[HN Gopher] You might not need a CRDT ___________________________________________________________________ You might not need a CRDT Author : paulgb Score : 264 points Date : 2022-12-05 13:57 UTC (9 hours ago) (HTM) web link (driftingin.space) (TXT) w3m dump (driftingin.space) | dustingetz wrote: | Remember, "conflict free" is the opposite of "transactional" | (ACID). When you choose conflict-free, you're choosing to give up | the ability to reject a change due to conflict and send it back | to the user for resolution. This is the crux of the issue in my | opinion. | JayStavis wrote: | Agreed - conflict resolution and authority handling are key to | advanced game networking techniques and underpin the | architecture of many big multiplayer games. | aboodman wrote: | I don't think this is true on both accounts: | | * There is no such thing as "conflict free". When users edit | data disconnected from each other, they can conflict, because | the user intent of the edits can be different and not possible | to reconcile. Nothing can fix that, not even crdts (As a | thought experiment - two users are working on a report about | ferrets. While offline, one decides the report should be about | lemurs, while the other decides it should be about otters. The | resulting set of changes will not be mergable without conflicts | in any reasonable sense.) | | * What CRDTs give you is actually _convergence_ which is very | important. But there are other ways to get convergence, and it | can be done transactionallym as long as your are willing to | weaken the D (durability). Replicache (my thing - | replicache.dev) is one way, the way described here is another. | | In general game networking is also transactional in this same | sense. Operations are transactions, that can affect multiple | data items and enforce arbitrary invariants. But the _order_ | they apply in drifts, and so durability is sacrificed. | | Stepping back even further, CRDTs are also often transactional | in the same way. Each operation in an op-based crdt can be | thought of a transaction that applies completely or not at all. | It's just that the set of available operations in a CRDT is | typically fixed. | dustingetz wrote: | Replicache is cool | | 1. So by "weaken durability" you mean basically allow | committed changes to be "lost" in the case of a common | conflict, aka give it up entirely? | | 2. CRDTs are meant to converge even in the presence of | network partitions (offline first), so you're saying your | committed transactions can be uncommitted? Or you're saying | writers have no idea if their transaction is committed | because it might converge the other way after arbitrary | delay? | | "Durability" means "transactions that have committed will | survive permanently" - wikipedia | aboodman wrote: | Thanks :) | | > So by "weaken durability" you mean basically allow | committed changes to be "lost" in the case of a common | conflict, aka give it up entirely? | | I should have said "weaken ordering". | | I mean that the transaction will run, but its order with | respect to other concurrently running transactions may be | different in the end than the submitting client initially | thought. As an extreme case, you submitted the transaction | and your local client thought you got the concert tickets, | but by the time it hits the server, the last ticket was | already sold. Transaction still runs: it just runs first | optimistically on the client, and then later | authoritatively on the server with a potentially different | result. Clients converge to serve view of the world and | application invariants are maintained regardless (number of | sold tickets doesn't go negative). | | Or just read Jepsen's take: | https://doc.replicache.dev/concepts/consistency | | In the case of concert tickets, this design doesn't make a | lot of sense, but in lots of other cases -- in particular | document editing applications -- it makes tons of sense. | The chance of concurrent edits is low, but it is critical | to maintain application-defined invariants when conflicts | do occur. | | > CRDTs are meant to converge even in the presence of | network partitions (offline first) | | Sure. Replicache converges in the presence of network | partitions, as does OP's design, as do most games, | distributed databases, etc. What CRDTs uniquely provide is | the ability to converge *without a leader* -- in a p2p | environment. As OP observes, this is not a property that | most applications today need. Because they have an obvious | leader: the server. | | > so you're saying your committed transactions can be | uncommitted? Or you're saying writers have no idea if their | transaction is committed because it might converge the | other way after arbitrary delay? | | They run optimistically (speculatively) on the client, and | then later authoritatively on the server. Application can | render the optimistic result, and then later the | authoritative result when it is known. | | This is exactly what CRDTs do. The only difference is that | CRDTs can do it without a leader, but the Replicache (and | OP's) design offers additional flexibility -- ability to | define your own operation/transactions. | | === | | We need new terminology for distributed systems. Terms | invented in the 70s aren't going to make sense. I believe | that "transactions" in the common parlance mean "a set of | code that runs together or not at all, isolated from other | concurrently running transactions". This is entirely | possible to provide with convergence if we are willing to | weaken ordering. | dustingetz wrote: | i accept we need new terminology 100% | | i accept we can weaken ordering, 100% | | under the model you propose, how does a writer know if | their transaction has been committed (by the authority) | and what is the upper bound on how long they will need to | wait to know the _final_ result? and is there an | operation for the writer to wait for it? | aboodman wrote: | Each client sends a sequence of mutations to the server, | each identified by a mutationID. During downstream sync, | server says up to which mutation it has processed. | | In Replicache, you can known when a mutation is finalized | with the `onSync` API, or more ergonomically, with the | upcoming `pendingMutation` API. | | https://doc.replicache.dev/api/classes/Replicache#onsync | https://doc.replicache.dev/api/classes/Replicache#experim | ent... | | PS- there actually is terminology for this already. | Replicache is technically a Causal+ Consistent | transactional system: | https://jepsen.io/consistency/models/causal | Sytten wrote: | Unless I am wrong, if your server uses end-to-end encryption you | are still better off with CRDT since the server cannot merge the | state. So there is a caveat to be said about this claim that you | only need CRDT for peer-to-peer scenarios. | paulgb wrote: | That's a good point, but the server can still enforce an | ordering of encrypted messages it can't read. You just don't | get the benefit of the server being able to detect invariant | violations in this case; all change operations would be sent to | all clients and the client state machine would throw away the | ones that created conflicts. | dgb23 wrote: | Quite a big takeaway here: | | > A general theme of successful multiplayer approaches we've seen | is not overcomplicating things. We've heard a number of companies | confess that their multiplayer approach feels naive -- especially | compared to the academic literature on the topic -- and yet it | works just fine in practice. | | The KISS and YAGNI principles apply. | | And on another note, it seems like one big industry is often | overlooked in these types of discussions. I wonder what web | development could learn from online game development when it | comes to "multiplayer" apps - I mean it's right there in the | name. | | One very pragmatic approach that strikes me as reasonable is | having fixed delay changes. The game Warcraft 3 (a real time | strategy game) did something like this. What it accomplishes is | that changes are basically forced to be in sync by delaying them. | | Note that you will get used to the delay very quickly, you won't | notice it too much after a short while. It is _much_ more | noticeable if the delays are variable, but a consistent delay | gets basically filtered out after a short while. | paulgb wrote: | > I wonder what web development could learn from online game | development when it comes to "multiplayer" apps - I mean it's | right there in the name. | | 100%. The architecture we used for https://plane.dev takes a | lot of inspiration from how game servers work, but using | HTTP/WebSocket instead of UDP. | | A lot of state sharing techniques that people are now using on | the web (like cursor position smoothing and optimistic updates) | have decades of precedent in multiplayer games. | | We've also seen more apps start to use an approach to pixel | streaming which more closely resembles stadia than traditional | VNC. https://digest.browsertech.com/archive/browsertech-digest- | wo... | JayStavis wrote: | The intersection of state replication and pixel streaming is | indeed interesting. Cloud games for example are usually using | game state sync techniques, and then add in the client<>gpu | latency on top of that. | | A true cloud native game (sadly part of Stadia's original | ambitions) would hopefully have a more e2e state management | approach inclusive of the pixel latency. | | This is very different from "local multiplayer" pixel | streaming, which may be what companies like Switchboard[1] | are doing: multiple clients control the same I/O channel | (e.g. you and I share a mouse on a single remote desktop) | | [1] https://www.switchboard.app/ | Joeri wrote: | _but using HTTP /WebSocket instead of UDP_ | | I think websockets run over UDP when hosted on http/3. | paulgb wrote: | They do, but with HTTP/3 there's also WebTransport, which | is even better because it means we will be able to avoid | head-of-line blocking. | munificent wrote: | There is definitely a lot we can learn from multiplayer games, | but the non-game use cases often have different constraints | that lead to different solutions. | | With a game, it's a given that everyone is playing in real- | time, right now, with as little lag as they can get away with. | The synchronization happens constantly and ideally the game's | internal state is completely in sync across all players. | Players _want_ the game to be kept in sync as quickly as | possible so that they can react to what other players are | doing. That means the amount of synchronization to perform at | any one point in time is generally fairly small. | | Also, the structure of the game itself often implicitly | partitions the game's mutable state among the players. Players | can obviously do things that affect each other and there are | mutable parts of the world that can be affected by multiple | players. But each player often controls their own avatar in the | game and has an implicit lock on that data. | | This means that there is rarely any "merging" happening. It's | usually just look at the events, determine an order to apply | them, and do them one at a time. If the result is a little | weird in terms of state (maybe a wall is built right when | another player is crossing it and the game lets the teleport | through), well, it's a game. The programmers can just declare | by fiat that it's part of the game mechanics. | | Outside of games, a common use case is "let me work on this | document offline for the duration of a flight and then resync | everything when I get to the airport". There may be hundreds of | changes and other users may also have huge batches of changes. | There can be a very complex reconciliation and megre process | because the changes are many and the data is precious. You | can't just say it's "part of the game" if your multi-user | spreadsheet goofs someone's tax statement. | smolder wrote: | I think the general term should be multi-actor. I understand | why people use the term multiplayer, I think: familiarity. But | players imply play which implies games, so it's not a great fit | outside of them. | | The type of thing you're talking about with Warcraft is a | deterministic simulation. As long as you have deterministic | simulation, and you synchronize your inputs by buffering | sufficiently for the input latency of all the remote | participants, that ensures everything stays 1:1 between each | local simulation. | yazaddaruvala wrote: | Honestly, I just restarting playing World of Warcraft, and even | tho there have been amazing tech improvements to Blizzard's | engine and servers, there is so much simple stuff lacking. | | I would honestly say it's likely better for game server | developers to be taking ideas from web development. | | When it comes to CRDT's everyone keeps talking about text- | editing, honestly one of the hardest problems for a CRDT to | solve. Text is entirely unstructured, requires extremely high | precision on the merge of any conflict. | | With a video game instead, players already used to "rewind and | replay" merges, they are already used to having a "miss" when | it should have been a "hit", and they roll with it because | while that is friction it's a game and precision isn't that | important. I'll say differently, precision is already not what | game servers provide. | | At the end of the day, game servers are "just" low-latency chat | applications with "bots" that process the messages. I'd love to | see the game server written by the maintainers of WhatsApp / | Messenger / Slack / etc. That would be a truly scalable game! | | P.S. If you think "chat is so simple, can't compare it to a | game server". Please just think through the design for FB's / | Discord's display widget that shows "currently active friends". | wtatum wrote: | This is an interesting idea but I do wonder if it can be | applied to generic collaborative editing "as is". The approach | for synchronized ticks (I think most RTS games do something | similar to the Blizzard approach) does require each participant | to "ack" the tick. You can see this when one participant is | lagging in that the simulation freezes for all players until | they acknowledge. The online gaming approach is to let other | players vote to kick someone who is lagging the game but that | probably wouldn't be considered suitable in something like | enterprise content management. | | Can some variation of the approach be taken without the strict | requirement that: - There is a known set of participants at any | time - The list of participants has a reasonable well-bounded | size - Peers can communicate with each other directly (WebRTC | is available but there are sharp corners) - It's acceptable to | freeze all players based on one lagging player - Need an | unambiguous way to handle simultaneous player actions against | shared state | | Most likely the "fix" for the friction is that you just slow | the ticks down _a lot_ compared to what you would do in a | multi-player game. | wongarsu wrote: | There are probably good techniques worth stealing from FPS or | MMORPGs. Those also usually run on a fixed tick rate of about | 50ms, but can't afford to have clients vote on it (FPS | because lag is unacceptable, MMORPGs because of user count). | There are various approaches in use, depending on whether you | want to just degrade the experience of the lagging | participant, or give preference based on action (e.g. FPS | resolve conflicts in favor of the person who fired). | matlin wrote: | There's actually an approach that sits in between this and formal | CRDTs. | | I had the same insight that having a total ordered log of changes | solves a lot of the issues with concurrent updates. We solved | this by creating one sequence CRDT that ensures all events from | clients eventually end up in the same order and then our data | structures are just reductions over those events. It's a little | bit like a distributed, synchronized Redux store. This let us | avoid having to think in terms of CRDT types (LWW registers, | Grow-only sets, sequences, etc) and just think in terms of the | domain-specific business logic (e.g. "what should happen when | someone marks a todo as done, after it's deleted?"). | | There's a little but more to so if this is interesting to anyone, | I can write a more detailed explanation. | lifeisstillgood wrote: | How do you solve the ordering? Timestamps can be wildly out of | whack? Be interested to hear more | matlin wrote: | We use logical timestamps instead of datetime. | | So each "event" has three properties that allow us to resolve | to a sensible order across clients: session (integer), | device_id (uuid), order (integer). The `session` number is | set by the highest `order` that your device has seen from the | server and the `order` gets incremented on each new event. | | So an example you might have a sequence of events like this. | [session] [order] 0 1 0 2 0 3 ---sync --- 3 4 3 5 3 6 | | We can then sort all events by (session, device_id, order) | and we'll get any events that happen on a device to be sorted | in a row even if some other device created a bunch of | concurrent events at the same time. | Karrot_Kream wrote: | What about situations where two different clients both | begin to publish new events from the same starting point? | Those events can't be absolutely ordered right? If you're | thinking of Lamport-style happens-before relations, then | you can't enforce total ordering of those events. Do you | just arbitrarily mark one of those client event streams as | failed and force the client to absorb new state and try | again? | LAC-Tech wrote: | Isn't that essentially doing a CRDT, but the replica is the | whole datastore? | | Ties in nicely with event sourcing - the ordered sequence of | events is the CRDT, and sending events between nodes are the | delta states. | derefr wrote: | Have you considered backing your application's state- | synchronization with a centralized CQRS/ES event store (e.g. | https://www.eventstore.com/)? You get the same semantics, | without paying the overhead of building them on top of a CRDT | abstraction; and the event store itself also "knows" (in the | sense that it supplies the framework and you supply the logic) | how to reduce over states in order to build snapshot states | ("aggregates") for clients to download for fast initial | synchronization. | matlin wrote: | We basically have that as well! We use CRDTs on the client to | allow for optimistic updates and fast replication of events | directly to other clients but we also do exactly what you | describe on the server so that a user loading the data for | the first time just gets the "snapshot" and then plays events | on top of it. | maclockard wrote: | I actually talked to the drifting in space team about our | experience implementing multiplayer. Really cool to see their | findings! | | Our team ended up having a similar take away, we wrote a blog | post detailing our approach and experience here: | https://hex.tech/blog/a-pragmatic-approach-to-live-collabora... | | For the most part is been surprisingly solid! I was very much | expecting to need to completely rewrite it by now, but its met | most of our present needs. | | The only real shortcoming is not being able to support line | specific commenting. While we _could_ support it, any minor | update to the text would result in the comment being marked | 'stale' with a high false positive rate. I've considered adopting | CRDT/OT just for the text editing portion to solve this. | jawadch93 wrote: | toxicFork wrote: | If I want to have an app that needs to eventually persist user | state to a server but still work well when the user is offline, | e.g. a personal vocabulary app that can work with multiple | devices but also can work when the user is offline, so the user | can still define new words manually on both devices, and then | when the phone is back online, the state still makes sense in | both, should I use CRDT? | eternalban wrote: | A definitive no is the answer. This complexity is necessary | only if others care about your shared state aka vocabulary, | and, multiple write ops are being done on same records at the | same relative time on different devices. For example, a | document editor _could_ use CRDTs without looking silly. A CRDT | is an _eventually consistent_ distributed object but all its | complexity is there to handle concurrent modifications. | | In your application, the consistency requirements are pretty | weak and easy to meet with a basic client/server architecture | with a well defined sync point set on its life-cycle. For | example, sync on boot, sync on net-reconnect, etc. Or, you | could use a simple P2P protocol so on boot you discover your | other connected devices and your apps can shake hand and | exchange notes to sync up as they modify their local state: | "Hey gang, + "KISS". "Keep it simple ..". Use json if you must. | | Further simple characteristics of this app is that the state | ops on your dictionary are commutative (like a CRDT!). On one | device you add word "Foo" and on the other you add "Bar". You | do not care (do you?) that you added them in a certain order, | so you don't even have to bother with what is rather difficult | to do in a performant manner: maintain a total ordering over | the state of the system. (Think sharded ordered-sets..) | fwip wrote: | There's a few edge cases that you'd probably need to handle | (like adding the same word with two different definitions), | but probably few enough that you could do them ad-hoc and | punt to the user. ("Hey, merging these two sets failed, | here's the conflict, which do you want to keep?") | | Something fancy like Operational Transforms or CRDT might | make your life easier if you find a library that's ergonomic | for your use case, but it's definitely not necessary. | jamil7 wrote: | The article talks about this, in the case where your clients | are syncing with a centralised server you don't necessarily | need to use CRDTs, and it might be simpler not to. | layer8 wrote: | This doesn't address how changes that were committed offline | are merged on the central server once the clients are online | again. | dinosaurdynasty wrote: | Basic CRDTs are still useful (idempotency, associativity, and | commutativity are powerful mental tools for all distributed | systems, even OT) but you can _massively_ simplify the CRDTs | if you assume a super-peer (aka centralized server all | updates go through). | | Example: last-write-wins register, which is a (v, t) pair, | normally has a merge function of "return the pair with the | greatest t (with deterministic tie breaking for same t)", and | for a super peer just have v's and "the latest value to hit | the server wins". | robertlagrant wrote: | Good shoutout to Automerge in this. | syrusakbary wrote: | > CRDTs are designed for decentralized systems where there is no | single central authority to decide what the final state should | be. There is some unavoidable performance and memory overhead | with doing this. Since Figma is centralized (our server is the | central authority), we can simplify our system by removing this | extra overhead and benefit from a faster and leaner | implementation | | I think this is on point. It's also super refreshing to see the | work on Aper [1] [2] (a Rust library implementing state machine | synchronization across a trusted network). Looking forward next | series of articles here! | | [1]: https://aper.dev/ | | [2]: https://github.com/drifting-in-space/aper | fwip wrote: | Just to clarify, Aper also uses a client/server model, correct? | It appears as though it uses a central server to totally order | incoming change requests. | paulgb wrote: | Yes, it does. | paulgb wrote: | Thanks, Syrus! | eatonphil wrote: | Great breakdown of CRDTs vs replicated state machines (like Raft | or VSR) and the difference between eventually convergent and | globally orderable workloads, thanks for this Paul! | satvikpendem wrote: | Does anyone have any resources on making an offline-first app | from scratch (I don't want to use stuff like Firebase and more | importantly want to learn how it's done)? Like a task manager for | example that can sync with an online account but still works | completely offline. | LAC-Tech wrote: | I swear I've looked into firebase multiple times and I have no | idea how they actually detect and handle conflicts. They just | hand-wave that it works offline... not that comforting. | | Regardless, there's no one size fits all for offline. | Approaches that work for append only data don't work for data | where multiple nodes are doing destructive updates. And even if | you're doing that, some data is amenable to deterministic | merging (ie, CRDTs) and some you need a user to step in to | decide (revision based multi-master systems, prior art being | Pouch/Couch or indeed git). | | Basically if you can tell me what exactly a taskmanager is and | how it might be updated I might be able to say something more | helfpul! | lewisjoe wrote: | This is the most underrated problem with CRDTs: | | > Both paths will allow us to ensure that each replica converges | on the same state, but that alone does not guarantee that this | state is the "correct" state from the point of view of user | intent. | | In context of richtext editors, it's easy to converge a rich-text | JSON into a single state for all collaborators. What's difficult | is to ensure that the converged state is renderable as richtext. | For example, is there a table cell that was inserted where a | column was deleted? | | I wrote a summary of this issue with CRDTs here - | https://writer.zohopublic.com/writer/published/grcwy5c699d67... | for those interested. | | While I'm optimistic it's not impossible to solve, it's | surprising why most CRDT literatures don't go into these details | on correctness. Appreciate the author for putting this out. | | PS: We currently use OT for collaboration in Zoho Writer | (https://writer.zoho.com/). I'm looking out for practical CRDT | ideas that works well with richtext. | truculent wrote: | To what extent is this an issue with CRDTs and to what extent | is it an issue with the data structure(s) used to represent the | state? | dkjaudyeqooe wrote: | I think the short answer is: if you want your CRDTs to | enforce application constraints you need to bake them into | the data structure. | truculent wrote: | That makes sense. Thank you. | [deleted] | jkaptur wrote: | The intuitively obvious approach here is to have some kind of | schema-guided or constraint-guided CRDT. That is, you get the | guarantee that not only do you end up with _valid JSON_ , you | get the guarantee that f(that_json) == true. I imagine this is | considerably easier said than done, but I'm curious if there's | research (or results!) in the area. | quickthrower2 wrote: | Parts of the tree with "don't merge children" might be ok, | then you show the user a conflict dialog when this happens. | | Or, inspired by your comment: run f for the tree, if false, | search for the smallest subset where f is false and show the | conflict dialog for that. The user chooses "mine" or | "theirs". | kevsim wrote: | We also had the OT vs. CRDT discussion with regards to our | editor at Kitemaker (kitemaker.co), and also ended up with OT. | For us there were a number of factors, but (in addition to the | case you mentioned) it came down to the fact that (at that | time) most of the CRDT libs like automerge generate huge diffs | that we'd need to send over the wire and also that the open | source editor we use (SlateJS) had operations that just slotted | very nicely into OT. | | Your editor feels really nice BTW. Well done! | expede wrote: | > I'm looking out for practical CRDT ideas that works well with | richtext. | | Have you seen Peritext from Ink & Switch? | https://www.inkandswitch.com/peritext/ It's relatively new, but | is a CRDT aimed at rich text! | jamil7 wrote: | Their linked summary mentions Peritext and the current | limitations. | | Agree it's exciting though and you can implement it fairly | easily yourself by following the blog post as it's so well | written and explained. | samwillis wrote: | Exactly, the "C" in CRDT is a little misleading. They are | "conflict free" in as much as they are able to merge and | resolve in the basic sense. That does not mean that the | resulting state is valid or meets your internal schema. | | A good example of this is using Yjs with ProseMirror, it's | possible for two concurrent edits to resolve to a structure | that doesn't meet your ProseMirror schema. An example I have | used before is a <figure> element that can contain a single | optional <caption> element. It's possible for two users to add | that caption to the figure concurrently, Yjs will happily merge | its XMLFragment structures and place two captions in the | figure. However when this is loaded into ProseMirror it will | see that the document does not match its schema and dump the | second caption. Fortunately the captions are ordered and so the | same state will re resolved by all parties. | | This is all ok if you are doing the merging on the front end, | but if you try to merge the documents on the server, not | involving ProseMirror, the structure you save, and potentially | index or save as html, will be incorrect. | | Point is, it's still essential with CRDTs to have a schema and | a validation/resolution process. That or you use completely | custom CRDTs that encodes into their resolution process the | validation of the schema. | lewisjoe wrote: | Your example is spot on :) | | I believe we shouldn't lock our richtext state to | Prosemirror's specs. The state should better be modelled as a | CRDT and ensuring correctness should be part of CRDT | operations. This way we unlock other possibilities like | server-side merging and the editing library can be swapped | (eg. lexical) | | All this sounds right but it's lot of work. | lijogdfljk wrote: | Yea, this is kinda why i don't understand a lot of these off | the shelf CRDT solutions. Like CRDTs for JSON. | | I'm still _trying_ to learn CRDTs; but early on in my | learning process i had the thought that CRDTs don 't have | mere data types - as you say, they have Data Type + Conflict | Resolution bundled into one. It's not enough to make a String | type, because different types need to be resolved | differently. Plus some types of resolution have a lot more | metadata, so you want to choose the best fitting one. | | I found that my goal to make SQL + CRDT meant i had to expose | this combo of Type + Resolution to the end user. It seems | essential to how the app works. | | But maybe i don't know anything, /shrug. I'm still learning | them. | toomim wrote: | Yes! We call this the "Merge Type" of data in the braid.org | group. | | Each datum as both a _data type_ and a _merge type_. The | programmer just needs to specify these two types, and then | the programming runtime can choose which synchronization | algorithm to use. | naasking wrote: | > Point is, it's still essential with CRDTs to have a schema | and a validation/resolution process. That or you use | completely custom CRDTs that encodes into their resolution | process the validation of the schema. | | I'm surprised that MS's concurrent revisions haven't taken | off because this is what they do by default: you specify a | custom merge function for any versioned data type so you can | resolve conflicts using any computable strategy. | CGamesPlay wrote: | I don't think I've seen any implementations of this, but the | table example you listed is solvable. The insight is to stop | thinking of the table as a bundle of markup and think about it | as itself a primitive CRDT type that contains text in the table | cells. | | The columns are a CRDT set of column IDs. The rows are a CRDT | map of row IDs to CRDT maps of column IDs to cell values, where | the cell values are again the same type as your document text. | Column headings are a specially formatted row. Rows are ordered | using a sequence type. | | Removing a row is a normal CRDT map deletion. Removing a column | is a normal CRDT map deletion from the columns map. Rendering | the table will ignore map entries that don't correspond to real | columns. | georgelyon wrote: | This comment really needs a "yo dawg" meme. | CGamesPlay wrote: | I mean, JSON is objects all the way down, CRDTs are | conflict-free maps all the way down. | inglor wrote: | So with your solution, the inserted table cell would | "disappear" from what is rendered? | | That's not necessarily (depends on the product) great or even | acceptable user experience. As a user I would certainly want | a way to know there is an "orphaned" "edited after delete" | cell to salvage the data inputted or understand the confusion | between editors. | jameshart wrote: | What you would 'certainly want' depends very much on the | application this is implemented in. The data isn't gone | from the CRDT table unless the program decides it is, so | many ui choices are possible. | georgelyon wrote: | The best UX resolution to this kind of issue I have seen is | to display "This table cell was added by Bob concurrently | to Alice removing its row. How would you like to proceed: | recover the row or ignore the change?" | drfuchs wrote: | Who gets the pop-up? If both Alice and Bob, and they | simultaneously give conflicting replies, you're right | back to where you started. | georgelyon wrote: | Yes, this is the correct behavior. If Alice and Bob | disagree about whether or not that row should be in the | table you aren't going to fix it with math. More often | than not, the expected outcome will be obvious (i.e. | spelling correction in a deleted cell can be ignored) and | if it isn't obvious, there will need to be human | interaction to determine the best path forward. | layer8 wrote: | This isn't really a problem. If Alice and Bob keep | disagreeing, then it is appropriate that a conflict | remains unresolved. | Dylan16807 wrote: | You're not back to where you started. You've gone from a | technical conflict to a direct disagreement about what is | supposed to happen. There's no technical solution to an | edit war. So maybe ask anew about the conflicting | answers, or pick one arbitrarily at that point. Or make | them play rock paper scissors. But it still helps to ask | _before_ it becomes a direct disagreement. | jlokier wrote: | Usually that doesn't happen because their replies don't | conflict within the time interval of a round trip between | the two of them, so one can see the other's action first, | and decide to agree. | | But if it does happen, either they get a new popup, | eventually, saying that their conflict-resolution | decisions also conflicted and what would they like to do, | or the system decays to a CRDT-style pre-determined | resolution rule for the second-order conflict -- with | similar problems to the original CRDT resolution rule we | tried to avoid. Things like "give priority to the person | who chose to keep information instead of deleting", "give | priority to the person with the biggest random number", | "merge them by keeping information along with a note in | the document about the attempt to delete the column", or | "delete the column and put the edited cell in a note". | But this time, only when the users conflict after the | first popup, so it doesn't occur as often. | | I think you're describing a metastability effect which | occurs in all distributed systems that don't use pre- | determined resolution rules like pure CRDTs. It happens | in network protocols, distributed transactions, and | consensus protocols in general. If the process you use to | resolve depends on input from multiple parties who don't | have a way to communicate and could choose conflicting | values, there's a non-zero probability of needing another | round or to escalate upwards to a higher-level system to | resolve. | | In many technical systems, _provided there 's a way to | communicate_ it's not difficult to make the probability | of repeated rounds or escalations tend arbitrarily close | to zero (but not actually zero). If there's a possibility | of network partition, though, the conflict can come back | later when communication returns; there's no way to | completely make it go away. | tomhallett wrote: | Stupid question - are there any good resources on User | Experience patterns/recipes for implementing OT/CRDTs? | | Context - a lot of the literature is about the backend | algorithm. But that still leaves individual teams | tripping over the frontend interaction design and | usability. | | Example of what I'm looking for - hex.tech has a video | tutorial of their "Cell locking" feature, where the cell | gets disabled and a "Mac Lockard has taken control of a | cell you were editing" toast notification appears. [1]. | This is fully baked and a recipe which if I put in a jira | ticket could actually get implemented. :) | | I would love to read a book which is a "Don't make me | think" and has an example with "This table cell was added | by Bob concurrently to Alice removing its row. How would | you like to proceed: recover the row or ignore the | change?". All the UI's in my head would look like | "Clippy" from MS Word - or "Pipey" from Silicon Valley | [2], haha. | | 1: https://hex.tech/blog/a-pragmatic-approach-to-live- | collabora... | | 2: https://www.youtube.com/watch?v=E2YcOV5C2x4 | toomim wrote: | This is close: https://decentpatterns.com/library/ | | ...but my general belief is that UI patterns need to | evolve as (1) specific UIs before they become (2) general | patterns, and we simply don't have enough good solutions | yet to make a pattern library out of them. | jdvh wrote: | For the table situation we solve this by have a generic concept | of a "card" that is rendered as a table cell when it's the | grandchild of a table (table > tr > td) but otherwise it's just | a standalone item, much like a markdown callout. | aboodman wrote: | This is a cool approach. It reminds me of statebox by mochimedia: | https://github.com/mochi/statebox. | | If I'm understanding correctly, it requires the mutations to be | deterministic in order for the nodes to converge. | | Replicache (replicache.dev - my thing) takes a similar approach | except it does _not_ requires the mutations to be deterministic, | which is very useful because it enables the mutations to reach | out to other services, be authenticated, etc. | | Both the idea here and Replicache's approach are closely related | to game networking. If you are interested in these ideas, a | really excellent set of content is: | https://www.gabrielgambetta.com/client-server-game-architect.... | | If you are interested in how Replicache's sync protocol works and | achieves transactional changes with server authority, it is | described here: https://doc.replicache.dev/concepts/how-it-works. | anonymousDan wrote: | Am I missing something? I'm all for SMR when possible but the | whole point of moving to something with weaker consistency is | that you don't want to rely on availability of a central server | (or quorum of servers in the case of SMR)? | LAC-Tech wrote: | First off, well done on examining your problem domain, realising | that your operations are not in-fact commutative, and realising | you need to discard your model. A lot of people would have just | decided it was 'agile' to 'iterate' over there fundamentally | broken solution until they had coded themselves in a corner, but | I digress. | | I would note that your path 2 is essentially another kind of CRDT | though. Instead of your replica being in individual document, | your replica is now the entire DB, and your changes are a grow | only sequence. (Wish I could remember how pointed this out to me | on HN, my mind was blown). | wim wrote: | For certain data structures the "pure" CRDT approach can get | tricky. In our case, we're building a collaborative notes/task | IDE [1] which is a hybrid of a text editor and an outliner at the | same time. So you can perform all the usual text editor | operations on a document as if it was text, but the underlying | data structure is a tree (or graph really). | | So just like the example in article we use a hybrid approach and | global order for certain (tree) operations to enforce invariants | on the data structure when syncing. That still works with | offline-mode and end-to-end encryption, but as we don't have the | requirement to get rid of a central server we can get away with | this model. | | [1] https://thymer.com | rubyist5eva wrote: | "You might not (probably need) need $FANCY_TRENDY_THING" | acrefoot wrote: | Whatever Clickup uses for their Clickup docs, it reliably gets | into a confused state and data is lost, during which not all | editors show the same state locally. | hardwaregeek wrote: | I really loved the Signals and Thread episode[0] on state machine | replication. It basically said "you could do all of these fancy | distributed systems techniques that assume a non total ordering | of events, ooooorr you could have a main server that determines a | total ordering for events and make your life infinitely easier" | | [0]: https://signalsandthreads.com/state-machine-replication- | and-... | Joeri wrote: | It breaks down as soon as you have an offline scenario. You can | only assume the server is authorative if there will not be | network partitions or if writes are not accepted on the other | side of the network partition (meaning on the client). As soon | as the client accepts writes while offline those writes become | authorative and will need to be merged with whatever changes | the server saw. | layer8 wrote: | It also breaks down when you have a passive/non-transactional | server, like an app syncing its state via files on Dropbox. | derefr wrote: | Or even just have a simple central presence server hand out | randomized-per-epoch node-IDs; have all nodes in the system use | node-IDs as the prefix of the event temporal ordering key per | "tick"; and trust your nodes to self-report their node-ID | correctly in all events they send. Then your data protocol can | still be p2p rather than all events having to flow through the | server (if that benefits you in your problem domain) while | getting all the same advantages of total ordering. | | And this works great for anything enterprise-y, where all the | nodes are operated by one party or a trusted consortium of | parties, and no nodes are are intentionally trying to cheat by | misreporting their assigned node-IDs. You only _need_ central | servers (or virtual central servers, a.k.a. blockchains) when | you get to things like MMO games, or HFT trading bots talking | to an exchange, where every client has a reason to want to | pretend it 's the one that "should" go earliest in the per-tick | transaction ordering. | lifeisstillgood wrote: | But if I have a conflict (ie one node changes a column | datatype, another adds data to column) how do i know which | one went first _inside the same tick?_ | | If I understand you, there will be a tick between 12.01 and | 12.02, and node A gets token 1234, and node B gets 5678. | Events are published p2p, and the ordering is 5678_+9seconds | and 1234_+12seconds - node A (1234) admits it has done the | action 3 seconds after B. But this still depends on | synchronised clocks I think? | | Also you still need to resolve each type of conflict (hence | having CRDTs really helps) - we know there is a conflict | between A and B changes, but how do we _resolve_ them? | | Maybe I dont get it :-) | meheleventyone wrote: | Yeah using an event stream like this requires synchronising | the clock of participants. It's similar to rollback | networking in games. Peers can recover from receiving late | updates but if all the updates are late from one peer the | situation still degrades. To avoid that participants run | time per tick slightly shorter or longer to keep within a | desired window. | pookha wrote: | I thought James Long's 200 or three hundred lines of code to | implement a real-world CRDT solution (Actual) was pretty cool. | Seemed to solve a lot of issues revolving around heavy duty | servers. | [deleted] | mathgladiator wrote: | Another reason to not need a CRDT is privacy. A CRDT assumes that | every participate has equal rights to all data. | | I'm biased to simple WebSockets with OT using JSON deltas, and | I'm building a platform that greatly simplifies how to | efficiently build collaborative apps: https://www.adama- | platform.com/ | CGamesPlay wrote: | These are orthogonal to one another. For read-protection, | encryption is a viable solution: specific paths into your data | structure (e.g. certain keys in a map) can be encrypted with a | key that isn't provided to all participants. For write- | protection, the solution is the same as with any syncing | service: when receiving an update, if you don't authorize the | sender of the update, you don't accept it. | samwillis wrote: | > A CRDT assumes that every participate has equal rights to all | data. | | Within a single data structure yes, so within a "room" or | "document". If you need to partition right to the data | (read/update) then you should use separate structures for each | area. | | Yjs implements this as as "sub documents". | charles_f wrote: | That sounds like a weak counter argument to me, 1) crdt is | focusing on the conflict resolution part, access control is not | in scope so you need to implement on top. 2) if you are already | implementing access control, then filtering out what people | don't have access to doesn't seem much more complexity. If you | go the full client-side way, you can also add a cryptographic | layer to whatever needs hiding 3) one thing I don't see in your | solution is the offline aspect to it. With a central authority | in the middle and online connectivity, conflicts become | unlikely and much smaller, but with offline support, documents | et can evolve in drastic ways where having a formalized | strategy like what crdt offers makes sense | pookha wrote: | CRDT's can be encrypted end-to-end. Privacy seems to be one of | the selling points. | pattrn wrote: | Is this really the case? While CRDT's are designed to work | peer-to-peer, they don't need to be fully connected to all | clients. Forcing the synchronization through controlled nodes | (a server or cluster) allows adding read/write permissions. | Depending on the use case, it may require additional logic for | reversing operations before propagating to other clients, or in | some cases forcing a client to revert to a snapshot (this can | be a bit complex). That's an approach I've used in the past. | | Have I overlooked something (highly likely)? | dxchester wrote: | We have used Automerge a bunch, but found that there is a | threshold where beyond a given document size, performance gets | exponentially worse, until even trivial updates take many | seconds' worth of CPU. That is often how it works when the | document end state is exclusively the sum of all the edits that | have ever happened. | | Our answer was to reimplement the Automerge API with different | mechanics underneath that allows for a "snapshots + recent | changes" paradigm, instead of "the doc is the sum of all | changes". That way performance doesn't have to degrade over time | as changes accumulate. | | Project is here: https://github.com/frameable/pigeon, with some | benchmarks: https://github.com/frameable/pigeon/wiki/Benchmarks | in the wiki... | supermatt wrote: | > Conflict-free replicated datasets (CRDTs) | | Conflict-free Replicated Data Types, surely? | paulgb wrote: | Oof, in the first sentence, too, that's embarrassing. Fixed, | thanks! | supermatt wrote: | No problem :) | nobu-mori wrote: | CRDTs have been on HN a lot recently. I'm working on a database | that deals in events rather than raw data. Application developers | specify event handlers in JavaScript. The database layer then | takes the event handlers and distributes them to the clients. | Clients receive the raw stream of events and reconcile the final | data state independently. The key aspect is all the event | handlers are reversible. This allows clients to insert locally | generated events immediately. If any remote events are received | out-of-order, the client can undo events, insert the new events, | and reapply everything on top of the new state. | | I'm curious how many people have a need for solving the real-time | multi-player use case. The database I'm working on was inspired | by the networking code used in fighting games (rollback netcode), | but I'm always curious to learn about additional use cases. | no_wizard wrote: | I think the most common outside of gaming is collaboration | software. Think whiteboards, docs (e.g. Google Docs) and other | collaborative tools. ___________________________________________________________________ (page generated 2022-12-05 23:00 UTC)