[HN Gopher] You might not need a CRDT
       ___________________________________________________________________
        
       You might not need a CRDT
        
       Author : paulgb
       Score  : 264 points
       Date   : 2022-12-05 13:57 UTC (9 hours ago)
        
 (HTM) web link (driftingin.space)
 (TXT) w3m dump (driftingin.space)
        
       | dustingetz wrote:
       | Remember, "conflict free" is the opposite of "transactional"
       | (ACID). When you choose conflict-free, you're choosing to give up
       | the ability to reject a change due to conflict and send it back
       | to the user for resolution. This is the crux of the issue in my
       | opinion.
        
         | JayStavis wrote:
         | Agreed - conflict resolution and authority handling are key to
         | advanced game networking techniques and underpin the
         | architecture of many big multiplayer games.
        
         | aboodman wrote:
         | I don't think this is true on both accounts:
         | 
         | * There is no such thing as "conflict free". When users edit
         | data disconnected from each other, they can conflict, because
         | the user intent of the edits can be different and not possible
         | to reconcile. Nothing can fix that, not even crdts (As a
         | thought experiment - two users are working on a report about
         | ferrets. While offline, one decides the report should be about
         | lemurs, while the other decides it should be about otters. The
         | resulting set of changes will not be mergable without conflicts
         | in any reasonable sense.)
         | 
         | * What CRDTs give you is actually _convergence_ which is very
         | important. But there are other ways to get convergence, and it
         | can be done transactionallym as long as your are willing to
         | weaken the D (durability). Replicache (my thing -
         | replicache.dev) is one way, the way described here is another.
         | 
         | In general game networking is also transactional in this same
         | sense. Operations are transactions, that can affect multiple
         | data items and enforce arbitrary invariants. But the _order_
         | they apply in drifts, and so durability is sacrificed.
         | 
         | Stepping back even further, CRDTs are also often transactional
         | in the same way. Each operation in an op-based crdt can be
         | thought of a transaction that applies completely or not at all.
         | It's just that the set of available operations in a CRDT is
         | typically fixed.
        
           | dustingetz wrote:
           | Replicache is cool
           | 
           | 1. So by "weaken durability" you mean basically allow
           | committed changes to be "lost" in the case of a common
           | conflict, aka give it up entirely?
           | 
           | 2. CRDTs are meant to converge even in the presence of
           | network partitions (offline first), so you're saying your
           | committed transactions can be uncommitted? Or you're saying
           | writers have no idea if their transaction is committed
           | because it might converge the other way after arbitrary
           | delay?
           | 
           | "Durability" means "transactions that have committed will
           | survive permanently" - wikipedia
        
             | aboodman wrote:
             | Thanks :)
             | 
             | > So by "weaken durability" you mean basically allow
             | committed changes to be "lost" in the case of a common
             | conflict, aka give it up entirely?
             | 
             | I should have said "weaken ordering".
             | 
             | I mean that the transaction will run, but its order with
             | respect to other concurrently running transactions may be
             | different in the end than the submitting client initially
             | thought. As an extreme case, you submitted the transaction
             | and your local client thought you got the concert tickets,
             | but by the time it hits the server, the last ticket was
             | already sold. Transaction still runs: it just runs first
             | optimistically on the client, and then later
             | authoritatively on the server with a potentially different
             | result. Clients converge to serve view of the world and
             | application invariants are maintained regardless (number of
             | sold tickets doesn't go negative).
             | 
             | Or just read Jepsen's take:
             | https://doc.replicache.dev/concepts/consistency
             | 
             | In the case of concert tickets, this design doesn't make a
             | lot of sense, but in lots of other cases -- in particular
             | document editing applications -- it makes tons of sense.
             | The chance of concurrent edits is low, but it is critical
             | to maintain application-defined invariants when conflicts
             | do occur.
             | 
             | > CRDTs are meant to converge even in the presence of
             | network partitions (offline first)
             | 
             | Sure. Replicache converges in the presence of network
             | partitions, as does OP's design, as do most games,
             | distributed databases, etc. What CRDTs uniquely provide is
             | the ability to converge *without a leader* -- in a p2p
             | environment. As OP observes, this is not a property that
             | most applications today need. Because they have an obvious
             | leader: the server.
             | 
             | > so you're saying your committed transactions can be
             | uncommitted? Or you're saying writers have no idea if their
             | transaction is committed because it might converge the
             | other way after arbitrary delay?
             | 
             | They run optimistically (speculatively) on the client, and
             | then later authoritatively on the server. Application can
             | render the optimistic result, and then later the
             | authoritative result when it is known.
             | 
             | This is exactly what CRDTs do. The only difference is that
             | CRDTs can do it without a leader, but the Replicache (and
             | OP's) design offers additional flexibility -- ability to
             | define your own operation/transactions.
             | 
             | ===
             | 
             | We need new terminology for distributed systems. Terms
             | invented in the 70s aren't going to make sense. I believe
             | that "transactions" in the common parlance mean "a set of
             | code that runs together or not at all, isolated from other
             | concurrently running transactions". This is entirely
             | possible to provide with convergence if we are willing to
             | weaken ordering.
        
               | dustingetz wrote:
               | i accept we need new terminology 100%
               | 
               | i accept we can weaken ordering, 100%
               | 
               | under the model you propose, how does a writer know if
               | their transaction has been committed (by the authority)
               | and what is the upper bound on how long they will need to
               | wait to know the _final_ result? and is there an
               | operation for the writer to wait for it?
        
               | aboodman wrote:
               | Each client sends a sequence of mutations to the server,
               | each identified by a mutationID. During downstream sync,
               | server says up to which mutation it has processed.
               | 
               | In Replicache, you can known when a mutation is finalized
               | with the `onSync` API, or more ergonomically, with the
               | upcoming `pendingMutation` API.
               | 
               | https://doc.replicache.dev/api/classes/Replicache#onsync 
               | https://doc.replicache.dev/api/classes/Replicache#experim
               | ent...
               | 
               | PS- there actually is terminology for this already.
               | Replicache is technically a Causal+ Consistent
               | transactional system:
               | https://jepsen.io/consistency/models/causal
        
       | Sytten wrote:
       | Unless I am wrong, if your server uses end-to-end encryption you
       | are still better off with CRDT since the server cannot merge the
       | state. So there is a caveat to be said about this claim that you
       | only need CRDT for peer-to-peer scenarios.
        
         | paulgb wrote:
         | That's a good point, but the server can still enforce an
         | ordering of encrypted messages it can't read. You just don't
         | get the benefit of the server being able to detect invariant
         | violations in this case; all change operations would be sent to
         | all clients and the client state machine would throw away the
         | ones that created conflicts.
        
       | dgb23 wrote:
       | Quite a big takeaway here:
       | 
       | > A general theme of successful multiplayer approaches we've seen
       | is not overcomplicating things. We've heard a number of companies
       | confess that their multiplayer approach feels naive -- especially
       | compared to the academic literature on the topic -- and yet it
       | works just fine in practice.
       | 
       | The KISS and YAGNI principles apply.
       | 
       | And on another note, it seems like one big industry is often
       | overlooked in these types of discussions. I wonder what web
       | development could learn from online game development when it
       | comes to "multiplayer" apps - I mean it's right there in the
       | name.
       | 
       | One very pragmatic approach that strikes me as reasonable is
       | having fixed delay changes. The game Warcraft 3 (a real time
       | strategy game) did something like this. What it accomplishes is
       | that changes are basically forced to be in sync by delaying them.
       | 
       | Note that you will get used to the delay very quickly, you won't
       | notice it too much after a short while. It is _much_ more
       | noticeable if the delays are variable, but a consistent delay
       | gets basically filtered out after a short while.
        
         | paulgb wrote:
         | > I wonder what web development could learn from online game
         | development when it comes to "multiplayer" apps - I mean it's
         | right there in the name.
         | 
         | 100%. The architecture we used for https://plane.dev takes a
         | lot of inspiration from how game servers work, but using
         | HTTP/WebSocket instead of UDP.
         | 
         | A lot of state sharing techniques that people are now using on
         | the web (like cursor position smoothing and optimistic updates)
         | have decades of precedent in multiplayer games.
         | 
         | We've also seen more apps start to use an approach to pixel
         | streaming which more closely resembles stadia than traditional
         | VNC. https://digest.browsertech.com/archive/browsertech-digest-
         | wo...
        
           | JayStavis wrote:
           | The intersection of state replication and pixel streaming is
           | indeed interesting. Cloud games for example are usually using
           | game state sync techniques, and then add in the client<>gpu
           | latency on top of that.
           | 
           | A true cloud native game (sadly part of Stadia's original
           | ambitions) would hopefully have a more e2e state management
           | approach inclusive of the pixel latency.
           | 
           | This is very different from "local multiplayer" pixel
           | streaming, which may be what companies like Switchboard[1]
           | are doing: multiple clients control the same I/O channel
           | (e.g. you and I share a mouse on a single remote desktop)
           | 
           | [1] https://www.switchboard.app/
        
           | Joeri wrote:
           | _but using HTTP /WebSocket instead of UDP_
           | 
           | I think websockets run over UDP when hosted on http/3.
        
             | paulgb wrote:
             | They do, but with HTTP/3 there's also WebTransport, which
             | is even better because it means we will be able to avoid
             | head-of-line blocking.
        
         | munificent wrote:
         | There is definitely a lot we can learn from multiplayer games,
         | but the non-game use cases often have different constraints
         | that lead to different solutions.
         | 
         | With a game, it's a given that everyone is playing in real-
         | time, right now, with as little lag as they can get away with.
         | The synchronization happens constantly and ideally the game's
         | internal state is completely in sync across all players.
         | Players _want_ the game to be kept in sync as quickly as
         | possible so that they can react to what other players are
         | doing. That means the amount of synchronization to perform at
         | any one point in time is generally fairly small.
         | 
         | Also, the structure of the game itself often implicitly
         | partitions the game's mutable state among the players. Players
         | can obviously do things that affect each other and there are
         | mutable parts of the world that can be affected by multiple
         | players. But each player often controls their own avatar in the
         | game and has an implicit lock on that data.
         | 
         | This means that there is rarely any "merging" happening. It's
         | usually just look at the events, determine an order to apply
         | them, and do them one at a time. If the result is a little
         | weird in terms of state (maybe a wall is built right when
         | another player is crossing it and the game lets the teleport
         | through), well, it's a game. The programmers can just declare
         | by fiat that it's part of the game mechanics.
         | 
         | Outside of games, a common use case is "let me work on this
         | document offline for the duration of a flight and then resync
         | everything when I get to the airport". There may be hundreds of
         | changes and other users may also have huge batches of changes.
         | There can be a very complex reconciliation and megre process
         | because the changes are many and the data is precious. You
         | can't just say it's "part of the game" if your multi-user
         | spreadsheet goofs someone's tax statement.
        
         | smolder wrote:
         | I think the general term should be multi-actor. I understand
         | why people use the term multiplayer, I think: familiarity. But
         | players imply play which implies games, so it's not a great fit
         | outside of them.
         | 
         | The type of thing you're talking about with Warcraft is a
         | deterministic simulation. As long as you have deterministic
         | simulation, and you synchronize your inputs by buffering
         | sufficiently for the input latency of all the remote
         | participants, that ensures everything stays 1:1 between each
         | local simulation.
        
         | yazaddaruvala wrote:
         | Honestly, I just restarting playing World of Warcraft, and even
         | tho there have been amazing tech improvements to Blizzard's
         | engine and servers, there is so much simple stuff lacking.
         | 
         | I would honestly say it's likely better for game server
         | developers to be taking ideas from web development.
         | 
         | When it comes to CRDT's everyone keeps talking about text-
         | editing, honestly one of the hardest problems for a CRDT to
         | solve. Text is entirely unstructured, requires extremely high
         | precision on the merge of any conflict.
         | 
         | With a video game instead, players already used to "rewind and
         | replay" merges, they are already used to having a "miss" when
         | it should have been a "hit", and they roll with it because
         | while that is friction it's a game and precision isn't that
         | important. I'll say differently, precision is already not what
         | game servers provide.
         | 
         | At the end of the day, game servers are "just" low-latency chat
         | applications with "bots" that process the messages. I'd love to
         | see the game server written by the maintainers of WhatsApp /
         | Messenger / Slack / etc. That would be a truly scalable game!
         | 
         | P.S. If you think "chat is so simple, can't compare it to a
         | game server". Please just think through the design for FB's /
         | Discord's display widget that shows "currently active friends".
        
         | wtatum wrote:
         | This is an interesting idea but I do wonder if it can be
         | applied to generic collaborative editing "as is". The approach
         | for synchronized ticks (I think most RTS games do something
         | similar to the Blizzard approach) does require each participant
         | to "ack" the tick. You can see this when one participant is
         | lagging in that the simulation freezes for all players until
         | they acknowledge. The online gaming approach is to let other
         | players vote to kick someone who is lagging the game but that
         | probably wouldn't be considered suitable in something like
         | enterprise content management.
         | 
         | Can some variation of the approach be taken without the strict
         | requirement that: - There is a known set of participants at any
         | time - The list of participants has a reasonable well-bounded
         | size - Peers can communicate with each other directly (WebRTC
         | is available but there are sharp corners) - It's acceptable to
         | freeze all players based on one lagging player - Need an
         | unambiguous way to handle simultaneous player actions against
         | shared state
         | 
         | Most likely the "fix" for the friction is that you just slow
         | the ticks down _a lot_ compared to what you would do in a
         | multi-player game.
        
           | wongarsu wrote:
           | There are probably good techniques worth stealing from FPS or
           | MMORPGs. Those also usually run on a fixed tick rate of about
           | 50ms, but can't afford to have clients vote on it (FPS
           | because lag is unacceptable, MMORPGs because of user count).
           | There are various approaches in use, depending on whether you
           | want to just degrade the experience of the lagging
           | participant, or give preference based on action (e.g. FPS
           | resolve conflicts in favor of the person who fired).
        
       | matlin wrote:
       | There's actually an approach that sits in between this and formal
       | CRDTs.
       | 
       | I had the same insight that having a total ordered log of changes
       | solves a lot of the issues with concurrent updates. We solved
       | this by creating one sequence CRDT that ensures all events from
       | clients eventually end up in the same order and then our data
       | structures are just reductions over those events. It's a little
       | bit like a distributed, synchronized Redux store. This let us
       | avoid having to think in terms of CRDT types (LWW registers,
       | Grow-only sets, sequences, etc) and just think in terms of the
       | domain-specific business logic (e.g. "what should happen when
       | someone marks a todo as done, after it's deleted?").
       | 
       | There's a little but more to so if this is interesting to anyone,
       | I can write a more detailed explanation.
        
         | lifeisstillgood wrote:
         | How do you solve the ordering? Timestamps can be wildly out of
         | whack? Be interested to hear more
        
           | matlin wrote:
           | We use logical timestamps instead of datetime.
           | 
           | So each "event" has three properties that allow us to resolve
           | to a sensible order across clients: session (integer),
           | device_id (uuid), order (integer). The `session` number is
           | set by the highest `order` that your device has seen from the
           | server and the `order` gets incremented on each new event.
           | 
           | So an example you might have a sequence of events like this.
           | [session] [order] 0 1 0 2 0 3 ---sync --- 3 4 3 5 3 6
           | 
           | We can then sort all events by (session, device_id, order)
           | and we'll get any events that happen on a device to be sorted
           | in a row even if some other device created a bunch of
           | concurrent events at the same time.
        
             | Karrot_Kream wrote:
             | What about situations where two different clients both
             | begin to publish new events from the same starting point?
             | Those events can't be absolutely ordered right? If you're
             | thinking of Lamport-style happens-before relations, then
             | you can't enforce total ordering of those events. Do you
             | just arbitrarily mark one of those client event streams as
             | failed and force the client to absorb new state and try
             | again?
        
         | LAC-Tech wrote:
         | Isn't that essentially doing a CRDT, but the replica is the
         | whole datastore?
         | 
         | Ties in nicely with event sourcing - the ordered sequence of
         | events is the CRDT, and sending events between nodes are the
         | delta states.
        
         | derefr wrote:
         | Have you considered backing your application's state-
         | synchronization with a centralized CQRS/ES event store (e.g.
         | https://www.eventstore.com/)? You get the same semantics,
         | without paying the overhead of building them on top of a CRDT
         | abstraction; and the event store itself also "knows" (in the
         | sense that it supplies the framework and you supply the logic)
         | how to reduce over states in order to build snapshot states
         | ("aggregates") for clients to download for fast initial
         | synchronization.
        
           | matlin wrote:
           | We basically have that as well! We use CRDTs on the client to
           | allow for optimistic updates and fast replication of events
           | directly to other clients but we also do exactly what you
           | describe on the server so that a user loading the data for
           | the first time just gets the "snapshot" and then plays events
           | on top of it.
        
       | maclockard wrote:
       | I actually talked to the drifting in space team about our
       | experience implementing multiplayer. Really cool to see their
       | findings!
       | 
       | Our team ended up having a similar take away, we wrote a blog
       | post detailing our approach and experience here:
       | https://hex.tech/blog/a-pragmatic-approach-to-live-collabora...
       | 
       | For the most part is been surprisingly solid! I was very much
       | expecting to need to completely rewrite it by now, but its met
       | most of our present needs.
       | 
       | The only real shortcoming is not being able to support line
       | specific commenting. While we _could_ support it, any minor
       | update to the text would result in the comment being marked
       | 'stale' with a high false positive rate. I've considered adopting
       | CRDT/OT just for the text editing portion to solve this.
        
       | jawadch93 wrote:
        
       | toxicFork wrote:
       | If I want to have an app that needs to eventually persist user
       | state to a server but still work well when the user is offline,
       | e.g. a personal vocabulary app that can work with multiple
       | devices but also can work when the user is offline, so the user
       | can still define new words manually on both devices, and then
       | when the phone is back online, the state still makes sense in
       | both, should I use CRDT?
        
         | eternalban wrote:
         | A definitive no is the answer. This complexity is necessary
         | only if others care about your shared state aka vocabulary,
         | and, multiple write ops are being done on same records at the
         | same relative time on different devices. For example, a
         | document editor _could_ use CRDTs without looking silly. A CRDT
         | is an _eventually consistent_ distributed object but all its
         | complexity is there to handle concurrent modifications.
         | 
         | In your application, the consistency requirements are pretty
         | weak and easy to meet with a basic client/server architecture
         | with a well defined sync point set on its life-cycle. For
         | example, sync on boot, sync on net-reconnect, etc. Or, you
         | could use a simple P2P protocol so on boot you discover your
         | other connected devices and your apps can shake hand and
         | exchange notes to sync up as they modify their local state:
         | "Hey gang, + "KISS". "Keep it simple ..". Use json if you must.
         | 
         | Further simple characteristics of this app is that the state
         | ops on your dictionary are commutative (like a CRDT!). On one
         | device you add word "Foo" and on the other you add "Bar". You
         | do not care (do you?) that you added them in a certain order,
         | so you don't even have to bother with what is rather difficult
         | to do in a performant manner: maintain a total ordering over
         | the state of the system. (Think sharded ordered-sets..)
        
           | fwip wrote:
           | There's a few edge cases that you'd probably need to handle
           | (like adding the same word with two different definitions),
           | but probably few enough that you could do them ad-hoc and
           | punt to the user. ("Hey, merging these two sets failed,
           | here's the conflict, which do you want to keep?")
           | 
           | Something fancy like Operational Transforms or CRDT might
           | make your life easier if you find a library that's ergonomic
           | for your use case, but it's definitely not necessary.
        
         | jamil7 wrote:
         | The article talks about this, in the case where your clients
         | are syncing with a centralised server you don't necessarily
         | need to use CRDTs, and it might be simpler not to.
        
           | layer8 wrote:
           | This doesn't address how changes that were committed offline
           | are merged on the central server once the clients are online
           | again.
        
           | dinosaurdynasty wrote:
           | Basic CRDTs are still useful (idempotency, associativity, and
           | commutativity are powerful mental tools for all distributed
           | systems, even OT) but you can _massively_ simplify the CRDTs
           | if you assume a super-peer (aka centralized server all
           | updates go through).
           | 
           | Example: last-write-wins register, which is a (v, t) pair,
           | normally has a merge function of "return the pair with the
           | greatest t (with deterministic tie breaking for same t)", and
           | for a super peer just have v's and "the latest value to hit
           | the server wins".
        
       | robertlagrant wrote:
       | Good shoutout to Automerge in this.
        
       | syrusakbary wrote:
       | > CRDTs are designed for decentralized systems where there is no
       | single central authority to decide what the final state should
       | be. There is some unavoidable performance and memory overhead
       | with doing this. Since Figma is centralized (our server is the
       | central authority), we can simplify our system by removing this
       | extra overhead and benefit from a faster and leaner
       | implementation
       | 
       | I think this is on point. It's also super refreshing to see the
       | work on Aper [1] [2] (a Rust library implementing state machine
       | synchronization across a trusted network). Looking forward next
       | series of articles here!
       | 
       | [1]: https://aper.dev/
       | 
       | [2]: https://github.com/drifting-in-space/aper
        
         | fwip wrote:
         | Just to clarify, Aper also uses a client/server model, correct?
         | It appears as though it uses a central server to totally order
         | incoming change requests.
        
           | paulgb wrote:
           | Yes, it does.
        
         | paulgb wrote:
         | Thanks, Syrus!
        
       | eatonphil wrote:
       | Great breakdown of CRDTs vs replicated state machines (like Raft
       | or VSR) and the difference between eventually convergent and
       | globally orderable workloads, thanks for this Paul!
        
       | satvikpendem wrote:
       | Does anyone have any resources on making an offline-first app
       | from scratch (I don't want to use stuff like Firebase and more
       | importantly want to learn how it's done)? Like a task manager for
       | example that can sync with an online account but still works
       | completely offline.
        
         | LAC-Tech wrote:
         | I swear I've looked into firebase multiple times and I have no
         | idea how they actually detect and handle conflicts. They just
         | hand-wave that it works offline... not that comforting.
         | 
         | Regardless, there's no one size fits all for offline.
         | Approaches that work for append only data don't work for data
         | where multiple nodes are doing destructive updates. And even if
         | you're doing that, some data is amenable to deterministic
         | merging (ie, CRDTs) and some you need a user to step in to
         | decide (revision based multi-master systems, prior art being
         | Pouch/Couch or indeed git).
         | 
         | Basically if you can tell me what exactly a taskmanager is and
         | how it might be updated I might be able to say something more
         | helfpul!
        
       | lewisjoe wrote:
       | This is the most underrated problem with CRDTs:
       | 
       | > Both paths will allow us to ensure that each replica converges
       | on the same state, but that alone does not guarantee that this
       | state is the "correct" state from the point of view of user
       | intent.
       | 
       | In context of richtext editors, it's easy to converge a rich-text
       | JSON into a single state for all collaborators. What's difficult
       | is to ensure that the converged state is renderable as richtext.
       | For example, is there a table cell that was inserted where a
       | column was deleted?
       | 
       | I wrote a summary of this issue with CRDTs here -
       | https://writer.zohopublic.com/writer/published/grcwy5c699d67...
       | for those interested.
       | 
       | While I'm optimistic it's not impossible to solve, it's
       | surprising why most CRDT literatures don't go into these details
       | on correctness. Appreciate the author for putting this out.
       | 
       | PS: We currently use OT for collaboration in Zoho Writer
       | (https://writer.zoho.com/). I'm looking out for practical CRDT
       | ideas that works well with richtext.
        
         | truculent wrote:
         | To what extent is this an issue with CRDTs and to what extent
         | is it an issue with the data structure(s) used to represent the
         | state?
        
           | dkjaudyeqooe wrote:
           | I think the short answer is: if you want your CRDTs to
           | enforce application constraints you need to bake them into
           | the data structure.
        
             | truculent wrote:
             | That makes sense. Thank you.
        
         | [deleted]
        
         | jkaptur wrote:
         | The intuitively obvious approach here is to have some kind of
         | schema-guided or constraint-guided CRDT. That is, you get the
         | guarantee that not only do you end up with _valid JSON_ , you
         | get the guarantee that f(that_json) == true. I imagine this is
         | considerably easier said than done, but I'm curious if there's
         | research (or results!) in the area.
        
           | quickthrower2 wrote:
           | Parts of the tree with "don't merge children" might be ok,
           | then you show the user a conflict dialog when this happens.
           | 
           | Or, inspired by your comment: run f for the tree, if false,
           | search for the smallest subset where f is false and show the
           | conflict dialog for that. The user chooses "mine" or
           | "theirs".
        
         | kevsim wrote:
         | We also had the OT vs. CRDT discussion with regards to our
         | editor at Kitemaker (kitemaker.co), and also ended up with OT.
         | For us there were a number of factors, but (in addition to the
         | case you mentioned) it came down to the fact that (at that
         | time) most of the CRDT libs like automerge generate huge diffs
         | that we'd need to send over the wire and also that the open
         | source editor we use (SlateJS) had operations that just slotted
         | very nicely into OT.
         | 
         | Your editor feels really nice BTW. Well done!
        
         | expede wrote:
         | > I'm looking out for practical CRDT ideas that works well with
         | richtext.
         | 
         | Have you seen Peritext from Ink & Switch?
         | https://www.inkandswitch.com/peritext/ It's relatively new, but
         | is a CRDT aimed at rich text!
        
           | jamil7 wrote:
           | Their linked summary mentions Peritext and the current
           | limitations.
           | 
           | Agree it's exciting though and you can implement it fairly
           | easily yourself by following the blog post as it's so well
           | written and explained.
        
         | samwillis wrote:
         | Exactly, the "C" in CRDT is a little misleading. They are
         | "conflict free" in as much as they are able to merge and
         | resolve in the basic sense. That does not mean that the
         | resulting state is valid or meets your internal schema.
         | 
         | A good example of this is using Yjs with ProseMirror, it's
         | possible for two concurrent edits to resolve to a structure
         | that doesn't meet your ProseMirror schema. An example I have
         | used before is a <figure> element that can contain a single
         | optional <caption> element. It's possible for two users to add
         | that caption to the figure concurrently, Yjs will happily merge
         | its XMLFragment structures and place two captions in the
         | figure. However when this is loaded into ProseMirror it will
         | see that the document does not match its schema and dump the
         | second caption. Fortunately the captions are ordered and so the
         | same state will re resolved by all parties.
         | 
         | This is all ok if you are doing the merging on the front end,
         | but if you try to merge the documents on the server, not
         | involving ProseMirror, the structure you save, and potentially
         | index or save as html, will be incorrect.
         | 
         | Point is, it's still essential with CRDTs to have a schema and
         | a validation/resolution process. That or you use completely
         | custom CRDTs that encodes into their resolution process the
         | validation of the schema.
        
           | lewisjoe wrote:
           | Your example is spot on :)
           | 
           | I believe we shouldn't lock our richtext state to
           | Prosemirror's specs. The state should better be modelled as a
           | CRDT and ensuring correctness should be part of CRDT
           | operations. This way we unlock other possibilities like
           | server-side merging and the editing library can be swapped
           | (eg. lexical)
           | 
           | All this sounds right but it's lot of work.
        
           | lijogdfljk wrote:
           | Yea, this is kinda why i don't understand a lot of these off
           | the shelf CRDT solutions. Like CRDTs for JSON.
           | 
           | I'm still _trying_ to learn CRDTs; but early on in my
           | learning process i had the thought that CRDTs don 't have
           | mere data types - as you say, they have Data Type + Conflict
           | Resolution bundled into one. It's not enough to make a String
           | type, because different types need to be resolved
           | differently. Plus some types of resolution have a lot more
           | metadata, so you want to choose the best fitting one.
           | 
           | I found that my goal to make SQL + CRDT meant i had to expose
           | this combo of Type + Resolution to the end user. It seems
           | essential to how the app works.
           | 
           | But maybe i don't know anything, /shrug. I'm still learning
           | them.
        
             | toomim wrote:
             | Yes! We call this the "Merge Type" of data in the braid.org
             | group.
             | 
             | Each datum as both a _data type_ and a _merge type_. The
             | programmer just needs to specify these two types, and then
             | the programming runtime can choose which synchronization
             | algorithm to use.
        
           | naasking wrote:
           | > Point is, it's still essential with CRDTs to have a schema
           | and a validation/resolution process. That or you use
           | completely custom CRDTs that encodes into their resolution
           | process the validation of the schema.
           | 
           | I'm surprised that MS's concurrent revisions haven't taken
           | off because this is what they do by default: you specify a
           | custom merge function for any versioned data type so you can
           | resolve conflicts using any computable strategy.
        
         | CGamesPlay wrote:
         | I don't think I've seen any implementations of this, but the
         | table example you listed is solvable. The insight is to stop
         | thinking of the table as a bundle of markup and think about it
         | as itself a primitive CRDT type that contains text in the table
         | cells.
         | 
         | The columns are a CRDT set of column IDs. The rows are a CRDT
         | map of row IDs to CRDT maps of column IDs to cell values, where
         | the cell values are again the same type as your document text.
         | Column headings are a specially formatted row. Rows are ordered
         | using a sequence type.
         | 
         | Removing a row is a normal CRDT map deletion. Removing a column
         | is a normal CRDT map deletion from the columns map. Rendering
         | the table will ignore map entries that don't correspond to real
         | columns.
        
           | georgelyon wrote:
           | This comment really needs a "yo dawg" meme.
        
             | CGamesPlay wrote:
             | I mean, JSON is objects all the way down, CRDTs are
             | conflict-free maps all the way down.
        
           | inglor wrote:
           | So with your solution, the inserted table cell would
           | "disappear" from what is rendered?
           | 
           | That's not necessarily (depends on the product) great or even
           | acceptable user experience. As a user I would certainly want
           | a way to know there is an "orphaned" "edited after delete"
           | cell to salvage the data inputted or understand the confusion
           | between editors.
        
             | jameshart wrote:
             | What you would 'certainly want' depends very much on the
             | application this is implemented in. The data isn't gone
             | from the CRDT table unless the program decides it is, so
             | many ui choices are possible.
        
             | georgelyon wrote:
             | The best UX resolution to this kind of issue I have seen is
             | to display "This table cell was added by Bob concurrently
             | to Alice removing its row. How would you like to proceed:
             | recover the row or ignore the change?"
        
               | drfuchs wrote:
               | Who gets the pop-up? If both Alice and Bob, and they
               | simultaneously give conflicting replies, you're right
               | back to where you started.
        
               | georgelyon wrote:
               | Yes, this is the correct behavior. If Alice and Bob
               | disagree about whether or not that row should be in the
               | table you aren't going to fix it with math. More often
               | than not, the expected outcome will be obvious (i.e.
               | spelling correction in a deleted cell can be ignored) and
               | if it isn't obvious, there will need to be human
               | interaction to determine the best path forward.
        
               | layer8 wrote:
               | This isn't really a problem. If Alice and Bob keep
               | disagreeing, then it is appropriate that a conflict
               | remains unresolved.
        
               | Dylan16807 wrote:
               | You're not back to where you started. You've gone from a
               | technical conflict to a direct disagreement about what is
               | supposed to happen. There's no technical solution to an
               | edit war. So maybe ask anew about the conflicting
               | answers, or pick one arbitrarily at that point. Or make
               | them play rock paper scissors. But it still helps to ask
               | _before_ it becomes a direct disagreement.
        
               | jlokier wrote:
               | Usually that doesn't happen because their replies don't
               | conflict within the time interval of a round trip between
               | the two of them, so one can see the other's action first,
               | and decide to agree.
               | 
               | But if it does happen, either they get a new popup,
               | eventually, saying that their conflict-resolution
               | decisions also conflicted and what would they like to do,
               | or the system decays to a CRDT-style pre-determined
               | resolution rule for the second-order conflict -- with
               | similar problems to the original CRDT resolution rule we
               | tried to avoid. Things like "give priority to the person
               | who chose to keep information instead of deleting", "give
               | priority to the person with the biggest random number",
               | "merge them by keeping information along with a note in
               | the document about the attempt to delete the column", or
               | "delete the column and put the edited cell in a note".
               | But this time, only when the users conflict after the
               | first popup, so it doesn't occur as often.
               | 
               | I think you're describing a metastability effect which
               | occurs in all distributed systems that don't use pre-
               | determined resolution rules like pure CRDTs. It happens
               | in network protocols, distributed transactions, and
               | consensus protocols in general. If the process you use to
               | resolve depends on input from multiple parties who don't
               | have a way to communicate and could choose conflicting
               | values, there's a non-zero probability of needing another
               | round or to escalate upwards to a higher-level system to
               | resolve.
               | 
               | In many technical systems, _provided there 's a way to
               | communicate_ it's not difficult to make the probability
               | of repeated rounds or escalations tend arbitrarily close
               | to zero (but not actually zero). If there's a possibility
               | of network partition, though, the conflict can come back
               | later when communication returns; there's no way to
               | completely make it go away.
        
               | tomhallett wrote:
               | Stupid question - are there any good resources on User
               | Experience patterns/recipes for implementing OT/CRDTs?
               | 
               | Context - a lot of the literature is about the backend
               | algorithm. But that still leaves individual teams
               | tripping over the frontend interaction design and
               | usability.
               | 
               | Example of what I'm looking for - hex.tech has a video
               | tutorial of their "Cell locking" feature, where the cell
               | gets disabled and a "Mac Lockard has taken control of a
               | cell you were editing" toast notification appears. [1].
               | This is fully baked and a recipe which if I put in a jira
               | ticket could actually get implemented. :)
               | 
               | I would love to read a book which is a "Don't make me
               | think" and has an example with "This table cell was added
               | by Bob concurrently to Alice removing its row. How would
               | you like to proceed: recover the row or ignore the
               | change?". All the UI's in my head would look like
               | "Clippy" from MS Word - or "Pipey" from Silicon Valley
               | [2], haha.
               | 
               | 1: https://hex.tech/blog/a-pragmatic-approach-to-live-
               | collabora...
               | 
               | 2: https://www.youtube.com/watch?v=E2YcOV5C2x4
        
               | toomim wrote:
               | This is close: https://decentpatterns.com/library/
               | 
               | ...but my general belief is that UI patterns need to
               | evolve as (1) specific UIs before they become (2) general
               | patterns, and we simply don't have enough good solutions
               | yet to make a pattern library out of them.
        
         | jdvh wrote:
         | For the table situation we solve this by have a generic concept
         | of a "card" that is rendered as a table cell when it's the
         | grandchild of a table (table > tr > td) but otherwise it's just
         | a standalone item, much like a markdown callout.
        
       | aboodman wrote:
       | This is a cool approach. It reminds me of statebox by mochimedia:
       | https://github.com/mochi/statebox.
       | 
       | If I'm understanding correctly, it requires the mutations to be
       | deterministic in order for the nodes to converge.
       | 
       | Replicache (replicache.dev - my thing) takes a similar approach
       | except it does _not_ requires the mutations to be deterministic,
       | which is very useful because it enables the mutations to reach
       | out to other services, be authenticated, etc.
       | 
       | Both the idea here and Replicache's approach are closely related
       | to game networking. If you are interested in these ideas, a
       | really excellent set of content is:
       | https://www.gabrielgambetta.com/client-server-game-architect....
       | 
       | If you are interested in how Replicache's sync protocol works and
       | achieves transactional changes with server authority, it is
       | described here: https://doc.replicache.dev/concepts/how-it-works.
        
       | anonymousDan wrote:
       | Am I missing something? I'm all for SMR when possible but the
       | whole point of moving to something with weaker consistency is
       | that you don't want to rely on availability of a central server
       | (or quorum of servers in the case of SMR)?
        
       | LAC-Tech wrote:
       | First off, well done on examining your problem domain, realising
       | that your operations are not in-fact commutative, and realising
       | you need to discard your model. A lot of people would have just
       | decided it was 'agile' to 'iterate' over there fundamentally
       | broken solution until they had coded themselves in a corner, but
       | I digress.
       | 
       | I would note that your path 2 is essentially another kind of CRDT
       | though. Instead of your replica being in individual document,
       | your replica is now the entire DB, and your changes are a grow
       | only sequence. (Wish I could remember how pointed this out to me
       | on HN, my mind was blown).
        
       | wim wrote:
       | For certain data structures the "pure" CRDT approach can get
       | tricky. In our case, we're building a collaborative notes/task
       | IDE [1] which is a hybrid of a text editor and an outliner at the
       | same time. So you can perform all the usual text editor
       | operations on a document as if it was text, but the underlying
       | data structure is a tree (or graph really).
       | 
       | So just like the example in article we use a hybrid approach and
       | global order for certain (tree) operations to enforce invariants
       | on the data structure when syncing. That still works with
       | offline-mode and end-to-end encryption, but as we don't have the
       | requirement to get rid of a central server we can get away with
       | this model.
       | 
       | [1] https://thymer.com
        
       | rubyist5eva wrote:
       | "You might not (probably need) need $FANCY_TRENDY_THING"
        
       | acrefoot wrote:
       | Whatever Clickup uses for their Clickup docs, it reliably gets
       | into a confused state and data is lost, during which not all
       | editors show the same state locally.
        
       | hardwaregeek wrote:
       | I really loved the Signals and Thread episode[0] on state machine
       | replication. It basically said "you could do all of these fancy
       | distributed systems techniques that assume a non total ordering
       | of events, ooooorr you could have a main server that determines a
       | total ordering for events and make your life infinitely easier"
       | 
       | [0]: https://signalsandthreads.com/state-machine-replication-
       | and-...
        
         | Joeri wrote:
         | It breaks down as soon as you have an offline scenario. You can
         | only assume the server is authorative if there will not be
         | network partitions or if writes are not accepted on the other
         | side of the network partition (meaning on the client). As soon
         | as the client accepts writes while offline those writes become
         | authorative and will need to be merged with whatever changes
         | the server saw.
        
           | layer8 wrote:
           | It also breaks down when you have a passive/non-transactional
           | server, like an app syncing its state via files on Dropbox.
        
         | derefr wrote:
         | Or even just have a simple central presence server hand out
         | randomized-per-epoch node-IDs; have all nodes in the system use
         | node-IDs as the prefix of the event temporal ordering key per
         | "tick"; and trust your nodes to self-report their node-ID
         | correctly in all events they send. Then your data protocol can
         | still be p2p rather than all events having to flow through the
         | server (if that benefits you in your problem domain) while
         | getting all the same advantages of total ordering.
         | 
         | And this works great for anything enterprise-y, where all the
         | nodes are operated by one party or a trusted consortium of
         | parties, and no nodes are are intentionally trying to cheat by
         | misreporting their assigned node-IDs. You only _need_ central
         | servers (or virtual central servers, a.k.a. blockchains) when
         | you get to things like MMO games, or HFT trading bots talking
         | to an exchange, where every client has a reason to want to
         | pretend it 's the one that "should" go earliest in the per-tick
         | transaction ordering.
        
           | lifeisstillgood wrote:
           | But if I have a conflict (ie one node changes a column
           | datatype, another adds data to column) how do i know which
           | one went first _inside the same tick?_
           | 
           | If I understand you, there will be a tick between 12.01 and
           | 12.02, and node A gets token 1234, and node B gets 5678.
           | Events are published p2p, and the ordering is 5678_+9seconds
           | and 1234_+12seconds - node A (1234) admits it has done the
           | action 3 seconds after B. But this still depends on
           | synchronised clocks I think?
           | 
           | Also you still need to resolve each type of conflict (hence
           | having CRDTs really helps) - we know there is a conflict
           | between A and B changes, but how do we _resolve_ them?
           | 
           | Maybe I dont get it :-)
        
             | meheleventyone wrote:
             | Yeah using an event stream like this requires synchronising
             | the clock of participants. It's similar to rollback
             | networking in games. Peers can recover from receiving late
             | updates but if all the updates are late from one peer the
             | situation still degrades. To avoid that participants run
             | time per tick slightly shorter or longer to keep within a
             | desired window.
        
         | pookha wrote:
         | I thought James Long's 200 or three hundred lines of code to
         | implement a real-world CRDT solution (Actual) was pretty cool.
         | Seemed to solve a lot of issues revolving around heavy duty
         | servers.
        
         | [deleted]
        
       | mathgladiator wrote:
       | Another reason to not need a CRDT is privacy. A CRDT assumes that
       | every participate has equal rights to all data.
       | 
       | I'm biased to simple WebSockets with OT using JSON deltas, and
       | I'm building a platform that greatly simplifies how to
       | efficiently build collaborative apps: https://www.adama-
       | platform.com/
        
         | CGamesPlay wrote:
         | These are orthogonal to one another. For read-protection,
         | encryption is a viable solution: specific paths into your data
         | structure (e.g. certain keys in a map) can be encrypted with a
         | key that isn't provided to all participants. For write-
         | protection, the solution is the same as with any syncing
         | service: when receiving an update, if you don't authorize the
         | sender of the update, you don't accept it.
        
         | samwillis wrote:
         | > A CRDT assumes that every participate has equal rights to all
         | data.
         | 
         | Within a single data structure yes, so within a "room" or
         | "document". If you need to partition right to the data
         | (read/update) then you should use separate structures for each
         | area.
         | 
         | Yjs implements this as as "sub documents".
        
         | charles_f wrote:
         | That sounds like a weak counter argument to me, 1) crdt is
         | focusing on the conflict resolution part, access control is not
         | in scope so you need to implement on top. 2) if you are already
         | implementing access control, then filtering out what people
         | don't have access to doesn't seem much more complexity. If you
         | go the full client-side way, you can also add a cryptographic
         | layer to whatever needs hiding 3) one thing I don't see in your
         | solution is the offline aspect to it. With a central authority
         | in the middle and online connectivity, conflicts become
         | unlikely and much smaller, but with offline support, documents
         | et can evolve in drastic ways where having a formalized
         | strategy like what crdt offers makes sense
        
         | pookha wrote:
         | CRDT's can be encrypted end-to-end. Privacy seems to be one of
         | the selling points.
        
         | pattrn wrote:
         | Is this really the case? While CRDT's are designed to work
         | peer-to-peer, they don't need to be fully connected to all
         | clients. Forcing the synchronization through controlled nodes
         | (a server or cluster) allows adding read/write permissions.
         | Depending on the use case, it may require additional logic for
         | reversing operations before propagating to other clients, or in
         | some cases forcing a client to revert to a snapshot (this can
         | be a bit complex). That's an approach I've used in the past.
         | 
         | Have I overlooked something (highly likely)?
        
       | dxchester wrote:
       | We have used Automerge a bunch, but found that there is a
       | threshold where beyond a given document size, performance gets
       | exponentially worse, until even trivial updates take many
       | seconds' worth of CPU. That is often how it works when the
       | document end state is exclusively the sum of all the edits that
       | have ever happened.
       | 
       | Our answer was to reimplement the Automerge API with different
       | mechanics underneath that allows for a "snapshots + recent
       | changes" paradigm, instead of "the doc is the sum of all
       | changes". That way performance doesn't have to degrade over time
       | as changes accumulate.
       | 
       | Project is here: https://github.com/frameable/pigeon, with some
       | benchmarks: https://github.com/frameable/pigeon/wiki/Benchmarks
       | in the wiki...
        
       | supermatt wrote:
       | > Conflict-free replicated datasets (CRDTs)
       | 
       | Conflict-free Replicated Data Types, surely?
        
         | paulgb wrote:
         | Oof, in the first sentence, too, that's embarrassing. Fixed,
         | thanks!
        
           | supermatt wrote:
           | No problem :)
        
       | nobu-mori wrote:
       | CRDTs have been on HN a lot recently. I'm working on a database
       | that deals in events rather than raw data. Application developers
       | specify event handlers in JavaScript. The database layer then
       | takes the event handlers and distributes them to the clients.
       | Clients receive the raw stream of events and reconcile the final
       | data state independently. The key aspect is all the event
       | handlers are reversible. This allows clients to insert locally
       | generated events immediately. If any remote events are received
       | out-of-order, the client can undo events, insert the new events,
       | and reapply everything on top of the new state.
       | 
       | I'm curious how many people have a need for solving the real-time
       | multi-player use case. The database I'm working on was inspired
       | by the networking code used in fighting games (rollback netcode),
       | but I'm always curious to learn about additional use cases.
        
         | no_wizard wrote:
         | I think the most common outside of gaming is collaboration
         | software. Think whiteboards, docs (e.g. Google Docs) and other
         | collaborative tools.
        
       ___________________________________________________________________
       (page generated 2022-12-05 23:00 UTC)