[HN Gopher] CRDTs: The Hard Parts [video]
       ___________________________________________________________________
        
       CRDTs: The Hard Parts [video]
        
       Author : benrbray
       Score  : 302 points
       Date   : 2020-07-11 13:52 UTC (9 hours ago)
        
 (HTM) web link (martin.kleppmann.com)
 (TXT) w3m dump (martin.kleppmann.com)
        
       | alextheparrot wrote:
       | This spawned an idea, which I feel a need to post.
       | 
       | It seems the hard part about CRDT is choosing the correct
       | commutative function, as merging two operations in line with user
       | intent is non-trivial. Would it be possible to use a combination
       | of superposition (Please correct me if this word is wrong) and
       | pruning to derive user intent?
       | 
       | The idea being that instead of combine being (A, A): A (A
       | commutative semigroup), couldn't we represent the operation as
       | (A, A): Set[A] and have a way of showing the user set results in
       | a way that their next operation shows us the "correct"
       | interpretation.
       | 
       | He's doing this implicitly with the file tree example, wherein
       | operations that don't create trees usually defy user expectations
       | (Symlinks aside), so he decides to prune those choices from the
       | result Set[A] before introducing a heuristic to further prune the
       | set. There's still an issue of users having opposite intent, but
       | at that point it just seems like we need to introduce "Set
       | pruning" or "Superposition collapse" operations as a user-level
       | primitive and then rely on recursion to resolve the possible
       | nesting of these result sets.
       | 
       | Does this riff with anyone / does anyone have further thoughts on
       | this formulation?
        
         | benrbray wrote:
         | This sounds similar to the concept of a "multi-value register"
         | [1] I've seen in a few places while reading about CRDTs the
         | last couple days. Is that what you're looking for? The idea is
         | that each process maintains a vector clock with the latest
         | timestamp from all other processes, and we don't delete values
         | until one version can be verified to be "later" than all the
         | others.
         | 
         | [1] Section 3.2ish of
         | https://hal.inria.fr/inria-00555588/document
        
           | alextheparrot wrote:
           | This was 100% the path I think I was traveling (Thank you for
           | the link). I had a hard time grocking the complete
           | specification from the paper, but the README of this
           | repository that implements CRDTs in that frame was helpful
           | [1].
           | 
           | The core part is separating the "merge" from the "resolve"
           | state. Merging state can be done in a variety of ways, so in
           | the default formulation there seems to be a focus on making
           | the "merge" operation also "resolve" to only one value, when
           | really there could be multiple formally valid merges that the
           | client may desire (Which is part of the difficulty that the
           | video notes as he proposes a variety of possible file system
           | mutations for a set of two uncoordinated operations).
           | 
           | The cleanest clarification of my thought process is similar
           | to the following:
           | 
           | Given the operation Merge(4, 2), I could propose that there
           | are two valid ways to perform this merge, addition and
           | multiplication. This means the result would be either 6 or 8.
           | The act of returning a single value (6 or 8) changes that
           | proposal into a statement, though, which is the "resolve".
           | 
           | One subversion of this restriction is to return a set of all
           | possible results for the operations we think are valid, so
           | {6, 8}. At this point, the user can say (Either explicitly or
           | implicitly) "I actually want 6" and we resolve it to the
           | single value of {6}. There are also special cases like
           | Merge(2, 2), where this whole situation is especially
           | ergonomic because the merge operations are equivalent.
           | 
           | There are problems, of course, with this approach.
           | 
           | One issue is the need to categorize all possible operations
           | the user may think are valid for Merge(4,2). If the user
           | intended to do division, the result set proposed above will
           | not include the state that they would expect. Still, this
           | seems more general now, as we just need to gather the set of
           | operations that the user may think are valid instead of
           | assuming which one is valid. There's also a ranking problem
           | that then exists at the UX level, as we need to find a way to
           | cleanly propose this set of alternatives.
           | 
           | Another issue is, of course, exists if both users propose
           | conflicting resolutions (Actor1 says "I want multiplication"
           | and Actor2 says "I want addition"). This is the issue with
           | decoupling the "merge" and "resolve" steps, as now we may
           | cause a fork in the model which causes a fundamental
           | divergence of the collaborator's data.
           | 
           | [1] https://github.com/rust-crdt/rust-crdt
        
       | mike_red5hift wrote:
       | Does anyone take issue with the fact that CRDTs seem to require
       | keeping a history of every change ever made to the document?
       | 
       | Seems like it could get unwieldy very fast. Especially, in the
       | face of a bad actor spamming a document with updates.
       | 
       | I've considered using CRDTs in a few projects now, but the
       | requirement to keep a running log of updates forever has ruled
       | them out. I've ended up using other less sound (more prone to
       | failure), but more practical methods of doing sync.
       | 
       | Perhaps, I'm missing something. Wouldn't be the first time.
       | 
       | Are there alternatives without this requirement, or that would at
       | least allow a cap on the update log?
        
         | topicseed wrote:
         | Depending on the use case, you can at a specific and suitable
         | time create a snapshot of the value and clear the history of
         | changesets.
         | 
         | I do that with my OT collaborative editor.
        
         | neeeeees wrote:
         | "Ever" in practice is usually a few months or years.
        
         | appwiz wrote:
         | > I've ended up using other less sound (more prone to failure),
         | but more practical methods of doing sync.
         | 
         | Could you share the alternative methods that you've used?
        
           | mike_red5hift wrote:
           | Stuff like diffing multiple json documents and merging adds,
           | updates and deletes where conflicts did not exist and
           | overwriting conflicting attributes based on timestamps
           | (latest wins). There's some reasonably good jsonpatch
           | libraries out there that will do the heavy lifting.
           | 
           | Plenty of room for problems to arise, but it did not require
           | keeping a log of updates. My use cases did not require real-
           | time collaboration and the structure of the document was
           | known beforehand, though.
        
           | topicseed wrote:
           | Operational Transformation, though not the easiest to grasp,
           | offers a solution to the same problem that is perhaps more
           | loyal to the original user intent.
        
             | heavenlyblue wrote:
             | OT is just a fancy name for a CRDT
        
               | alextheparrot wrote:
               | Is this really true (I just watched the video, so still
               | learning these concepts, so please be kind)?
               | 
               | The linked video makes a clear distinction that OT and
               | CRDT are different, as OT has the idea of a centralised
               | coordinator to ensure consistency by mutating the
               | proposed, conflicting operations whereas CRDT uses
               | commutativity to attempt to make conflicting operations
               | an inaccessible state
        
               | sagichmal wrote:
               | No, CRDTs and Operational Transforms are not the same at
               | all.
        
               | topicseed wrote:
               | Not true whatsoever.
        
         | vivekseth wrote:
         | You don't actually need to keep a log of all events. You only
         | need to keep around just enough information to merge any 2
         | replicas.
         | 
         | Here's a very simple example: If there are only 5 users
         | concurrently editing a document and each user has seen
         | operations o1...oN, then you can safely compress the data for
         | o1...oN.
         | 
         | Depending on the CRDT type you may not need to store any log at
         | all. For an add-only set, for example, you only need to store
         | the elements.
         | 
         | I think what's harder to solve is the metadata overhead
         | problem. Most CRDT based text editors have a huge per-character
         | overhead. As Martin mentioned, Automerge used to have a
         | overhead of ~200bytes per character, but using a new binary
         | encoding format they were able to reduce the overhead down to
         | 1.5-7 bytes per character. (https://speakerdeck.com/ept/crdts-
         | the-hard-parts?slide=67).
        
         | im_down_w_otp wrote:
         | That's not quite right. There are a lot of sound strategies for
         | culling/merging/resolving CRDT state in-part or in-total
         | depending on the use case and/or the topology of the system
         | that interacts with the CRDT.
         | 
         | It's possible to construct a pathological case where it's
         | impossible to soundly GC the CRDT state, and where you have to
         | keep around an arbitrarily long list of per-agent updates or
         | list of agents forever, but that shouldn't be the normative
         | case.
        
           | zzzcpan wrote:
           | Yeah, CRDTs don't require keeping history of every change
           | forever or at all. In other words, all the changes coming
           | from a bad actor can be merged locally into a single change
           | or a small set of changes or whatever is appropriate that
           | will actually be propagated to other nodes. Plus nodes can
           | easily know how far all of them have progressed and drop
           | history before the most far behind point. Only during outages
           | history should grow a bit more than usual.
        
             | mike_red5hift wrote:
             | Do have any pointers to javascript libraries that work this
             | way? All the libs I've looked at (not recently) require the
             | server to keep a running log of updates for the life of the
             | document.
             | 
             | For example, the automerge library linked elsewhere on this
             | thread requires it.
             | 
             | My understanding (flawed) is that you need to keep all the
             | changes on the server because you never know how long it's
             | been since a client has pulled/pushed changes to a
             | document.
             | 
             | I guess arbitrary limits based on number of updates or time
             | can be imposed, but I haven't seen libraries that do that.
             | 
             | Thanks.
        
               | rapnie wrote:
               | Did you look at hypermerge, also in the automerge org? It
               | is based on DAT protocol and hypercore, and the p2p
               | version of automerge I think (just found the project).
               | 
               | https://github.com/automerge/hypermerge
        
               | mike_red5hift wrote:
               | No I didn't. I'll check it out. Thanks!
        
             | zmj wrote:
             | > Plus nodes can easily know how far all of them have
             | progressed and drop history before the most far behind
             | point.
             | 
             | That may be easy coordinating servers that are almost
             | always online, but it's definitely not easy for
             | desktop/mobile clients that go offline for long periods
             | (and sometimes don't come back).
        
         | rzzzt wrote:
         | You could introduce a quiet period in which no updates are let
         | in, coalesce changes into a snapshot, clear history and open
         | the floodgates again.
        
         | ghj wrote:
         | We tend to overestimate how often a document will be modified.
         | For example wikipedia seems to keep "a history of every change
         | ever made to the document" and they seem to be managing fine.
         | If there are bad actors, what you need is better moderation.
         | Having easy undo helps a lot in that case!
        
       | infogulch wrote:
       | There was a recent post of a series that dives into CRDTs [1] .
       | Someone linked to a paper, _Chronofold: a data structure for
       | versioned text_ [2] dated this April, that attempts to map CRDT
       | semantics onto text editing. It 's still on my list.
       | 
       | [1]: https://news.ycombinator.com/item?id=23737639
       | 
       | [2]: https://arxiv.org/pdf/2002.09511v4.pdf
        
       | jwr wrote:
       | Incidentally, Martin's book ("Designing Data-Intensive
       | Applications") is excellent and highly recommended reading. If
       | you find yourself saying things like "this database is ACID
       | compliant", "we have an SQL database with transactions, so we're
       | fine" or "let's just add replication to Postgres and we'll be
       | fine", you need to read this book.
        
         | swyx wrote:
         | > If you find yourself saying things like "this database is
         | ACID compliant", "we have an SQL database with transactions, so
         | we're fine" or "let's just add replication to Postgres and
         | we'll be fine", you need to read this book.
         | 
         | can you elaborate why? are these sentences fundamentally wrong?
         | they dont appear so.
        
           | codepie wrote:
           | I found this short paper really interesting to understand the
           | fundamental limitations of distributed systems: Consistency
           | Tradeoffs in Modern Distributed Database System Design
           | http://www.cs.umd.edu/~abadi/papers/abadi-pacelc.pdf . The
           | theorem introduced in the paper (PACELC) provides a better
           | way to understand the design of modern distributed systems
           | than CAP.
        
           | adamnemecek wrote:
           | In the context of databases wrong is not binary. You need to
           | understand the failure models of your database and the
           | implications of your parameters.
        
           | romanhn wrote:
           | It's not that they are fundamentally wrong, but rather that
           | these seem like simple, black-and-white statements/decisions,
           | when they are anything but. When it comes to databases
           | (especially distributed ones), the devil is in the details,
           | and things like ACID compliance can surprisingly mean vastly
           | different things in different systems.
           | 
           | +1 from me on Martin's book, btw. One of the best technical
           | books I've ever read.
        
           | bmillare wrote:
           | It's simplistic. When you move into distributed systems,
           | these claims are not sufficient to indicate "so we're fine".
           | Even with the ACID, the "I" has multiple levels, and often
           | they get things wrong (see Jepsen).
        
           | robto wrote:
           | The entire book is about when and how these statements turn
           | out to not be absolutely true. And it's not a short book. I
           | don't have my copy in front of me right now, so I won't get
           | into specifics. But I consider it one of the most important
           | books I've read, if only for making me realize how difficult
           | it is to get distributed systems correct. Or, rather,
           | learning that getting distributed systems correct is
           | impossible and what sort of tradeoffs you can make in order
           | to keep things mostly working.
           | 
           | And it turns out that most services that I find myself
           | working on these days are distributed systems, so having a
           | healthy respect for all the ways things can break is a useful
           | place to be.
        
         | filipn wrote:
         | Oh man, I was just about to recommend this book, you beat me to
         | it. I am currently reading it, and I cannot recommend it
         | enough. If you're interested in distributed systems and
         | learning about database fundamentals, it really is a must read.
        
       | hencq wrote:
       | First off, this is an excellent talk. The presenter explains the
       | different topics in a way that even a layman like me can easily
       | follow along. The compression scheme he presents at the end seems
       | very interesting as well.
       | 
       | I do wonder if in practice OT isn't a simpler solution for most
       | applications. He mentions the differences in the beginning of the
       | presentation and the main advantage of CRDTs is that they don't
       | need a central server. It seems to me that for e.g. a web app you
       | have a central server anyway so all the extra complexity of CRDTs
       | isn't needed. I know almost nothing about this though, so would
       | love for someone more knowledgeable to explain why I might be
       | wrong.
        
         | vivekseth wrote:
         | For single server/database web-apps, CRDTs might be useful
         | because they allow offline edits, and (to me at least) they are
         | simple to understand and implement. OT does allow offline edits
         | too, but (I think) has poor performance if there are many
         | offline edits.
         | 
         | For multi server/database web-apps, CRDTs might be useful
         | because they reduce the centralization required for
         | collaboration, and increase the fault tolerance. In a load
         | balanced web app, different clients could connect to different
         | servers/databases and stil achieve eventual consistency when
         | those back-end systems sync up. If any of those systems go
         | down, in theory traffic could be routed to other systems
         | seamlessly.
        
       | mdptt wrote:
       | What an excellent talk, many thanks.
       | 
       | One idea comes to my mind (a bit out of topic): as we can store
       | the complete editing history in a document (even including mouse
       | movements) with a fairly small overhead using the ideas of
       | automerge, would this be useful for distinguishing texts that are
       | generated by machines and by humans? Or to detect plagiarism?
        
       | benrbray wrote:
       | See also automerge [1], discussed at the end. They are currently
       | working on performance improvements [2]. Quoting from the repo,
       | "automerge is a library of data structures for building
       | collaborative applications in JavaScript:
       | 
       | * You can have a copy of the application state locally on several
       | devices (which may belong to the same user, or to different
       | users). Each user can independently update the application state
       | on their local device, even while offline, and save the state to
       | local disk.
       | 
       | * (Similar to git, which allows you to edit files and commit
       | changes offline.)
       | 
       | * When a network connection is available, Automerge figures out
       | which changes need to be synced from one device to another, and
       | brings them into the same state. (Similar to git, which lets you
       | push your own changes, and pull changes from other developers,
       | when you are online.)
       | 
       | * If the state was changed concurrently on different devices,
       | Automerge automatically merges the changes together cleanly, so
       | that everybody ends up in the same state, and no changes are
       | lost. (Different from git: no merge conflicts to resolve!)"
       | 
       | [1] https://github.com/automerge/automerge [2]
       | https://github.com/automerge/automerge/pull/253
        
         | Someone wrote:
         | That sounds similar in functionality to the protocol for Google
         | Wave, now Apache Wave, and discontinued in early 2018
         | (https://en.m.wikipedia.org/wiki/Apache_Wave)
         | 
         | If so, what's different? Better algorithms, assuming fewer
         | collaborators and/or less frequent updates? Or is my
         | understanding that these are similar in functionality
         | incorrect?
        
           | jahewson wrote:
           | Google Wave's algorithm is now used by Google Docs, it's
           | Operational Transformation (OT) [1]. You can approximately
           | view it as a special form of CRDT but the theory underpinning
           | each have separate origins.
           | 
           | OT is efficient and fast at real-time editing with a central
           | server, whereas CRDTs are more capable in distributed/p2p
           | scenarios but bring significant overhead as they store a lot
           | of metadata.
           | 
           | 1) https://en.m.wikipedia.org/wiki/Operational_transformation
        
             | neilk wrote:
             | You seem to be implying that OT requires a central server.
             | OT was designed for peer-to-peer, at least as far as I
             | know.
             | 
             | My OT knowledge isn't deep; perhaps in practice, some
             | implementations like Google docs use an authoritative node?
        
         | benrbray wrote:
         | See also the PushPin project [3], which uses React+Automerge,
         | along with Capstone [4]
         | 
         | [3] https://automerge.github.io/pushpin/ [4]
         | https://www.inkandswitch.com/capstone-manuscript.html
        
         | flir wrote:
         | So if I change "foo" to "moo" and you change "foo" to "boo",
         | who wins?
        
           | wffurr wrote:
           | Depending on timing, you end up with "mboo" or "bmoo".
        
             | taeric wrote:
             | So, literally a change nobody wanted? Seems like it would
             | not work in any real sense. I change it to moo and as a
             | line checking something on it. You change it to boo and add
             | a line, as well. Congrats, now neither of our additional
             | lines makes sense...
             | 
             | This really feels like a solution in search of problems.
        
               | heavenlyblue wrote:
               | What do you expect to happen instead?
        
               | taeric wrote:
               | Things are always sequenced. One will win, and the other
               | will have to redo their change on top. Partially applying
               | the change at the word level just seems way too fraught
               | with false edits. It is already a source of a lot of bugs
               | at the file and project level.
        
               | djur wrote:
               | That is exactly what's happening here -- both deleted the
               | "f", so there's no conflict, and their inserts of "b" and
               | "m" are executed in order.
        
               | taeric wrote:
               | But one person edited the word foo. The other edited a
               | weird that no longer exists. I get why this feels like a
               | clever combination of the edits. But I'm struggling to
               | see how doing this at the character level makes any
               | sense.
               | 
               | Consider instead that you could do this at the byte
               | level, with equally off results.
               | 
               | At higher levels, this trick sounds useful. But you pick
               | your abstraction height where all conflicts should just
               | go back to the user.
               | 
               | So, people edit the same document, but at different
               | paragraphs? Fine. They edit the same paragraph? Almost
               | certainly a problem. No different than code.
        
               | dkersten wrote:
               | > Congrats, now neither of our additional lines makes
               | sense...
               | 
               | What would you expect to happen? That one persons input
               | is ignored? That's hardly expected for that person. If
               | anything, it's much more confusing. This way, both people
               | see both Ed it a and can react appropriately. If they
               | both remove the same thing, then no problem, if they keep
               | stomping on each other's work, then they need to
               | communicate anyway.
               | 
               | The important thing isn't that you ended up with
               | something neither of you wanted but that it's consistent
               | for all people. You see the exact same thing they see.
               | 
               | > This really feels like a solution in search of
               | problems.
               | 
               | Hardly. As someone who once wrote a collaborative editor
               | as a you project long ago, this seems really useful to
               | me. I've also worked on mobile sync (multiple devices
               | that could be edited offline syncing with the online
               | version) and again this would have been really beneficial
               | as the solution being used wasn't great at all.
        
               | taeric wrote:
               | Don't ignore, but just like I expect my car to not start
               | of I don't have my foot where it is supposed to be, I'd
               | expect there editor to indicate to me that my edit could
               | not go through.
               | 
               | And I think I wasn't clear. Collaborative editors at the
               | character level feel like the solution that is a misfire.
               | Doing the same things at a higher level of abstraction
               | works. Merge in document changes in remote sections. Code
               | merges with git work reasonably. None are bullet proof,
               | and I expect conflicts at a level lower than paragraph to
               | almost always need an audit. Certainly lower than the
               | line level.
        
               | josephg wrote:
               | This is how google docs works and this problem doesn't
               | come up much in practice. The reason is that usually when
               | you're collaboratively editing a document you do so in
               | real-time. If we both see one another's cursor on the
               | same content, we pay extra attention and make sure our
               | edits make sense.
               | 
               | For offline edits (eg multiple developers working on
               | independent features in a codebase), generating merge
               | conflicts is probably more appropriate. OT and CRDTs can
               | be written that generate merge conflicts in cases like
               | this - it's an implementation detail. It's just that most
               | algorithms are written first and foremost with real-time
               | collaborative editing in mind. And again, in the real-
               | time collaborative editing case, merge conflicts aren't
               | what users want.
        
               | akiselev wrote:
               | That's the best failure mode you can expect for the
               | average user without diving into the usability pit that
               | is git - both changes are kept and it's obvious to users
               | that there is a merge conflict. Otherwise, one update
               | would just eat the other silently, leaving one side
               | confused and the other oblivious.
        
               | taeric wrote:
               | Meh. Git isn't so bad, all told.
               | 
               | But, the point is that you get a marked conflict. And
               | take it back to the users.
        
               | tobr wrote:
               | It's an extremely valuable solution to a very real
               | problem. Surely you must have tried Google Docs? The
               | "mboo" is an indication that the two editors have
               | different ideas about where they're going. In a typical
               | setup both editors will see that someone else has their
               | cursor in the same place in the document, and they will
               | very quickly see that a conflict has happened. Now they
               | can coordinate on how to resolve it. It's not a problem,
               | but a desirable step in the process of two people working
               | together on something that isn't finished.
        
               | taeric wrote:
               | I've seen all to often where people don't see the
               | conflict because it allowed them to continue.
               | 
               | Such that editing a Google doc is easily up there with
               | many other experiences I don't like. Taking the act of
               | editing, that used to just be single user and forcing it
               | into distributed tricks from the get go.
               | 
               | Yes, collaboration is distributed. And sometimes it is
               | nice to both be working at the same place/time. Usually,
               | though, a batch process is easier to reason about and
               | execute.
        
             | sorokod wrote:
             | How about concurrently editing "your bonus is 1000" and
             | "your bonus is 10000"?
        
           | atombender wrote:
           | So this is either represented as a delete followed by an
           | insert (delete one character at offset N, insert "m" at
           | offset N), or as a replace (atomically replace character at
           | offset N with "m").
           | 
           | For an atomic replace operation, CRDT algorithms will solve
           | this by having the last write win. What CRDTs give you here
           | is a guarantee that the order is the same for every
           | participant. So if you're building a collaborative text
           | editor, for example, either everyone will either see "moo" or
           | everyone will see "boo".
           | 
           | For a delete + insert, it might not be atomic, in which case
           | only the delete will "conflict". Since you both deleted at
           | the same time, it's not actually a conflict (you both did the
           | same thing), and the result will be either "mboo" or "bmoo".
           | But again, it will the same for everyone.
        
             | Someone wrote:
             | > either everyone will either see "moo" or everyone will
             | see "boo" That seems in conflict with the claim elsewhere
             | in this discussion that this works offline, too
             | (https://news.ycombinator.com/item?id=23802495)
             | 
             | I guess everyone will _eventually_ either see "moo" or
             | "boo"?
        
         | vlovich123 wrote:
         | It's hard enough getting your own code to run and work
         | correctly. When would this be a useful development paradigm?
        
           | kohtatsu wrote:
           | This is a library, not a tool.
        
           | skybrian wrote:
           | CRDT's and operational transforms are most useful for live,
           | online editing. Each client sees a nearly up-to-date version
           | of the document and any differences due to network lag are
           | relatively small.
           | 
           | The idea is that normally each user will see other users'
           | edits as they happen. They are trying to cooperate, not stomp
           | on each other's edits. So long as the merge is reasonably
           | intuitive, it can be fixed manually if it's not exactly what
           | the authors wanted.
           | 
           | CRDT's aren't very good for writing code asynchronously,
           | since you probably want each version to compile and pass
           | tests, and sometimes do code review as well. Git works better
           | for that. But they could be sort-of-okay for pair
           | programming, though it might be an overly-complicated
           | solution and better to use some kind of remote desktop.
        
           | topicseed wrote:
           | Realtime collaboration on documents (e.g. source code, rich
           | text editing, etc).
        
             | ickyforce wrote:
             | When I edit code it's broken most of the time. It wouldn't
             | make sense to collaboratively break it in various parts for
             | other people...
        
               | IggleSniggle wrote:
               | While you could do real-time collaboration on code edits
               | with this tool (and I've done that with some success via
               | pair coding, where you commit them to git _after_ your
               | shared edits are complete), this tool isn 't strictly
               | about code. It's about any shared content at all. Think
               | google docs, but for any data, with offline/online sync.
               | The offline/online sync might not be great for
               | simultaneous edits, but in a shared "project" of say,
               | field research data, the combination offline/online sync
               | and soft-realtime shared documents at any hierarchy level
               | layer is a nice "don't need to think about it generally"
               | approach.
               | 
               | I'm not sure which kind of CRDT these are, but since all
               | edits are part of history, as in git, you're not going to
               | lose history if someone DOES inadvertently stomp on
               | someone else's edits.
        
               | dkersten wrote:
               | I have done remote pair programming many times though and
               | in that case it's perfectly ok, because you're
               | communicating with the other person/people and can let
               | each other know if you will break something.
               | 
               | For that use case, I can see this being very useful.
        
         | nojvek wrote:
         | What does no merge conflicts to resolve even mean? Isn't it
         | fundamentally impossible to have both no merge conflicts and
         | have meaningful data.
         | 
         | Incase of a conflict you can either keep the most recent change
         | or both changes. But like got keeps both changes with <<<< HEAD
         | and theirs markers the code is now invalid and won't compile.
         | Suppose both the changes were kept without conflict resolution,
         | now you have two things that may interfere with each other.
         | 
         | I've had git mess up by trying to do an auto merge and still
         | breaking the logic.
         | 
         | So I don't think there is a golden algorithms. Just a bunch of
         | trade offs like any other problem.
        
       | gritzko wrote:
       | "Data laced with history" (2018) [1] is very relevant here.
       | Interestingly, that extensive post obsessively vivisects RON 1.0
       | (Replicated Object Notation [2] as of 2017) which was based on
       | columnar compression techniques Automerge recently implemented
       | (53:24 in the talk).
       | 
       | Columnar formats have their upsides and downsides, though.
       | 
       | [1]: http://archagon.net/blog/2018/03/24/data-laced-with-history/
       | 
       | [2]: http://replicated.cc
        
         | lann wrote:
         | It's worth noting that you are the author of RON :)
        
           | gritzko wrote:
           | ... and working hard to release the new version. Hearing the
           | fuzzer buzzing as we speak...
        
             | burtmacklin wrote:
             | awesome, thank you for your work.
        
         | toomim wrote:
         | > Columnar formats have their upsides and downsides, though.
         | 
         | Oo, it sounds like you have some interesting thoughts. Could
         | you elaborate?
        
       | migueloller wrote:
       | I recently shared a thread [1] on Twitter with CRDT resources I
       | found useful if you're interested in that kind of thing.
       | 
       | [1] https://twitter.com/ollermi/status/1279067350269124609?s=21
        
       | sradman wrote:
       | CRDTs are great examples of what the NoSQL movement called
       | Eventual Consistency. I never understood why this movement
       | assumed that abandoning ACID-style consistency automatically gave
       | you Eventual Consistency.
       | 
       | As a side question, have any new algorithms been developed over
       | the last decade that have significantly improved automatic source
       | branch merging?
        
         | shinzui wrote:
         | Yes. Take a look at Pijul. Here is an article explaining the
         | algorithm. https://jneem.github.io/merging/
        
           | billconan wrote:
           | very good read. however, I'm confused. the article doesn't
           | mention how to handle changes within a line?
        
         | Taek wrote:
         | Eventual consistency and ACID are not at odds, you can have one
         | or the other or neither or both.
        
           | dboreham wrote:
           | Not true. Eventual consistency can not (provably) provide
           | consensus.
        
           | swyx wrote:
           | depends how you interpret ACID. for some interpretaions, only
           | strong consistency qualifies
        
       | sillysaurusx wrote:
       | Hmm. Does anyone know of a transcript? I can't watch right now,
       | only read.
       | 
       | I wonder if there's some free YouTube transcription service...
       | even just showing the captions would work.
       | 
       | Oh, hm. It's not on YouTube anyway.
        
         | benrbray wrote:
         | Actually, the video on the page is a YouTube embed and the
         | English captions are pretty good (although understandably it
         | struggles with CRDT jargon).
         | 
         | I found that the last 4-5 references listed in the link are
         | pretty accessible, and most of the diagrams from the talk are
         | taken from one of the papers by Kleppmann.
        
       | sfvisser wrote:
       | From my (admittedly very limited) experience implementing CRDTs
       | it became clear that even technically correct is not good enough.
       | Optimistic merge strategies require a very clear understanding of
       | user intent and expectation.
       | 
       | Properly consistent can still mean utterly confusing.
        
         | dropofwill wrote:
         | I think that's basically the point of this talk, that
         | correctness is the minimum, but for user collaboration at
         | least, its important to encode the user operations as 'close'
         | to the CRDT as possible.
         | 
         | For example, in this talk he discusses the problems with moving
         | items in a list as a pair of delete+insert operations. Then
         | proposes adding a move operation to the CRDT that solves some
         | of these problems.
        
         | topicseed wrote:
         | Exactly that which is pushing many people away from CRDTs
         | despite them being mathematically proven true, and towards
         | Operational Transformation.
        
           | heavenlyblue wrote:
           | OT is just a fancy name for a CRDT.
        
             | sagichmal wrote:
             | No, it isn't.
        
       ___________________________________________________________________
       (page generated 2020-07-11 23:00 UTC)