[HN Gopher] CRDTs are the future
       ___________________________________________________________________
        
       CRDTs are the future
        
       Author : lewisjoe
       Score  : 614 points
       Date   : 2020-09-28 15:18 UTC (7 hours ago)
        
 (HTM) web link (josephg.com)
 (TXT) w3m dump (josephg.com)
        
       | zemo wrote:
       | I use CRDTs in production at Jackbox for audience functionality
       | and honestly I don't know why the only thing that people talk
       | about when it comes to CRDTs is collaborative text editing. Like,
       | sure, cool, but that's literally a single problem domain and
       | like, Google Docs already exists and works well for the majority
       | of users so how many developers actually need to create a
       | collaborative text editor? CRDTs are an incredibly abstract and
       | versatile concept; collaborative text editing is a tiny problem
       | domain. I would really like to see more writing about CRDTs that
       | is NOT about collaborative text editing.
        
         | wenc wrote:
         | Here's one written by an employee of Redis Labs about CRDTs for
         | georeplication of databases for strong eventual consistency.
         | [1]
         | 
         | There's also 2-year old list of CRDTs used in prod [2].
         | 
         | I know Azure CosmosDB (NoSQL) advertises it's own use of CRDTs
         | for global replication.
         | 
         | [1] https://www.infoworld.com/article/3305321/when-to-use-a-
         | crdt...
         | 
         | [2] https://github.com/ipfs/notes/issues/404
        
         | uryga wrote:
         | just wanted to say, i really like Jackbox's games and its
         | really fun to see you here! i've been exposed to your games via
         | youtube vids and it feels weird (but cool!) to have that
         | intersect with HN. would love to read more about how you use
         | CRDTs to make it all work!
        
         | sagichmal wrote:
         | 1000% this.
        
         | rhencke wrote:
         | Would you consider being the writer of one of those articles?
        
           | zemo wrote:
           | well I really walked into that question huh
        
             | djm_ wrote:
             | Followed you in case you ever do! I'm a big fan of Jackbox
             | and I'd love to read about where CRDTs fit in your stack.
        
             | gen220 wrote:
             | I'd read this article.
             | 
             | Or, if you'd rather not, can you share some examples of
             | problem domains, that we at home can search for?
        
             | davidhowlett wrote:
             | If you write an article and post it here I will read it.
        
         | PaulDavisThe1st wrote:
         | We want something like that for distributed collaboration in
         | Ardour, a cross-platform DAW. The relevant state is serialized
         | as an XML file, and used natively as a complex object tree.
         | Users want to be able to edit (in the DAW) locally, then share
         | (and merge) their results with collaborators.
        
           | jka wrote:
           | I've plugged this collaboration project a few times recently,
           | and have no relationship to it other than discovering it (via
           | YJS' "who is using" list[1]) and finding it fascinating:
           | 
           | http://cattaz.io/
           | 
           | What I find most interesting about it is that it has reduced
           | the state of multiple 'smart' user-facing widgets/apps into a
           | common, lowest-common-denominator format (a text document)
           | that lends itself more easily and intuitively to
           | collaborative editing and CRDT operations.
           | 
           | I don't know for sure whether this is the path forward for
           | CRDT-based applications in general, but I think there are
           | valuable ideas there. It does raise the possibility of the
           | widgets/applications occasionally being in 'invalid' states;
           | but rarely in a way that the human participants wouldn't
           | notice or be able to fix themselves.
           | 
           | Whether that scales to the complexity of the state management
           | for a multi-track audio editing session, I don't know; but it
           | could be instructional to compare.
           | 
           | [1] - https://github.com/yjs/yjs#who-is-using-yjs
        
       | benrbray wrote:
       | CRDTs seem very promising, but we still have a long way to go.
       | The most exciting work in this area is being done by Ink&Switch
       | [0]. They have a number of interesting real-world app prototypes
       | based on CRDTs.
       | 
       | - An interesting case where CRDTs failed is Xi-editor, where they
       | tried to use CRDTs as the basis for a plugin system [1,2].
       | 
       | - One of the biggest problems with CRDTs is the overhead needed
       | to keep track of the full document history. The automerge [3]
       | project has been working on efficient compression of CRDTs for
       | JSON datatypes.
       | 
       | - The idea of monotonic updates is really appealing at first, but
       | I was disappointed when I realized there's no good solution to
       | handle deletions. Tombstones, to me, seem like kind of a hack,
       | albeit a necessary one. Practically, CRDTs aren't the silver
       | bullet they might seem like at first.
       | 
       | - Another lesson learned is that when ten people are editing the
       | same paragraph, there's not really a right answer. I think the
       | key to implementing CRDTs is doing it at the correct level of
       | granularity.
       | 
       | - ProseMirror intentionally chose NOT to use CRDTs [4].
       | 
       | - Some more good references are [5,6,7]
       | 
       | [0] https://inkandswitch.com/
       | 
       | [1] https://github.com/xi-editor/xi-
       | editor/issues/1187#issuecomm...
       | 
       | [2] https://news.ycombinator.com/item?id=19886883
       | 
       | [3] https://github.com/automerge and
       | https://github.com/automerge/pushpin
       | 
       | [4] https://marijnhaverbeke.nl/blog/collaborative-editing.html
       | 
       | [5] Kleppmann 2020, "CRDTs: The Hard Parts"
       | https://www.youtube.com/watch?v=x7drE24geUw with HN Discussion:
       | https://news.ycombinator.com/item?id=23802208
       | 
       | [6] Kleppmann 2019, "Interleaving Anomalies in Text Editors"
       | https://martin.kleppmann.com/papers/interleaving-papoc19.pdf
       | 
       | [7] https://abishov.com/xi-editor/docs/crdt-details.html
        
         | yawrp wrote:
         | You might enjoy this piece, covers a few of the tradeoffs you
         | mention https://hex.tech/blog/a-pragmatic-approach-to-live-
         | collabora...
        
       | [deleted]
        
       | Splizard wrote:
       | No, peer2peer lockstep is the future. No central server, no speed
       | penalty. No storage penalty.
       | 
       | Has been used in RTS games to synchronize 1000s of units across
       | low-bandwidth connections.
       | 
       | Input may be delayed by latency which can be mitigated with
       | client-side prediction. Cosmic bit-shifts & indeterminism can be
       | a challenge in longer sessions but peers can sync with eachother
       | when there is an OOS.
        
       | alangibson wrote:
       | Anyone wanting more info on CRDTs can check out the index I'm
       | maintaining at https://github.com/alangibson/awesome-crdt
        
       | SirensOfTitan wrote:
       | So CRDTs are the future, but what about today for real,
       | production products? I'm just about to really dive into
       | collaborative editing features for our product, and OT still
       | seems to me to be a much safer bet unless you're dealing with a
       | more obscure environment.
        
         | omgtehlion wrote:
         | Yes, starting with OT looks easy. You can make 99% work in
         | almost no time. But the last 1% will bite you in the rear
         | really hard...
         | 
         | Actually, CRDT is not a single data structure or even
         | algorithm. It is a term for several families of data structures
         | and different algorithms on them. If your task is not editing
         | text, you may find a simple and already implemented CRDT for
         | your case.
        
         | toomim wrote:
         | > So CRDTs are the future, but what about today for real,
         | production products?
         | 
         | Try Y.js: https://github.com/yjs/yjs
        
       | crawshaw wrote:
       | I have been consistently at odds with myself comparing CRDTs vs.
       | OT. One the one hand, CRDTs have a nicer core formalism. On the
       | other hand, OT works, and is closer to the actual driving events
       | of text editing.
       | 
       | The core argument of this article: that CRDTs now work _and_
       | distributed is better than centralized I question. I certainly
       | want more distribution than  "everything is run on a google
       | server" but do I really foresee a need for distributing a single
       | document? One server with an optimal OT implementation can
       | probably handle near a million active connections.
       | 
       | In practice, that's plenty. Each piece of data having one owner
       | is quite reasonable. There are lots of pieces of data.
       | 
       | I remain on the fence for collaborative text editing. Though it's
       | great to see all the work pushing CRDTs forward!
        
         | josephg wrote:
         | Blog author here. I've been having this conversation with a lot
         | of folks over the last few weeks and I hear you.
         | 
         | Does it make sense for us as an opensource community to invest
         | our time and energy making one really good CRDT (with
         | implementations in a few languages). Or does it make sense for
         | us to distribute that energy between a bunch of CRDT and OT
         | implementations, with different performance tradeoffs?
         | 
         | My take is that its been hugely beneficial to us all that JSON
         | is a standard, because I can use it from every language, and
         | have confidence that the implementations are fast and good
         | quality. I think we have an opportunity to make that for a good
         | CRDT too. Even if OT would work fine in your architecture, if
         | we have a great, capable, fast CRDT kicking around, it could be
         | a reasonable default for most people. And my claim is that the
         | performance difference between CRDTs and OT is smaller than the
         | difference between high and low quality implementations. (I
         | expect a well written CRDT in wasm will outperform my OT code
         | in javascript.)
        
       | clarkmoody wrote:
       | _[CRDTs] would let us write software that treats users as digital
       | citizens, not as digital serfs_
       | 
       | Amen, brother.
        
       | yavi wrote:
       | If you're interested in building collaborative apps but not the
       | architectural overhead of implementing CRDTs I'd recommend
       | checking out roomservice.dev [1]. They've begun to power some
       | other collaborative apps such as Tella.tv [2] - realtime browser-
       | based video editing.
       | 
       | [1] https://roomservice.dev [2]
       | https://news.ycombinator.com/item?id=24158509
       | 
       | Disclaimer: I've invested in roomservice.dev (and very excited
       | about what they're building!). No affiliation with Tella.
        
         | rememberlenny wrote:
         | +1 Roomservice.
         | 
         | CRDT as a service.
        
       | basicplus2 wrote:
       | CRDTs (Conflict-Free Replicated data types)
        
       | stblack wrote:
       | It takes over 800 words, and several mentions of the CRDT
       | acronym, before the acronym is expanded for the reader.
       | 
       | Don't write like this. Respect your readers and help them
       | comprehend. Expand acronyms as early as you can, ideally at the
       | first mention.
        
         | butterisgood wrote:
         | I'm sure a casual, non-technical reader of Hacker News would be
         | unaware of most of the headlines here. Google is your friend,
         | and CRDTs are part of the language of distributed systems. To
         | some degree, one has to help themselves.
        
         | [deleted]
        
         | [deleted]
        
         | CydeWeys wrote:
         | Seriously. It needs to be explained the FIRST TIME it appears,
         | and it shouldn't be abbreviated in the title. I read for a
         | minute thinking he was talking about Chrome Remote Desktop
         | (which is what CRDT means to me).
         | 
         | Mods, can we expand the acronym in the title of this submission
         | please?
        
           | PaulDavisThe1st wrote:
           | Or some new-old sort of visual display device ... Cathode Ray
           | Display Terminal :)
        
             | crazygringo wrote:
             | That was my first thought. "Wait... is there such a thing
             | as a cathode-ray _DIGITAL_ tube now...?!?! "
             | 
             | I mean, if SLR's became DSLR's... I just assume any "D"
             | means we're digital now! :)
        
         | josephg wrote:
         | Author here. Thanks for the feedback - I'll update the article.
         | 
         | I'm a little embarrassed to admit I didn't even notice.
        
           | regulation_d wrote:
           | I'm of the camp that focusing on the acronym stuff is missing
           | the point: that was a thoughtful, well-written piece.
           | 
           | I, for one, am grateful that you took the time to write it.
        
             | josephg wrote:
             | Thanks. I expanded the acronym it when it became relevant
             | to the story. But judging by the comments here, lots of
             | folks were distracted and frustrated that they didn't know
             | what the acronym meant earlier.
             | 
             | Anyway, I've updated the introductory paragraph to make it
             | more clear.
        
         | losvedir wrote:
         | > _Don 't write like this. Respect your readers_
         | 
         | This is way over the top.
         | 
         | I thought the author did an amazing job of discussing a highly
         | technical topic in a very approachable way. Every blog on HN
         | should _aspire_ to write like this! It was so good it got me
         | reading other posts even.
         | 
         | Yes, it would have been nice for us non- domain experts if the
         | author had done the classic "Conflict-free replicated data type
         | (CRDT)" thing, but you can easily just say that, ya know? "Hey,
         | it would be helpful if you expanded CRDT early on."
        
         | colesantiago wrote:
         | Agreed, 29 mentions of the acronym 'CRDT' and I had no idea
         | what it was until I had to break my reading flow and google it,
         | it sounded like buzzword soup to me.
         | 
         | Engineers, when talking about technical concepts with acronyms,
         | always expand them for the first time to your readers!
        
         | samatman wrote:
         | As I tirelessly mention whenever this comes up on HN, which is
         | often: we have a specific technology that is designed precisely
         | for this situation.
         | 
         | It's called the link. All an author has to do is link the first
         | instance of an acronym or piece of jargon to some authoritative
         | description, and you get the best of both worlds: readers
         | familiar with e.g. CRDTs[0] can just keep reading, and the rest
         | can click the link and find out.
         | 
         | [0]: https://en.wikipedia.org/wiki/Conflict-
         | free_replicated_data_...
        
           | thesuitonym wrote:
           | Even better, just use the abbr tag. It's well supported and
           | doesn't rely on an outside source to tell your readers what
           | you're talking about.
        
             | agent86 wrote:
             | Relevant to the parent here, I prefer to use a link because
             | it is more commonly expected and it gives me the
             | opportunity to refer to a quality source for an in-depth
             | explanation of what I'm referencing.
             | 
             | It is especially useful when writing a technical document
             | that utilizes multiple products/stacks/terms. Creating
             | links to quality sources for those items gives someone new
             | to the content a good source to go deeper into those pieces
             | while allowing me to focus the article on the specific
             | aspect I'm writing about.
        
             | taywrobel wrote:
             | "It's well supported", yet on latest mobile Safari, it
             | doesn't appear to work at all. This w3 demo code does not
             | show me the full name anywhere in the rendered content -htt
             | ps://www.w3schools.com/tags/tryit.asp?filename=tryhtml_ab..
             | .
        
               | hashtekar wrote:
               | Chrome on Android rather (un)helpfully selects 'The WHO'
               | and gives me a link to the rock band.
        
               | justusthane wrote:
               | That would have nothing to do with the HTML abbr tag.
               | That's just Google Assistant being Google Assistant.
        
               | ficklepickle wrote:
               | What about a long-press on the abbreviation? Does that do
               | anything? I'm not on mobile now but I vaguely recall that
               | working on some mobile platforms.
        
               | lucasverra wrote:
               | Noup, just tried
        
               | [deleted]
        
               | snazz wrote:
               | Although it works on _desktop_ Safari, there 's no way to
               | tell that you should be able to hover over it.
        
             | JeremyBanks wrote:
             | Merely expanding an abbreviation is less useful than
             | explaining it.
        
             | mcculley wrote:
             | Do both!
        
             | heavyset_go wrote:
             | Does <abbr> work well on Android and iOS?
        
               | californical wrote:
               | Not working for me on iOS
        
             | heavenlyblue wrote:
             | It doesn't work on mobile
        
             | dllthomas wrote:
             | Better in some ways, worse in others. Better than either, I
             | think, is both.
        
           | SketchySeaBeast wrote:
           | Don't even have to - the third paragraph down they set the
           | standard with "Operational Transform (OT)"
        
             | samatman wrote:
             | I agree, as does every good style guide, and even that can
             | be improved by also making it a link.
             | 
             | Hypertext is great, links are practically free, I encourage
             | authors to be liberal in applying them.
        
           | jchook wrote:
           | In defense, the first sentence links to a Youtube video that
           | expands the acronym in the first 10 seconds.
           | 
           | The video also gives good context for the article, even for a
           | beginner to the topic.
        
           | kevinpet wrote:
           | That's still lazy writing. Every blog should be written with
           | the assumption it will be encountered by a non-specialist.
           | Expanding abbreviations on first use and offering a brief
           | explanation of jargon is enough to let these readers know if
           | the article is something they are interested in.
        
             | Shared404 wrote:
             | But also, the blog post should be able to focus on it's own
             | business logic so to speak.
             | 
             | The same arguments for and against using libraries apply
             | here, and it's up to the author which works best for their
             | piece.
        
               | Sharlin wrote:
               | This is what introductory paragraphs/sections/chapters
               | are for. Someone already well acquainted with the subject
               | matter can quickly skim through them, while others less
               | familiar with it get a quick catch-up.
        
               | Shared404 wrote:
               | I agree. But that's not always the best solution.
               | 
               | Just like libraries, sometimes it is and sometimes it
               | isn't the best approach.
               | 
               | For example, in a "How to do $BASIC_THING in python"
               | article, putting an intro of "This is what a variable is"
               | may not be a bad idea. Meanwhile, in a "Writing an
               | operating system from scratch in an esolang I wrote"
               | article, maybe you'd be better off linking to previous
               | blog posts or other resources.
               | 
               | Obviously these are both extreme examples, but I think
               | it's still a valid view.
        
             | bosswipe wrote:
             | Every blog? That's silly. People are allowed to have
             | conversations about niche topics that you are not familiar
             | with. You aren't the audience of every blog.
        
               | s1mon wrote:
               | People are allowed and encouraged to speak freely about
               | anything on the internet, but people seem to forget that
               | this is the _world_ wide web, and writers can 't control
               | who in the world shows up to their blog or site. With a
               | little help, someone who might not be in the core
               | audience, might actually enjoy or learn something. If
               | everything is written with jargon and abbreviations with
               | no context, it's really just lazy inconsiderate writing.
               | 
               | It never ceases to amaze me how many websites for
               | restaurants or whatever neglect to mention basic things
               | like what state (and country) they're in. Even newspaper
               | web sites assume that we know that the "Chronicle" or the
               | "Ledger" or whatever generic name is the local paper for
               | East Bumblefuck.
        
               | danenania wrote:
               | That's true, but most people underestimate how opaque
               | their writing can be even to other experts. It doesn't
               | mean you have to explain _every_ piece of jargon, but you
               | can often greatly improve the clarity of your writing,
               | including for expert readers, by targeting at least a few
               | levels of expertise below where you think your audience
               | is. We _all_ have gaps in our knowledge that will seem
               | basic or obvious to others, no matter how expert we are
               | in a topic.
        
               | josephg wrote:
               | Yep. Its the paradox that the more you understand
               | something, the harder it is to teach it because its more
               | work to empathise with people who don't know the concept.
               | 
               | Anyway, blog author here - sorry I didn't explain CRDTs
               | earlier in the piece. It didn't occur to me that people
               | would be confused.
               | 
               | https://wiki.lesswrong.com/wiki/Inferential_distance
        
           | mlinksva wrote:
           | We also have select-contextmenu-search on both desktop and
           | mobile, for any word or acronym. Links are nice for
           | disambiguation or to point to a recommended resource, but
           | they're hardly essential, nor are in-line expansions or
           | definitions.
        
         | KirinDave wrote:
         | I don't understand why you think that writing on a very
         | technical subject needs to build you a ladder to climb on as a
         | prerequisite. There is a link to a very high quality talk right
         | at the top of the article for folks who wanted to dive deeper
         | that specifically makes that effort.
         | 
         | I found the article quite good, and if you had genuinely been
         | motivated to engage with the content you could have highlighted
         | the acronym and searched for it. There is a wealth of good info
         | for "CRDTs" that comes up on the first page of Google, Bing or
         | DDG.
         | 
         | Does the acronym actually illuminate what they are or how they
         | function? I submit to you that it probably doesn't.
        
         | mcdirty wrote:
         | I literally closed the article after reading the first blurb,
         | because it wasn't explained. Just started googling.
        
         | serverholic wrote:
         | Is it really that hard to google? If you're trying to learn
         | about a subject it can get annoying to repeatedly have to jump
         | to the meat of the article or fast forward if you're watching a
         | video.
        
           | lotsofpulp wrote:
           | Is it really that hard to spell it out and then put the
           | abbreviation in parenthesis the first time it is used?
        
           | root_axis wrote:
           | Obviously googling isn't hard, but having to google what
           | could be easily explained in the text breaks one's
           | concentration, something that is critical for most readers.
        
             | serverholic wrote:
             | A single quick search and you're good to go. Besides, if
             | you need to look it up then you're probably better off
             | reading a quick summary anyways.
             | 
             | Spelling out "Conflict-free replicated data type" doesn't
             | really help beginners all that much and non-beginners will
             | just use "CRDT" anyways.
             | 
             | We don't need every article about the web to spell out HTTP
             | right? I don't get why the author is getting beat up just
             | because his free content isn't convenient enough.
        
               | root_axis wrote:
               | If the article is titled "HTTP is the future" yes, I
               | think unpacking the acronym is appropriate. Also, he's
               | not getting "beat up", it's just a mild criticism
               | regarding how the article was written, it's not that big
               | of a deal.
        
         | Quarrelsome wrote:
         | I was at least happy that the wiki detour introduced me to
         | "gossip protocols" which is probably now one of my all-time
         | favourite technology namings.
        
         | dang wrote:
         | The trouble with comments like this is that they make
         | discussions shallower and more generic [1], which makes for
         | much worse threads. Actually it's not so much a problem with
         | the comment as with the upvotes, but shallow-generic-indignant
         | comments routinely attract upvotes, so alas it amounts to the
         | same thing.
         | 
         | The most recent guideline we added says: " _Please don 't
         | complain about website formatting, back-button breakage, and
         | similar annoyances. They're too common to be interesting.
         | Exception: when the author is present. Then friendly feedback
         | might be helpful._"
         | 
         | I suppose that complaints about writing style fall under the
         | same umbrella. Not that these things don't matter, of course
         | (when helping people with their pieces for HN I always tell
         | them to define jargon at point of introduction), but they
         | matter much less than the overall specific topic and much less
         | than the attention they end up getting. So they're basically
         | like weeds that grow and choke out the flowers.
         | 
         | (This is not a personal criticism--of course you didn't mean to
         | have this effect.)
         | 
         | https://news.ycombinator.com/newsguidelines.html
         | 
         | [1]
         | https://hn.algolia.com/?query=generic%20discussion%20by:dang...
        
         | spicymaki wrote:
         | Given that the author defines CRDT (conflict-free replicated
         | data type) a few paragraphs in, it might have been accidental.
         | The author might have re-ordered a few of the paragraphs during
         | editing.
        
         | tachyonbeam wrote:
         | You can also be even nicer and have the first expansion of the
         | acronym link to a wikipedia page or other relevant explanation.
        
         | deepsun wrote:
         | I strongly disagree, that forces author to spend extra time on
         | explaining everything. That's why it's often so hard for me to
         | find quality in-depth advanced blogs on various technologies
         | and fields -- because they all tend to be really introductory.
         | So there's either papers or tutorials, but nothing in-between.
         | E.g. a different-angle explanation of the same thing, or
         | comparison with another tech who came from that.
         | 
         | In contrast, I like way more a different approach on explaining
         | (mostly see it on Cyrillic forums) -- instead of guiding you by
         | hand, they just give you clues where to look for. That way,
         | knowledge givers are way more approachable, because it costs
         | them very little to chat back something like "look for CRDT",
         | than go into in-depth explaining. In the end -- there's way
         | more information, and from top experts in the fields.
        
         | bosswipe wrote:
         | He wasn't writing for you, he was obviously writing for people
         | familiar with these algorithms.
        
         | nickflorez wrote:
         | Amen.
        
         | Naac wrote:
         | While I agree that reading the title was confusing ( as I am
         | not familiar with CRDT ), I think the writing style was
         | actually very good.
         | 
         | I read the title, wondered what CRDT was, and started reading.
         | In the back of my mind I was wondering what CRDT was, but
         | reading the article felt like I was going on a journey. Every
         | term that needed to be defined was defined. Finally, when CRDT
         | was mentioned in the article, it was immediately defined.
         | 
         | I generally agree that throwing acronyms around without
         | defining them is not fair to the reader, but I don't think this
         | article did that at all.
        
           | theon144 wrote:
           | Yup, strong agree. The article did a great job of capturing
           | the "story" of the competing approaches really well, I didn't
           | even mind that the acronym wasn't explained until later.
        
           | IncRnd wrote:
           | This is called "burying the lede", where the newsworthy
           | portion is buried somewhere later instead of being mentioned
           | upfront. It's best not to do this, since not all readers will
           | read two thirds of a story in order to determine the subject.
        
             | rpdillon wrote:
             | I don't think this is a good example of burying the lede.
             | If I wanted to bury the lede on this post, I'd do this:
             | 
             | > I've spent the last decade working on OT, and have always
             | thought it was the right way to implement a collaborative
             | editor. Then something amazing happened.
             | 
             | Instead, we get this:
             | 
             | > I saw Martin Kleppmann's talk a few weeks ago about
             | CRDTs, and I felt a deep sense of despair. Maybe all the
             | work I've been doing for the past decade won't be part of
             | the future after all, because Martin's work on CRDTs will
             | supersede it. Its really good.
             | 
             | That seems like the opposite of burying the lede. The main
             | point of the story is _not_ that CRDT stands for Conflict-
             | free Replicated Data Type, it's that the author now favors
             | CRDTs over OT for collaborative editors.
        
               | IncRnd wrote:
               | It's a quibble to say that the undefined term CRDT is
               | part of the lede or the the lede itself, since people who
               | do not know the meaning of the acrynym need to read a
               | significant part of the story to be told the definition.
               | 
               | That can be seen by glancing at the comments on this
               | page.
        
             | andrewprock wrote:
             | I've seen this writing tactic become more and more common
             | over the years. It shows disrespect for your audience, and
             | tends to play well only when "preaching to the choir".
             | 
             | Whenever I see this writing style, such that I cannot find
             | a thesis in the first two paragraphs, I almost universally
             | discard the writing as a waste of time.
        
       | natural20s wrote:
       | Ahhhh Google Wave. I was an early adopter and shed a tear when it
       | went away. The closest I've felt to that product is Slack but
       | find Slack too noisy. With Wave I felt like I was IN my work not
       | in a "sidebar" application that was pulling my attention from my
       | work. I suppose there were so many ways to use Wave and so many
       | ways to use Slack that your experience could be completely
       | different than mine. But RIP Google Wave.
        
         | hughw wrote:
         | I just never thought email needed fixing, and I suspected
         | "worse is better" [1] would apply to Wave adoption.
         | 
         | [1] https://en.wikipedia.org/wiki/Worse_is_better
        
           | TillE wrote:
           | Nobody uses email anymore! It's a last resort. If properly
           | nurtured, Google Wave easily could have become Slack and
           | more. It was pointing in that direction.
        
       | omgtehlion wrote:
       | CRDTs are hip and cool. But right now I'm trying to find an
       | implementation for desktop software, not some web-framework in-
       | electron. And could not find a concise and correct codebase.
       | 
       | All the implementations are: 1. javascript or 2. dependent on
       | their chosen method of synchronisation or 3. incorrect.
       | 
       | The result of a two week long search is that I'm reimplementing
       | the stuff myself...
        
         | WhatIsDukkha wrote:
         | https://github.com/automerge/automerge-rs
         | 
         | I can't speak to its usability as I'm waiting on a 1.0
        
           | omgtehlion wrote:
           | yeah... you better wait for 1.0...
        
           | memorythought wrote:
           | Im one of the authors of this. Right now the code is very
           | unstable as we're tracking the performance branch of the JS
           | implementation. Once the JS version hits 1.0 I'll be putting
           | a bunch of effort into making the API cleaner and more rusty
           | and documenting things.
           | 
           | It does work and can actually be used as a backend for the JS
           | implementation if you use the wasm backend we've built. In
           | fact, this is how we have tested it, by compiling to WASM and
           | running the JS test script against it.
        
       | stephc_int13 wrote:
       | When dealing with this type of discussion I always try to
       | remember that making design decisions is a tradeoff, an arbitrage
       | highly dependent on your knowledge of the field, but also context
       | and taste.
       | 
       | Believing there is a silver bullet is a fool errand.
       | 
       | From what I've read about CRDTs, it seems difficult to escape the
       | overengineering trap when dealing with them.
        
         | tabtab wrote:
         | I tend to agree. Each team, project, and organization has
         | different needs, preferences, and cultures. One-size-fits-all
         | is a really tall order.
         | 
         | I believe it's better to focus on kits of parts--API's and/or
         | self-contained functions--that can be combined or ignored as
         | needed, along with a variety of reference application samples.
         | 
         | Having lots of ways to easily filter and sort content is also
         | very useful. For example, filtering and/or sorting annotations
         | by person, group, date, content (sub-strings) is very useful. A
         | query-by-example kind of interface is nice for this.
        
       | ffhhj wrote:
       | I'm looking for a solution to implement collaborative editing in
       | my visual programming node editor. Are CRDTs useful in this case?
        
       | aazaa wrote:
       | The video linked in the first sentence is well worth the time to
       | understand the background.
       | 
       | https://www.youtube.com/watch?v=x7drE24geUw
        
       | santiagobasulto wrote:
       | If you're a young technical entrepreneur looking for a 10-100M
       | startup opportunity and with a very interesting technical
       | challenge behind it: Create a collaborative replacement of
       | Jupyter Notebooks. There's already some effort done in JupyterLab
       | fork if you're interested [0], but with no significant
       | advancements.
       | 
       | So yes, I agree that CDRTs are indeed a promising endeavor.
       | 
       | [0] https://github.com/jupyterlab/jupyterlab/issues/5382
        
         | fancy_pantser wrote:
         | Domino Data Lab has been around for a while and closed another
         | $43M in funding earlier this year. They have a boatload of
         | tools around collaborative notebooks. They go even further and
         | have data science manager-level dashboards to track the
         | notebooks, their resources, and who is working on what. There
         | are others, but I'm calling this company out specifically
         | because they've shown great traction and I've spent a little
         | time with the cofounders when they were still at a shared
         | incubator space.
        
         | yunyu wrote:
         | https://deepnote.com/ is doing exactly this!
        
         | csours wrote:
         | How is the system you imagine different from repl.it?
        
         | darkhorse13 wrote:
         | Is it really such a good idea to entice young people like this?
         | Shouldn't someone at least be interested and have domain
         | knowledge in CRDTs and real-time collaboration before diving
         | into building a startup like this?
        
           | TheDong wrote:
           | There's no need to gatekeep building something on already
           | having knowledge.
           | 
           | If someone has time and energy and desire, not knowing
           | anything about document editing or CRDTs is not a blocker.
           | Those things can be learned in a week to a month by someone
           | who dedicates time to it.
           | 
           | Very few parts of software are inaccessible to someone with
           | basic CS knowledge. It's a great idea for people to try
           | something, regardless of their background, and if they fail
           | but learn something, that's still a fine outcome.
        
           | colesantiago wrote:
           | Worked for Figma, right?
           | 
           | I'm sure they fall into the collaborative software space,
           | utilise CRDTs and the founders are less than 40 years of age.
           | 
           | This seems like gatekeeping no?
        
           | santiagobasulto wrote:
           | Well, yes, of course. But my comment assumes the person might
           | be interested in the subject.
        
         | williamstein wrote:
         | CoCalc is a collaborative replacement of Jupyter notebooks.
         | It's a top-to-bottom re-implementation of the entire Jupyter
         | stack designed specifically for realtime collaboration. You can
         | use it via our hosted offering (https://cocalc.com), or install
         | it on prem via https://github.com/sagemathinc/cocalc-docker.
         | 
         | We released the our collaborative Jupyter notebook in 2014 as a
         | plugin to Jupyter classic. We then iterated on what we learned
         | over the years, completely rewriting everything multiple times,
         | including the entire realtime collaboration stack. Cocalc's
         | Jupyter support is pretty mature and battle tested at this
         | point, and also includes a TimeTravel slider that lets you view
         | all past versions of a Jupyter notebook and integrated chat.
         | 
         | I was a college professor (at Univ of Washington), I started a
         | company around this in 2015, so CoCalc has soo far been mainly
         | aimed at serving the needs of academics teaching courses. It's
         | been increasingly popular lately, e.g., in the last month over
         | a half million distinct Jupyter notebooks were edited on
         | https://cocalc.com. Of course, many of these notebooks are
         | homework problems. Anyway, our company is doing very well, and
         | we hope it will eventually be a "10M startup opportunity". :-)
        
         | maclockard wrote:
         | I actually just wrote about doing this with our code notebook
         | product just the other day https://hex.tech/blog/a-pragmatic-
         | approach-to-live-collabora...
        
           | bearly wrote:
           | Interesting decision process. I kept wondering if other
           | people had implemented the Figma approach and it looks like
           | you did a nice job with it. I also appreciate you putting
           | those cool explainers up front
        
       | lisper wrote:
       | CRDT = Conflict-Free Replicated Data Types. Think git for data
       | structures instead of directory trees.
        
       | lpage wrote:
       | The three most recent HN discussions on CRDTs are all worth
       | perusing.
       | 
       | [1] is an excellent tutorial that assumes no initial familiarity
       | with CRDTs or the math that underpins them. It walks you through
       | both the formalisms and the implementation, which is pretty key
       | to understanding why making real-world CRDTs flexible enough to
       | handle things like rich text editing is hard.
       | 
       | [2] is a talk that goes more in-depth on the hard parts
       | 
       | [3] goes deeper on OT vs. CRDT
       | 
       | It's worth noting that many of the CRDT discussions focus on
       | collaborative text editing. That's a _really_ hard problem. CRDTs
       | are (and have been for some time) a useful primitive for building
       | distributed systems.
       | 
       | [1] https://news.ycombinator.com/item?id=23737639
       | 
       | [2] https://news.ycombinator.com/item?id=23802208
       | 
       | [3] https://news.ycombinator.com/item?id=22039950
        
         | sashachepurnoi wrote:
         | Thank you for the links!
        
         | regulation_d wrote:
         | Perhaps also of interest is Raph Levien's retrospective on the
         | choice of CRDT as the collab technology for Xi.
         | 
         | https://news.ycombinator.com/item?id=19886883
        
       | hinkley wrote:
       | A question that's been in my mind for a while is why Version
       | Control and Collaborative Editing work at such cross purposes
       | with each other when they are essentially solving the same
       | problem? The biggest difference is that one works interactively
       | and the other favors a CLI. Beyond that, how much of the
       | distinction is artificial?
       | 
       | In particular I've been wondering about the space between CRDTs
       | and the 'theory of patches' such as we discussed with Pijul the
       | other day.
       | 
       | I have a collaborative editing project that's been sitting in my
       | in-box for a long time now because I don't want to write my own
       | edit history code and existing tools don't have enough ability to
       | reason about the contents as structured data. The target audience
       | is technology-averse, so no 'dancing bears' are going to interest
       | them. It's not enough for it to work, it has to work very well.
        
         | josephg wrote:
         | Author of the blog post here. I totally agree with you.
         | 
         | People think of OT / CRDT as realtime algorithms for realtime
         | collaborative editing because they're always programmed and
         | used that way. But the conflict resolution approach doesn't
         | have to merge everything as-is. You could build a CRDT or OT
         | system that generated VCS-style conflicts if concurrent edits
         | happen on the same line of code. To make it a valid OT / CRDT
         | algorithm the main constraint is just that every peer needs to
         | resolve conflicts the same way. (So if I merge your changes or
         | you merge my changes, we end up with identical document
         | states). It would be easier to implement using OT because you
         | only have to consider the interaction between two peers. But I
         | think its definitely doable in a CRDT as well.
         | 
         | I think having something that seamlessly worked in both pair
         | programming setups and with git style feature branches &
         | merging would be fantastic.
         | 
         | I have a lot of thoughts about this and would be happy to talk
         | more about it with folks in this space.
        
         | samatman wrote:
         | Strong agree.
         | 
         | There's a next level of VCS forming on the horizon, in some
         | combination of CRDTs, patch theory, and grammar-aware diffing.
         | 
         | Which should also learn from fossil, and consider metadata such
         | as issues and surrounding discussions to be a part of the repo.
         | 
         | A really robust solution would also be aware of dependencies
         | and build systems, and even deployment: I see these as all
         | fundamentally related, and connected to versioning in a way
         | that should be reflected and tracked through software.
        
         | exfalso wrote:
         | Around 6-7 years ago we started a collaborative editing project
         | for prezi.com. The problem basically boiled down to concurrent
         | editing of a big DOM-like data-structure. We looked at the
         | little literature that was available at the time including OT
         | and CRDTs, but quickly realized that none of the existing
         | approaches were mature enough for our needs. All of them were
         | stuck at "text editing", but we needed to edit these big object
         | DAGs.
         | 
         | So we ended up essentially implementing what you laid out, an
         | in-memory revision control system, although using a bit more
         | formal methods to reason about divergence/convergence of
         | clients. The most basic operation was the "diamond merge":
         | given operation x:A->B, y:A->C, construct x':C->D, y':B->D such
         | that x' . y == y' . x It also had to satisfy certain other
         | algebraic laws, notably diamond composition, which allowed us
         | to compose these merging operations whenever we wanted,
         | guaranteeing that the clients will eventually converge to the
         | same data state. It was quite neat! Shame that it's all
         | proprietary.
         | 
         | Good old days. I remember, the most pesky operation was
         | implementing a good undo-redo algorithm, it's quite tricky,
         | even once you add inverses.
        
           | josephg wrote:
           | It wasn't around at the time, but tree operations (with
           | object reparenting) is increasingly supported by OT systems
           | now:
           | 
           | https://github.com/ottypes/json1/
           | 
           | (Designed to be used with sharedb or similar.)
        
         | Joeri wrote:
         | My understanding may be flawed, but as far as I know you can
         | think of an OT log and a git log as being similar. Each party
         | generates deltas to the data structure that are recorded in the
         | log, and when these parallel histories meet they must be
         | merged. OT merges without involving the user, which sometimes
         | leads it to discard changes. Git merges like that if it can,
         | but when something must be discarded it asks the user. It is
         | the interactive merging and deep ability to navigate and edit
         | the log of changes that makes git so command-liney.
        
           | plesiv wrote:
           | Not intending to nit-pick, but Git doesn't store the content
           | as deltas. Each commit is the snapshot of the entirety of the
           | codebase at that point in time.
        
             | NateEag wrote:
             | Conceptually, yes, but under the hood, Git actually does
             | store content as deltas:
             | 
             | https://git-scm.com/book/en/v2/Git-Internals-Packfiles
        
         | drawkbox wrote:
         | Cloud based code environments are starting to merge this.
         | Github Code Spaces for one are starting this. I don't know if
         | they use Operational Transaction (OT) or Conflict-Free
         | Replicated Data Types (CRDT) but they are repo backed. I assume
         | it is just using Github diffing tools in the repos and maybe
         | OT/CRDT in live sessions over WebRTC or similar.
         | 
         | Much of real-time collaboration goes back to networking and
         | real-time networking used in distributed multi-user systems
         | like games, where simulations need to sync on a server. In
         | games though, Dead Reckoning [2] is used as well as
         | interpolation and extrapolation in prediction, much of it can
         | be slightly different for instance with physics/effect, but
         | messages that are important to all like scores or game
         | start/end are reliably synced and determined on the server.
         | 
         | [1] https://visualstudio.microsoft.com/services/github-
         | codespace...
         | 
         | [2]
         | https://www.gamasutra.com/view/feature/131638/dead_reckoning...
        
           | ultimape wrote:
           | I wonder if there is a way to describe change sets as a
           | mathematical curve and achieve something like the rewind-
           | ability within Planetary Annihilation https://www.forrestthew
           | oods.com/blog/tech_of_planetary_annih... which seems to be an
           | smoother alternative to dead-reckoning that bakes the history
           | into it a bit better.
        
         | jerf wrote:
         | As it stands today, version control and collaborative editing
         | do _not_ solve the same problem. Version control deals with
         | large chunks of changes at a time. I don 't even particularly
         | want a version control system that stored every single
         | keystroke made in source code. [1] Collaborative editing deals
         | with keystroke-by-keystroke updates. By the standard of
         | collaborative editing, even a single line source control commit
         | is a big change.
         | 
         | The problem spaces are quite different. Problems that emerge on
         | a minute-by-minute basis in collaborative editing emerge on a
         | week-by-week basis in source control, and when the problems
         | emerge in the latter, they tend to be much larger (because you
         | can build up a much bigger merge conflict on a routine basis
         | with the big chunks you're making).
         | 
         | Yes, it's true that if you squint hard, it _looks_ like version
         | control is a subset of collaborative editing, but I 'd be
         | really hesitant to, say, try to start a start-up based on that
         | observation, because even if we take for the sake of argument
         | that it's a good idea to use the same underlying data
         | structures, the UI affordances you're going to need to navigate
         | the problem space are going to be very different, and some of
         | the obvious ways of trying to "fix" that would be awful, e.g.,
         | yes, you _could_ give me a  "collaborative space" where I see
         | what everybody's doing in their code in real time... but it's a
         | _feature_ , not a bug, that when I'm working on a feature I'm
         | isolated from what everyone else is doing at that exact moment.
         | When I run the compiler and it errors out, it's really, really
         | nice to have a good idea that it's _my_ change that produced
         | that result.
         | 
         | (I'm aware that collaborative editing also has the "I was
         | offline for a week and here's a bunch of conflicts", but I'm
         | thinking in terms of UI paradigms. That's not the common case
         | for most/all collaborative editing systems.)
         | 
         | [1]: Not saying the only solution is the one we had now. A
         | magic genie that watched over the code and made commits for you
         | at exactly the right level of granularity would be great, so
         | you'd never lose any useful context. But key-by-key isn't that
         | level of granularity.
        
           | oever wrote:
           | Version control is collaborative editing. Synchronizing on
           | every key stroke is _real-time_ collaborative editing. That
           | 's nice if you're working on a overlapping data at the same
           | time. In code this does not happen so often because code
           | repositories tend to be large.
           | 
           | Git does not work well for text because we have not figured
           | out a nice format for text yet that developers and other
           | people both enjoy. Developers want to stick to plain text as
           | their format because we have so far failed to create nice
           | tools and formats for structured data. Perhaps these
           | affordances can appear thanks to a popularization of real-
           | time collaborative editing.
        
           | hinkley wrote:
           | One of the reasons we compartmentalize code is so that people
           | can work on unrelated features without tripping over each
           | other at every turn.
           | 
           | The bits where they don't interact also don't conflict. The
           | bits where they do, look a lot more like collaborative
           | editing.
           | 
           | They're also the spots where merges usually go wrong.
        
             | jerf wrote:
             | I've been on systems where multiple developers were trying
             | to develop on the same system at once. I've also seen teams
             | trying to do it systematically. It scales basically to two
             | developers, sitting across from each other. Three, again,
             | physically colocated, on a good day. Even if they're
             | working on completely separate tasks, you hit "compile" and
             | it's a complete mystery what's going to happen. It's not
             | even stable if you do nothing and just hit "compile" again.
             | 
             | Beyond that it's insane. You _do not_ want that in your
             | version control system, as something built in, working all
             | the time, across your entire team. It would be a massive
             | anti-feature that would nuke your product.
             | 
             | Again, anyone thinking this sounds like a totally awesome
             | idea, I strongly encourage you to try out the simple
             | version, availablbe right now, of just "five or six people
             | editing the same source code checkout" right now, before
             | betting a start up on it. I guarantee a complete lack of
             | desire to productize the result if you try it for a week or
             | two.
        
               | lars wrote:
               | A middle ground could be nice: An IDE extension that
               | notifies you when something you're writing will conflict
               | in the future, should you and your coworker both commit
               | and push what you've typed out. It would allow you to
               | sort that out immediately, or at least plan ahead, rather
               | than being surprised by a large merge conflict n days
               | down the road.
        
         | PaulDavisThe1st wrote:
         | line-oriented data formats vs everything else. Why ? Because of
         | "patching theory". If you don't understand the the data
         | describes objects and doesn't have line-by-line semantics, it
         | is hard to get merges correct.
         | 
         | Version control works wonders with line-oriented stuff, which
         | covers more or less every programming language in existence.
         | 
         | It doesn't do so well with non-line-oriented structured formats
         | such as XML (not sure how JSON or TOML) fits in here).
         | 
         | Given that collaborative editing typically works with non-line-
         | oriented data formats, you can see the issue, I think.
        
           | samatman wrote:
           | That's what I refer to as "grammar-aware diffing" in the
           | sibling comment, and it's one of the low-hanging fruits here.
           | 
           | Even git allows for pluggable diffing, and doesn't force line
           | orientation. What's missing is the concept of moving
           | something, as distinct from deleting lines/chunks and then
           | inserting lines/chunks which just happen to be the same.
           | 
           | This is not a problem which CRDTs have, to put it mildly. I
           | believe pijul understands it as well. A lot of this stuff is
           | right out on the cutting edge, and as it matures it will
           | become practical to connect the edges, such as a CRDT which
           | collaborates with a parser to produce grammar-aware patches
           | which are automagically fed to pijul or something like it.
           | 
           | This comes with a host of problems, mostly that we're not
           | used to dealing with a history which has this level of
           | granularity, most of which we don't want to see, most of the
           | time. But they would be nice problems to have.
        
             | hinkley wrote:
             | Some of "We" depend on sub-line diff highlighting during
             | code reviews in order to reason about refactors and
             | adding/removing arguments from function signatures.
             | 
             | That this is generally a feature of the diff tool and not
             | the version control is a bit disappointing.
        
       | macintux wrote:
       | The title sounds like it could be fanboy clickbait but it's
       | actually a thoughtful look at how far CRDTs have come from the
       | viewpoint of an expert and skeptic.
       | 
       | A good read.
        
       | anne_biene wrote:
       | It is wonderful to see so much enthusiasm about this technology.
       | I have been working on CRDTs since 2012 and it has been quite a
       | ride.
       | 
       | For those looking for more information, have a look at the
       | information collected at http://crdt.tech/ (Disclaimer: I am
       | involved, though Martin did the bulk load of the work.)
       | 
       | If you are into CRDTs for collaborative gaming, we are looking
       | for partners and investors: https://concordant.io (Disclaimer: I
       | am technical advisor in its team.)
        
       | bigfish24 wrote:
       | Great summary. CRDTs are a better fit for generalized data.
       | Having previously worked on an OT system, the central server
       | stickiness and merge complexity simply did not scale. There are
       | trade-offs with CRDTs, especially metadata, but as the post
       | mentions compression techniques are far more solvable in real-
       | world scenarios than a fundamental performance bottleneck at the
       | core.
        
       | csours wrote:
       | If you don't use CRDTs, you may be doomed to re-invent them.
       | Reading about them just now I realized that I spent the last year
       | developing a CRDT with LWW and OR characteristics.
       | 
       | edit: updated 'you are doomed' to 'you may be doomed'.
        
       | jakobmartz3 wrote:
       | are they tho
        
       | lewisjoe wrote:
       | I'm part of the team that makes Zoho Writer (a Google Docs
       | alternative) - https://writer.zoho.com
       | 
       | We went with OT for our real-time syncing of edits in 2010 and a
       | decade later, we are still sticking with OT for reasons I already
       | stated sometime back -
       | https://news.ycombinator.com/item?id=24186883
       | 
       | However, in the spirit of "There are no solutions, only trade-
       | offs" CRDTs are absolutely necessary for certain type of syncing
       | - like syncing a set of database nodes.
       | 
       | But for systems which already mandate a central server
       | (SaaS/Cloud) and especially for a complex problem like rich-text
       | editing (i.e semantic trees) I still think OT provides better
       | trade-offs than CRDT.
       | 
       | I respect Joseph's conviction on CRDTs being the future, so I
       | guess we'll figure this out sometime soon.
        
         | cordite wrote:
         | What does OT stand for?
         | 
         | In the link, OT is aliased to "Operational Transformations"
        
           | mjhirn wrote:
           | "Operation Transformation" = "a system that supports
           | collaboration functionalities by separating the high-level
           | transformation (or integration) control from the low-level
           | transformation functions"
           | 
           | Source: OT's Wikipedia article
           | 
           | But I felt the same. Never heard of "Operation
           | Transformation" before and both OT and its alias were equally
           | opaque to me.
        
           | dwb wrote:
           | Have you not answered your own question? OT does indeed stand
           | for Operational Transformation.
           | 
           | https://en.wikipedia.org/wiki/Operational_transformation
        
         | RangerScience wrote:
         | Interesting. I might be adding real-time edit syncing to a
         | hobby project sometime soon. Can you share more about the
         | trade-offs?
        
           | lewisjoe wrote:
           | I haven't yet completely watched Martin's talk on CRDTs, so I
           | might come back and stand corrected. For now these are some
           | well known trade-offs
           | 
           | A central server: Most OT algorithms depend on a central
           | system for intention preservation. CRDTs are truly
           | distributed and need no central server at all.
           | 
           | Memory: Traditionally CRDTs consume more memory because
           | deletions are preserved. OT lets you garbage collect some
           | operations since a central system is already recording those
           | ops and sequencing them as well.
           | 
           | Analysing and cancelling ops: OT lets you easily analyse
           | incoming ops and modify/dummy-ify/cancel them without
           | breaking the consistency. This convenience is not necessary
           | for most cases, but really important for rich-text editing.
           | For example when someone merges a couple of table cells when
           | another user is deleting a column, we need to analyze these
           | operations and modify them so as not to end-up with an
           | invalid table structure.
        
             | passthefist wrote:
             | Seems like another one (based off the article) is ease of
             | use as well. I'm not familiar with either algorithm, but
             | sounds like OT is less complex and easier to understand,
             | which IMO is a decent tradeoff worth considering.
        
               | mdpye wrote:
               | Having worked a little with both, my impression is that
               | OT can get very complex in implementation edge cases.
               | CRDTs are incredibly difficult to design, but if you
               | successfully design one which can model your features,
               | implementation is pretty straightforward.
               | 
               | A real world implication is that if you want to add a new
               | operation to a system (like, table column merge, or
               | moving a range of text), with OT, you can probably find a
               | way to extend what you have to get it in there, with a
               | painfully non-linear cost as you add more new operations.
               | With CRDTs, you may find yourself entirely back at the
               | drawing board. But the stuff you do support, you will
               | support pretty well and reliably...
               | 
               | Personally, I prefer CRDTs for their elegance, but it can
               | be difficult in a world of evolving requirements
        
               | alextheparrot wrote:
               | I agree complexity is worth considering, though part of
               | me wonders how important that is in this case. The reason
               | for this intuition is that this is one of core parts of
               | what they're selling.
               | 
               | If you're going to invest your complexity budget
               | somewhere, it seems like this is a good place for
               | companies dealing with these structures.
        
           | zamalek wrote:
           | Dealing with text is still an active area of research for
           | CRDTs. While the problem has been theoretically solved, the
           | solutions require much more memory/bandwidth than OT does.[1]
           | Conversely, CRDTs are _significantly_ better at replicating
           | graphs.
           | 
           | yjs[2] is one CRDT that handles text reasonably well, but it
           | can still run into performance edge cases (as they
           | plainly/honestly admit in their README).
           | 
           | [1]: https://github.com/automerge/automerge/issues/89 [2]:
           | https://github.com/yjs/yjs
        
           | z3t4 wrote:
           | The transform operation is more simple if you know the order
           | of things. For example in OT: nr2) Delete H from index 0.
           | nr1) Insert "Hello" at index 0. You know that nr1 should come
           | before nr2 because of a central counter. But with CRDT it's
           | a) Delete character id 0, b) Insert "Hello" at character with
           | id 0.
        
         | btreecat wrote:
         | My small startup company went with Zoho office at first because
         | of the price. But the features is what has us looking to stay
         | for a while.
         | 
         | One thing I would love to see is the addition of wildcard
         | addresses like the way google has and microsoft added
         | (user+site_string@domain.com).
         | 
         | Thanks for your hard work on a great product!
        
           | aidos wrote:
           | The Zoho ecosystem is this weird place where you can find
           | almost _everything_ , virtually for free. If you've never
           | looked before, check it out - it's expansive.
           | 
           | Frustratingly though, there are so many features heaped in
           | that there is no cohesion. Things are frequently buggy,
           | unreliable and disjointed. I'd almost be able to forgive it
           | but unfortunately the support is really terrible too.
           | 
           | I assessed a _lot_ of crm software and each one I kept
           | finding things they didn't have that zoho had but for the
           | reasons above we ultimately chose something else. Which is a
           | shame, because I would pay them a lot more than they ask, for
           | them to just be a little better.
        
         | Proven wrote:
         | Don't click on a link if you're unsure - from the title or URL
         | - the content is relevant to you.
         | 
         | It's equally "disrespectful" to waste reader's time on 101
         | content if that's now what the post is about.
        
       | [deleted]
        
       | taeric wrote:
       | I hate that I am skeptical on this. I suspect wave just left that
       | bad of a taste behind. So much hubris in what was claimed to be
       | possible.
       | 
       | The ideas do look nice. And I suspect it has gotten farther than
       | I give credit. However, sequencing the edits of independent
       | actors is likely not something you will solve with a data
       | structure.
       | 
       | Take the example of a doc getting overwhelmed. Let's say you can
       | make it so that you don't have a server to coordinate. Is it
       | realistic to think hundreds of people can edit a document in real
       | time at the same time and come up with something coherent?
       | 
       | Best I can currently imagine is it works if they are editing
       | hundreds of pages. But, that is back to the basic wiki structure
       | working fine.
       | 
       | So, help me fix my imagination. Why is this the future?
        
         | archagon wrote:
         | In the case of a text document, concurrent edits form branches
         | of a tree in many string CRDTs:
         | http://archagon.net/blog/2018/03/24/data-laced-with-history/
         | 
         | So yes, hundreds of people can edit a string and produce a
         | coherent result at the end. Contiguous runs of characters will
         | stick together and interleave with concurrent edits.
        
           | fwip wrote:
           | CRDTs don't guarantee coherence, but instead guarantee
           | consistency.
           | 
           | The result may often be coherent at the sentence level if the
           | edits are normal human edits, but often will not be at the
           | whole-document level.
           | 
           | For a simplistic example, if one person changes a frequently-
           | used term throughout the document, and another person uses
           | the old term in a bunch of places when writing new content,
           | the document will be semantically inconsistent, even though
           | all users made semantically consistent changes and are now
           | seeing the same eventually-consistent document.
           | 
           | For a contrived example of local inconsistency, consider the
           | phrase "James had a bass on his wall." Alice rewrites this to
           | "James had a bass on his wall, a trophy from his fishing trip
           | last summer," and Brianna separately chooses "James, being
           | musically inclined, had hung his favorite bass on his wall."
           | The CRDT dutifully applies both edits, and resolves this as:
           | "James, being musically inclined, had hung his favorite bass
           | on his wall, a trophy from his fishing trip last summer."
           | 
           | In nearly any system, semantic data is not completely
           | represented by any available data model. Any automatic
           | conflict-resolution model, no matter how smart, can lead to
           | semantically-nonsensical merges.
           | 
           | CRDTs are very very cool. Too often, though, people think
           | that they can substitute for manual review and conflict
           | resolution.
        
             | derefr wrote:
             | Right. The problem CRDTs solve is the problem of the three-
             | way merge conflict in git: the problem of the "correct"
             | merge being _underspecified_ by the formalism, and so
             | _implementation dependent_.
             | 
             | If two different git clients each implemented some
             | automated form of merge-conflict resolution; and then each
             | of them tried to resolve the same conflicting merge; then
             | each client might resolve the conflict in a _different,
             | implementation-dependent_ way, resulting in differing
             | commits. (This is already what happens even without
             | automation--the  "implementation" being depended upon is
             | the set of manual case-by-case choices made by each human.)
             | 
             | CRDTs are data structures that explicitly specify, in the
             | definition of what a conforming implementation would look
             | like, how "merge conflicts" for the data should be
             | resolved. (Really, they specify their way _around_ the data
             | ever coming into conflict -- thus  "conflict-free" -- but
             | it's easier to talk about them resolving conflicts.)
             | 
             | In the git analogy, you could think of a CRDT as a pair of
             | "data-format aware" algorithms: a merge algorithm, and a
             | pre-commit validation algorithm. The git client would, upon
             | commit, run the pre-commit validation algorithm specific to
             | the file's type, and only actually accept the commit if the
             | modified file remained "mergeable." The client would then,
             | upon merge, hand two of these files to a file-type-specific
             | merge algorithm, which would be guaranteed to succeed
             | assuming both inputs are "mergeable." Which they are,
             | because we only let "mergeable" files into commits.
             | 
             | Such a framework, by itself, doesn't guarantee that
             | anything _good_ or _useful_ will come out the other end of
             | the process. Garbage In, Garbage Out. What it _does_
             | guarantee, is that clients doing the same _merge_ , will
             | deterministically generate the same resulting _commit_. It
             | 's up to the designer of each CRDT data-structure to
             | specify a _useful_ merge algorithm for it; and it 's up to
             | the developer to define their data in terms of a CRDT data-
             | structure that has the right semantics.
        
               | nvader wrote:
               | That just sparked a thought.
               | 
               | For a codebase, unit tests could be the pre-commit
               | validation algorithm. Then, as authors continue to edit
               | the piece, they both add unit tests, and merge the code.
               | In the face of a merge, the tests could be the deciding
               | factor between what emerges.
               | 
               | Of course, unless you have conflicts in the tests
               | themselves.
        
             | digikata wrote:
             | So the CRDTs could be applied to a document and an
             | edit/change log to guarantee the consistency of the log and
             | its entries, not necessarily the document itself?
        
           | omgtehlion wrote:
           | I upvote for the link alone. This article (data-laced-with-
           | history) is the best source if you are starting your journey
           | into CRDTs.
        
           | deegles wrote:
           | What if the document starts empty and syncing doesn't happen
           | until everyone presses submit? Will it CRDTs produce a valid
           | document? Yes. Will it make any sense? Who knows. I think
           | that's what OP is getting at.
        
             | [deleted]
        
             | archagon wrote:
             | I read it as a question regarding OT vs. CRDTs, which I
             | believe would produce similar results even under heavy
             | concurrency. In terms of larger edits or refactors, you'd
             | probably need to do something else, e.g. lock the document
             | or section, unshare the document, use some sort of higher-
             | level CRDT that ships your changes atomically and forces a
             | manual merge on concurrent edits, etc. None of these
             | necessarily require a central server, though they may
             | require an active session between participants.
             | 
             | I should also note that even if you use regular merge, and
             | the end state of a text document is a complete mess after a
             | refactor + concurrent edits, there's enough data in the
             | tree to simply pull out any concurrent contributions. They
             | could then be reapplied manually if needed. Perhaps the app
             | could even notice this automatically and provide an
             | optional UI for this process. Similarly, it would be
             | possible for the concurrent editors to remove the refactor
             | edits and thus "fork" their document.
        
               | taeric wrote:
               | My question was not meant to be OT versus CRDT. Rather, I
               | am questioning expectations at that shared editing use
               | case.
               | 
               | Comparing to git (as others have done) is interesting.
               | The expectation is any merge is manually tested by the
               | user. Such that it is not just the git actions at play,
               | but all support activity. That is, the user flow assumes
               | all intermediate states are touched and verified by a
               | user. Where this is skipped, things increase the risk of
               | being broken. (Is why git bisect often fails projects
               | that don't build every commit.)
               | 
               | Same for games. Some machine gets to set the record
               | straight as to what actually happened. Pretty much
               | always. The faster the path to the authority for every
               | edit, the higher chance of coherence.
               | 
               | With hundreds of authorities, machine or not, this feels
               | intractable.
        
         | sagichmal wrote:
         | > Why is this the future?
         | 
         | Here is an interview with someone using CRDTs to build an edge
         | state product that answers this question at a high level.
         | 
         | https://www.infoq.com/articles/state-edge-peter-bourgon
        
         | jka wrote:
         | Your key insight, which is spot-on, is that nothing can prevent
         | human-level editing conflicts.
         | 
         | If I was going to take an attempt at justifying the importance
         | of CRDTs, I would say:
         | 
         | CRDTs are the future because they solve digital document-level
         | conflict.
         | 
         | They don't bypass the problem the way that diff/patch/git
         | conflict resolution does, by requiring human intervention.
         | 
         | Instead they truly and utterly obliterate the digital conflict
         | resolution problem: a group of people editing a document can
         | separately lose network connectivity, use different network
         | transports, reconvene as a subgroup of the original editors...
         | and their collective edits will always be resolved
         | automatically by software into a deterministic document that
         | fits within the original schema.
         | 
         | If viable, this has far-reaching implications, particularly
         | related to cloud-based document and sharing systems.
        
           | taeric wrote:
           | But how do they obliterate it? They just move the authority,
           | no?
           | 
           | That is, say you get a hundred machines editing a document.
           | They split into partitions for a time and eventually reunite
           | to a single one. What sort of coherent and usable data will
           | they make? Without basically electing a leader to reject
           | branches of the edits, sending them back to the machines
           | rejected?
        
             | jka wrote:
             | There's no leader node necessarily required; each
             | participant application in the session may have their own
             | local copy of the document, and they apply edits to that
             | using CRDT operations.
             | 
             | It's no doubt possible to construct application that _don
             | 't_ behave correctly for certain combinations of edits --
             | but the datastructures themselves should be robust under
             | any re-combination of the peer group's operations.
             | 
             | Edit / addendum: to phrase this another way and perhaps
             | answer you more clearly: it's a responsibility of the
             | application designer to come up with a document format for
             | their application (and corresponding in-app edit
             | operations) that will tend to result in 'sensible'
             | recombinations under collaborative editing.
             | 
             | My sense so far is that this is the tradeoff; the
             | complexity moves into the document format and edit
             | operations. But that's a (largely) one-off up-front cost,
             | and the infrastructure savings and offline/limited-
             | connectivity collaboration support it affords continue to
             | accrue over the lifetime of the software.
        
         | lallysingh wrote:
         | > sequencing the edits of independent actors is likely not
         | something you will solve with a data structure.
         | 
         | Any multiplayer game does this. Git does this as well.
         | 
         | So of course you can do this, it's a matter of how you
         | reconcile conflicts. Real-time interactive games will generally
         | choose a FIFO ordering based on what came into the server's NIC
         | first. Git makes the person pushing the merge reconcile first.
         | 
         | For docs, live editing seems to work the same as in games.
         | Reconciliation for the decentralized workflow will be
         | interesting, but it's just going to be minimizing the hit to a
         | user when their version loses the argument.
        
         | samatman wrote:
         | "Twitch plays Google Docs" is always going to be incoherent,
         | for social reasons. CRDTs can make it possible, they can't make
         | it a good idea.
         | 
         | But for a contrived example, a game with hundreds of players,
         | backed by an enormous JSON document, where the game engine is
         | in charge of making sure each move makes sense: A CRDT could
         | enable that, and each player could save a snapshot of the game
         | state as a simple text file, or save the entire history as the
         | whole CRDT.
         | 
         | Or as a less contrived example, instead of a game, it's a chat
         | client, and it provides rich text a la Matrix, but there's no
         | server, it's all resolved with CRDTs and all data is kept
         | client-local for each client.
         | 
         | There are a lot of cool things you can build with a performant
         | CRDT.
        
         | ragnese wrote:
         | > Is it realistic to think hundreds of people can edit a
         | document in real time at the same time and come up with
         | something coherent?
         | 
         | And here's the thing: Can 100 people edit a document, _even in
         | theory_ , and have it make sense? I think the answer is "no,"
         | with or without technology.
         | 
         | I'm sure there are other uses for these data structures, but
         | shared editing is always the example I read about.
        
           | ssivark wrote:
           | Depends on what kind of document we're talking about I.e. how
           | the grammar captures the domain model. Eg: A shared ledger in
           | the case of digital currencies, or the linux source code
           | being worked on remotely by many people are exactly examples
           | of such documents.
        
           | taeric wrote:
           | I meant this to be my takeaway. The data structure is nice.
           | And I suspect it is a perfect fit for some use cases. I
           | question the use case of shared editing. Not just the
           | solution, but the use case.
        
           | nonbirithm wrote:
           | A question I always have is if CDRTs solve some problem with
           | collaborative editing, then can git's merge algorithm be
           | rewritten to use CDRTs and benefit from it somehow?
           | 
           | Somehow I think the answer is no. There is a reason we still
           | have to manually drop down to a diff editor to resolve
           | certain kinds of conflicts after many decades.
        
             | dan-robertson wrote:
             | I think a better question is "what if merges were more well
             | behaved," where "well behaved" means they have nice
             | properties like associativity and having the minimal amount
             | of conflict without auto-resolving any cases that should
             | actually be a conflict.
             | 
             | The problem with using a CRDT is the CR part: there are
             | generally merge conflicts in version control for a reason.
             | If your data type isn't "state of the repo with no
             | conflicts" or "history of the repo and current state with
             | no conflicts" but something like "history of the repo and
             | current state including conflicts from unresolved merges"
             | then maybe that would work but it feels pretty complicated
             | to explain and not very different from regular git. Also
             | note that you need history to correctly merge (if you do a
             | 3-way merge of a history of a file of "add line foo; delete
             | line foo" with a history of "add line foo; delete line foo;
             | add line foo" and common ancestor "add line foo", you
             | should end with a history equal to the second one I
             | described. But if you only look at the files you will
             | probably end up deleting foo)
             | 
             | See also: darcs and pijul.
        
             | mattnewport wrote:
             | Git mostly treats merging as a line oriented diff problem.
             | Even though you can specify language aware diffing in
             | theory it doesn't seem to buy you much in practice (based
             | on my experience with the C# language-aware diff).
             | 
             | It wouldn't make much sense to me to just plug a text CRDT
             | in place of a standard text diff. CRDTs like automerge are
             | capable of representing more complex tree structures
             | however and if you squint you can sort of imagine a world
             | where merging source code edits was done at something more
             | like the AST level rather than as lines of text.
             | 
             | I've had some ugly merge conflicts that were a mix of
             | actual code changes and formatting changes which git diffs
             | tend not to be much help with. A system that really
             | understood the semantic structure of the code should in
             | theory be able to handle those a lot better.
             | 
             | IDEs have powerful refactoring support these days like
             | renaming class members but source control is ignorant of
             | those things. One can imagine a more integrated system that
             | could understand a rename as a distinct operation and have
             | no trouble merging a rename with an actual code change that
             | touched some code that referenced the renamed thing in many
             | situations. Manual review would probably still be necessary
             | but the automated merge could get it right a much higher
             | percentage of the time.
        
             | dnautics wrote:
             | The answer is no, but unlike git, crdts make a choice for
             | you, and all nodes get convergent consistency. The problem
             | heretofore with crdts is that those choices have not been
             | sane. I think there are a recent crop of crdts that are
             | "95% sane" and honestly that's probably good enough. There
             | is an argument that optimal human choices will never be
             | reconciliable with commutativity, which I totally buy, but
             | I think there is also an argument for "let not the perfect
             | be the enemy of the awesome". And having made a choice,
             | even if it's not optimal, is a much firmer ground to build
             | upon than blocking on leaving a merge conflict undecided.
        
           | CydeWeys wrote:
           | It depends how big the document is, i.e. what is the density
           | of users per page. If it's a 100 page document and the 100
           | users are all working on different sections, then it could
           | easily be possible.
           | 
           | I just don't remotely see a use case for this. Real-time
           | human collaboration in general fails at a scale much smaller
           | than this, and not because of the tools available.
        
           | jandrese wrote:
           | Maybe if your "document" is the Encyclopedia Britannica?
           | Wikipedia has hundreds of editors working at once, but that
           | only really works because it's broken up into millions of
           | smaller parts that don't interact much.
        
           | jka wrote:
           | JoeDocs[1] could be a useful project to track related to this
           | - the Coronavirus Tech Handbook[2] amongst other
           | collaborative documents is now hosted by their service.
           | 
           | They utilize the same YJS[3] library mentioned in the article
           | this thread discusses, and their GitHub repos include some
           | useful working demonstration application code.
           | 
           | [1] - https://joedocs.com/
           | 
           | [2] - https://coronavirustechhandbook.com/
           | 
           | [3] - https://docs.yjs.dev/
        
           | dan-robertson wrote:
           | Ultimately I think the answer is "it depends" but the issue
           | is that there is usually document structure which is mot
           | visible in the data structure itself. For example imagine
           | getting 100 people to fill out a row on a spreadsheet about
           | their preferences for some things or their availability on
           | certain dates. If each person simultaneously tries to fill in
           | the third row of the spreadsheet (after the headings and the
           | author), then a spreadsheet CRDT probably would suck at
           | merging the edits. But if you had a CRDT for the underlying
           | structure of this specific document you could probably merge
           | the changes (eg sort the set of rows alphabetically by name
           | and do something else if multiple documents have rows keyed
           | by the same name).
        
       | blackgirldev wrote:
       | Doesn't Redis implement CRDT's in production?
       | 
       | https://redislabs.com/blog/diving-into-crdts/
        
         | zegl wrote:
         | Riak as well, I've used it very successfully on projects in the
         | past.
         | 
         | https://docs.riak.com/riak/kv/latest/developing/data-types/i...
        
         | dnautics wrote:
         | Yes, as does riak. There are plenty of simple crdts and the
         | theory, while recent, has all of it's fundamentals fleshed out.
         | We know what property makes data structures crdts, and how to
         | compose them, and how to prove they are crdts.
         | 
         | Currently we are in the "discovery of new crdts" and
         | "engineering and implementing of older crdts reliably" phase,
         | and in some cases "discovering when not to use crdts".
         | 
         | The crux of the this issue is that crdts that play nice with
         | human expectations in regards to collaborative document editing
         | are not known, possibly excepting automerge (yjs). As it's a
         | 'softer' concept will no good axioms, there is no solid theory
         | on how to combine the theoretical requirements of crdts with
         | human expectations.
        
         | einpoklum wrote:
         | It looks like it's basically biasing in favor of some
         | operations over others. In the link they talk about CRDT sets,
         | saying at some point:
         | 
         | > 1. Adding wins over deleting.
         | 
         | yeah, so, _maybe_ you can remove elements from your set. If
         | you're lucky. I dunno about all that...
        
           | samatman wrote:
           | That's an overly pessimistic way to put it.
           | 
           | I think it's more accurate to say that _maybe_ you can remove
           | elements from your set... unless another actor wants them in
           | the set.
           | 
           | That's not always the behavior you want. But if it is, it's
           | great.
        
       | mhale wrote:
       | I'm working on a project with some offline data synchronization
       | needs, but haven't started implementation yet. I've been
       | following CRDTs with interest. I also saw many of the same
       | downsides mentioned in the OP, e.g. bloat (which apparently are
       | being addressed remarkably well). Beyond OT, another approach
       | I've run across that looks very promising is Differential
       | Synchronization[1] by Neil Fraser. While it also relies on a
       | centralized server, it allows for servers to be chained in such a
       | way that seems to address many of the downsides of OT. I wonder
       | why I rarely ever see Differential Synchronization mentioned here
       | on HN? Is it due to lack of awareness or because of use-case fit
       | issues or some fatal flaw I haven't seen? Or something else?
       | 
       | [1] https://www.youtube.com/watch?v=S2Hp_1jqpY8
        
       | arendtio wrote:
       | I wonder why OT is restricted to a central server. In 2016/2017 I
       | wrote a Progressive Web App (PWA) for myself which uses an
       | algorithm which probably fits the category of OT. It uses a
       | WebDAV server for synchronization between devices. Yes, this is a
       | centralized server, but when some super slow & dumb WebDAV server
       | can serve this purpose, it should probably be possible to build
       | it on top of S3, a blockchain or something federated.
       | 
       | My biggest issues at the time were around CORS as with a PWA you
       | can't simply use every server the user enters, as the same-
       | origin-policy keeps getting in your way.
        
       | yawrp wrote:
       | Interesting piece from last week comparing OT, CRDT, and Figma's
       | hybrid approach (good explainers of each too):
       | https://hex.tech/blog/a-pragmatic-approach-to-live-collabora...
        
       | xwdv wrote:
       | CRDT stands for conflict-free replicated data type.
        
         | contravariant wrote:
         | Thanks, I had to look it up as well. It's not the first article
         | I read on CRDTs but I definitely didn't recall what they were
         | from just the acronym.
        
       | dustingetz wrote:
       | A problem w/ e.g. CRDT datasync in web apps is data security,
       | HTTP resources impose control points where you know "why" the
       | client is asking for e.g. this chunk of social graph, it's
       | /profile/friendlist so the UI can ask for a very controlled and
       | tightly specified data projection for that particular UI and
       | consumed by tightly controlled javascript. Datasync is NOT for
       | scraper bots, arbitrary read patterns or any notion of general
       | access.
       | 
       | Immutability makes data control way harder ...
        
       | rsync wrote:
       | "It was a general purpose medium (like paper). Unlike a lot of
       | other tools, it doesn't force you into its own workflow. You
       | could use it to do anything from plan holidays, make a wiki, play
       | D&D with your friends, schedule a meeting, etc."
       | 
       | So, sort of like email ?
        
       ___________________________________________________________________
       (page generated 2020-09-28 23:00 UTC)