[HN Gopher] CRDTs are the future ___________________________________________________________________ CRDTs are the future Author : lewisjoe Score : 614 points Date : 2020-09-28 15:18 UTC (7 hours ago) (HTM) web link (josephg.com) (TXT) w3m dump (josephg.com) | zemo wrote: | I use CRDTs in production at Jackbox for audience functionality | and honestly I don't know why the only thing that people talk | about when it comes to CRDTs is collaborative text editing. Like, | sure, cool, but that's literally a single problem domain and | like, Google Docs already exists and works well for the majority | of users so how many developers actually need to create a | collaborative text editor? CRDTs are an incredibly abstract and | versatile concept; collaborative text editing is a tiny problem | domain. I would really like to see more writing about CRDTs that | is NOT about collaborative text editing. | wenc wrote: | Here's one written by an employee of Redis Labs about CRDTs for | georeplication of databases for strong eventual consistency. | [1] | | There's also 2-year old list of CRDTs used in prod [2]. | | I know Azure CosmosDB (NoSQL) advertises it's own use of CRDTs | for global replication. | | [1] https://www.infoworld.com/article/3305321/when-to-use-a- | crdt... | | [2] https://github.com/ipfs/notes/issues/404 | uryga wrote: | just wanted to say, i really like Jackbox's games and its | really fun to see you here! i've been exposed to your games via | youtube vids and it feels weird (but cool!) to have that | intersect with HN. would love to read more about how you use | CRDTs to make it all work! | sagichmal wrote: | 1000% this. | rhencke wrote: | Would you consider being the writer of one of those articles? | zemo wrote: | well I really walked into that question huh | djm_ wrote: | Followed you in case you ever do! I'm a big fan of Jackbox | and I'd love to read about where CRDTs fit in your stack. | gen220 wrote: | I'd read this article. | | Or, if you'd rather not, can you share some examples of | problem domains, that we at home can search for? | davidhowlett wrote: | If you write an article and post it here I will read it. | PaulDavisThe1st wrote: | We want something like that for distributed collaboration in | Ardour, a cross-platform DAW. The relevant state is serialized | as an XML file, and used natively as a complex object tree. | Users want to be able to edit (in the DAW) locally, then share | (and merge) their results with collaborators. | jka wrote: | I've plugged this collaboration project a few times recently, | and have no relationship to it other than discovering it (via | YJS' "who is using" list[1]) and finding it fascinating: | | http://cattaz.io/ | | What I find most interesting about it is that it has reduced | the state of multiple 'smart' user-facing widgets/apps into a | common, lowest-common-denominator format (a text document) | that lends itself more easily and intuitively to | collaborative editing and CRDT operations. | | I don't know for sure whether this is the path forward for | CRDT-based applications in general, but I think there are | valuable ideas there. It does raise the possibility of the | widgets/applications occasionally being in 'invalid' states; | but rarely in a way that the human participants wouldn't | notice or be able to fix themselves. | | Whether that scales to the complexity of the state management | for a multi-track audio editing session, I don't know; but it | could be instructional to compare. | | [1] - https://github.com/yjs/yjs#who-is-using-yjs | benrbray wrote: | CRDTs seem very promising, but we still have a long way to go. | The most exciting work in this area is being done by Ink&Switch | [0]. They have a number of interesting real-world app prototypes | based on CRDTs. | | - An interesting case where CRDTs failed is Xi-editor, where they | tried to use CRDTs as the basis for a plugin system [1,2]. | | - One of the biggest problems with CRDTs is the overhead needed | to keep track of the full document history. The automerge [3] | project has been working on efficient compression of CRDTs for | JSON datatypes. | | - The idea of monotonic updates is really appealing at first, but | I was disappointed when I realized there's no good solution to | handle deletions. Tombstones, to me, seem like kind of a hack, | albeit a necessary one. Practically, CRDTs aren't the silver | bullet they might seem like at first. | | - Another lesson learned is that when ten people are editing the | same paragraph, there's not really a right answer. I think the | key to implementing CRDTs is doing it at the correct level of | granularity. | | - ProseMirror intentionally chose NOT to use CRDTs [4]. | | - Some more good references are [5,6,7] | | [0] https://inkandswitch.com/ | | [1] https://github.com/xi-editor/xi- | editor/issues/1187#issuecomm... | | [2] https://news.ycombinator.com/item?id=19886883 | | [3] https://github.com/automerge and | https://github.com/automerge/pushpin | | [4] https://marijnhaverbeke.nl/blog/collaborative-editing.html | | [5] Kleppmann 2020, "CRDTs: The Hard Parts" | https://www.youtube.com/watch?v=x7drE24geUw with HN Discussion: | https://news.ycombinator.com/item?id=23802208 | | [6] Kleppmann 2019, "Interleaving Anomalies in Text Editors" | https://martin.kleppmann.com/papers/interleaving-papoc19.pdf | | [7] https://abishov.com/xi-editor/docs/crdt-details.html | yawrp wrote: | You might enjoy this piece, covers a few of the tradeoffs you | mention https://hex.tech/blog/a-pragmatic-approach-to-live- | collabora... | [deleted] | Splizard wrote: | No, peer2peer lockstep is the future. No central server, no speed | penalty. No storage penalty. | | Has been used in RTS games to synchronize 1000s of units across | low-bandwidth connections. | | Input may be delayed by latency which can be mitigated with | client-side prediction. Cosmic bit-shifts & indeterminism can be | a challenge in longer sessions but peers can sync with eachother | when there is an OOS. | alangibson wrote: | Anyone wanting more info on CRDTs can check out the index I'm | maintaining at https://github.com/alangibson/awesome-crdt | SirensOfTitan wrote: | So CRDTs are the future, but what about today for real, | production products? I'm just about to really dive into | collaborative editing features for our product, and OT still | seems to me to be a much safer bet unless you're dealing with a | more obscure environment. | omgtehlion wrote: | Yes, starting with OT looks easy. You can make 99% work in | almost no time. But the last 1% will bite you in the rear | really hard... | | Actually, CRDT is not a single data structure or even | algorithm. It is a term for several families of data structures | and different algorithms on them. If your task is not editing | text, you may find a simple and already implemented CRDT for | your case. | toomim wrote: | > So CRDTs are the future, but what about today for real, | production products? | | Try Y.js: https://github.com/yjs/yjs | crawshaw wrote: | I have been consistently at odds with myself comparing CRDTs vs. | OT. One the one hand, CRDTs have a nicer core formalism. On the | other hand, OT works, and is closer to the actual driving events | of text editing. | | The core argument of this article: that CRDTs now work _and_ | distributed is better than centralized I question. I certainly | want more distribution than "everything is run on a google | server" but do I really foresee a need for distributing a single | document? One server with an optimal OT implementation can | probably handle near a million active connections. | | In practice, that's plenty. Each piece of data having one owner | is quite reasonable. There are lots of pieces of data. | | I remain on the fence for collaborative text editing. Though it's | great to see all the work pushing CRDTs forward! | josephg wrote: | Blog author here. I've been having this conversation with a lot | of folks over the last few weeks and I hear you. | | Does it make sense for us as an opensource community to invest | our time and energy making one really good CRDT (with | implementations in a few languages). Or does it make sense for | us to distribute that energy between a bunch of CRDT and OT | implementations, with different performance tradeoffs? | | My take is that its been hugely beneficial to us all that JSON | is a standard, because I can use it from every language, and | have confidence that the implementations are fast and good | quality. I think we have an opportunity to make that for a good | CRDT too. Even if OT would work fine in your architecture, if | we have a great, capable, fast CRDT kicking around, it could be | a reasonable default for most people. And my claim is that the | performance difference between CRDTs and OT is smaller than the | difference between high and low quality implementations. (I | expect a well written CRDT in wasm will outperform my OT code | in javascript.) | clarkmoody wrote: | _[CRDTs] would let us write software that treats users as digital | citizens, not as digital serfs_ | | Amen, brother. | yavi wrote: | If you're interested in building collaborative apps but not the | architectural overhead of implementing CRDTs I'd recommend | checking out roomservice.dev [1]. They've begun to power some | other collaborative apps such as Tella.tv [2] - realtime browser- | based video editing. | | [1] https://roomservice.dev [2] | https://news.ycombinator.com/item?id=24158509 | | Disclaimer: I've invested in roomservice.dev (and very excited | about what they're building!). No affiliation with Tella. | rememberlenny wrote: | +1 Roomservice. | | CRDT as a service. | basicplus2 wrote: | CRDTs (Conflict-Free Replicated data types) | stblack wrote: | It takes over 800 words, and several mentions of the CRDT | acronym, before the acronym is expanded for the reader. | | Don't write like this. Respect your readers and help them | comprehend. Expand acronyms as early as you can, ideally at the | first mention. | butterisgood wrote: | I'm sure a casual, non-technical reader of Hacker News would be | unaware of most of the headlines here. Google is your friend, | and CRDTs are part of the language of distributed systems. To | some degree, one has to help themselves. | [deleted] | [deleted] | CydeWeys wrote: | Seriously. It needs to be explained the FIRST TIME it appears, | and it shouldn't be abbreviated in the title. I read for a | minute thinking he was talking about Chrome Remote Desktop | (which is what CRDT means to me). | | Mods, can we expand the acronym in the title of this submission | please? | PaulDavisThe1st wrote: | Or some new-old sort of visual display device ... Cathode Ray | Display Terminal :) | crazygringo wrote: | That was my first thought. "Wait... is there such a thing | as a cathode-ray _DIGITAL_ tube now...?!?! " | | I mean, if SLR's became DSLR's... I just assume any "D" | means we're digital now! :) | josephg wrote: | Author here. Thanks for the feedback - I'll update the article. | | I'm a little embarrassed to admit I didn't even notice. | regulation_d wrote: | I'm of the camp that focusing on the acronym stuff is missing | the point: that was a thoughtful, well-written piece. | | I, for one, am grateful that you took the time to write it. | josephg wrote: | Thanks. I expanded the acronym it when it became relevant | to the story. But judging by the comments here, lots of | folks were distracted and frustrated that they didn't know | what the acronym meant earlier. | | Anyway, I've updated the introductory paragraph to make it | more clear. | losvedir wrote: | > _Don 't write like this. Respect your readers_ | | This is way over the top. | | I thought the author did an amazing job of discussing a highly | technical topic in a very approachable way. Every blog on HN | should _aspire_ to write like this! It was so good it got me | reading other posts even. | | Yes, it would have been nice for us non- domain experts if the | author had done the classic "Conflict-free replicated data type | (CRDT)" thing, but you can easily just say that, ya know? "Hey, | it would be helpful if you expanded CRDT early on." | colesantiago wrote: | Agreed, 29 mentions of the acronym 'CRDT' and I had no idea | what it was until I had to break my reading flow and google it, | it sounded like buzzword soup to me. | | Engineers, when talking about technical concepts with acronyms, | always expand them for the first time to your readers! | samatman wrote: | As I tirelessly mention whenever this comes up on HN, which is | often: we have a specific technology that is designed precisely | for this situation. | | It's called the link. All an author has to do is link the first | instance of an acronym or piece of jargon to some authoritative | description, and you get the best of both worlds: readers | familiar with e.g. CRDTs[0] can just keep reading, and the rest | can click the link and find out. | | [0]: https://en.wikipedia.org/wiki/Conflict- | free_replicated_data_... | thesuitonym wrote: | Even better, just use the abbr tag. It's well supported and | doesn't rely on an outside source to tell your readers what | you're talking about. | agent86 wrote: | Relevant to the parent here, I prefer to use a link because | it is more commonly expected and it gives me the | opportunity to refer to a quality source for an in-depth | explanation of what I'm referencing. | | It is especially useful when writing a technical document | that utilizes multiple products/stacks/terms. Creating | links to quality sources for those items gives someone new | to the content a good source to go deeper into those pieces | while allowing me to focus the article on the specific | aspect I'm writing about. | taywrobel wrote: | "It's well supported", yet on latest mobile Safari, it | doesn't appear to work at all. This w3 demo code does not | show me the full name anywhere in the rendered content -htt | ps://www.w3schools.com/tags/tryit.asp?filename=tryhtml_ab.. | . | hashtekar wrote: | Chrome on Android rather (un)helpfully selects 'The WHO' | and gives me a link to the rock band. | justusthane wrote: | That would have nothing to do with the HTML abbr tag. | That's just Google Assistant being Google Assistant. | ficklepickle wrote: | What about a long-press on the abbreviation? Does that do | anything? I'm not on mobile now but I vaguely recall that | working on some mobile platforms. | lucasverra wrote: | Noup, just tried | [deleted] | snazz wrote: | Although it works on _desktop_ Safari, there 's no way to | tell that you should be able to hover over it. | JeremyBanks wrote: | Merely expanding an abbreviation is less useful than | explaining it. | mcculley wrote: | Do both! | heavyset_go wrote: | Does <abbr> work well on Android and iOS? | californical wrote: | Not working for me on iOS | heavenlyblue wrote: | It doesn't work on mobile | dllthomas wrote: | Better in some ways, worse in others. Better than either, I | think, is both. | SketchySeaBeast wrote: | Don't even have to - the third paragraph down they set the | standard with "Operational Transform (OT)" | samatman wrote: | I agree, as does every good style guide, and even that can | be improved by also making it a link. | | Hypertext is great, links are practically free, I encourage | authors to be liberal in applying them. | jchook wrote: | In defense, the first sentence links to a Youtube video that | expands the acronym in the first 10 seconds. | | The video also gives good context for the article, even for a | beginner to the topic. | kevinpet wrote: | That's still lazy writing. Every blog should be written with | the assumption it will be encountered by a non-specialist. | Expanding abbreviations on first use and offering a brief | explanation of jargon is enough to let these readers know if | the article is something they are interested in. | Shared404 wrote: | But also, the blog post should be able to focus on it's own | business logic so to speak. | | The same arguments for and against using libraries apply | here, and it's up to the author which works best for their | piece. | Sharlin wrote: | This is what introductory paragraphs/sections/chapters | are for. Someone already well acquainted with the subject | matter can quickly skim through them, while others less | familiar with it get a quick catch-up. | Shared404 wrote: | I agree. But that's not always the best solution. | | Just like libraries, sometimes it is and sometimes it | isn't the best approach. | | For example, in a "How to do $BASIC_THING in python" | article, putting an intro of "This is what a variable is" | may not be a bad idea. Meanwhile, in a "Writing an | operating system from scratch in an esolang I wrote" | article, maybe you'd be better off linking to previous | blog posts or other resources. | | Obviously these are both extreme examples, but I think | it's still a valid view. | bosswipe wrote: | Every blog? That's silly. People are allowed to have | conversations about niche topics that you are not familiar | with. You aren't the audience of every blog. | s1mon wrote: | People are allowed and encouraged to speak freely about | anything on the internet, but people seem to forget that | this is the _world_ wide web, and writers can 't control | who in the world shows up to their blog or site. With a | little help, someone who might not be in the core | audience, might actually enjoy or learn something. If | everything is written with jargon and abbreviations with | no context, it's really just lazy inconsiderate writing. | | It never ceases to amaze me how many websites for | restaurants or whatever neglect to mention basic things | like what state (and country) they're in. Even newspaper | web sites assume that we know that the "Chronicle" or the | "Ledger" or whatever generic name is the local paper for | East Bumblefuck. | danenania wrote: | That's true, but most people underestimate how opaque | their writing can be even to other experts. It doesn't | mean you have to explain _every_ piece of jargon, but you | can often greatly improve the clarity of your writing, | including for expert readers, by targeting at least a few | levels of expertise below where you think your audience | is. We _all_ have gaps in our knowledge that will seem | basic or obvious to others, no matter how expert we are | in a topic. | josephg wrote: | Yep. Its the paradox that the more you understand | something, the harder it is to teach it because its more | work to empathise with people who don't know the concept. | | Anyway, blog author here - sorry I didn't explain CRDTs | earlier in the piece. It didn't occur to me that people | would be confused. | | https://wiki.lesswrong.com/wiki/Inferential_distance | mlinksva wrote: | We also have select-contextmenu-search on both desktop and | mobile, for any word or acronym. Links are nice for | disambiguation or to point to a recommended resource, but | they're hardly essential, nor are in-line expansions or | definitions. | KirinDave wrote: | I don't understand why you think that writing on a very | technical subject needs to build you a ladder to climb on as a | prerequisite. There is a link to a very high quality talk right | at the top of the article for folks who wanted to dive deeper | that specifically makes that effort. | | I found the article quite good, and if you had genuinely been | motivated to engage with the content you could have highlighted | the acronym and searched for it. There is a wealth of good info | for "CRDTs" that comes up on the first page of Google, Bing or | DDG. | | Does the acronym actually illuminate what they are or how they | function? I submit to you that it probably doesn't. | mcdirty wrote: | I literally closed the article after reading the first blurb, | because it wasn't explained. Just started googling. | serverholic wrote: | Is it really that hard to google? If you're trying to learn | about a subject it can get annoying to repeatedly have to jump | to the meat of the article or fast forward if you're watching a | video. | lotsofpulp wrote: | Is it really that hard to spell it out and then put the | abbreviation in parenthesis the first time it is used? | root_axis wrote: | Obviously googling isn't hard, but having to google what | could be easily explained in the text breaks one's | concentration, something that is critical for most readers. | serverholic wrote: | A single quick search and you're good to go. Besides, if | you need to look it up then you're probably better off | reading a quick summary anyways. | | Spelling out "Conflict-free replicated data type" doesn't | really help beginners all that much and non-beginners will | just use "CRDT" anyways. | | We don't need every article about the web to spell out HTTP | right? I don't get why the author is getting beat up just | because his free content isn't convenient enough. | root_axis wrote: | If the article is titled "HTTP is the future" yes, I | think unpacking the acronym is appropriate. Also, he's | not getting "beat up", it's just a mild criticism | regarding how the article was written, it's not that big | of a deal. | Quarrelsome wrote: | I was at least happy that the wiki detour introduced me to | "gossip protocols" which is probably now one of my all-time | favourite technology namings. | dang wrote: | The trouble with comments like this is that they make | discussions shallower and more generic [1], which makes for | much worse threads. Actually it's not so much a problem with | the comment as with the upvotes, but shallow-generic-indignant | comments routinely attract upvotes, so alas it amounts to the | same thing. | | The most recent guideline we added says: " _Please don 't | complain about website formatting, back-button breakage, and | similar annoyances. They're too common to be interesting. | Exception: when the author is present. Then friendly feedback | might be helpful._" | | I suppose that complaints about writing style fall under the | same umbrella. Not that these things don't matter, of course | (when helping people with their pieces for HN I always tell | them to define jargon at point of introduction), but they | matter much less than the overall specific topic and much less | than the attention they end up getting. So they're basically | like weeds that grow and choke out the flowers. | | (This is not a personal criticism--of course you didn't mean to | have this effect.) | | https://news.ycombinator.com/newsguidelines.html | | [1] | https://hn.algolia.com/?query=generic%20discussion%20by:dang... | spicymaki wrote: | Given that the author defines CRDT (conflict-free replicated | data type) a few paragraphs in, it might have been accidental. | The author might have re-ordered a few of the paragraphs during | editing. | tachyonbeam wrote: | You can also be even nicer and have the first expansion of the | acronym link to a wikipedia page or other relevant explanation. | deepsun wrote: | I strongly disagree, that forces author to spend extra time on | explaining everything. That's why it's often so hard for me to | find quality in-depth advanced blogs on various technologies | and fields -- because they all tend to be really introductory. | So there's either papers or tutorials, but nothing in-between. | E.g. a different-angle explanation of the same thing, or | comparison with another tech who came from that. | | In contrast, I like way more a different approach on explaining | (mostly see it on Cyrillic forums) -- instead of guiding you by | hand, they just give you clues where to look for. That way, | knowledge givers are way more approachable, because it costs | them very little to chat back something like "look for CRDT", | than go into in-depth explaining. In the end -- there's way | more information, and from top experts in the fields. | bosswipe wrote: | He wasn't writing for you, he was obviously writing for people | familiar with these algorithms. | nickflorez wrote: | Amen. | Naac wrote: | While I agree that reading the title was confusing ( as I am | not familiar with CRDT ), I think the writing style was | actually very good. | | I read the title, wondered what CRDT was, and started reading. | In the back of my mind I was wondering what CRDT was, but | reading the article felt like I was going on a journey. Every | term that needed to be defined was defined. Finally, when CRDT | was mentioned in the article, it was immediately defined. | | I generally agree that throwing acronyms around without | defining them is not fair to the reader, but I don't think this | article did that at all. | theon144 wrote: | Yup, strong agree. The article did a great job of capturing | the "story" of the competing approaches really well, I didn't | even mind that the acronym wasn't explained until later. | IncRnd wrote: | This is called "burying the lede", where the newsworthy | portion is buried somewhere later instead of being mentioned | upfront. It's best not to do this, since not all readers will | read two thirds of a story in order to determine the subject. | rpdillon wrote: | I don't think this is a good example of burying the lede. | If I wanted to bury the lede on this post, I'd do this: | | > I've spent the last decade working on OT, and have always | thought it was the right way to implement a collaborative | editor. Then something amazing happened. | | Instead, we get this: | | > I saw Martin Kleppmann's talk a few weeks ago about | CRDTs, and I felt a deep sense of despair. Maybe all the | work I've been doing for the past decade won't be part of | the future after all, because Martin's work on CRDTs will | supersede it. Its really good. | | That seems like the opposite of burying the lede. The main | point of the story is _not_ that CRDT stands for Conflict- | free Replicated Data Type, it's that the author now favors | CRDTs over OT for collaborative editors. | IncRnd wrote: | It's a quibble to say that the undefined term CRDT is | part of the lede or the the lede itself, since people who | do not know the meaning of the acrynym need to read a | significant part of the story to be told the definition. | | That can be seen by glancing at the comments on this | page. | andrewprock wrote: | I've seen this writing tactic become more and more common | over the years. It shows disrespect for your audience, and | tends to play well only when "preaching to the choir". | | Whenever I see this writing style, such that I cannot find | a thesis in the first two paragraphs, I almost universally | discard the writing as a waste of time. | natural20s wrote: | Ahhhh Google Wave. I was an early adopter and shed a tear when it | went away. The closest I've felt to that product is Slack but | find Slack too noisy. With Wave I felt like I was IN my work not | in a "sidebar" application that was pulling my attention from my | work. I suppose there were so many ways to use Wave and so many | ways to use Slack that your experience could be completely | different than mine. But RIP Google Wave. | hughw wrote: | I just never thought email needed fixing, and I suspected | "worse is better" [1] would apply to Wave adoption. | | [1] https://en.wikipedia.org/wiki/Worse_is_better | TillE wrote: | Nobody uses email anymore! It's a last resort. If properly | nurtured, Google Wave easily could have become Slack and | more. It was pointing in that direction. | omgtehlion wrote: | CRDTs are hip and cool. But right now I'm trying to find an | implementation for desktop software, not some web-framework in- | electron. And could not find a concise and correct codebase. | | All the implementations are: 1. javascript or 2. dependent on | their chosen method of synchronisation or 3. incorrect. | | The result of a two week long search is that I'm reimplementing | the stuff myself... | WhatIsDukkha wrote: | https://github.com/automerge/automerge-rs | | I can't speak to its usability as I'm waiting on a 1.0 | omgtehlion wrote: | yeah... you better wait for 1.0... | memorythought wrote: | Im one of the authors of this. Right now the code is very | unstable as we're tracking the performance branch of the JS | implementation. Once the JS version hits 1.0 I'll be putting | a bunch of effort into making the API cleaner and more rusty | and documenting things. | | It does work and can actually be used as a backend for the JS | implementation if you use the wasm backend we've built. In | fact, this is how we have tested it, by compiling to WASM and | running the JS test script against it. | stephc_int13 wrote: | When dealing with this type of discussion I always try to | remember that making design decisions is a tradeoff, an arbitrage | highly dependent on your knowledge of the field, but also context | and taste. | | Believing there is a silver bullet is a fool errand. | | From what I've read about CRDTs, it seems difficult to escape the | overengineering trap when dealing with them. | tabtab wrote: | I tend to agree. Each team, project, and organization has | different needs, preferences, and cultures. One-size-fits-all | is a really tall order. | | I believe it's better to focus on kits of parts--API's and/or | self-contained functions--that can be combined or ignored as | needed, along with a variety of reference application samples. | | Having lots of ways to easily filter and sort content is also | very useful. For example, filtering and/or sorting annotations | by person, group, date, content (sub-strings) is very useful. A | query-by-example kind of interface is nice for this. | ffhhj wrote: | I'm looking for a solution to implement collaborative editing in | my visual programming node editor. Are CRDTs useful in this case? | aazaa wrote: | The video linked in the first sentence is well worth the time to | understand the background. | | https://www.youtube.com/watch?v=x7drE24geUw | santiagobasulto wrote: | If you're a young technical entrepreneur looking for a 10-100M | startup opportunity and with a very interesting technical | challenge behind it: Create a collaborative replacement of | Jupyter Notebooks. There's already some effort done in JupyterLab | fork if you're interested [0], but with no significant | advancements. | | So yes, I agree that CDRTs are indeed a promising endeavor. | | [0] https://github.com/jupyterlab/jupyterlab/issues/5382 | fancy_pantser wrote: | Domino Data Lab has been around for a while and closed another | $43M in funding earlier this year. They have a boatload of | tools around collaborative notebooks. They go even further and | have data science manager-level dashboards to track the | notebooks, their resources, and who is working on what. There | are others, but I'm calling this company out specifically | because they've shown great traction and I've spent a little | time with the cofounders when they were still at a shared | incubator space. | yunyu wrote: | https://deepnote.com/ is doing exactly this! | csours wrote: | How is the system you imagine different from repl.it? | darkhorse13 wrote: | Is it really such a good idea to entice young people like this? | Shouldn't someone at least be interested and have domain | knowledge in CRDTs and real-time collaboration before diving | into building a startup like this? | TheDong wrote: | There's no need to gatekeep building something on already | having knowledge. | | If someone has time and energy and desire, not knowing | anything about document editing or CRDTs is not a blocker. | Those things can be learned in a week to a month by someone | who dedicates time to it. | | Very few parts of software are inaccessible to someone with | basic CS knowledge. It's a great idea for people to try | something, regardless of their background, and if they fail | but learn something, that's still a fine outcome. | colesantiago wrote: | Worked for Figma, right? | | I'm sure they fall into the collaborative software space, | utilise CRDTs and the founders are less than 40 years of age. | | This seems like gatekeeping no? | santiagobasulto wrote: | Well, yes, of course. But my comment assumes the person might | be interested in the subject. | williamstein wrote: | CoCalc is a collaborative replacement of Jupyter notebooks. | It's a top-to-bottom re-implementation of the entire Jupyter | stack designed specifically for realtime collaboration. You can | use it via our hosted offering (https://cocalc.com), or install | it on prem via https://github.com/sagemathinc/cocalc-docker. | | We released the our collaborative Jupyter notebook in 2014 as a | plugin to Jupyter classic. We then iterated on what we learned | over the years, completely rewriting everything multiple times, | including the entire realtime collaboration stack. Cocalc's | Jupyter support is pretty mature and battle tested at this | point, and also includes a TimeTravel slider that lets you view | all past versions of a Jupyter notebook and integrated chat. | | I was a college professor (at Univ of Washington), I started a | company around this in 2015, so CoCalc has soo far been mainly | aimed at serving the needs of academics teaching courses. It's | been increasingly popular lately, e.g., in the last month over | a half million distinct Jupyter notebooks were edited on | https://cocalc.com. Of course, many of these notebooks are | homework problems. Anyway, our company is doing very well, and | we hope it will eventually be a "10M startup opportunity". :-) | maclockard wrote: | I actually just wrote about doing this with our code notebook | product just the other day https://hex.tech/blog/a-pragmatic- | approach-to-live-collabora... | bearly wrote: | Interesting decision process. I kept wondering if other | people had implemented the Figma approach and it looks like | you did a nice job with it. I also appreciate you putting | those cool explainers up front | lisper wrote: | CRDT = Conflict-Free Replicated Data Types. Think git for data | structures instead of directory trees. | lpage wrote: | The three most recent HN discussions on CRDTs are all worth | perusing. | | [1] is an excellent tutorial that assumes no initial familiarity | with CRDTs or the math that underpins them. It walks you through | both the formalisms and the implementation, which is pretty key | to understanding why making real-world CRDTs flexible enough to | handle things like rich text editing is hard. | | [2] is a talk that goes more in-depth on the hard parts | | [3] goes deeper on OT vs. CRDT | | It's worth noting that many of the CRDT discussions focus on | collaborative text editing. That's a _really_ hard problem. CRDTs | are (and have been for some time) a useful primitive for building | distributed systems. | | [1] https://news.ycombinator.com/item?id=23737639 | | [2] https://news.ycombinator.com/item?id=23802208 | | [3] https://news.ycombinator.com/item?id=22039950 | sashachepurnoi wrote: | Thank you for the links! | regulation_d wrote: | Perhaps also of interest is Raph Levien's retrospective on the | choice of CRDT as the collab technology for Xi. | | https://news.ycombinator.com/item?id=19886883 | hinkley wrote: | A question that's been in my mind for a while is why Version | Control and Collaborative Editing work at such cross purposes | with each other when they are essentially solving the same | problem? The biggest difference is that one works interactively | and the other favors a CLI. Beyond that, how much of the | distinction is artificial? | | In particular I've been wondering about the space between CRDTs | and the 'theory of patches' such as we discussed with Pijul the | other day. | | I have a collaborative editing project that's been sitting in my | in-box for a long time now because I don't want to write my own | edit history code and existing tools don't have enough ability to | reason about the contents as structured data. The target audience | is technology-averse, so no 'dancing bears' are going to interest | them. It's not enough for it to work, it has to work very well. | josephg wrote: | Author of the blog post here. I totally agree with you. | | People think of OT / CRDT as realtime algorithms for realtime | collaborative editing because they're always programmed and | used that way. But the conflict resolution approach doesn't | have to merge everything as-is. You could build a CRDT or OT | system that generated VCS-style conflicts if concurrent edits | happen on the same line of code. To make it a valid OT / CRDT | algorithm the main constraint is just that every peer needs to | resolve conflicts the same way. (So if I merge your changes or | you merge my changes, we end up with identical document | states). It would be easier to implement using OT because you | only have to consider the interaction between two peers. But I | think its definitely doable in a CRDT as well. | | I think having something that seamlessly worked in both pair | programming setups and with git style feature branches & | merging would be fantastic. | | I have a lot of thoughts about this and would be happy to talk | more about it with folks in this space. | samatman wrote: | Strong agree. | | There's a next level of VCS forming on the horizon, in some | combination of CRDTs, patch theory, and grammar-aware diffing. | | Which should also learn from fossil, and consider metadata such | as issues and surrounding discussions to be a part of the repo. | | A really robust solution would also be aware of dependencies | and build systems, and even deployment: I see these as all | fundamentally related, and connected to versioning in a way | that should be reflected and tracked through software. | exfalso wrote: | Around 6-7 years ago we started a collaborative editing project | for prezi.com. The problem basically boiled down to concurrent | editing of a big DOM-like data-structure. We looked at the | little literature that was available at the time including OT | and CRDTs, but quickly realized that none of the existing | approaches were mature enough for our needs. All of them were | stuck at "text editing", but we needed to edit these big object | DAGs. | | So we ended up essentially implementing what you laid out, an | in-memory revision control system, although using a bit more | formal methods to reason about divergence/convergence of | clients. The most basic operation was the "diamond merge": | given operation x:A->B, y:A->C, construct x':C->D, y':B->D such | that x' . y == y' . x It also had to satisfy certain other | algebraic laws, notably diamond composition, which allowed us | to compose these merging operations whenever we wanted, | guaranteeing that the clients will eventually converge to the | same data state. It was quite neat! Shame that it's all | proprietary. | | Good old days. I remember, the most pesky operation was | implementing a good undo-redo algorithm, it's quite tricky, | even once you add inverses. | josephg wrote: | It wasn't around at the time, but tree operations (with | object reparenting) is increasingly supported by OT systems | now: | | https://github.com/ottypes/json1/ | | (Designed to be used with sharedb or similar.) | Joeri wrote: | My understanding may be flawed, but as far as I know you can | think of an OT log and a git log as being similar. Each party | generates deltas to the data structure that are recorded in the | log, and when these parallel histories meet they must be | merged. OT merges without involving the user, which sometimes | leads it to discard changes. Git merges like that if it can, | but when something must be discarded it asks the user. It is | the interactive merging and deep ability to navigate and edit | the log of changes that makes git so command-liney. | plesiv wrote: | Not intending to nit-pick, but Git doesn't store the content | as deltas. Each commit is the snapshot of the entirety of the | codebase at that point in time. | NateEag wrote: | Conceptually, yes, but under the hood, Git actually does | store content as deltas: | | https://git-scm.com/book/en/v2/Git-Internals-Packfiles | drawkbox wrote: | Cloud based code environments are starting to merge this. | Github Code Spaces for one are starting this. I don't know if | they use Operational Transaction (OT) or Conflict-Free | Replicated Data Types (CRDT) but they are repo backed. I assume | it is just using Github diffing tools in the repos and maybe | OT/CRDT in live sessions over WebRTC or similar. | | Much of real-time collaboration goes back to networking and | real-time networking used in distributed multi-user systems | like games, where simulations need to sync on a server. In | games though, Dead Reckoning [2] is used as well as | interpolation and extrapolation in prediction, much of it can | be slightly different for instance with physics/effect, but | messages that are important to all like scores or game | start/end are reliably synced and determined on the server. | | [1] https://visualstudio.microsoft.com/services/github- | codespace... | | [2] | https://www.gamasutra.com/view/feature/131638/dead_reckoning... | ultimape wrote: | I wonder if there is a way to describe change sets as a | mathematical curve and achieve something like the rewind- | ability within Planetary Annihilation https://www.forrestthew | oods.com/blog/tech_of_planetary_annih... which seems to be an | smoother alternative to dead-reckoning that bakes the history | into it a bit better. | jerf wrote: | As it stands today, version control and collaborative editing | do _not_ solve the same problem. Version control deals with | large chunks of changes at a time. I don 't even particularly | want a version control system that stored every single | keystroke made in source code. [1] Collaborative editing deals | with keystroke-by-keystroke updates. By the standard of | collaborative editing, even a single line source control commit | is a big change. | | The problem spaces are quite different. Problems that emerge on | a minute-by-minute basis in collaborative editing emerge on a | week-by-week basis in source control, and when the problems | emerge in the latter, they tend to be much larger (because you | can build up a much bigger merge conflict on a routine basis | with the big chunks you're making). | | Yes, it's true that if you squint hard, it _looks_ like version | control is a subset of collaborative editing, but I 'd be | really hesitant to, say, try to start a start-up based on that | observation, because even if we take for the sake of argument | that it's a good idea to use the same underlying data | structures, the UI affordances you're going to need to navigate | the problem space are going to be very different, and some of | the obvious ways of trying to "fix" that would be awful, e.g., | yes, you _could_ give me a "collaborative space" where I see | what everybody's doing in their code in real time... but it's a | _feature_ , not a bug, that when I'm working on a feature I'm | isolated from what everyone else is doing at that exact moment. | When I run the compiler and it errors out, it's really, really | nice to have a good idea that it's _my_ change that produced | that result. | | (I'm aware that collaborative editing also has the "I was | offline for a week and here's a bunch of conflicts", but I'm | thinking in terms of UI paradigms. That's not the common case | for most/all collaborative editing systems.) | | [1]: Not saying the only solution is the one we had now. A | magic genie that watched over the code and made commits for you | at exactly the right level of granularity would be great, so | you'd never lose any useful context. But key-by-key isn't that | level of granularity. | oever wrote: | Version control is collaborative editing. Synchronizing on | every key stroke is _real-time_ collaborative editing. That | 's nice if you're working on a overlapping data at the same | time. In code this does not happen so often because code | repositories tend to be large. | | Git does not work well for text because we have not figured | out a nice format for text yet that developers and other | people both enjoy. Developers want to stick to plain text as | their format because we have so far failed to create nice | tools and formats for structured data. Perhaps these | affordances can appear thanks to a popularization of real- | time collaborative editing. | hinkley wrote: | One of the reasons we compartmentalize code is so that people | can work on unrelated features without tripping over each | other at every turn. | | The bits where they don't interact also don't conflict. The | bits where they do, look a lot more like collaborative | editing. | | They're also the spots where merges usually go wrong. | jerf wrote: | I've been on systems where multiple developers were trying | to develop on the same system at once. I've also seen teams | trying to do it systematically. It scales basically to two | developers, sitting across from each other. Three, again, | physically colocated, on a good day. Even if they're | working on completely separate tasks, you hit "compile" and | it's a complete mystery what's going to happen. It's not | even stable if you do nothing and just hit "compile" again. | | Beyond that it's insane. You _do not_ want that in your | version control system, as something built in, working all | the time, across your entire team. It would be a massive | anti-feature that would nuke your product. | | Again, anyone thinking this sounds like a totally awesome | idea, I strongly encourage you to try out the simple | version, availablbe right now, of just "five or six people | editing the same source code checkout" right now, before | betting a start up on it. I guarantee a complete lack of | desire to productize the result if you try it for a week or | two. | lars wrote: | A middle ground could be nice: An IDE extension that | notifies you when something you're writing will conflict | in the future, should you and your coworker both commit | and push what you've typed out. It would allow you to | sort that out immediately, or at least plan ahead, rather | than being surprised by a large merge conflict n days | down the road. | PaulDavisThe1st wrote: | line-oriented data formats vs everything else. Why ? Because of | "patching theory". If you don't understand the the data | describes objects and doesn't have line-by-line semantics, it | is hard to get merges correct. | | Version control works wonders with line-oriented stuff, which | covers more or less every programming language in existence. | | It doesn't do so well with non-line-oriented structured formats | such as XML (not sure how JSON or TOML) fits in here). | | Given that collaborative editing typically works with non-line- | oriented data formats, you can see the issue, I think. | samatman wrote: | That's what I refer to as "grammar-aware diffing" in the | sibling comment, and it's one of the low-hanging fruits here. | | Even git allows for pluggable diffing, and doesn't force line | orientation. What's missing is the concept of moving | something, as distinct from deleting lines/chunks and then | inserting lines/chunks which just happen to be the same. | | This is not a problem which CRDTs have, to put it mildly. I | believe pijul understands it as well. A lot of this stuff is | right out on the cutting edge, and as it matures it will | become practical to connect the edges, such as a CRDT which | collaborates with a parser to produce grammar-aware patches | which are automagically fed to pijul or something like it. | | This comes with a host of problems, mostly that we're not | used to dealing with a history which has this level of | granularity, most of which we don't want to see, most of the | time. But they would be nice problems to have. | hinkley wrote: | Some of "We" depend on sub-line diff highlighting during | code reviews in order to reason about refactors and | adding/removing arguments from function signatures. | | That this is generally a feature of the diff tool and not | the version control is a bit disappointing. | macintux wrote: | The title sounds like it could be fanboy clickbait but it's | actually a thoughtful look at how far CRDTs have come from the | viewpoint of an expert and skeptic. | | A good read. | anne_biene wrote: | It is wonderful to see so much enthusiasm about this technology. | I have been working on CRDTs since 2012 and it has been quite a | ride. | | For those looking for more information, have a look at the | information collected at http://crdt.tech/ (Disclaimer: I am | involved, though Martin did the bulk load of the work.) | | If you are into CRDTs for collaborative gaming, we are looking | for partners and investors: https://concordant.io (Disclaimer: I | am technical advisor in its team.) | bigfish24 wrote: | Great summary. CRDTs are a better fit for generalized data. | Having previously worked on an OT system, the central server | stickiness and merge complexity simply did not scale. There are | trade-offs with CRDTs, especially metadata, but as the post | mentions compression techniques are far more solvable in real- | world scenarios than a fundamental performance bottleneck at the | core. | csours wrote: | If you don't use CRDTs, you may be doomed to re-invent them. | Reading about them just now I realized that I spent the last year | developing a CRDT with LWW and OR characteristics. | | edit: updated 'you are doomed' to 'you may be doomed'. | jakobmartz3 wrote: | are they tho | lewisjoe wrote: | I'm part of the team that makes Zoho Writer (a Google Docs | alternative) - https://writer.zoho.com | | We went with OT for our real-time syncing of edits in 2010 and a | decade later, we are still sticking with OT for reasons I already | stated sometime back - | https://news.ycombinator.com/item?id=24186883 | | However, in the spirit of "There are no solutions, only trade- | offs" CRDTs are absolutely necessary for certain type of syncing | - like syncing a set of database nodes. | | But for systems which already mandate a central server | (SaaS/Cloud) and especially for a complex problem like rich-text | editing (i.e semantic trees) I still think OT provides better | trade-offs than CRDT. | | I respect Joseph's conviction on CRDTs being the future, so I | guess we'll figure this out sometime soon. | cordite wrote: | What does OT stand for? | | In the link, OT is aliased to "Operational Transformations" | mjhirn wrote: | "Operation Transformation" = "a system that supports | collaboration functionalities by separating the high-level | transformation (or integration) control from the low-level | transformation functions" | | Source: OT's Wikipedia article | | But I felt the same. Never heard of "Operation | Transformation" before and both OT and its alias were equally | opaque to me. | dwb wrote: | Have you not answered your own question? OT does indeed stand | for Operational Transformation. | | https://en.wikipedia.org/wiki/Operational_transformation | RangerScience wrote: | Interesting. I might be adding real-time edit syncing to a | hobby project sometime soon. Can you share more about the | trade-offs? | lewisjoe wrote: | I haven't yet completely watched Martin's talk on CRDTs, so I | might come back and stand corrected. For now these are some | well known trade-offs | | A central server: Most OT algorithms depend on a central | system for intention preservation. CRDTs are truly | distributed and need no central server at all. | | Memory: Traditionally CRDTs consume more memory because | deletions are preserved. OT lets you garbage collect some | operations since a central system is already recording those | ops and sequencing them as well. | | Analysing and cancelling ops: OT lets you easily analyse | incoming ops and modify/dummy-ify/cancel them without | breaking the consistency. This convenience is not necessary | for most cases, but really important for rich-text editing. | For example when someone merges a couple of table cells when | another user is deleting a column, we need to analyze these | operations and modify them so as not to end-up with an | invalid table structure. | passthefist wrote: | Seems like another one (based off the article) is ease of | use as well. I'm not familiar with either algorithm, but | sounds like OT is less complex and easier to understand, | which IMO is a decent tradeoff worth considering. | mdpye wrote: | Having worked a little with both, my impression is that | OT can get very complex in implementation edge cases. | CRDTs are incredibly difficult to design, but if you | successfully design one which can model your features, | implementation is pretty straightforward. | | A real world implication is that if you want to add a new | operation to a system (like, table column merge, or | moving a range of text), with OT, you can probably find a | way to extend what you have to get it in there, with a | painfully non-linear cost as you add more new operations. | With CRDTs, you may find yourself entirely back at the | drawing board. But the stuff you do support, you will | support pretty well and reliably... | | Personally, I prefer CRDTs for their elegance, but it can | be difficult in a world of evolving requirements | alextheparrot wrote: | I agree complexity is worth considering, though part of | me wonders how important that is in this case. The reason | for this intuition is that this is one of core parts of | what they're selling. | | If you're going to invest your complexity budget | somewhere, it seems like this is a good place for | companies dealing with these structures. | zamalek wrote: | Dealing with text is still an active area of research for | CRDTs. While the problem has been theoretically solved, the | solutions require much more memory/bandwidth than OT does.[1] | Conversely, CRDTs are _significantly_ better at replicating | graphs. | | yjs[2] is one CRDT that handles text reasonably well, but it | can still run into performance edge cases (as they | plainly/honestly admit in their README). | | [1]: https://github.com/automerge/automerge/issues/89 [2]: | https://github.com/yjs/yjs | z3t4 wrote: | The transform operation is more simple if you know the order | of things. For example in OT: nr2) Delete H from index 0. | nr1) Insert "Hello" at index 0. You know that nr1 should come | before nr2 because of a central counter. But with CRDT it's | a) Delete character id 0, b) Insert "Hello" at character with | id 0. | btreecat wrote: | My small startup company went with Zoho office at first because | of the price. But the features is what has us looking to stay | for a while. | | One thing I would love to see is the addition of wildcard | addresses like the way google has and microsoft added | (user+site_string@domain.com). | | Thanks for your hard work on a great product! | aidos wrote: | The Zoho ecosystem is this weird place where you can find | almost _everything_ , virtually for free. If you've never | looked before, check it out - it's expansive. | | Frustratingly though, there are so many features heaped in | that there is no cohesion. Things are frequently buggy, | unreliable and disjointed. I'd almost be able to forgive it | but unfortunately the support is really terrible too. | | I assessed a _lot_ of crm software and each one I kept | finding things they didn't have that zoho had but for the | reasons above we ultimately chose something else. Which is a | shame, because I would pay them a lot more than they ask, for | them to just be a little better. | Proven wrote: | Don't click on a link if you're unsure - from the title or URL | - the content is relevant to you. | | It's equally "disrespectful" to waste reader's time on 101 | content if that's now what the post is about. | [deleted] | taeric wrote: | I hate that I am skeptical on this. I suspect wave just left that | bad of a taste behind. So much hubris in what was claimed to be | possible. | | The ideas do look nice. And I suspect it has gotten farther than | I give credit. However, sequencing the edits of independent | actors is likely not something you will solve with a data | structure. | | Take the example of a doc getting overwhelmed. Let's say you can | make it so that you don't have a server to coordinate. Is it | realistic to think hundreds of people can edit a document in real | time at the same time and come up with something coherent? | | Best I can currently imagine is it works if they are editing | hundreds of pages. But, that is back to the basic wiki structure | working fine. | | So, help me fix my imagination. Why is this the future? | archagon wrote: | In the case of a text document, concurrent edits form branches | of a tree in many string CRDTs: | http://archagon.net/blog/2018/03/24/data-laced-with-history/ | | So yes, hundreds of people can edit a string and produce a | coherent result at the end. Contiguous runs of characters will | stick together and interleave with concurrent edits. | fwip wrote: | CRDTs don't guarantee coherence, but instead guarantee | consistency. | | The result may often be coherent at the sentence level if the | edits are normal human edits, but often will not be at the | whole-document level. | | For a simplistic example, if one person changes a frequently- | used term throughout the document, and another person uses | the old term in a bunch of places when writing new content, | the document will be semantically inconsistent, even though | all users made semantically consistent changes and are now | seeing the same eventually-consistent document. | | For a contrived example of local inconsistency, consider the | phrase "James had a bass on his wall." Alice rewrites this to | "James had a bass on his wall, a trophy from his fishing trip | last summer," and Brianna separately chooses "James, being | musically inclined, had hung his favorite bass on his wall." | The CRDT dutifully applies both edits, and resolves this as: | "James, being musically inclined, had hung his favorite bass | on his wall, a trophy from his fishing trip last summer." | | In nearly any system, semantic data is not completely | represented by any available data model. Any automatic | conflict-resolution model, no matter how smart, can lead to | semantically-nonsensical merges. | | CRDTs are very very cool. Too often, though, people think | that they can substitute for manual review and conflict | resolution. | derefr wrote: | Right. The problem CRDTs solve is the problem of the three- | way merge conflict in git: the problem of the "correct" | merge being _underspecified_ by the formalism, and so | _implementation dependent_. | | If two different git clients each implemented some | automated form of merge-conflict resolution; and then each | of them tried to resolve the same conflicting merge; then | each client might resolve the conflict in a _different, | implementation-dependent_ way, resulting in differing | commits. (This is already what happens even without | automation--the "implementation" being depended upon is | the set of manual case-by-case choices made by each human.) | | CRDTs are data structures that explicitly specify, in the | definition of what a conforming implementation would look | like, how "merge conflicts" for the data should be | resolved. (Really, they specify their way _around_ the data | ever coming into conflict -- thus "conflict-free" -- but | it's easier to talk about them resolving conflicts.) | | In the git analogy, you could think of a CRDT as a pair of | "data-format aware" algorithms: a merge algorithm, and a | pre-commit validation algorithm. The git client would, upon | commit, run the pre-commit validation algorithm specific to | the file's type, and only actually accept the commit if the | modified file remained "mergeable." The client would then, | upon merge, hand two of these files to a file-type-specific | merge algorithm, which would be guaranteed to succeed | assuming both inputs are "mergeable." Which they are, | because we only let "mergeable" files into commits. | | Such a framework, by itself, doesn't guarantee that | anything _good_ or _useful_ will come out the other end of | the process. Garbage In, Garbage Out. What it _does_ | guarantee, is that clients doing the same _merge_ , will | deterministically generate the same resulting _commit_. It | 's up to the designer of each CRDT data-structure to | specify a _useful_ merge algorithm for it; and it 's up to | the developer to define their data in terms of a CRDT data- | structure that has the right semantics. | nvader wrote: | That just sparked a thought. | | For a codebase, unit tests could be the pre-commit | validation algorithm. Then, as authors continue to edit | the piece, they both add unit tests, and merge the code. | In the face of a merge, the tests could be the deciding | factor between what emerges. | | Of course, unless you have conflicts in the tests | themselves. | digikata wrote: | So the CRDTs could be applied to a document and an | edit/change log to guarantee the consistency of the log and | its entries, not necessarily the document itself? | omgtehlion wrote: | I upvote for the link alone. This article (data-laced-with- | history) is the best source if you are starting your journey | into CRDTs. | deegles wrote: | What if the document starts empty and syncing doesn't happen | until everyone presses submit? Will it CRDTs produce a valid | document? Yes. Will it make any sense? Who knows. I think | that's what OP is getting at. | [deleted] | archagon wrote: | I read it as a question regarding OT vs. CRDTs, which I | believe would produce similar results even under heavy | concurrency. In terms of larger edits or refactors, you'd | probably need to do something else, e.g. lock the document | or section, unshare the document, use some sort of higher- | level CRDT that ships your changes atomically and forces a | manual merge on concurrent edits, etc. None of these | necessarily require a central server, though they may | require an active session between participants. | | I should also note that even if you use regular merge, and | the end state of a text document is a complete mess after a | refactor + concurrent edits, there's enough data in the | tree to simply pull out any concurrent contributions. They | could then be reapplied manually if needed. Perhaps the app | could even notice this automatically and provide an | optional UI for this process. Similarly, it would be | possible for the concurrent editors to remove the refactor | edits and thus "fork" their document. | taeric wrote: | My question was not meant to be OT versus CRDT. Rather, I | am questioning expectations at that shared editing use | case. | | Comparing to git (as others have done) is interesting. | The expectation is any merge is manually tested by the | user. Such that it is not just the git actions at play, | but all support activity. That is, the user flow assumes | all intermediate states are touched and verified by a | user. Where this is skipped, things increase the risk of | being broken. (Is why git bisect often fails projects | that don't build every commit.) | | Same for games. Some machine gets to set the record | straight as to what actually happened. Pretty much | always. The faster the path to the authority for every | edit, the higher chance of coherence. | | With hundreds of authorities, machine or not, this feels | intractable. | sagichmal wrote: | > Why is this the future? | | Here is an interview with someone using CRDTs to build an edge | state product that answers this question at a high level. | | https://www.infoq.com/articles/state-edge-peter-bourgon | jka wrote: | Your key insight, which is spot-on, is that nothing can prevent | human-level editing conflicts. | | If I was going to take an attempt at justifying the importance | of CRDTs, I would say: | | CRDTs are the future because they solve digital document-level | conflict. | | They don't bypass the problem the way that diff/patch/git | conflict resolution does, by requiring human intervention. | | Instead they truly and utterly obliterate the digital conflict | resolution problem: a group of people editing a document can | separately lose network connectivity, use different network | transports, reconvene as a subgroup of the original editors... | and their collective edits will always be resolved | automatically by software into a deterministic document that | fits within the original schema. | | If viable, this has far-reaching implications, particularly | related to cloud-based document and sharing systems. | taeric wrote: | But how do they obliterate it? They just move the authority, | no? | | That is, say you get a hundred machines editing a document. | They split into partitions for a time and eventually reunite | to a single one. What sort of coherent and usable data will | they make? Without basically electing a leader to reject | branches of the edits, sending them back to the machines | rejected? | jka wrote: | There's no leader node necessarily required; each | participant application in the session may have their own | local copy of the document, and they apply edits to that | using CRDT operations. | | It's no doubt possible to construct application that _don | 't_ behave correctly for certain combinations of edits -- | but the datastructures themselves should be robust under | any re-combination of the peer group's operations. | | Edit / addendum: to phrase this another way and perhaps | answer you more clearly: it's a responsibility of the | application designer to come up with a document format for | their application (and corresponding in-app edit | operations) that will tend to result in 'sensible' | recombinations under collaborative editing. | | My sense so far is that this is the tradeoff; the | complexity moves into the document format and edit | operations. But that's a (largely) one-off up-front cost, | and the infrastructure savings and offline/limited- | connectivity collaboration support it affords continue to | accrue over the lifetime of the software. | lallysingh wrote: | > sequencing the edits of independent actors is likely not | something you will solve with a data structure. | | Any multiplayer game does this. Git does this as well. | | So of course you can do this, it's a matter of how you | reconcile conflicts. Real-time interactive games will generally | choose a FIFO ordering based on what came into the server's NIC | first. Git makes the person pushing the merge reconcile first. | | For docs, live editing seems to work the same as in games. | Reconciliation for the decentralized workflow will be | interesting, but it's just going to be minimizing the hit to a | user when their version loses the argument. | samatman wrote: | "Twitch plays Google Docs" is always going to be incoherent, | for social reasons. CRDTs can make it possible, they can't make | it a good idea. | | But for a contrived example, a game with hundreds of players, | backed by an enormous JSON document, where the game engine is | in charge of making sure each move makes sense: A CRDT could | enable that, and each player could save a snapshot of the game | state as a simple text file, or save the entire history as the | whole CRDT. | | Or as a less contrived example, instead of a game, it's a chat | client, and it provides rich text a la Matrix, but there's no | server, it's all resolved with CRDTs and all data is kept | client-local for each client. | | There are a lot of cool things you can build with a performant | CRDT. | ragnese wrote: | > Is it realistic to think hundreds of people can edit a | document in real time at the same time and come up with | something coherent? | | And here's the thing: Can 100 people edit a document, _even in | theory_ , and have it make sense? I think the answer is "no," | with or without technology. | | I'm sure there are other uses for these data structures, but | shared editing is always the example I read about. | ssivark wrote: | Depends on what kind of document we're talking about I.e. how | the grammar captures the domain model. Eg: A shared ledger in | the case of digital currencies, or the linux source code | being worked on remotely by many people are exactly examples | of such documents. | taeric wrote: | I meant this to be my takeaway. The data structure is nice. | And I suspect it is a perfect fit for some use cases. I | question the use case of shared editing. Not just the | solution, but the use case. | nonbirithm wrote: | A question I always have is if CDRTs solve some problem with | collaborative editing, then can git's merge algorithm be | rewritten to use CDRTs and benefit from it somehow? | | Somehow I think the answer is no. There is a reason we still | have to manually drop down to a diff editor to resolve | certain kinds of conflicts after many decades. | dan-robertson wrote: | I think a better question is "what if merges were more well | behaved," where "well behaved" means they have nice | properties like associativity and having the minimal amount | of conflict without auto-resolving any cases that should | actually be a conflict. | | The problem with using a CRDT is the CR part: there are | generally merge conflicts in version control for a reason. | If your data type isn't "state of the repo with no | conflicts" or "history of the repo and current state with | no conflicts" but something like "history of the repo and | current state including conflicts from unresolved merges" | then maybe that would work but it feels pretty complicated | to explain and not very different from regular git. Also | note that you need history to correctly merge (if you do a | 3-way merge of a history of a file of "add line foo; delete | line foo" with a history of "add line foo; delete line foo; | add line foo" and common ancestor "add line foo", you | should end with a history equal to the second one I | described. But if you only look at the files you will | probably end up deleting foo) | | See also: darcs and pijul. | mattnewport wrote: | Git mostly treats merging as a line oriented diff problem. | Even though you can specify language aware diffing in | theory it doesn't seem to buy you much in practice (based | on my experience with the C# language-aware diff). | | It wouldn't make much sense to me to just plug a text CRDT | in place of a standard text diff. CRDTs like automerge are | capable of representing more complex tree structures | however and if you squint you can sort of imagine a world | where merging source code edits was done at something more | like the AST level rather than as lines of text. | | I've had some ugly merge conflicts that were a mix of | actual code changes and formatting changes which git diffs | tend not to be much help with. A system that really | understood the semantic structure of the code should in | theory be able to handle those a lot better. | | IDEs have powerful refactoring support these days like | renaming class members but source control is ignorant of | those things. One can imagine a more integrated system that | could understand a rename as a distinct operation and have | no trouble merging a rename with an actual code change that | touched some code that referenced the renamed thing in many | situations. Manual review would probably still be necessary | but the automated merge could get it right a much higher | percentage of the time. | dnautics wrote: | The answer is no, but unlike git, crdts make a choice for | you, and all nodes get convergent consistency. The problem | heretofore with crdts is that those choices have not been | sane. I think there are a recent crop of crdts that are | "95% sane" and honestly that's probably good enough. There | is an argument that optimal human choices will never be | reconciliable with commutativity, which I totally buy, but | I think there is also an argument for "let not the perfect | be the enemy of the awesome". And having made a choice, | even if it's not optimal, is a much firmer ground to build | upon than blocking on leaving a merge conflict undecided. | CydeWeys wrote: | It depends how big the document is, i.e. what is the density | of users per page. If it's a 100 page document and the 100 | users are all working on different sections, then it could | easily be possible. | | I just don't remotely see a use case for this. Real-time | human collaboration in general fails at a scale much smaller | than this, and not because of the tools available. | jandrese wrote: | Maybe if your "document" is the Encyclopedia Britannica? | Wikipedia has hundreds of editors working at once, but that | only really works because it's broken up into millions of | smaller parts that don't interact much. | jka wrote: | JoeDocs[1] could be a useful project to track related to this | - the Coronavirus Tech Handbook[2] amongst other | collaborative documents is now hosted by their service. | | They utilize the same YJS[3] library mentioned in the article | this thread discusses, and their GitHub repos include some | useful working demonstration application code. | | [1] - https://joedocs.com/ | | [2] - https://coronavirustechhandbook.com/ | | [3] - https://docs.yjs.dev/ | dan-robertson wrote: | Ultimately I think the answer is "it depends" but the issue | is that there is usually document structure which is mot | visible in the data structure itself. For example imagine | getting 100 people to fill out a row on a spreadsheet about | their preferences for some things or their availability on | certain dates. If each person simultaneously tries to fill in | the third row of the spreadsheet (after the headings and the | author), then a spreadsheet CRDT probably would suck at | merging the edits. But if you had a CRDT for the underlying | structure of this specific document you could probably merge | the changes (eg sort the set of rows alphabetically by name | and do something else if multiple documents have rows keyed | by the same name). | blackgirldev wrote: | Doesn't Redis implement CRDT's in production? | | https://redislabs.com/blog/diving-into-crdts/ | zegl wrote: | Riak as well, I've used it very successfully on projects in the | past. | | https://docs.riak.com/riak/kv/latest/developing/data-types/i... | dnautics wrote: | Yes, as does riak. There are plenty of simple crdts and the | theory, while recent, has all of it's fundamentals fleshed out. | We know what property makes data structures crdts, and how to | compose them, and how to prove they are crdts. | | Currently we are in the "discovery of new crdts" and | "engineering and implementing of older crdts reliably" phase, | and in some cases "discovering when not to use crdts". | | The crux of the this issue is that crdts that play nice with | human expectations in regards to collaborative document editing | are not known, possibly excepting automerge (yjs). As it's a | 'softer' concept will no good axioms, there is no solid theory | on how to combine the theoretical requirements of crdts with | human expectations. | einpoklum wrote: | It looks like it's basically biasing in favor of some | operations over others. In the link they talk about CRDT sets, | saying at some point: | | > 1. Adding wins over deleting. | | yeah, so, _maybe_ you can remove elements from your set. If | you're lucky. I dunno about all that... | samatman wrote: | That's an overly pessimistic way to put it. | | I think it's more accurate to say that _maybe_ you can remove | elements from your set... unless another actor wants them in | the set. | | That's not always the behavior you want. But if it is, it's | great. | mhale wrote: | I'm working on a project with some offline data synchronization | needs, but haven't started implementation yet. I've been | following CRDTs with interest. I also saw many of the same | downsides mentioned in the OP, e.g. bloat (which apparently are | being addressed remarkably well). Beyond OT, another approach | I've run across that looks very promising is Differential | Synchronization[1] by Neil Fraser. While it also relies on a | centralized server, it allows for servers to be chained in such a | way that seems to address many of the downsides of OT. I wonder | why I rarely ever see Differential Synchronization mentioned here | on HN? Is it due to lack of awareness or because of use-case fit | issues or some fatal flaw I haven't seen? Or something else? | | [1] https://www.youtube.com/watch?v=S2Hp_1jqpY8 | arendtio wrote: | I wonder why OT is restricted to a central server. In 2016/2017 I | wrote a Progressive Web App (PWA) for myself which uses an | algorithm which probably fits the category of OT. It uses a | WebDAV server for synchronization between devices. Yes, this is a | centralized server, but when some super slow & dumb WebDAV server | can serve this purpose, it should probably be possible to build | it on top of S3, a blockchain or something federated. | | My biggest issues at the time were around CORS as with a PWA you | can't simply use every server the user enters, as the same- | origin-policy keeps getting in your way. | yawrp wrote: | Interesting piece from last week comparing OT, CRDT, and Figma's | hybrid approach (good explainers of each too): | https://hex.tech/blog/a-pragmatic-approach-to-live-collabora... | xwdv wrote: | CRDT stands for conflict-free replicated data type. | contravariant wrote: | Thanks, I had to look it up as well. It's not the first article | I read on CRDTs but I definitely didn't recall what they were | from just the acronym. | dustingetz wrote: | A problem w/ e.g. CRDT datasync in web apps is data security, | HTTP resources impose control points where you know "why" the | client is asking for e.g. this chunk of social graph, it's | /profile/friendlist so the UI can ask for a very controlled and | tightly specified data projection for that particular UI and | consumed by tightly controlled javascript. Datasync is NOT for | scraper bots, arbitrary read patterns or any notion of general | access. | | Immutability makes data control way harder ... | rsync wrote: | "It was a general purpose medium (like paper). Unlike a lot of | other tools, it doesn't force you into its own workflow. You | could use it to do anything from plan holidays, make a wiki, play | D&D with your friends, schedule a meeting, etc." | | So, sort of like email ? ___________________________________________________________________ (page generated 2020-09-28 23:00 UTC)