[HN Gopher] CRDT: Fractional Indexing ___________________________________________________________________ CRDT: Fractional Indexing Author : mblode Score : 279 points Date : 2022-11-27 17:03 UTC (11 hours ago) (HTM) web link (madebyevan.com) (TXT) w3m dump (madebyevan.com) | [deleted] | parentheses wrote: | Maybe I'm missing the point. Random offsets feels like an | inelegant solution in the space of elegant solutions (read: | CRDTs). | paulgb wrote: | This stuff is fun to play with. I implemented a Rust version of | fractional indexing based on another of Evan's blog posts. | | https://docs.rs/fractional_index/latest/fractional_index/ | dang wrote: | Ok, let's try changing the URL from | https://madebyevan.com/algos/ (a generic list) to | https://docs.rs/fractional_index/latest/fractional_index/ (the | item you're taking about). | | Explanation here: | https://news.ycombinator.com/item?id=33764956. Thanks! | chrisoverzero wrote: | Does a Vec<u8> with that calculation listed end up smaller than | two integers of (presumably) arbitrary length? | | When I read about the problem, I imagined using the mediant to | calculate "between". I think that mediant of $value and 1/1 | would work for "after" and mediant of $value and 0/1 would work | for "before". | | There must be a requirement I'm missing, though! | | mediant: https://en.wikipedia.org/wiki/Mediant_(mathematics) | paulgb wrote: | That would be an interesting thing to test! My first instinct | is that the comparison operation (when denominators differ) | would be more expensive. | | In general some approaches should be better than others | depending on how inserts are distributed -- I think my | approach is optimal when inserts are made at random (I | haven't proved this), but in practice it may be common for a | bunch of things to be inserted in sequence for some | applications. In those cases, there are more space-optimal | approaches. | conaclos wrote: | This is basically the idea behind Logoot [Weis_2009] that was | improved by LSeq [Nedelec_2013] and later extended to the first | block-wise sequence CRDT: LogootSplit [Andre_2013]. LogootSplit | was recently improved as Dotted LogootSplit [1] [Elvinger_2021]. | | Disclamer: I'm the author of Dotted LogootSplit. | | [Weis_2009] https://hal.inria.fr/inria-00432368 | | [Nedelec_2013] https://hal.archives-ouvertes.fr/hal-00921633/en | | [Andre_2013] https://hal.archives-ouvertes.fr/hal-01246212 | | [Elvinger_2021] https://hal.univ-lorraine.fr/tel-03284806 | | [1] https://github.com/coast-team/dotted-logootsplit | superb-owl wrote: | CRDTs are incredible. I recommend checking out the original link | [1] as well as this "awesome" list [2] | | [1] https://madebyevan.com/algos/ | | [2] https://github.com/alangibson/awesome-crdt | jiffyjeff wrote: | Conflict-free replicated data type, or CRDT, is often referred to | as simply "CRDT" in trade jargon without any definition. | Apparently. | dang wrote: | This looks like great stuff if you follow the pointers, but lists | don't make good submissions to HN (which itself is already a | list). They tend not to lead to deep discussion because comments | are about lowest common denominator of the items on the list, and | this is usually pretty generic. What's better is to pick the most | interesting item on the list and submit that instead. | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor... | | Edit: let's do that in this case. I've changed the URL from | https://madebyevan.com/algos/ based on | https://news.ycombinator.com/item?id=33764875. | mblode wrote: | Thank you dang for updating the title and URL! | dang wrote: | The funny thing is that the discussion so far has still | mostly been generic, yet I swear the comments are higher- | quality than they would have been without the change. | (Impossible to say for sure, of course) | LAC-Tech wrote: | CRDTs are often talked about in the same breath as collaborative | editing software, but they're useful for much more than that. | | They really are a theoretical model of how distributed, | convergent, multi-master systems have to work. IE the DT in CRDT | could be a whole datastore, not as just an individual document. | | (Wish I could remember who on HN alerted me to this. I had read | the paper but didn't grok the full implications). | jdvh wrote: | We're using a single end-to-end encrypted document tree synced | with CRDTs for our collaborative task IDE[1]. All data for a | team is a single tree (graph really, if you count | transclusions) and its kind of magical how simple everything | gets when you know all state will sync in a deterministic way | between all clients. It doesn't matter whether you drag&drop | some object or add a new user, or rename something. It all | reduces to a handful of CRDT operations. | | (We have a central server and the app works offline so the | algorithm from the linked article doesn't apply in our case.) | | [1] https://thymer.com | jitl wrote: | Are you using Yjs? How are you thinking about scale-up for | teams with 100s - 1000s of users? | jdvh wrote: | We've looked at Yjs but ultimately decided to write our own | thing from scratch. | | Even scale-ups with thousands of users will have people | working on different parts of the document tree in | practice. The client doesn't need to receive every | keystroke for people editing somewhere far away in the | document tree. Those updates can be sent in batches every | once in a while. | | If 10.000 people decide to edit the exact same location in | a document then performance will degrade. The client will | start to lag behind if it can't keep up with the stream of | updates (cpu or internet bottleneck) but eventually it will | process everything and all clients will end up with the | same state. We have two channels. One for state updates | (soft real-time) and one for UI hints ("hard" real-time) | and other user actions that aren't CRDT mutations. | adamnemecek wrote: | Probably something by Martin Kleppmann? | newhouseb wrote: | You might be thinking of this [1] paper? The core idea being if | your problem space lends itself to monotonicity (see the paper | for a more precise definition than I can give), then you can | build a globally consistent database (around a CRDT) where the | end-user doesn't need to concern themselves with inconsistent | states while consistency is reached. | | [1] https://arxiv.org/pdf/1901.01930.pdf | LAC-Tech wrote: | I wasn't clear - by "the paper" I meant the original CRDT | paper. I read it, thought I understood it on some level, but | had not drawn the dots between the theory and real world | problems. | | Regardless, thanks very much for linking the paper! Right up | my alley. | sitkack wrote: | Marc Shapiro and Nuno Preguica created CRDTs in 2007. | | https://pages.lip6.fr/Marc.Shapiro/pubs.html#CRDTs | | The original paper would be | https://hal.inria.fr/inria-00177693/ | bhl wrote: | Is there a good resource on designing collaborative apps with | CRDTS, that is which type of CRDT and conflict resolution to pick | for a given application? Something like | https://mattweidner.com/2022/02/10/collaborative-data-design... | but generalizes. | samwillis wrote: | Anyone unsure of what a CRDT is (I think everyone on HN must know | by now), this is the perfect intro: | https://www.inkandswitch.com/peritext/ | | The two most widely used CRDT implementations (combining JSON | like general purpose types and rich text editing types) are: | | - Automerge https://github.com/automerge/automerge | | - Yjs https://github.com/yjs/yjs | | Both have JS and Rust implementations, and have bindings to most | online rich text editors. | | CRDTs are addictive one you get into them. | [deleted] | taneq wrote: | Even that link was 5 pages in (on my phone) before it deigned | to mention: | | > It is a Conflict-free Replicated Data Type (CRDT) | | What happened to the idea of defining all non-universally- | recognised acronyms the _first_ time you use the term? With | people making up new terms exponentially faster than ever | before, it's now more important than ever. | al2o3cr wrote: | The first use is a hyperlink to a whole article defining the | term, what are you on about? | lelandfe wrote: | TBF, it uses the acronym 8 times across 500 words before | giving you the actual term. | glenngillen wrote: | Also, outside the page title/headings and the reference to | the name of said external paper the first use in the | document _is_ where they use the expanded name. | jitl wrote: | To me the other algorithms described in the list are more novel | and interesting: | | https://madebyevan.com/algos/crdt-tree-based-indexing/ - for when | precise order is critical, like paragraphs in a document. This | algorithm is _almost_ like storing adjacency information like a | linked list, but is more convergent. Very interesting for [my | use-case](https://www.notion.so/blog/data-model-behind-notion). | | https://madebyevan.com/algos/crdt-mutable-tree-hierarchy/ - for | tree-shaped data, like blocks in a Notion page that should have | exactly one parent, but allow concurrent re-parenting operations | | https://madebyevan.com/algos/log-spaced-snapshots/ - log space | snapshots, for choosing what fidelity of historical information | to store. For context, many CRDTs for rich text or sequences | store unbounded history so that any edit made at any time can be | merged into the sequence. For long-lived documents, this could be | impractical to sync to all clients or keep in "hot" memory. | Instead, we can decide to compact historical data and move it to | cold storage, imposing a time boundary on what writes the system | can accept on the hot path. The log-spaced snapshots algorithm | here could be used to decide what should be kept "hot", and how | to tune the cold storage. | maxmcd wrote: | If you're working on CRDT stuff in production (or possibly in | production) do you have thoughts on the CRDT vs OT debate? I | would expect Notion to use operational transform given the | availability of reliable central servers. But I know quite | little! Interested in your thoughts. | ChadNauseam wrote: | I'm not the GP, but OT is pretty annoying to implement. There | are so many cases that it's quite difficult to formally prove | an OT correct. On the other hand, a large subset of CRDTs can | be implemented in Datalog and if you do that you can't | possibly end up with an invalid CRDT. | | From wikipedia: | | > Similarly, Joseph Gentle who is a former Google Wave | engineer and an author of the Share.JS library wrote, | "Unfortunately, implementing OT sucks. There's a million | algorithms with different tradeoffs, mostly trapped in | academic papers. [...] Wave took 2 years to write and if we | rewrote it today, it would take almost as long to write a | second time." But later he amends his comment with "I no | longer believe that wave would take 2 years to implement now | - mostly because of advances in web frameworks and web | browsers." | throwaway81523 wrote: | This is kind of interesting but "fractional indexing" doesn't | seem to be a computer science topic, and I think it might be | clearer to treat these indexes as lists of numbers (or ordinals | in o^o, if you prefer) rather than fractions. Those are simpler | to generate and compare than arbitrary-precision fractions. Or as | jitl's post suggests, using trees as indexes (I haven't yet | looked at jitl's linked articles). Those would presumably have | order type e0. It's not clear to me why you'd want that, but it | seems doable. In all these schemes you might occasionally want a | "stop the world garbage collection" where you reset all the | indices to be ordinary integers or maybe pairs of integers. I | guess that is also doable without having to pause all the | updates, at least if you use pairs. | zaphar wrote: | In a large distributed system stop the world garbage collection | is probably not going to work very well. | fatneckbeardz wrote: | reminds me a lot of Ford Circle / Farey Diagram / Stern Brocot | tree | | Basically a tree of fractions where you take two rational points | on a number line, a/b and c/d, then the next point in the tree is | (a+b) / (c+d). Turns out that every single point you create this | way has a unique position and never duplicate each other, and it | forms a tree like structure. | | https://en.wikipedia.org/wiki/Ford_circle | | not sure if this would be useful, but basically it could be a | fractional index that has a built in tree structure, since it | basically means any fraction is a leaf on a Stern-Brocot tree. | brilee wrote: | Doesn't this end up being effectively a binary heap, with a | maximum tree depth of 23 (floating point mantissa precision)? I | imagine there must be a rebalancing operation required every so | often, possibly more frequently for pathological insertion | orders. | luto wrote: | > Fractional positions should be represented using arbitrary- | precision decimals so that they don't run out of precision. | Floating-point numbers are insufficient. | newhouseb wrote: | Is this the same as LSeq [1] except rather than using bytes one | is basically using digits in a floating point representation | (given this is JS where most things are floats)? | | [1] https://bartoszsypytkowski.com/operation-based-crdts- | arrays-... | Shohrux wrote: | est wrote: | Reminds me of TokuDB engine for MySQL. Good times. | lelandfe wrote: | For those unaware: Evan is the cofounder, former CTO, and | technical wizard behind Figma; creator of esbuild, etc. ___________________________________________________________________ (page generated 2022-11-28 05:00 UTC)