[HN Gopher] CRDT: Fractional Indexing
       ___________________________________________________________________
        
       CRDT: Fractional Indexing
        
       Author : mblode
       Score  : 279 points
       Date   : 2022-11-27 17:03 UTC (11 hours ago)
        
 (HTM) web link (madebyevan.com)
 (TXT) w3m dump (madebyevan.com)
        
       | [deleted]
        
       | parentheses wrote:
       | Maybe I'm missing the point. Random offsets feels like an
       | inelegant solution in the space of elegant solutions (read:
       | CRDTs).
        
       | paulgb wrote:
       | This stuff is fun to play with. I implemented a Rust version of
       | fractional indexing based on another of Evan's blog posts.
       | 
       | https://docs.rs/fractional_index/latest/fractional_index/
        
         | dang wrote:
         | Ok, let's try changing the URL from
         | https://madebyevan.com/algos/ (a generic list) to
         | https://docs.rs/fractional_index/latest/fractional_index/ (the
         | item you're taking about).
         | 
         | Explanation here:
         | https://news.ycombinator.com/item?id=33764956. Thanks!
        
         | chrisoverzero wrote:
         | Does a Vec<u8> with that calculation listed end up smaller than
         | two integers of (presumably) arbitrary length?
         | 
         | When I read about the problem, I imagined using the mediant to
         | calculate "between". I think that mediant of $value and 1/1
         | would work for "after" and mediant of $value and 0/1 would work
         | for "before".
         | 
         | There must be a requirement I'm missing, though!
         | 
         | mediant: https://en.wikipedia.org/wiki/Mediant_(mathematics)
        
           | paulgb wrote:
           | That would be an interesting thing to test! My first instinct
           | is that the comparison operation (when denominators differ)
           | would be more expensive.
           | 
           | In general some approaches should be better than others
           | depending on how inserts are distributed -- I think my
           | approach is optimal when inserts are made at random (I
           | haven't proved this), but in practice it may be common for a
           | bunch of things to be inserted in sequence for some
           | applications. In those cases, there are more space-optimal
           | approaches.
        
       | conaclos wrote:
       | This is basically the idea behind Logoot [Weis_2009] that was
       | improved by LSeq [Nedelec_2013] and later extended to the first
       | block-wise sequence CRDT: LogootSplit [Andre_2013]. LogootSplit
       | was recently improved as Dotted LogootSplit [1] [Elvinger_2021].
       | 
       | Disclamer: I'm the author of Dotted LogootSplit.
       | 
       | [Weis_2009] https://hal.inria.fr/inria-00432368
       | 
       | [Nedelec_2013] https://hal.archives-ouvertes.fr/hal-00921633/en
       | 
       | [Andre_2013] https://hal.archives-ouvertes.fr/hal-01246212
       | 
       | [Elvinger_2021] https://hal.univ-lorraine.fr/tel-03284806
       | 
       | [1] https://github.com/coast-team/dotted-logootsplit
        
       | superb-owl wrote:
       | CRDTs are incredible. I recommend checking out the original link
       | [1] as well as this "awesome" list [2]
       | 
       | [1] https://madebyevan.com/algos/
       | 
       | [2] https://github.com/alangibson/awesome-crdt
        
       | jiffyjeff wrote:
       | Conflict-free replicated data type, or CRDT, is often referred to
       | as simply "CRDT" in trade jargon without any definition.
       | Apparently.
        
       | dang wrote:
       | This looks like great stuff if you follow the pointers, but lists
       | don't make good submissions to HN (which itself is already a
       | list). They tend not to lead to deep discussion because comments
       | are about lowest common denominator of the items on the list, and
       | this is usually pretty generic. What's better is to pick the most
       | interesting item on the list and submit that instead.
       | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...
       | 
       | Edit: let's do that in this case. I've changed the URL from
       | https://madebyevan.com/algos/ based on
       | https://news.ycombinator.com/item?id=33764875.
        
         | mblode wrote:
         | Thank you dang for updating the title and URL!
        
           | dang wrote:
           | The funny thing is that the discussion so far has still
           | mostly been generic, yet I swear the comments are higher-
           | quality than they would have been without the change.
           | (Impossible to say for sure, of course)
        
       | LAC-Tech wrote:
       | CRDTs are often talked about in the same breath as collaborative
       | editing software, but they're useful for much more than that.
       | 
       | They really are a theoretical model of how distributed,
       | convergent, multi-master systems have to work. IE the DT in CRDT
       | could be a whole datastore, not as just an individual document.
       | 
       | (Wish I could remember who on HN alerted me to this. I had read
       | the paper but didn't grok the full implications).
        
         | jdvh wrote:
         | We're using a single end-to-end encrypted document tree synced
         | with CRDTs for our collaborative task IDE[1]. All data for a
         | team is a single tree (graph really, if you count
         | transclusions) and its kind of magical how simple everything
         | gets when you know all state will sync in a deterministic way
         | between all clients. It doesn't matter whether you drag&drop
         | some object or add a new user, or rename something. It all
         | reduces to a handful of CRDT operations.
         | 
         | (We have a central server and the app works offline so the
         | algorithm from the linked article doesn't apply in our case.)
         | 
         | [1] https://thymer.com
        
           | jitl wrote:
           | Are you using Yjs? How are you thinking about scale-up for
           | teams with 100s - 1000s of users?
        
             | jdvh wrote:
             | We've looked at Yjs but ultimately decided to write our own
             | thing from scratch.
             | 
             | Even scale-ups with thousands of users will have people
             | working on different parts of the document tree in
             | practice. The client doesn't need to receive every
             | keystroke for people editing somewhere far away in the
             | document tree. Those updates can be sent in batches every
             | once in a while.
             | 
             | If 10.000 people decide to edit the exact same location in
             | a document then performance will degrade. The client will
             | start to lag behind if it can't keep up with the stream of
             | updates (cpu or internet bottleneck) but eventually it will
             | process everything and all clients will end up with the
             | same state. We have two channels. One for state updates
             | (soft real-time) and one for UI hints ("hard" real-time)
             | and other user actions that aren't CRDT mutations.
        
         | adamnemecek wrote:
         | Probably something by Martin Kleppmann?
        
         | newhouseb wrote:
         | You might be thinking of this [1] paper? The core idea being if
         | your problem space lends itself to monotonicity (see the paper
         | for a more precise definition than I can give), then you can
         | build a globally consistent database (around a CRDT) where the
         | end-user doesn't need to concern themselves with inconsistent
         | states while consistency is reached.
         | 
         | [1] https://arxiv.org/pdf/1901.01930.pdf
        
           | LAC-Tech wrote:
           | I wasn't clear - by "the paper" I meant the original CRDT
           | paper. I read it, thought I understood it on some level, but
           | had not drawn the dots between the theory and real world
           | problems.
           | 
           | Regardless, thanks very much for linking the paper! Right up
           | my alley.
        
             | sitkack wrote:
             | Marc Shapiro and Nuno Preguica created CRDTs in 2007.
             | 
             | https://pages.lip6.fr/Marc.Shapiro/pubs.html#CRDTs
             | 
             | The original paper would be
             | https://hal.inria.fr/inria-00177693/
        
       | bhl wrote:
       | Is there a good resource on designing collaborative apps with
       | CRDTS, that is which type of CRDT and conflict resolution to pick
       | for a given application? Something like
       | https://mattweidner.com/2022/02/10/collaborative-data-design...
       | but generalizes.
        
       | samwillis wrote:
       | Anyone unsure of what a CRDT is (I think everyone on HN must know
       | by now), this is the perfect intro:
       | https://www.inkandswitch.com/peritext/
       | 
       | The two most widely used CRDT implementations (combining JSON
       | like general purpose types and rich text editing types) are:
       | 
       | - Automerge https://github.com/automerge/automerge
       | 
       | - Yjs https://github.com/yjs/yjs
       | 
       | Both have JS and Rust implementations, and have bindings to most
       | online rich text editors.
       | 
       | CRDTs are addictive one you get into them.
        
         | [deleted]
        
         | taneq wrote:
         | Even that link was 5 pages in (on my phone) before it deigned
         | to mention:
         | 
         | > It is a Conflict-free Replicated Data Type (CRDT)
         | 
         | What happened to the idea of defining all non-universally-
         | recognised acronyms the _first_ time you use the term? With
         | people making up new terms exponentially faster than ever
         | before, it's now more important than ever.
        
           | al2o3cr wrote:
           | The first use is a hyperlink to a whole article defining the
           | term, what are you on about?
        
             | lelandfe wrote:
             | TBF, it uses the acronym 8 times across 500 words before
             | giving you the actual term.
        
             | glenngillen wrote:
             | Also, outside the page title/headings and the reference to
             | the name of said external paper the first use in the
             | document _is_ where they use the expanded name.
        
       | jitl wrote:
       | To me the other algorithms described in the list are more novel
       | and interesting:
       | 
       | https://madebyevan.com/algos/crdt-tree-based-indexing/ - for when
       | precise order is critical, like paragraphs in a document. This
       | algorithm is _almost_ like storing adjacency information like a
       | linked list, but is more convergent. Very interesting for [my
       | use-case](https://www.notion.so/blog/data-model-behind-notion).
       | 
       | https://madebyevan.com/algos/crdt-mutable-tree-hierarchy/ - for
       | tree-shaped data, like blocks in a Notion page that should have
       | exactly one parent, but allow concurrent re-parenting operations
       | 
       | https://madebyevan.com/algos/log-spaced-snapshots/ - log space
       | snapshots, for choosing what fidelity of historical information
       | to store. For context, many CRDTs for rich text or sequences
       | store unbounded history so that any edit made at any time can be
       | merged into the sequence. For long-lived documents, this could be
       | impractical to sync to all clients or keep in "hot" memory.
       | Instead, we can decide to compact historical data and move it to
       | cold storage, imposing a time boundary on what writes the system
       | can accept on the hot path. The log-spaced snapshots algorithm
       | here could be used to decide what should be kept "hot", and how
       | to tune the cold storage.
        
         | maxmcd wrote:
         | If you're working on CRDT stuff in production (or possibly in
         | production) do you have thoughts on the CRDT vs OT debate? I
         | would expect Notion to use operational transform given the
         | availability of reliable central servers. But I know quite
         | little! Interested in your thoughts.
        
           | ChadNauseam wrote:
           | I'm not the GP, but OT is pretty annoying to implement. There
           | are so many cases that it's quite difficult to formally prove
           | an OT correct. On the other hand, a large subset of CRDTs can
           | be implemented in Datalog and if you do that you can't
           | possibly end up with an invalid CRDT.
           | 
           | From wikipedia:
           | 
           | > Similarly, Joseph Gentle who is a former Google Wave
           | engineer and an author of the Share.JS library wrote,
           | "Unfortunately, implementing OT sucks. There's a million
           | algorithms with different tradeoffs, mostly trapped in
           | academic papers. [...] Wave took 2 years to write and if we
           | rewrote it today, it would take almost as long to write a
           | second time." But later he amends his comment with "I no
           | longer believe that wave would take 2 years to implement now
           | - mostly because of advances in web frameworks and web
           | browsers."
        
       | throwaway81523 wrote:
       | This is kind of interesting but "fractional indexing" doesn't
       | seem to be a computer science topic, and I think it might be
       | clearer to treat these indexes as lists of numbers (or ordinals
       | in o^o, if you prefer) rather than fractions. Those are simpler
       | to generate and compare than arbitrary-precision fractions. Or as
       | jitl's post suggests, using trees as indexes (I haven't yet
       | looked at jitl's linked articles). Those would presumably have
       | order type e0. It's not clear to me why you'd want that, but it
       | seems doable. In all these schemes you might occasionally want a
       | "stop the world garbage collection" where you reset all the
       | indices to be ordinary integers or maybe pairs of integers. I
       | guess that is also doable without having to pause all the
       | updates, at least if you use pairs.
        
         | zaphar wrote:
         | In a large distributed system stop the world garbage collection
         | is probably not going to work very well.
        
       | fatneckbeardz wrote:
       | reminds me a lot of Ford Circle / Farey Diagram / Stern Brocot
       | tree
       | 
       | Basically a tree of fractions where you take two rational points
       | on a number line, a/b and c/d, then the next point in the tree is
       | (a+b) / (c+d). Turns out that every single point you create this
       | way has a unique position and never duplicate each other, and it
       | forms a tree like structure.
       | 
       | https://en.wikipedia.org/wiki/Ford_circle
       | 
       | not sure if this would be useful, but basically it could be a
       | fractional index that has a built in tree structure, since it
       | basically means any fraction is a leaf on a Stern-Brocot tree.
        
       | brilee wrote:
       | Doesn't this end up being effectively a binary heap, with a
       | maximum tree depth of 23 (floating point mantissa precision)? I
       | imagine there must be a rebalancing operation required every so
       | often, possibly more frequently for pathological insertion
       | orders.
        
         | luto wrote:
         | > Fractional positions should be represented using arbitrary-
         | precision decimals so that they don't run out of precision.
         | Floating-point numbers are insufficient.
        
       | newhouseb wrote:
       | Is this the same as LSeq [1] except rather than using bytes one
       | is basically using digits in a floating point representation
       | (given this is JS where most things are floats)?
       | 
       | [1] https://bartoszsypytkowski.com/operation-based-crdts-
       | arrays-...
        
       | Shohrux wrote:
        
       | est wrote:
       | Reminds me of TokuDB engine for MySQL. Good times.
        
       | lelandfe wrote:
       | For those unaware: Evan is the cofounder, former CTO, and
       | technical wizard behind Figma; creator of esbuild, etc.
        
       ___________________________________________________________________
       (page generated 2022-11-28 05:00 UTC)