[HN Gopher] Building a BFT JSON CRDT
       ___________________________________________________________________
        
       Building a BFT JSON CRDT
        
       Author : raykyri
       Score  : 127 points
       Date   : 2022-11-21 16:41 UTC (6 hours ago)
        
 (HTM) web link (jzhao.xyz)
 (TXT) w3m dump (jzhao.xyz)
        
       | adamdusty wrote:
       | For people like me:
       | 
       | BFT - Byzantine Fault Tolerant [0]
       | 
       | CRDT - Conflict-free Replicated Data Type [1]
       | 
       | [0] https://en.wikipedia.org/wiki/Byzantine_fault
       | 
       | [1] https://en.wikipedia.org/wiki/Conflict-
       | free_replicated_data_...
        
         | [deleted]
        
         | karmakaze wrote:
         | RGA/CT - Replicated Growable Array/Causal Tree
         | 
         | I don't like unexplained acronyms/initialisms.
        
         | hansvm wrote:
         | JSON - JavaScript Object Notation [0]
         | 
         | [0] https://en.m.wikipedia.org/wiki/JSON
        
       | MWil wrote:
       | when you see it...them...
        
         | athorax wrote:
         | we are legion
        
       | Karrot_Kream wrote:
       | Hash graph reconciliation is generally the harder part of this
       | problem as noted in future work, but this is a very exciting
       | direction.
        
         | jchanimal wrote:
         | Check out this implementation of hash-stable trees. The same
         | dataset generates the same Merkle hash, regardless of insertion
         | orders. This makes verifiable sync computationally efficient.
         | https://github.com/mikeal/prolly-trees
        
       | simonw wrote:
       | > I write this blog post mostly as a note to my past self,
       | distilling a lot of what I've learned since into a blog post I
       | wish I had read before going in
       | 
       | The best kind of blog post!
        
       | lukeigel wrote:
       | Let's go Jackie!!!
        
       | thruflo wrote:
       | From the article, this appears to be an implementation of Making
       | CRDTs Byzantine Fault Tolerant [0] in Rust [1].
       | 
       | [0]: https://martin.kleppmann.com/papers/bft-crdt-papoc22.pdf
       | 
       | [1]: https://github.com/jackyzha0/bft-json-crdt
        
       | recuter wrote:
       | To somebody who knows Erlang, isn't this reinventing the wheel
       | basically?
        
         | NavinF wrote:
         | AFAIK Erlang isn't byzantine fault tolerant, doesn't use CRDTs,
         | and doesn't even have a JSON library in its stdlib. So just
         | from the title you can tell it has nothing to do with the
         | article.
        
         | jansommer wrote:
         | Could you elaborate? How do you get CRDT's for free in Erlang?
        
         | thruflo wrote:
         | Erlang is a great language and runtime to build distributed
         | systems in but it doesn't provide primitives to resolve
         | concurrent overlapping updates.
         | 
         | Hence projects like https://www.antidotedb.eu (CRDT database in
         | Erlang)
        
       | PointyFluff wrote:
       | A little early in the morning for so much alphabet soup.
        
         | MWil wrote:
         | or to give up on an opportunity to learn/enjoy something
        
       | jitl wrote:
       | I'm quite surprised by the [benchmarks versus Automerge JS &
       | Rust](https://github.com/jackyzha0/bft-json-crdt#benchmarks) when
       | it comes to memory:
       | 
       | > Ours (Basic) 27.6MB
       | 
       | > Ours (BFT) 59.5MB
       | 
       | > Automerge (Rust) 232.5MB
       | 
       | I would expect adding the public key tracking to use more memory;
       | I wonder how Automerge is spending so much more memory. Possibly
       | on a bunch of internal caches or memoization that give the order-
       | of-magnitude improvement in speed?
       | 
       | > Ops: 100k
       | 
       | > Ours (Basic) 9.321s
       | 
       | > Ours (BFT) 38.842s
       | 
       | > Automerge (Rust) 0.597s
        
       | samwillis wrote:
       | It's the Byzantine Fault Tolerant part of this that is
       | particularly innovative and based on Kleppmanns most recent work.
       | I believe it solves the issue of either malicious actors in the
       | networks modifying others transactions, spoofing them, or the
       | messages being modified by third parties ("outside" the network)
       | who have MITM the connection. These are really great problems to
       | solve.
       | 
       | However, when I was experimenting with CRDTs a while back, it
       | seemed to me the other big issue is where multiple transactions
       | from different users combine to create an invalid state. Most
       | CRDT toolkits are aiming for a combination of a JSON like type
       | structure, with the addition on specific structures for rich
       | text. The JSON structures for general purpose data, and those for
       | rich text as that is the most common current use case. These sort
       | of general purpose CRDTs don't have a way to handle, say, the
       | maximum length of an array/string, or minimum and maximum values
       | for a number. They leave this to the application using the
       | toolkit.
       | 
       | For the Yjs + ProseMirror system, effectively the CRDTs are
       | resolved outside of ProseMirror. Thats useful as they can be
       | combined/resolved on the server without a ProseMirror instance
       | running. However there is a strong possibility that the resulting
       | structure is no longer valid for your ProseMirror Schema, this is
       | currently handled by ProseMirror throwing away the invalid parts
       | of the document when it loads.
       | 
       | What I think is needed is a Schema system that is a layer either
       | on top of these toolkits, or as part of them, that provides rules
       | and conflict resolution. So there is a way to specify the maximum
       | length of an array/string, or what the maximum value of a number
       | is. Effectively generic CRDTs that have an understanding of
       | developer supplied rules and bounds.
       | 
       | The "maximum length" is an interesting one, as depending on the
       | order of transactions you could end up with a different result.
        
         | sambroner wrote:
         | When I was working on this problem with Fluid Framework, we did
         | a few interesting experiments with "owned objects" and
         | centralized ACL objects. I believe the team primarily
         | implemented centralized ACL because that implementation works
         | for many enterprise use cases.
         | 
         | With a centralized schema provider, you run a connected node on
         | a trusted server and reject changes that are out of schema or
         | should not be accessed by a user.
         | 
         | An owned object is an object where a user (or user group that
         | votes via quorum) that owns the object can veto changes to the
         | object. The changes are temporarily applied until accepted by
         | the owners. I haven't dug deep enough into this BFT
         | implementation to know how our model would map to this model.
        
           | sroussey wrote:
           | How does that compare to Google's Zanzibar?
        
         | sbazerque wrote:
         | I think this is the schema system you're asking for:
         | https://www.hyperhyperspace.org/
         | 
         | (specifically, the data representation part)
        
           | jitl wrote:
           | This is a cool idea, but I didn't find any examples of max
           | length constraints or other normalization rules in the source
           | code I reviewed. Maybe there's something in there.
           | 
           | Here's some source code for an early, work-in-progress Wiki
           | CRDT: https://github.com/hyperhyperspace/wiki-
           | collab/blob/master/s...
           | 
           | Page in the Wiki. Note that data types have a validate method
           | that returns true or false; maybe if false, they're just
           | dropped from the UI? Not sure how the method is used.
           | https://github.com/hyperhyperspace/wiki-
           | collab/blob/master/s...
           | 
           | I haven't found the underlying text or rich text CRDT
           | implementation yet.
        
             | jackyzhao wrote:
             | Hey! Author of the post responding here :) Unfortunately
             | max/min-length is a global invariant and not one that can
             | be reinforced by CRDTs without coordination
             | 
             | bloom-lang.net is a really cool project working on trying
             | to figure out what types of program state actually require
             | coordination at compile time
        
               | jitl wrote:
               | Right - I think what Sam is wondering about is if there's
               | a better way to maintain some of those invariants when
               | decoding the CRDT data or at least preserve user intent
               | with more fidelity by taking the invariants into account.
               | For example, if a ProseMirror schema says "figure can
               | have one caption child", then a CRDT library can assist
               | by making Child a LWW register instead of an array; or
               | picking the last item in the array instead of the first
               | when limiting length.
        
         | tekkk wrote:
         | Interesting, yeah access-control is kinda open problem with
         | Yjs. Regarding ProseMirror and rich-text documents, you can
         | mess up documents in other ways as well. Eg deploy a faulty
         | command with a transaction that inserts nodes with invalid
         | children (can be prevented though by using createChecked). Or
         | just changing your schema in an incompatible way with the
         | previous version. So you kinda have to deal with possible
         | malformed documents either way.
         | 
         | Havent had documents corrupted by Yjs allowing changes that are
         | not parseable by schema though - has this happened to you?
         | 
         | And about the schema layer on top of Yjs, you possibly could
         | inspect every update and apply some validation rules. Arent all
         | operations just inserts, updates or deletes of nodes? You can
         | at least rollback to previous version as you flush the updates
         | to the doc in the db. Not ideal though.
        
           | samwillis wrote:
           | No, I haven't seen a corrupt "unparceable" document. It's
           | more issues around documents that don't comply with the
           | schema.
           | 
           | Say for example you have a <figure> node that can contain a
           | single optional <caption>. If two users concurrently add a
           | caption, then merge their changes, the Yjs document will
           | contain both. It has no concept of what the valid structure
           | is. When this is loaded into the ProseMirror the second
           | caption will be dropped.
        
             | tekkk wrote:
             | Hmm hadn't thought about that. Yeah Yjs should use more
             | granular steps instead of just replacing the whole doc each
             | time remote update is received.
        
       | barbazoo wrote:
       | That was a bit creepy while it lasted. (the mouse pointer thing
       | that is)
        
       | EGreg wrote:
       | Yessss finally someone is doing it.
        
       ___________________________________________________________________
       (page generated 2022-11-21 23:00 UTC)