[HN Gopher] Building a BFT JSON CRDT ___________________________________________________________________ Building a BFT JSON CRDT Author : raykyri Score : 127 points Date : 2022-11-21 16:41 UTC (6 hours ago) (HTM) web link (jzhao.xyz) (TXT) w3m dump (jzhao.xyz) | adamdusty wrote: | For people like me: | | BFT - Byzantine Fault Tolerant [0] | | CRDT - Conflict-free Replicated Data Type [1] | | [0] https://en.wikipedia.org/wiki/Byzantine_fault | | [1] https://en.wikipedia.org/wiki/Conflict- | free_replicated_data_... | [deleted] | karmakaze wrote: | RGA/CT - Replicated Growable Array/Causal Tree | | I don't like unexplained acronyms/initialisms. | hansvm wrote: | JSON - JavaScript Object Notation [0] | | [0] https://en.m.wikipedia.org/wiki/JSON | MWil wrote: | when you see it...them... | athorax wrote: | we are legion | Karrot_Kream wrote: | Hash graph reconciliation is generally the harder part of this | problem as noted in future work, but this is a very exciting | direction. | jchanimal wrote: | Check out this implementation of hash-stable trees. The same | dataset generates the same Merkle hash, regardless of insertion | orders. This makes verifiable sync computationally efficient. | https://github.com/mikeal/prolly-trees | simonw wrote: | > I write this blog post mostly as a note to my past self, | distilling a lot of what I've learned since into a blog post I | wish I had read before going in | | The best kind of blog post! | lukeigel wrote: | Let's go Jackie!!! | thruflo wrote: | From the article, this appears to be an implementation of Making | CRDTs Byzantine Fault Tolerant [0] in Rust [1]. | | [0]: https://martin.kleppmann.com/papers/bft-crdt-papoc22.pdf | | [1]: https://github.com/jackyzha0/bft-json-crdt | recuter wrote: | To somebody who knows Erlang, isn't this reinventing the wheel | basically? | NavinF wrote: | AFAIK Erlang isn't byzantine fault tolerant, doesn't use CRDTs, | and doesn't even have a JSON library in its stdlib. So just | from the title you can tell it has nothing to do with the | article. | jansommer wrote: | Could you elaborate? How do you get CRDT's for free in Erlang? | thruflo wrote: | Erlang is a great language and runtime to build distributed | systems in but it doesn't provide primitives to resolve | concurrent overlapping updates. | | Hence projects like https://www.antidotedb.eu (CRDT database in | Erlang) | PointyFluff wrote: | A little early in the morning for so much alphabet soup. | MWil wrote: | or to give up on an opportunity to learn/enjoy something | jitl wrote: | I'm quite surprised by the [benchmarks versus Automerge JS & | Rust](https://github.com/jackyzha0/bft-json-crdt#benchmarks) when | it comes to memory: | | > Ours (Basic) 27.6MB | | > Ours (BFT) 59.5MB | | > Automerge (Rust) 232.5MB | | I would expect adding the public key tracking to use more memory; | I wonder how Automerge is spending so much more memory. Possibly | on a bunch of internal caches or memoization that give the order- | of-magnitude improvement in speed? | | > Ops: 100k | | > Ours (Basic) 9.321s | | > Ours (BFT) 38.842s | | > Automerge (Rust) 0.597s | samwillis wrote: | It's the Byzantine Fault Tolerant part of this that is | particularly innovative and based on Kleppmanns most recent work. | I believe it solves the issue of either malicious actors in the | networks modifying others transactions, spoofing them, or the | messages being modified by third parties ("outside" the network) | who have MITM the connection. These are really great problems to | solve. | | However, when I was experimenting with CRDTs a while back, it | seemed to me the other big issue is where multiple transactions | from different users combine to create an invalid state. Most | CRDT toolkits are aiming for a combination of a JSON like type | structure, with the addition on specific structures for rich | text. The JSON structures for general purpose data, and those for | rich text as that is the most common current use case. These sort | of general purpose CRDTs don't have a way to handle, say, the | maximum length of an array/string, or minimum and maximum values | for a number. They leave this to the application using the | toolkit. | | For the Yjs + ProseMirror system, effectively the CRDTs are | resolved outside of ProseMirror. Thats useful as they can be | combined/resolved on the server without a ProseMirror instance | running. However there is a strong possibility that the resulting | structure is no longer valid for your ProseMirror Schema, this is | currently handled by ProseMirror throwing away the invalid parts | of the document when it loads. | | What I think is needed is a Schema system that is a layer either | on top of these toolkits, or as part of them, that provides rules | and conflict resolution. So there is a way to specify the maximum | length of an array/string, or what the maximum value of a number | is. Effectively generic CRDTs that have an understanding of | developer supplied rules and bounds. | | The "maximum length" is an interesting one, as depending on the | order of transactions you could end up with a different result. | sambroner wrote: | When I was working on this problem with Fluid Framework, we did | a few interesting experiments with "owned objects" and | centralized ACL objects. I believe the team primarily | implemented centralized ACL because that implementation works | for many enterprise use cases. | | With a centralized schema provider, you run a connected node on | a trusted server and reject changes that are out of schema or | should not be accessed by a user. | | An owned object is an object where a user (or user group that | votes via quorum) that owns the object can veto changes to the | object. The changes are temporarily applied until accepted by | the owners. I haven't dug deep enough into this BFT | implementation to know how our model would map to this model. | sroussey wrote: | How does that compare to Google's Zanzibar? | sbazerque wrote: | I think this is the schema system you're asking for: | https://www.hyperhyperspace.org/ | | (specifically, the data representation part) | jitl wrote: | This is a cool idea, but I didn't find any examples of max | length constraints or other normalization rules in the source | code I reviewed. Maybe there's something in there. | | Here's some source code for an early, work-in-progress Wiki | CRDT: https://github.com/hyperhyperspace/wiki- | collab/blob/master/s... | | Page in the Wiki. Note that data types have a validate method | that returns true or false; maybe if false, they're just | dropped from the UI? Not sure how the method is used. | https://github.com/hyperhyperspace/wiki- | collab/blob/master/s... | | I haven't found the underlying text or rich text CRDT | implementation yet. | jackyzhao wrote: | Hey! Author of the post responding here :) Unfortunately | max/min-length is a global invariant and not one that can | be reinforced by CRDTs without coordination | | bloom-lang.net is a really cool project working on trying | to figure out what types of program state actually require | coordination at compile time | jitl wrote: | Right - I think what Sam is wondering about is if there's | a better way to maintain some of those invariants when | decoding the CRDT data or at least preserve user intent | with more fidelity by taking the invariants into account. | For example, if a ProseMirror schema says "figure can | have one caption child", then a CRDT library can assist | by making Child a LWW register instead of an array; or | picking the last item in the array instead of the first | when limiting length. | tekkk wrote: | Interesting, yeah access-control is kinda open problem with | Yjs. Regarding ProseMirror and rich-text documents, you can | mess up documents in other ways as well. Eg deploy a faulty | command with a transaction that inserts nodes with invalid | children (can be prevented though by using createChecked). Or | just changing your schema in an incompatible way with the | previous version. So you kinda have to deal with possible | malformed documents either way. | | Havent had documents corrupted by Yjs allowing changes that are | not parseable by schema though - has this happened to you? | | And about the schema layer on top of Yjs, you possibly could | inspect every update and apply some validation rules. Arent all | operations just inserts, updates or deletes of nodes? You can | at least rollback to previous version as you flush the updates | to the doc in the db. Not ideal though. | samwillis wrote: | No, I haven't seen a corrupt "unparceable" document. It's | more issues around documents that don't comply with the | schema. | | Say for example you have a <figure> node that can contain a | single optional <caption>. If two users concurrently add a | caption, then merge their changes, the Yjs document will | contain both. It has no concept of what the valid structure | is. When this is loaded into the ProseMirror the second | caption will be dropped. | tekkk wrote: | Hmm hadn't thought about that. Yeah Yjs should use more | granular steps instead of just replacing the whole doc each | time remote update is received. | barbazoo wrote: | That was a bit creepy while it lasted. (the mouse pointer thing | that is) | EGreg wrote: | Yessss finally someone is doing it. ___________________________________________________________________ (page generated 2022-11-21 23:00 UTC)