[HN Gopher] FoundationDB: A Distributed, Unbundled, Transactiona... ___________________________________________________________________ FoundationDB: A Distributed, Unbundled, Transactional Key Value Store [pdf] Author : wwilson Score : 130 points Date : 2021-06-07 16:37 UTC (6 hours ago) (HTM) web link (www.foundationdb.org) (TXT) w3m dump (www.foundationdb.org) | jwr wrote: | I just implemented a database with changefeeds using FoundationDB | (in Clojure), to eventually replace RethinkDB in my system. Very | impressed so far. | jbverschoor wrote: | It's unfortunate that they went silent for years after the Apple | acquisition. That period was key for database adoption. I have | the feeling everybody kind of settled for pgsql. | threeseed wrote: | > I have the feeling everybody kind of settled for pgsql. | | That's probably because of spending time on this echo chamber. | | In reality everyone has likely been staying with the same | databases they know and love but just moved to the cloud. It's | why now AWS for example offers such a wide variety of databases | e.g. MySQL, PostgreSQL, SQL Server, Oracle, MongoDB, Cassandra, | Redis. | eloff wrote: | Those are two completely non overlapping use cases. If you can | use pgsql for your problem, you have no business trying to use | a distributed key value store instead. That would be at least | as dumb as driving screws with a hammer. | cwp wrote: | Yeah, but there are quite a few efforts out there to extend | PG into a distributed DB of one flavor or another. Some | examples are YugabyteDB, CockroachDB, Aurora and Citus. It's | a reasonable approach, but it's also reasonable to come at it | from the other direction - build a SQL engine on top of a | solid distributed key-value store. Contrafactuals are always | dicey, but FDB vanishing behind the Apple wall of silence | sure didn't help. | eyelovewe wrote: | CouchDB 4 is built upon Foundation FWIW | jbverschoor wrote: | Didn't know, very happy to hear | rubyn00bie wrote: | Here's one of my favorite articles on FoundationDB, where it | (FDB) passes Jepsen first try: | https://web.archive.org/web/20150312112556/http://blog.found... | | > I ran FoundationDB Key-Value Store through every nemesis in | Jepsen - including those that found failures in other databases - | and FoundationDB passed all of them with flying colors. | | FoundationDB is one of the coolest pieces of technology I've used | in the past decade. The tuple keyspace is incredibly useful, so | are the multi-key transactions. I've physically killed the power | on an FDB node and FDB cluster; multiple times (heh, home | servers)... and _every_ time the cluster or node just comes back. | gregwebs wrote: | That's great that you are doing your own resiliency testing. | | Having someone other than those officially on the Jepsen | project run the Jepsen test is a good start. However, many | databases have claimed to run the Jepsen tests themselves and | pass, but when there is an actual paid engagement for a | distributed database there are always issues that are found. | That's generally true even for unpaid official runs as well | although Zookeeper did pass existing tests. Every database is | different and the paid engagement will design specific tests | designed to break the database in question. | kendallgclark wrote: | This was the FDB team's stock demo in the early days. It's a | killer move. | jFriedensreich wrote: | I am pretty sure that the new cloudant transaction/storage engine | is also based on foundationDB, which powers a lot of things | behind the scenes at ibm. And couchdb 4 with foundationDB storage | engine is hopefully not too far out either. Lets see how long | this whole transition takes, but i am still hopeful that the | mindshare and motivation of apple, snowflake, ibm and apache | community will lead to something great. | jorangreef wrote: | Markus Pilman from Snowflake did an awesome talk on | FoundationDB's testing at CMU's Quarantine Tech Talks (2020), How | I Learned to Stop Worrying and Trust the Database: | | https://www.youtube.com/watch?v=OJb8A6h9jQQ | sgk284 wrote: | Here's another excellent talk at Strangeloop on FoundationDB's | simulation testing by Will Wilson in 2014: | https://www.youtube.com/watch?v=4fFDFbi3toc | jtdev wrote: | I'd love to see a good primer on data models and scenarios that | are well suited to FDB. | selljamhere wrote: | Their docs might be a good place to start. | https://apple.github.io/foundationdb/developer-guide.html#da... | sigstoat wrote: | this is limited by your creativity and willingness to make | tradeoffs. | | the only really general statement i can think of is that the | "larger"/"longer" your transactions are, the harder a time | you'll have getting it to cooperate with FDB. "small"/"fast" | transactions will be easier to fit into its model. | | (to likely replies: this isn't an absolute, see all the quotes. | yes things like redwood will alleviate some of this, but not | all.) | vvern wrote: | IIRC fdb is fully optimistic concurrency control. It doesn't | do any locking. If you have workloads which are highly | contended, you'll need to do something in the layer above to | coordinate. Otherwise, performance will be unbearable. | | This may be out-dated, please let me know if the story has | evolved here. | georgelyon wrote: | FDB is an awesome and unique piece of software (I attribute quite | a bit of Snowflake's success to FDB). I've also had the pleasure | of meeting some folks from the original team and they are true | engineers. Does anyone know if/when Redwood (the new storage | engine) has landed / will land? | victor106 wrote: | > I attribute quite a bit of Snowflake's success to FDB | | How so? | foobiekr wrote: | Snowflake is the biggest deployment of fdb in the world after | iCloud. | kendallgclark wrote: | Founders are building a distributed systems simulation product | now called Antithesis. My data fabric startup, Stardog, is a | happy Antithesis early adopter customer. It's helping us | reproduce and fix non-deterministic bugs deterministically. | Good stuff. | twoodfin wrote: | Did they ever implement a SQL layer? They seemed like one of the | only NoSQL products with the architecture to make it plausible to | do so. | polskibus wrote: | What is the backup / restore story in FoundationDB? How does it | compare to postgresql? | ex3ndr wrote: | Much much better. Single line backup/restore and Disaster | Recovery mode that syncs second DC and able too switch on the | fly with barely any configs (except one file). | e12e wrote: | This seems like a good place to ask - are there any new and | exiting FOSS "application" worth checking out? I recall from the | initial publication of the source - there was references to a | great sql layer? I don't know if a FOSS work-a-like ever | materialized? Other things I'd hoped for was a network | filesystem/blob layer, like maybe s3/nfs/webdavfs compatible? | What are people building on top of foundationdb today? | | Ed: i suppose various document/db applications - like IMAP might | be a good fit too? | jFriedensreich wrote: | large unstructured blobs and large files are among the things | not well suited to foundationdb and couchdb 4 actually reduced | supported blob size in the transition to foundationdb. it looks | like object/blob storage systems are at the moment rather | seperating more from key/value and document storage than | growing together. but this is a good thing because the | tradeoffs are very different and it allows each system to focus | on what it does best. blob stores will hopefully move even more | to content addressing and merkle dag similar to git and ipfs. | agency wrote: | I'm curious about this as well. Is anyone working on building | text search on top of FDB? It's kind of astounding to me that | last time I checked Elasticsearch was still essentially the | only game in town. | jFriedensreich wrote: | its pretty hard to catch up with lucene, there is just so | much work, features and brainpower in there at this point. as | many features of foundationdb such as the transaction | guarantees and reliability are not super important for | fulltext search i cannot imagine any company even apple or | ibm being able to justify that gigantic investment, instead | im sure nearly any soluion willcontinue to use lucene under | the hood for the forseeable future. | sigstoat wrote: | peruse the fdb forum. they produce document and record layers | now. there are community layers of varying quality for a | network block device, a filesystem, and a few other things. | AtlasBarfed wrote: | They got acquihired by apple, didn't they? Was. Fdb ever oss'd? | | Is it CP or AP? Comments seem to imply AP | ssgao wrote: | FoundationDB is Apache 2.0 | https://github.com/apple/foundationdb/blob/master/LICENSE | | It is CP per https://apple.github.io/foundationdb/cap- | theorem.html | kendallgclark wrote: | It wasn't an acquihire. Apple paid a lot of $$ for FDB. | [deleted] | ryanworl wrote: | Two quotes from the paper that I think will motivate people to | read it: | | "Rigorous correctness testing via simulation makes FDB extremely | reliable. In the past several years, CloudKit [59] has deployed | FDB for more than 0.5M disk years without a single data | corruption event. Additionally, we constantly perform data | consistency checks by comparing replicas of data records and | making sure they are the same. To this date, no inconsistent data | replicas have ever been found in our production clusters." | | "For example, early versions of FDB depended on Apache Zookeeper | for coordination, which was deleted after real-world fault | injection found two independent bugs in Zookeeper (circa 2010) | and was replaced by a de novo Paxos implementation written in | Flow. No production bugs have ever been reported since." | jeffbee wrote: | Ehhhh, doesn't align with my experience. I think FDB is | actually really poorly tested. When I was evaluating it for | replacement of the metadata key-value store at a major, public | web services company we found that injecting faults into | virtual NVMe devices on individual replicas would cause corrupt | results returned to clients. We also found that it would just | crash-loop on Linux systems with huge pages, because although | someone from the project had written a huge-page-aware C++ | allocator "for performance", evidently nobody had ever actually | tried to use it, including the author. | | It's also really, really weird that their non-scalable | architecture hits a brick wall at 25 machines. Ignoring the | correctness flaws, it only works if you can either design | around that limit by sharding, and never off cross-shard | transactions, or if you can assure yourself that your use case | will never outgrow half a rack of equipment. | fnordpiglet wrote: | Can you fix a point in time? Software evolves and I think a | point I saw is that it wasn't well tested then they changed | once production workloads told them it needs to change. | bpicolo wrote: | What were the strong contenders? | rbranson wrote: | Were there other distributed databases that did pass the | fault injection testing? | jeffbee wrote: | There weren't any, which is why that particular shop | elected to roll their own distributed system on top of | rocks. | | In general I think people who think they want to do | FoundationDB owe themselves a serious contemplation of the | cost/benefit of using Cloud Spanner instead. Obviously you | cannot do your own fault injection testing of Spanner, but | it does have end-to-end checksums. | sigstoat wrote: | > There weren't any, which is why that particular shop | elected to roll their own distributed system on top of | rocks. | | that's nuts. rocks could've been added as a storage | engine to fdb far more easily. | ryanworl wrote: | This is currently in progress right now. | | https://github.com/apple/foundationdb/blob/e7d7b39f12afa8 | ea2... | jeffbee wrote: | For the record, I said the same thing. But it's a | management problem because on the one hand you have a | known open project with demonstrable flaws, and on the | other you have your own in-house developers and you will | tend to discount the bugs they haven't written yet. | | But, also for the same record, thinking you can implement | a reliable, globally-replicated key-value store on top of | FoundationDB that is cheaper and better than Cloud | Spanner may be evidence of the same cognitive bias. | sigstoat wrote: | > But, also for the same record, thinking you can | implement a reliable, globally-replicated key-value store | on top of FoundationDB that is cheaper and better than | Cloud Spanner may be evidence of the same cognitive bias. | | man, good thing nobody made any claim like that. | sandinmyjoints wrote: | What is the Flow referred to here? | oconnor663 wrote: | It's an async/await framework for C++. I'm not sure what the | best source on this is, but here's a discussion: | https://forums.foundationdb.org/t/why-was-flow- | developed/171... | | My understanding is that FDB relies heavily on deterministic | simulations for testing, and that their async/await model is | a big part of how they make sure they cover different | possible interleavings in a deterministic way. | jorangreef wrote: | Thanks for the quotes, I've been wanting to read this paper for | some time. Great to see they went through the consensus | literature and made a decision to go with Active Disk Paxos, | instead of stopping short and not fully understanding the | consensus they're building on. The consensus and replication | protocol is such a huge part of building a distributed | database. | fizwhiz wrote: | > de novo Paxos implementation written in Flow | | That's... brave. Flow is a DSL built on top of C++? | alistairw wrote: | Yeah it's their own language on top of c++ to help them with | testing distributed systems with deterministic simulation. | | Their talk from a while ago about it was something that | really blew me away at the time [0] | | [0] https://www.youtube.com/watch?v=4fFDFbi3toc | monstrado wrote: | Have nothing but praise for FoundationDB. It has been by far the | most rock solid distributed database I have ever had the pleasure | of using. I used to manage HBase clusters, and the fact that I | have never once had to worry about manually splitting "regions" | is such a boon for administration...let alone JVM GC tuning. | | We run several FDB clusters using 3-DC replication and have never | once lost data. I remember when we wanted to replace all of the | FDB hardware (one cluster) in AWS, and so we just doubled the | cluster size, waited for data shuffling to calm down, and just | started axing the original hardware. We did this all while | performing over 100K production TPS. | | One thing that makes the above seamless for all existing | connections is that clients automatically update their "cluster | file" in the event that new coordinators join or are reassigned. | That alone is amazing...as you don't have to track down every | single client and change / re-roll with new connection | parameters. | | Anyway, I talk this database up every chance I get. Keep up the | awesome work. | | - A very happy user. ___________________________________________________________________ (page generated 2021-06-07 23:00 UTC)