[HN Gopher] FoundationDB: A Distributed, Unbundled, Transactiona...
       ___________________________________________________________________
        
       FoundationDB: A Distributed, Unbundled, Transactional Key Value
       Store [pdf]
        
       Author : wwilson
       Score  : 130 points
       Date   : 2021-06-07 16:37 UTC (6 hours ago)
        
 (HTM) web link (www.foundationdb.org)
 (TXT) w3m dump (www.foundationdb.org)
        
       | jwr wrote:
       | I just implemented a database with changefeeds using FoundationDB
       | (in Clojure), to eventually replace RethinkDB in my system. Very
       | impressed so far.
        
       | jbverschoor wrote:
       | It's unfortunate that they went silent for years after the Apple
       | acquisition. That period was key for database adoption. I have
       | the feeling everybody kind of settled for pgsql.
        
         | threeseed wrote:
         | > I have the feeling everybody kind of settled for pgsql.
         | 
         | That's probably because of spending time on this echo chamber.
         | 
         | In reality everyone has likely been staying with the same
         | databases they know and love but just moved to the cloud. It's
         | why now AWS for example offers such a wide variety of databases
         | e.g. MySQL, PostgreSQL, SQL Server, Oracle, MongoDB, Cassandra,
         | Redis.
        
         | eloff wrote:
         | Those are two completely non overlapping use cases. If you can
         | use pgsql for your problem, you have no business trying to use
         | a distributed key value store instead. That would be at least
         | as dumb as driving screws with a hammer.
        
           | cwp wrote:
           | Yeah, but there are quite a few efforts out there to extend
           | PG into a distributed DB of one flavor or another. Some
           | examples are YugabyteDB, CockroachDB, Aurora and Citus. It's
           | a reasonable approach, but it's also reasonable to come at it
           | from the other direction - build a SQL engine on top of a
           | solid distributed key-value store. Contrafactuals are always
           | dicey, but FDB vanishing behind the Apple wall of silence
           | sure didn't help.
        
       | eyelovewe wrote:
       | CouchDB 4 is built upon Foundation FWIW
        
         | jbverschoor wrote:
         | Didn't know, very happy to hear
        
       | rubyn00bie wrote:
       | Here's one of my favorite articles on FoundationDB, where it
       | (FDB) passes Jepsen first try:
       | https://web.archive.org/web/20150312112556/http://blog.found...
       | 
       | > I ran FoundationDB Key-Value Store through every nemesis in
       | Jepsen - including those that found failures in other databases -
       | and FoundationDB passed all of them with flying colors.
       | 
       | FoundationDB is one of the coolest pieces of technology I've used
       | in the past decade. The tuple keyspace is incredibly useful, so
       | are the multi-key transactions. I've physically killed the power
       | on an FDB node and FDB cluster; multiple times (heh, home
       | servers)... and _every_ time the cluster or node just comes back.
        
         | gregwebs wrote:
         | That's great that you are doing your own resiliency testing.
         | 
         | Having someone other than those officially on the Jepsen
         | project run the Jepsen test is a good start. However, many
         | databases have claimed to run the Jepsen tests themselves and
         | pass, but when there is an actual paid engagement for a
         | distributed database there are always issues that are found.
         | That's generally true even for unpaid official runs as well
         | although Zookeeper did pass existing tests. Every database is
         | different and the paid engagement will design specific tests
         | designed to break the database in question.
        
         | kendallgclark wrote:
         | This was the FDB team's stock demo in the early days. It's a
         | killer move.
        
       | jFriedensreich wrote:
       | I am pretty sure that the new cloudant transaction/storage engine
       | is also based on foundationDB, which powers a lot of things
       | behind the scenes at ibm. And couchdb 4 with foundationDB storage
       | engine is hopefully not too far out either. Lets see how long
       | this whole transition takes, but i am still hopeful that the
       | mindshare and motivation of apple, snowflake, ibm and apache
       | community will lead to something great.
        
       | jorangreef wrote:
       | Markus Pilman from Snowflake did an awesome talk on
       | FoundationDB's testing at CMU's Quarantine Tech Talks (2020), How
       | I Learned to Stop Worrying and Trust the Database:
       | 
       | https://www.youtube.com/watch?v=OJb8A6h9jQQ
        
         | sgk284 wrote:
         | Here's another excellent talk at Strangeloop on FoundationDB's
         | simulation testing by Will Wilson in 2014:
         | https://www.youtube.com/watch?v=4fFDFbi3toc
        
       | jtdev wrote:
       | I'd love to see a good primer on data models and scenarios that
       | are well suited to FDB.
        
         | selljamhere wrote:
         | Their docs might be a good place to start.
         | https://apple.github.io/foundationdb/developer-guide.html#da...
        
         | sigstoat wrote:
         | this is limited by your creativity and willingness to make
         | tradeoffs.
         | 
         | the only really general statement i can think of is that the
         | "larger"/"longer" your transactions are, the harder a time
         | you'll have getting it to cooperate with FDB. "small"/"fast"
         | transactions will be easier to fit into its model.
         | 
         | (to likely replies: this isn't an absolute, see all the quotes.
         | yes things like redwood will alleviate some of this, but not
         | all.)
        
           | vvern wrote:
           | IIRC fdb is fully optimistic concurrency control. It doesn't
           | do any locking. If you have workloads which are highly
           | contended, you'll need to do something in the layer above to
           | coordinate. Otherwise, performance will be unbearable.
           | 
           | This may be out-dated, please let me know if the story has
           | evolved here.
        
       | georgelyon wrote:
       | FDB is an awesome and unique piece of software (I attribute quite
       | a bit of Snowflake's success to FDB). I've also had the pleasure
       | of meeting some folks from the original team and they are true
       | engineers. Does anyone know if/when Redwood (the new storage
       | engine) has landed / will land?
        
         | victor106 wrote:
         | > I attribute quite a bit of Snowflake's success to FDB
         | 
         | How so?
        
           | foobiekr wrote:
           | Snowflake is the biggest deployment of fdb in the world after
           | iCloud.
        
         | kendallgclark wrote:
         | Founders are building a distributed systems simulation product
         | now called Antithesis. My data fabric startup, Stardog, is a
         | happy Antithesis early adopter customer. It's helping us
         | reproduce and fix non-deterministic bugs deterministically.
         | Good stuff.
        
       | twoodfin wrote:
       | Did they ever implement a SQL layer? They seemed like one of the
       | only NoSQL products with the architecture to make it plausible to
       | do so.
        
       | polskibus wrote:
       | What is the backup / restore story in FoundationDB? How does it
       | compare to postgresql?
        
         | ex3ndr wrote:
         | Much much better. Single line backup/restore and Disaster
         | Recovery mode that syncs second DC and able too switch on the
         | fly with barely any configs (except one file).
        
       | e12e wrote:
       | This seems like a good place to ask - are there any new and
       | exiting FOSS "application" worth checking out? I recall from the
       | initial publication of the source - there was references to a
       | great sql layer? I don't know if a FOSS work-a-like ever
       | materialized? Other things I'd hoped for was a network
       | filesystem/blob layer, like maybe s3/nfs/webdavfs compatible?
       | What are people building on top of foundationdb today?
       | 
       | Ed: i suppose various document/db applications - like IMAP might
       | be a good fit too?
        
         | jFriedensreich wrote:
         | large unstructured blobs and large files are among the things
         | not well suited to foundationdb and couchdb 4 actually reduced
         | supported blob size in the transition to foundationdb. it looks
         | like object/blob storage systems are at the moment rather
         | seperating more from key/value and document storage than
         | growing together. but this is a good thing because the
         | tradeoffs are very different and it allows each system to focus
         | on what it does best. blob stores will hopefully move even more
         | to content addressing and merkle dag similar to git and ipfs.
        
         | agency wrote:
         | I'm curious about this as well. Is anyone working on building
         | text search on top of FDB? It's kind of astounding to me that
         | last time I checked Elasticsearch was still essentially the
         | only game in town.
        
           | jFriedensreich wrote:
           | its pretty hard to catch up with lucene, there is just so
           | much work, features and brainpower in there at this point. as
           | many features of foundationdb such as the transaction
           | guarantees and reliability are not super important for
           | fulltext search i cannot imagine any company even apple or
           | ibm being able to justify that gigantic investment, instead
           | im sure nearly any soluion willcontinue to use lucene under
           | the hood for the forseeable future.
        
         | sigstoat wrote:
         | peruse the fdb forum. they produce document and record layers
         | now. there are community layers of varying quality for a
         | network block device, a filesystem, and a few other things.
        
       | AtlasBarfed wrote:
       | They got acquihired by apple, didn't they? Was. Fdb ever oss'd?
       | 
       | Is it CP or AP? Comments seem to imply AP
        
         | ssgao wrote:
         | FoundationDB is Apache 2.0
         | https://github.com/apple/foundationdb/blob/master/LICENSE
         | 
         | It is CP per https://apple.github.io/foundationdb/cap-
         | theorem.html
        
         | kendallgclark wrote:
         | It wasn't an acquihire. Apple paid a lot of $$ for FDB.
        
       | [deleted]
        
       | ryanworl wrote:
       | Two quotes from the paper that I think will motivate people to
       | read it:
       | 
       | "Rigorous correctness testing via simulation makes FDB extremely
       | reliable. In the past several years, CloudKit [59] has deployed
       | FDB for more than 0.5M disk years without a single data
       | corruption event. Additionally, we constantly perform data
       | consistency checks by comparing replicas of data records and
       | making sure they are the same. To this date, no inconsistent data
       | replicas have ever been found in our production clusters."
       | 
       | "For example, early versions of FDB depended on Apache Zookeeper
       | for coordination, which was deleted after real-world fault
       | injection found two independent bugs in Zookeeper (circa 2010)
       | and was replaced by a de novo Paxos implementation written in
       | Flow. No production bugs have ever been reported since."
        
         | jeffbee wrote:
         | Ehhhh, doesn't align with my experience. I think FDB is
         | actually really poorly tested. When I was evaluating it for
         | replacement of the metadata key-value store at a major, public
         | web services company we found that injecting faults into
         | virtual NVMe devices on individual replicas would cause corrupt
         | results returned to clients. We also found that it would just
         | crash-loop on Linux systems with huge pages, because although
         | someone from the project had written a huge-page-aware C++
         | allocator "for performance", evidently nobody had ever actually
         | tried to use it, including the author.
         | 
         | It's also really, really weird that their non-scalable
         | architecture hits a brick wall at 25 machines. Ignoring the
         | correctness flaws, it only works if you can either design
         | around that limit by sharding, and never off cross-shard
         | transactions, or if you can assure yourself that your use case
         | will never outgrow half a rack of equipment.
        
           | fnordpiglet wrote:
           | Can you fix a point in time? Software evolves and I think a
           | point I saw is that it wasn't well tested then they changed
           | once production workloads told them it needs to change.
        
           | bpicolo wrote:
           | What were the strong contenders?
        
           | rbranson wrote:
           | Were there other distributed databases that did pass the
           | fault injection testing?
        
             | jeffbee wrote:
             | There weren't any, which is why that particular shop
             | elected to roll their own distributed system on top of
             | rocks.
             | 
             | In general I think people who think they want to do
             | FoundationDB owe themselves a serious contemplation of the
             | cost/benefit of using Cloud Spanner instead. Obviously you
             | cannot do your own fault injection testing of Spanner, but
             | it does have end-to-end checksums.
        
               | sigstoat wrote:
               | > There weren't any, which is why that particular shop
               | elected to roll their own distributed system on top of
               | rocks.
               | 
               | that's nuts. rocks could've been added as a storage
               | engine to fdb far more easily.
        
               | ryanworl wrote:
               | This is currently in progress right now.
               | 
               | https://github.com/apple/foundationdb/blob/e7d7b39f12afa8
               | ea2...
        
               | jeffbee wrote:
               | For the record, I said the same thing. But it's a
               | management problem because on the one hand you have a
               | known open project with demonstrable flaws, and on the
               | other you have your own in-house developers and you will
               | tend to discount the bugs they haven't written yet.
               | 
               | But, also for the same record, thinking you can implement
               | a reliable, globally-replicated key-value store on top of
               | FoundationDB that is cheaper and better than Cloud
               | Spanner may be evidence of the same cognitive bias.
        
               | sigstoat wrote:
               | > But, also for the same record, thinking you can
               | implement a reliable, globally-replicated key-value store
               | on top of FoundationDB that is cheaper and better than
               | Cloud Spanner may be evidence of the same cognitive bias.
               | 
               | man, good thing nobody made any claim like that.
        
         | sandinmyjoints wrote:
         | What is the Flow referred to here?
        
           | oconnor663 wrote:
           | It's an async/await framework for C++. I'm not sure what the
           | best source on this is, but here's a discussion:
           | https://forums.foundationdb.org/t/why-was-flow-
           | developed/171...
           | 
           | My understanding is that FDB relies heavily on deterministic
           | simulations for testing, and that their async/await model is
           | a big part of how they make sure they cover different
           | possible interleavings in a deterministic way.
        
         | jorangreef wrote:
         | Thanks for the quotes, I've been wanting to read this paper for
         | some time. Great to see they went through the consensus
         | literature and made a decision to go with Active Disk Paxos,
         | instead of stopping short and not fully understanding the
         | consensus they're building on. The consensus and replication
         | protocol is such a huge part of building a distributed
         | database.
        
         | fizwhiz wrote:
         | > de novo Paxos implementation written in Flow
         | 
         | That's... brave. Flow is a DSL built on top of C++?
        
           | alistairw wrote:
           | Yeah it's their own language on top of c++ to help them with
           | testing distributed systems with deterministic simulation.
           | 
           | Their talk from a while ago about it was something that
           | really blew me away at the time [0]
           | 
           | [0] https://www.youtube.com/watch?v=4fFDFbi3toc
        
       | monstrado wrote:
       | Have nothing but praise for FoundationDB. It has been by far the
       | most rock solid distributed database I have ever had the pleasure
       | of using. I used to manage HBase clusters, and the fact that I
       | have never once had to worry about manually splitting "regions"
       | is such a boon for administration...let alone JVM GC tuning.
       | 
       | We run several FDB clusters using 3-DC replication and have never
       | once lost data. I remember when we wanted to replace all of the
       | FDB hardware (one cluster) in AWS, and so we just doubled the
       | cluster size, waited for data shuffling to calm down, and just
       | started axing the original hardware. We did this all while
       | performing over 100K production TPS.
       | 
       | One thing that makes the above seamless for all existing
       | connections is that clients automatically update their "cluster
       | file" in the event that new coordinators join or are reassigned.
       | That alone is amazing...as you don't have to track down every
       | single client and change / re-roll with new connection
       | parameters.
       | 
       | Anyway, I talk this database up every chance I get. Keep up the
       | awesome work.
       | 
       | - A very happy user.
        
       ___________________________________________________________________
       (page generated 2021-06-07 23:00 UTC)