[HN Gopher] FoundationDB: A Distributed Key-Value Store
       ___________________________________________________________________
        
       FoundationDB: A Distributed Key-Value Store
        
       Author : eatonphil
       Score  : 218 points
       Date   : 2023-07-03 13:34 UTC (9 hours ago)
        
 (HTM) web link (cacm.acm.org)
 (TXT) w3m dump (cacm.acm.org)
        
       | debussyman wrote:
       | I worked next to the founders a decade ago and tried the first
       | versions of the project (before Apple acq). Loved the concept,
       | but it hasn't really lived up to the promise.
        
       | neftaly wrote:
       | I've been tooling around with "Tuple Database", which claims to
       | be FoundationDB for the frontend (by the original dev of Notion).
       | 
       | https://github.com/ccorcos/tuple-database/
       | 
       | I have found it conceptually similar to Relic or Datascript, but
       | with strong preformance guarantees - something Relic considers a
       | potential issue. It also solves the problem of using reactive
       | queries to trigger things like popups and fullscreen requests,
       | which must be run in the same event loop as user input.
       | 
       | https://github.com/wotbrew/relic
       | https://github.com/tonsky/datascript
       | 
       | Having a full (fast!) database as my React state manager gives me
       | LAMP nostalgia :)
        
       | jwr wrote:
       | FoundationDB is absolutely incredible and I've been wondering why
       | it doesn't get more popular over time. I suspect it's too complex
       | to use directly in most applications, with people used to SQL-
       | based solutions or simple KV stores.
       | 
       | I always wanted my app to use a fully distributed database (for
       | redundancy). I've been using RethinkDB in production for over 8
       | years now. I'm slowly rebuilding my app to use FoundationDB.
       | 
       | What I discovered when I started using FDB surprised me a bit. To
       | make really good use of the database you can't really use a
       | "database layer" and fully abstract it away from your app. Your
       | code should be fully aware of transaction boundaries, for
       | example. To make good use of versionstamps (an incredible
       | feature) your code needs to be somewhat aware of them.
       | 
       | I think FDB is a great candidate for implementing a "user-
       | friendly" database on top of it, and in fact several databases
       | are doing exactly that (using FDB as a "lower layer"). But that
       | abstracts away too much, at least for me.
       | 
       | The superficial take on FDB is "waah, where are my features? it
       | doesn't do indexing? waaah, just use Postgres!".
       | 
       | But when you actually start mapping your app's data structures
       | onto FDB optimally, you discover a whole new world. For example,
       | I ended up writing my indexing code myself, in my language
       | (Clojure). FDB gives you all the tools, and a strict serializable
       | data model to work with -- your language brings _your_ data
       | structures and _your_ indexing functions. The combination is
       | incredible. Once you define your index functions in your
       | language, you will never want to look at SQL again. Plus, you get
       | incredible features like versionstamps -- I use them to replace
       | RethinkDB changefeeds and implement super quick polling for
       | recent changes.
       | 
       | Oh, and did I mention that it is a fully distributed database
       | that _correctly_ implements the strict serializable consistency
       | model? There are _very_ few dbs that can claim that. If you
       | understand what that means, you probably know how incredible this
       | is and I don 't have to convince you. If you _think_ you
       | understand, I suggest you go and explore
       | https://jepsen.io/consistency -- carefully reading and learning
       | about the differences in various consistency models.
       | 
       | I really worry that FoundationDB will not become popular because
       | of its inherent complexity, while worse solutions (ahem, MongoDB)
       | will be more fashionable.
       | 
       | I would encourage everyone to at least take a look at FDB. It
       | really is something quite different.
        
         | manish_gill wrote:
         | Seems like a very similar use-case like Zookeeper - use it for
         | distributed coordination / consistency etc and build your
         | actual database on top of it.
        
       | brainzap wrote:
       | I think FoundationDB will also have parts written in Swift, at
       | least that is what Apple showed at WWDC.
        
         | crabmusket wrote:
         | That was Foundation, not FoundationDB.
         | https://developer.apple.com/documentation/foundation
        
           | jen20 wrote:
           | No, it was FoundationDB [1].
           | 
           | [1]: https://developer.apple.com/videos/play/wwdc2023/10164/?
           | time...
        
       | romanhn wrote:
       | Back in 2014 or so, I saw the FoundationDB team demo the product
       | at a developer conference. They had the database running across a
       | bunch of machines, with a visual showing their health and data
       | distribution. One team member would then be turning machines on
       | and off (or maybe unplugging them from the network) and you could
       | see FDB effortlessly rebalancing the data across the available
       | nodes. It was a very striking, impressive presentation
       | (especially as we were dealing with the challenges of distributed
       | Cassandra at the time).
        
         | boxcarr wrote:
         | When I saw the post about Foundation DB, I remembered the exact
         | same demo running on a cluster of Raspberry Pi instances!
         | Sadly, no memory of it on YouTube.
        
           | boxcarr wrote:
           | Wayback Machine partially to the rescue. Found the page (http
           | s://web.archive.org/web/20150325003301/https://foundatio...),
           | but the Vimeo video was nuked when foundationdb.com shutdown.
           | Here's the HN thread about that demo:
           | 
           | https://news.ycombinator.com/item?id=5739721
        
             | romanhn wrote:
             | I feel like I saw something a bit more refined (I recall
             | node statuses aggregated on one cool UI), so this may have
             | been an earlier iteration, but the beginning of the
             | following video has some of what we're talking about:
             | https://youtu.be/Nrb3LN7X1Pg
        
       | jbverschoor wrote:
       | Around 2010-2013 (gaming), I found fdb, and to me it seemed like
       | the perfect database because of their architecture. I tried it a
       | bit, and was really happy with it.
       | 
       | Unfortunately they were acquired by Apple, only to resurface
       | something like 10 years later. All momentum was gone, and I'm not
       | really aware nor interested in where they stand. I'll stick with
       | my rusty old Postgres for a long time before I'd try anything
       | else out.
        
       | mlerner wrote:
       | Really neat paper - a while ago I wrote a summary of the system:
       | https://www.micahlerner.com/2021/06/12/foundationdb-a-distri...
        
       | [deleted]
        
       | mrtracy wrote:
       | FoundationDB has, in my experience, always been well regarded in
       | DB development circles; I think their test architecture -
       | developed to easily reproduce rare concurrency failures - is its
       | best legacy, as mentioned in a comment above and frequently
       | before.
       | 
       | However, since these topics are always filled with effusive
       | praise in the comments, let me give an example of a distributed
       | scenario where FDB has shortcomings: OLTP SQL.
       | 
       | First, FDB is clearly designed for "read often, update rarely"
       | workloads, in a relative sense. It produces multiple consistent
       | replicas which are consistently queryable at a past time stamp,
       | without a transaction - excellent for that profile. However, its
       | transaction consistency method is both optimistic and
       | centralized, and can lead to difficulty writing during high
       | contention and (brief) system-wide transaction downtime if there
       | is a failover; while it will work, it's not optimal for "write
       | often, read once" workloads.
       | 
       | Secondly, while it is an _ordered_ key value store - facilitating
       | building SQL on top of it - the popular thought of layering SQL
       | _on top of the distributed layer_ comes with many shortcomings.
       | 
       | My key example of this is schema changes. Optimistic application,
       | and keeping schema information entirely "above" the transaction
       | layer, can make it extremely slow to apply changes to large
       | tables, and possibly require taking them partially offline during
       | the update. There are ways to manage this, but online schema
       | changes will be a competitive advantage for other systems.
       | 
       | Even for read-only queries, you lose opportunities to push many
       | types of predicates down to the storage node, where they can be
       | executed with fewer round trips. Depending on how distributed
       | your system is, this could add up to significant additional
       | latency.
       | 
       | Afaik, all of the spanner-likes of the world push significant
       | schema-specific information into their transaction layers - and
       | utilize pessimistic locking - to facilitate these scenarios with
       | competitive performance.
       | 
       | For reasons like these, I think FDB will find (and has found) the
       | most success in warehousing scenarios, where individual datum are
       | queried often once written, and updates come in at a slower pace
       | than the reads.
        
         | mike_hearn wrote:
         | You can do online schema changes with FDB, it all depends on
         | what you do with the FDB primitives.
         | 
         | A great example of how to best utilize FDB is Permazen [1],
         | described well in its white paper [2].
         | 
         | Permazen is a Java library, so it can be utilized from any JVM
         | language e.g. via Truffle you get Python, JavaScript, Ruby,
         | WASM + any bytecode language. It supports any sorted K/V
         | backend so you can build and test locally with a simple disk or
         | in memory impl, or RocksDB, or even a regular SQL database.
         | Then you can point it at FoundationDB later when you're ready
         | for scaling.
         | 
         | Permazen is _not_ a SQL implementation. Instead it 's "language
         | integrated" meaning you write queries using the Java
         | collections library and some helpers, in particular,
         | NavigableSet and NavigableMap. In effect you write and hard
         | code your query plans. However, for this you get many of the
         | same features an RDBMS would have and then some more, for
         | example you get indexes, indexes with compound keys, strongly
         | typed and enforced schemas with ONLINE updates, strong type
         | safety during schema changes (which are allowed to be
         | arbitrary), sophisticated transaction support, tight control
         | over caching and transactional "copy out", watching fields or
         | objects for changes, constraints and the equivalent of foreign
         | key constraints with better validation semantics than what JPA
         | or SQL gives you, you can define any custom data derivation
         | function for new kinds of "index", a CLI for ad-hoc querying,
         | and a GUI for exploration of the data.
         | 
         | Oh yes, it also has a Raft implementation, so if you want
         | multi-cluster FDB with Raft-driven failover you could do that
         | too (iirc, FDB doesn't have this out of the box).
         | 
         | And because the K/V format is stable, it has some helpers to
         | write in memory stores to byte arrays and streams, so you can
         | use it as a serialization format too.
         | 
         | FDB has something a bit like this in its Record layer, but it's
         | nowhere near as powerful or well thought out. Permazen is
         | obscure and not widely used, but it's been deployed to
         | production as part of a large US 911 dispatching system and is
         | maintained.
         | 
         | Incremental schema evolution is possible because Permazen
         | stores schema data in the K/V store, along with a version for
         | each persisted object (row), and upgrades objects on the fly
         | when they're first accessed.
         | 
         | [1] https://permazen.io/
         | 
         | [2]
         | https://cdn.jsdelivr.net/gh/permazen/permazen@master/permaze...
        
           | SamReidHughes wrote:
           | 100%. I don't have the time to read the paper but online
           | schema changes, with the ability to fail and abort the entire
           | operation if one row is invalid, are basically the same
           | problem as background index building.
           | 
           | If instead of using some generic K/V backend, it made use of
           | specific FDB features, it might be even better. Conflict
           | ranges and snapshot reads have been useful for me for some
           | background index building designs, and atomic ops have their
           | uses.
           | 
           | > Oh yes, it also has a Raft implementation, so if you want
           | multi-cluster FDB with Raft-driven failover you could do that
           | too (iirc, FDB doesn't have this out of the box).
           | 
           | I don't know what you mean by this. Multiple FDB clusters?
        
             | mike_hearn wrote:
             | It supports atomic ops and snapshot reads. Don't remember
             | about conflict ranges. It doesn't require all backends to
             | be identical, it supports a kind of graceful degradation
             | when backends don't have all the features. The creator is
             | quite keen on FDB and made sure Permazen works well with
             | it.
             | 
             | Yes multiple FDB clusters. IIRC FDB replication doesn't
             | support full geo-replication, or didn't. There's a post by
             | me about it somewhere on their forums.
        
         | Dave_Rosenthal wrote:
         | I totally agree with your high level point that there isn't a
         | great SQL (OLTP, or otherwise) layer for FoundationDB. Building
         | something like this would be very hard--but I don't think the
         | FoundationDB storage engine itself would end up inflicting the
         | limitations you mention if it was well executed. And
         | FoundationDB _was_ specifically designed for real-time
         | workloads with mixed reads /writes (i.e. the OLTP case).
         | 
         | Whether or not concurrency is optimistic (or done with locks,
         | or whatever) doesn't really have a bearing on things. Any
         | database is going to suffer if it has a bunch of updates to a
         | specific hot keys that needs to be isolated (in the ACID
         | sense). As long as your reads and writes are sufficiently
         | spread out you'll avoid lock contention/optimistic transaction
         | retries.
         | 
         | You speak to the real main limitation of FoundationDB when you
         | talk about stuff like schema changes. There is a five-second
         | transaction limit which in practice means that you cannot, for
         | example, do a single giant transaction to change every row in a
         | table. This was definitely a deliberate deliberate design
         | choice, but not one without tradeoffs. The bad side is that if
         | you want to be able to do something like this (lockout clients
         | while you migrate a table) you need a different design that
         | uses another strategy, like indirection. The good side is that
         | screwed-up transactions that lock big chunks of your DB for a
         | long time don't take down your system.
         | 
         | I find that the people who are relatively new to databases tend
         | to wish that the five second limit was gone because it makes
         | things simpler to code. People that are running them in
         | production tend to like it more because it avoids a slew of
         | production issues.
         | 
         | That said, I think for many situations a timeout like 30 or 60
         | seconds (with a warning at 10) would be a better operating
         | point rather than the default 5 second cliff.
        
           | mrtracy wrote:
           | I think that the SQL-on-top, and optimistic model, are
           | definitely things that can have a workflow-dependent
           | performance impact and are relevant.
           | 
           | All databases do suffer under some red line of write
           | contention; but optimistic databases will suffer _more_ , and
           | will start degrading at a _lower level of contention_.
           | "Avoiding contention" is database optimization table stakes,
           | and you should be structuring every schema you can to do so;
           | but hot keys are almost inevitable when a certain class of
           | real-time product scales, and they will show up in ways you
           | do not expect. When it happens, you'd like your DBMS to give
           | as much runway as possible before you have to make the tough
           | changes to break through.
           | 
           | SQL-on-top becomes an issue for geographic distribution;
           | without "pushing down" predicates, read-modify-write
           | workloads, table joins, etc. on the client can incur
           | significant round-trip time issuing queries. I think the lack
           | of this is always going to present a persistent disadvantage
           | vs selecting a competitor.
           | 
           | And again, given FDBs multiple-full-secondary model, it's
           | only a problem when working in real time, slower queries can
           | work off a local secondary. But latest-data-latency is
           | relevant for many applications.
        
           | aseipp wrote:
           | FWIW, I believe read transactions are unlimited in duration
           | now that the Redwood engine has been available. But I haven't
           | tested Redwood myself. Write transactions are still
           | definitely limited to 5 seconds, though.
        
         | gregwebs wrote:
         | TiDB uses TiKV as an equivalent to foundationDB. It supports
         | online migrations and pushing down read queries to the kv
         | later. It also defaults to optimistic locking, but supports
         | pessimistic. It also doesn't have a five second rate
         | transaction limit. a SQL layer on top of foundation DB could
         | probably solve all these problems and it wouldn't be novel.
        
         | preseinger wrote:
         | do you think the things you mention were deliberate design
         | decisions?
        
           | mike_hearn wrote:
           | Yes, one of the nice things about FDB is it has extensive
           | design docs. Optimizing for reading more often than writing
           | is obviously a pretty normal design choice, outside of log
           | ingestion you'll normally be reading more than writing. There
           | are people using FDB for logs (snowflake iirc?) and it's been
           | optimized for that sort of use case more in recent years, but
           | it's not like it was an unreasonable choice.
        
             | aseipp wrote:
             | Snowflake uses FoundationDB for warehouse metadata in the
             | control plane, IIRC. It is not in the data plane path for
             | log ingestion or other warehousing tech. That said the
             | control plane is, uh, pretty important!
        
           | mrtracy wrote:
           | They absolutely were, yes. There are very valuable
           | application profiles where FoundationDB's design is
           | excellent, and you can see that from its internal usage at
           | large companies like Apple and Snowflake.
        
       | monstrado wrote:
       | I built an online / mutable time-series database using FDB a few
       | years back at a previous company. Not only was it rock solid, but
       | it scaled linearly pretty effortlessly. It truly is one of novel
       | modern pieces of technologies out there, and I wish there were
       | more layers built on top of it.
        
       | georgelyon wrote:
       | FoundationDB is a truly one-of-a-kind bit of technology. Others
       | have already linked to the testing methodology that allows them
       | to run orders of magnitude more database hours in test than have
       | run in production: https://www.youtube.com/watch?v=4fFDFbi3toc
       | 
       | A less known but also great talk is the follow which talked about
       | what the a few of the team worked on next, effectively trying to
       | generalize the methodology to any computer program:
       | https://www.youtube.com/watch?v=fFSPwJFXVlw
       | 
       | I liken the approach to being able to fuzz the execution space of
       | the program, not just the inputs.
        
       | [deleted]
        
       | jeffbee wrote:
       | How hard have people pushed this thing? We get regular threads of
       | effusive praise, but little criticism. Last time I mentioned that
       | years ago my colleagues found half a dozen ways to lose data in
       | FDB I got called out here and even in private emails, but it
       | seems more valuable to know where the limits of these systems
       | are, and not very valuable to read the positive feelings of
       | people who used FDB in trivial and uncritical ways.
        
         | ryanworl wrote:
         | FoundationDB is used at Datadog as the metadata store for
         | Husky, the storage and query engine powering a significant
         | number of Datadog products, such as logs, network performance
         | monitoring, and trace analytics.
         | 
         | 1. https://www.datadoghq.com/blog/engineering/introducing-
         | husky...
         | 
         | 2. https://www.datadoghq.com/blog/engineering/husky-deep-dive/
         | 
         | 3. https://www.youtube.com/watch?v=mNneCaZewTg
         | 
         | 4. https://www.youtube.com/watch?v=1-zo9jqdRZU
         | 
         | I was involved with this project from the beginning and it
         | would've taken significantly longer to deliver without
         | FoundationDB.
        
           | jeffbee wrote:
           | I know there are multiple companies that use it. The question
           | is not whether people put things into FDB. The question is
           | whether anyone has checked to see if their junk was still
           | there later. I don't consider large scale deployments to be
           | proof of anything. When I worked on Gmail we were still
           | finding data-loss bugs in either BigTable or Colossus
           | regularly, even after those systems had been the largest
           | datastores on the planet for many years.
        
             | [deleted]
        
         | eatonphil wrote:
         | Is Snowflake big enough of a deal?
         | 
         | https://news.ycombinator.com/item?id=16880404
         | 
         | Also, in the post itself, authors including Apple and Snowflake
         | devs, it mentions it's run in production by Apple and
         | Snowflake.
         | 
         | I haven't seen yet though what Apple uses it for.
        
           | tilolebo wrote:
           | It is used by CloudKit
           | 
           | https://machinelearning.apple.com/research/foundationdb-
           | reco...
        
           | jeffbee wrote:
           | The time at which my colleagues found easy ways to lose data
           | was well after Apple had claimed to use it in iCloud at
           | scale. So, I don't think deployment at scale is a proof of
           | correctness. The thing that needs doing is regularly looking
           | in the database for things that should be there.
        
             | endisneigh wrote:
             | I'm curious - could you elaborate on the circumstances?
             | Like the version of FDB, cluster size, network
             | circumstances, etc?
        
         | Dave_Rosenthal wrote:
         | Yes, there are definitely a lot of big companies that have used
         | FoundationDB very hard at huge scale for many years. That said,
         | yeah, it feels like there are also a lot of folks on HN who
         | just jump on the "cool, fault simulation" bandwagon and don't
         | have a lot of personal real-world experience.
         | 
         | What I can tell you, for sure, is that if you find an issue
         | with something as important and fundamental as data loss the
         | team working on FoundationDB would take it super seriously.
        
       | cetinsert wrote:
       | https://deno.com/deploy is building
       | 
       | https://deno.com/kv on FoundationDB!
        
       | qaq wrote:
       | Suprised none used it as a foundation for a NewSQL DB, the thing
       | is battle tested and actively developed by Apple and Snowflake.
        
         | danpalmer wrote:
         | I think I remember the FDB team developing one that was closed
         | source back before their acquisition. I thought the business
         | model was going to be open core and closed, paid, layers on
         | top. I seem to remember them benchmarking the SQL layer and it
         | being highly performant still, despite the complexity it added.
         | 
         | Maybe this thing still exists in close source form at Apple? It
         | wouldn't surprise me if it does and forms the basis of a
         | Spanner alternative, they're big enough to need it. Or maybe
         | they canned it pre/post acquisition.
         | 
         | Edit: ah, you've already mentioned the closed source layer that
         | exists at Apple. There we go!
        
         | endisneigh wrote:
         | There's https://www.tigrisdata.com/
         | 
         | It's similar to mongo (it's nosql)
        
           | eatonphil wrote:
           | Not a NewSQL database though as GP mentioned. I don't think
           | Tigris has a SQL layer.
        
             | endisneigh wrote:
             | Yes I know. I explicitly said it was similar to mongo. Just
             | responding to the bit that it's battle tested and used as a
             | foundation (no pun intended) for another db. As far as I
             | know it's the only database that has a company around it
             | that is using FDB
        
           | qaq wrote:
           | there was poc of sqlite on top of FDB. There is also sql
           | layer that Apple did not open source that they use at scale.
           | Just seems a wasted opportunity.
        
             | endisneigh wrote:
             | It's because you introduce a lot of latency. Cockroachdb
             | for example (which is a great db) has a lot of latency
             | compared to Postgres.
             | 
             | At the time of its release it was probably hard to justify
             | having an order of magnitude more latency than competitors
             | (of course they were not fault tolerant, but still).
        
               | riku_iki wrote:
               | hypothetically, you can run cocroach with replication
               | factor 1, and have also low latency and apples to apples
               | comparison.
        
         | canadiantim wrote:
         | I know some people have had success using FoundationDB as a KV
         | store with SurrealDB[1]
         | 
         | [1] https://github.com/orgs/surrealdb/discussions/25
        
           | qaq wrote:
           | thats a document-graph database though
        
       | endisneigh wrote:
       | I've been using FDB for toy projects for a while. It's truly rock
       | solid. In my experience it's the best open source database I've
       | used, including mariadb, Postgres and cockroach. That being said,
       | I wish there were more layers as the functionality out of the box
       | is very very limited.
       | 
       | Ideally someone could implement the firestore or dynamodb api on
       | top.
       | 
       | https://github.com/losfair/mvsqlite
       | 
       | Is basically distributed SQLite backed by FDB. I've been scared
       | to use it since I don't know rust and can't attest to if mvcc had
       | been implemented correctly.
       | 
       | In using this I actually realized how coupled the storage engine
       | is to the storage system and how few open source projects make
       | the storage engine easily swap-able.
        
         | fhrow4484 wrote:
         | > That being said, I wish there were more layers as the
         | functionality out of the box is very very limited.
         | 
         | The record layer https://github.com/FoundationDB/fdb-record-
         | layer which allows to store protobuf, and define the primary
         | keys and index directly on those proto fields is truly amazing:
         | 
         | https://github.com/FoundationDB/fdb-record-layer/blob/main/d...
        
         | facu17y wrote:
         | mvcc is already taken care of by fdb, no?
        
           | endisneigh wrote:
           | Yea, but mvsqlite implements its own to get around the
           | limitations around transactions.
        
         | tommiegannert wrote:
         | I really wanted to use FoundationDB for building a graph
         | database, but was taken aback by the limitations in record
         | (10+100 kB) and somewhat transaction sizes (10 MB) [1]. And the
         | documentation [2] doesn't really give any answers than "build
         | it yourself."
         | 
         | mvsqlite seems to improve the transaction size [3], which is
         | nice. Does it also improve the key/value limitations?
         | 
         | > Transaction size cannot exceed 10,000,000 bytes of affected
         | data. [---] Keys cannot exceed 10,000 bytes in size. Values
         | cannot exceed 100,000 bytes in size.
         | 
         | [1] https://apple.github.io/foundationdb/known-limitations.html
         | 
         | [2] https://apple.github.io/foundationdb/largeval.html
        
           | aseipp wrote:
           | Transaction size and duration is limited to keep the latency
           | and throughput of the system manageable under load, from my
           | understanding. It makes sense to some degree even with no
           | background in the design; if you are serving X/rps with a
           | latency of Y milliseconds, using Z resources, and you double
           | Y, you now need to double your resources Z as well, to serve
           | the same amount of clients. You always hit a cap somewhere,
           | so if you want consistent throughput and latency, it's maybe
           | not a bad tradeoff.
           | 
           | mvsqlite fixes the transaction size through its own
           | transaction layer, from my understanding; I don't know how
           | that would impact performance. The 10kb/100Kb key value limit
           | is probably not fixable in any way, but it's not really a
           | huge problem as a user in practice for FDB because you can
           | just shard the value across two keys in a consistent
           | transaction and it's fine. 10 kilobyte keys have pretty much
           | never ever been an issue in my cases either; you can
           | typically just do something like hash a really big key before
           | insert and use that.
        
       | tanepiper wrote:
       | A few years ago I was working at an agency, one of their teams
       | was building a real-time gaming system on top of FoundationDB.
       | 
       | Apple then bought it up and shut the open source down. They had
       | to rebuild whole layers from scratch.
        
         | Dave_Rosenthal wrote:
         | Yeah, that sucked for sure and we hated to disappoint people
         | like that (co-founder here). But you have it exactly backwards.
         | FoundationDB was never open source. There was a binary that you
         | could download and use as a trial, or you could buy a license
         | for real use. The users that bought licenses got to keep using
         | those licenses. Some of those customers went on to build
         | billion-dollar businesses on top of FoundationDB (Snowflake!) A
         | few years after acquiring the tech Apple themselves open
         | sourced it (!) so now it is open source. The big challenge for
         | users is that most of the sophisticated "layers" that make the
         | tech into more of an easy-to-use database rather than just a
         | storage engine are still proprietary.
        
         | 58028641 wrote:
         | As far as I can tell, FoundationDB was never open source until
         | Apple open sourced it.
        
           | tanepiper wrote:
           | At least one reference on here from 2018 -
           | https://news.ycombinator.com/item?id=16878786
           | 
           | And here's a news story - https://www.forbes.com/sites/benkep
           | es/2015/03/25/a-cautionar...
        
             | detaro wrote:
             | ... which both state that it wasnt open-source before the
             | apple buyout.
        
             | endisneigh wrote:
             | It was never open source before apple. Rather the binary
             | was freely available to be used. When apple bought them
             | they took it away but continued to support customers with
             | contracts. In that way it was inaccessible until it was
             | open sourced.
        
               | metadat wrote:
               | Yes, I got bitten by this and will never forget- FDB
               | abruptly shut off public access in mid-2015. Fortunately
               | for me, it only cost half a day to migrate my system to
               | Postgres.
        
           | stephenr wrote:
           | Hey now, don't let verifiable facts and observed history get
           | in the way of a chance to bash Apple.
        
       | hadjian wrote:
       | His jokes are hilarious!
        
       | mprime1 wrote:
       | Part of the FDB team (great folks) went on to create something
       | quite incredible I have the pleasure of having early access to.
       | If you're into dependability check this out:
       | https://antithesis.com/
        
       | leetrout wrote:
       | What always fascinated me is they built the simulator for the
       | database first(ish) and relied on it as a first class citizen
       | while building the DB:
       | 
       | https://www.youtube.com/watch?v=4fFDFbi3toc
       | 
       | > We wanted FoundationDB to survive failures of machines,
       | networks, disks, clocks, racks, data centers, file systems, etc.,
       | so we created a simulation framework closely tied to Flow. By
       | replacing physical interfaces with shims, replacing the main
       | epoll-based run loop with a time-based simulation, and running
       | multiple logical processes as concurrent Flow Actors, Simulation
       | is able to conduct a deterministic simulation of an entire
       | FoundationDB cluster within a single-thread! Even better, we are
       | able to execute this simulation in a deterministic way, enabling
       | us to reproduce problems and add instrumentation ex post facto.
       | This incredible capability enabled us to build FoundationDB
       | exclusively in simulation for the first 18 months and ensure
       | exceptional fault tolerance long before it sent its first real
       | network packet. For a database with as strong a contract as the
       | FoundationDB, testing is crucial, and over the years we have run
       | the equivalent of a trillion CPU-hours of simulated stress
       | testing.
       | 
       | https://pierrezemb.fr/posts/notes-about-foundationdb/
        
         | riwsky wrote:
         | "The Jepsen is coming... from INSIDE THE HOUSE!"
        
         | AaronFriel wrote:
         | When I was writing a Haskell client library for Hyperdex,
         | another distributed kv store, I found it incredibly helpful to
         | implement a simulator for correctness. This helped me identify
         | which behavior was unspecified (arithmetic overflow: should
         | error) or where my simulator deviated.
         | 
         | https://github.com/AaronFriel/hyhac/blob/master/test/Test/Hy...
         | 
         | Alas, I think Hyperdex development paused a few years later.
         | It's a shame that it stopped then.
        
         | pavlov wrote:
         | For some types of distributed systems, you can do this kind of
         | simulated testing in advance by building a TLA+ model.
         | 
         | It's not a full-blown simulator (because generally the
         | application code doesn't even exist yet when you're building
         | the TLA+ model). But it can let you collect data and validate
         | assumptions about your design before writing a single line of
         | code.
        
           | rockwotj wrote:
           | My beef with TLA+ is that it's not the same code, so while
           | you're testing the design yes, you aren't testing the
           | implementation of the design, which is just as important (if
           | not harder too) to get right.
        
             | aseipp wrote:
             | Yes, but there really aren't too many good solutions to
             | that that aren't either extremely language or domain
             | specific. And if you're careful you can get a lot of direct
             | mileage out of it. For example, MongoDB (yes, that one!)
             | used it in the development of their Atlas system and has a
             | paper about using TLA+ to model the system, characterize
             | behaviors, then generate compilable-code test cases from
             | those minimal set of behaviors -- which are then directly
             | linked against the core internals of the Atlas codebase as
             | a client library. They then run those tests and re-generate
             | them when the model changes. "Model based test case
             | generation" is the strategy here. So you can characterize
             | what happens in split brain scenarios, state machine
             | transition failures (conflicting transactions), etc.
             | 
             | In reality the design stage is a pretty critical phase so
             | you need all the help you can get, so even if you don't
             | like TLA+ you're way better off than not modeling at all.
             | 
             | As an example of the language specific thing, though,
             | there's a library for Haskell I like that's very cool,
             | called Spectacle, which also implements the temporal logic
             | of TLA+ along with a model checker, but as a Haskell DSL.
             | An interesting benefit of this is that you can model check
             | actual real Haskell code that runs e.g. in your services,
             | but I haven't taken this very far. There are also
             | alternative solutions like Stateright for Rust. But again,
             | not everyone has the benefit of these...
        
               | bigfish24 wrote:
               | Do you know the paper for Atlas?
        
               | aseipp wrote:
               | Yes, I managed to find it: "eXtreme Modeling in Practice"
               | https://arxiv.org/pdf/2006.00915.pdf
               | 
               | Unfortunately I got the product wrong; it was not Atlas,
               | it was Realm Sync. All of the test-case generation stuff
               | is in Section 5.
        
             | pavlov wrote:
             | Yes, the model is more like an executable form of
             | documentation. There's no guarantee that code comments
             | match what the code actually does; similarly there's no
             | guarantee that the TLA+ model matches what the system does.
             | 
             | Documentation is still generally useful, and so is a model.
             | You have to be committed to keeping both up to date as the
             | code evolves.
        
         | falsandtru wrote:
         | I'm loving this point. The unfortunate thing is those tests are
         | closed source (I saw a maintainer says so probably in an issue
         | before). It seems testable but still seems to be closed source.
         | So we cannot fork the project even if FDB becomes totally
         | closed source again.
        
           | aseipp wrote:
           | No, the simulation harness and tests are open source and you
           | can run them. It would be impossible for anyone to contribute
           | anyway without it, for example, Snowflake, which heavily
           | depends on it. It's built into the server binary directly, so
           | the same code is always used, and it's simply a different
           | operational mode when compared to the real server. I used to
           | have a project to do lots of simulation runs on my big 32
           | core server and then aggregate the logs into clickhouse for
           | analysis. It wasn't that hard.
           | 
           | However, they (at least at the time most of the developers
           | were at Apple, many have now moved to Snowflake and the Apple
           | team has grown a little I think) haven't released or
           | integrated their nightly cluster and performance testing
           | systems into open, nor have they integrated them with GitHub
           | Actions or Nightly runs or anything. My understanding is that
           | this is "just" a lot of compute cluster/platform
           | orchestration code on top of the tests that exist in the
           | repository. So, while Apple or Snowflake integrates changes
           | across hundreds of concurrent fuzzing simulations on whatever
           | platforms they have, if you write patches yourself, you're
           | stuck with long simulation runs. Maybe that's changed; I
           | haven't kept up since the 7.0 series.
           | 
           | In practice if you write patches and they accept them, they
           | will just do the testing in their runs for you, on a cluster
           | far larger than what you could have. Failures reports will
           | tell you how to reproduce them from the test files. As a
           | contributor, testing the system on your own is mostly a
           | matter of how much money or how many CPU cores you can
           | personally stand to set on fire.
           | 
           | Someone could probably integrate this functionality into a
           | Kubernetes operator or something so that outside engineers
           | could run large scale simulations reliably. But it is really
           | expensive and CPU/compute intense, no matter how you go about
           | it.
           | 
           | [1] https://forums.foundationdb.org/t/how-to-use-
           | foundationdb-un...
           | 
           | [2] https://github.com/apple/foundationdb/tree/main/tests
        
             | falsandtru wrote:
             | Those tests are not the implementations of the tests, just
             | specifying the test case and the few options. But I found
             | the implementations. I am not sure if this is all of the
             | simulation tests, but it seems to cover the basic cases.
             | 
             | https://github.com/apple/foundationdb/tree/main/fdbserver/w
             | o...
             | 
             | > Someone could probably integrate this functionality into
             | a Kubernetes operator or something so that outside
             | engineers could run large scale simulations reliably. But
             | it is really expensive and CPU/compute intense, no matter
             | how you go about it.
             | 
             | Maybe this.
             | 
             | https://github.com/FoundationDB/fdb-joshua
        
               | aseipp wrote:
               | Yeah, that's basically an actually good implementation of
               | the pile of crap that I threw together several years ago
               | while writing a few patches. :)
               | 
               | And yes, I linked to the spec files because there
               | actually isn't that much test code written in Flow I
               | feel; the high-level specs in the .txt files can be mixed
               | and matched so much to create a lot of variety from some
               | small number of primitives, so that's really where all
               | the good stuff is. Implementation vs interface, and all
               | that.
        
         | samsquire wrote:
         | Thanks for sharing that and quoting an incredibly useful
         | snippet.
         | 
         | This is such an interesting topic!
         | 
         | Some thoughts:
         | 
         | * I wonder if the approach could be used to implement
         | debuggable replayability, with accurate tracing and profiling.
         | A bit like what verdagon is doing with Vale.
         | 
         | * It could be used to integrate the event loop with tracing
         | (rather than instrumentation with Jaegar)
         | 
         | * I really like the idea that "every object" is an event loop,
         | which reminds me of Microsoft Orleans with its actor model for
         | its grains.
         | 
         | * I am interested with actor and lightweight thread
         | architectures.
         | 
         | * I am interested in the scalabiliy of nodejs event loop
         | architecture and Win32 desktop application programming.
         | 
         | * I think this approach could be used to test and simulate
         | microservices.
         | 
         | * Approach could be used to test GUIs with React Redux reducer
         | style.
        
         | ajmurmann wrote:
         | Working on a distributed key/value store myself, I couldn't
         | agree more and think what FoundationDB did for testing from the
         | start is absolutely the way to go. Testing distributed system
         | is very tricky and tests can be incredibly time consuming and
         | bring everything to a halt.
        
       | ibotty wrote:
       | Every time I look at FoundationDB for replacing some Redis usage
       | I wonder about key expiry/TTL, look for it and find nothing.
       | 
       | Is this such a strange use case, that there is not even a blog
       | entry about it only some forum entries?
        
         | endisneigh wrote:
         | You would need to implement that yourself. Easily can be done
         | by storing tuples with your expiry date. You then could watch
         | the keys to remove expired keys automatically. FDB is very
         | barebones by design. Alternatively (and easier):
         | 
         | https://forums.foundationdb.org/t/designing-key-value-expira...
        
       ___________________________________________________________________
       (page generated 2023-07-03 23:00 UTC)