[HN Gopher] FoundationDB: A Distributed Key-Value Store ___________________________________________________________________ FoundationDB: A Distributed Key-Value Store Author : eatonphil Score : 218 points Date : 2023-07-03 13:34 UTC (9 hours ago) (HTM) web link (cacm.acm.org) (TXT) w3m dump (cacm.acm.org) | debussyman wrote: | I worked next to the founders a decade ago and tried the first | versions of the project (before Apple acq). Loved the concept, | but it hasn't really lived up to the promise. | neftaly wrote: | I've been tooling around with "Tuple Database", which claims to | be FoundationDB for the frontend (by the original dev of Notion). | | https://github.com/ccorcos/tuple-database/ | | I have found it conceptually similar to Relic or Datascript, but | with strong preformance guarantees - something Relic considers a | potential issue. It also solves the problem of using reactive | queries to trigger things like popups and fullscreen requests, | which must be run in the same event loop as user input. | | https://github.com/wotbrew/relic | https://github.com/tonsky/datascript | | Having a full (fast!) database as my React state manager gives me | LAMP nostalgia :) | jwr wrote: | FoundationDB is absolutely incredible and I've been wondering why | it doesn't get more popular over time. I suspect it's too complex | to use directly in most applications, with people used to SQL- | based solutions or simple KV stores. | | I always wanted my app to use a fully distributed database (for | redundancy). I've been using RethinkDB in production for over 8 | years now. I'm slowly rebuilding my app to use FoundationDB. | | What I discovered when I started using FDB surprised me a bit. To | make really good use of the database you can't really use a | "database layer" and fully abstract it away from your app. Your | code should be fully aware of transaction boundaries, for | example. To make good use of versionstamps (an incredible | feature) your code needs to be somewhat aware of them. | | I think FDB is a great candidate for implementing a "user- | friendly" database on top of it, and in fact several databases | are doing exactly that (using FDB as a "lower layer"). But that | abstracts away too much, at least for me. | | The superficial take on FDB is "waah, where are my features? it | doesn't do indexing? waaah, just use Postgres!". | | But when you actually start mapping your app's data structures | onto FDB optimally, you discover a whole new world. For example, | I ended up writing my indexing code myself, in my language | (Clojure). FDB gives you all the tools, and a strict serializable | data model to work with -- your language brings _your_ data | structures and _your_ indexing functions. The combination is | incredible. Once you define your index functions in your | language, you will never want to look at SQL again. Plus, you get | incredible features like versionstamps -- I use them to replace | RethinkDB changefeeds and implement super quick polling for | recent changes. | | Oh, and did I mention that it is a fully distributed database | that _correctly_ implements the strict serializable consistency | model? There are _very_ few dbs that can claim that. If you | understand what that means, you probably know how incredible this | is and I don 't have to convince you. If you _think_ you | understand, I suggest you go and explore | https://jepsen.io/consistency -- carefully reading and learning | about the differences in various consistency models. | | I really worry that FoundationDB will not become popular because | of its inherent complexity, while worse solutions (ahem, MongoDB) | will be more fashionable. | | I would encourage everyone to at least take a look at FDB. It | really is something quite different. | manish_gill wrote: | Seems like a very similar use-case like Zookeeper - use it for | distributed coordination / consistency etc and build your | actual database on top of it. | brainzap wrote: | I think FoundationDB will also have parts written in Swift, at | least that is what Apple showed at WWDC. | crabmusket wrote: | That was Foundation, not FoundationDB. | https://developer.apple.com/documentation/foundation | jen20 wrote: | No, it was FoundationDB [1]. | | [1]: https://developer.apple.com/videos/play/wwdc2023/10164/? | time... | romanhn wrote: | Back in 2014 or so, I saw the FoundationDB team demo the product | at a developer conference. They had the database running across a | bunch of machines, with a visual showing their health and data | distribution. One team member would then be turning machines on | and off (or maybe unplugging them from the network) and you could | see FDB effortlessly rebalancing the data across the available | nodes. It was a very striking, impressive presentation | (especially as we were dealing with the challenges of distributed | Cassandra at the time). | boxcarr wrote: | When I saw the post about Foundation DB, I remembered the exact | same demo running on a cluster of Raspberry Pi instances! | Sadly, no memory of it on YouTube. | boxcarr wrote: | Wayback Machine partially to the rescue. Found the page (http | s://web.archive.org/web/20150325003301/https://foundatio...), | but the Vimeo video was nuked when foundationdb.com shutdown. | Here's the HN thread about that demo: | | https://news.ycombinator.com/item?id=5739721 | romanhn wrote: | I feel like I saw something a bit more refined (I recall | node statuses aggregated on one cool UI), so this may have | been an earlier iteration, but the beginning of the | following video has some of what we're talking about: | https://youtu.be/Nrb3LN7X1Pg | jbverschoor wrote: | Around 2010-2013 (gaming), I found fdb, and to me it seemed like | the perfect database because of their architecture. I tried it a | bit, and was really happy with it. | | Unfortunately they were acquired by Apple, only to resurface | something like 10 years later. All momentum was gone, and I'm not | really aware nor interested in where they stand. I'll stick with | my rusty old Postgres for a long time before I'd try anything | else out. | mlerner wrote: | Really neat paper - a while ago I wrote a summary of the system: | https://www.micahlerner.com/2021/06/12/foundationdb-a-distri... | [deleted] | mrtracy wrote: | FoundationDB has, in my experience, always been well regarded in | DB development circles; I think their test architecture - | developed to easily reproduce rare concurrency failures - is its | best legacy, as mentioned in a comment above and frequently | before. | | However, since these topics are always filled with effusive | praise in the comments, let me give an example of a distributed | scenario where FDB has shortcomings: OLTP SQL. | | First, FDB is clearly designed for "read often, update rarely" | workloads, in a relative sense. It produces multiple consistent | replicas which are consistently queryable at a past time stamp, | without a transaction - excellent for that profile. However, its | transaction consistency method is both optimistic and | centralized, and can lead to difficulty writing during high | contention and (brief) system-wide transaction downtime if there | is a failover; while it will work, it's not optimal for "write | often, read once" workloads. | | Secondly, while it is an _ordered_ key value store - facilitating | building SQL on top of it - the popular thought of layering SQL | _on top of the distributed layer_ comes with many shortcomings. | | My key example of this is schema changes. Optimistic application, | and keeping schema information entirely "above" the transaction | layer, can make it extremely slow to apply changes to large | tables, and possibly require taking them partially offline during | the update. There are ways to manage this, but online schema | changes will be a competitive advantage for other systems. | | Even for read-only queries, you lose opportunities to push many | types of predicates down to the storage node, where they can be | executed with fewer round trips. Depending on how distributed | your system is, this could add up to significant additional | latency. | | Afaik, all of the spanner-likes of the world push significant | schema-specific information into their transaction layers - and | utilize pessimistic locking - to facilitate these scenarios with | competitive performance. | | For reasons like these, I think FDB will find (and has found) the | most success in warehousing scenarios, where individual datum are | queried often once written, and updates come in at a slower pace | than the reads. | mike_hearn wrote: | You can do online schema changes with FDB, it all depends on | what you do with the FDB primitives. | | A great example of how to best utilize FDB is Permazen [1], | described well in its white paper [2]. | | Permazen is a Java library, so it can be utilized from any JVM | language e.g. via Truffle you get Python, JavaScript, Ruby, | WASM + any bytecode language. It supports any sorted K/V | backend so you can build and test locally with a simple disk or | in memory impl, or RocksDB, or even a regular SQL database. | Then you can point it at FoundationDB later when you're ready | for scaling. | | Permazen is _not_ a SQL implementation. Instead it 's "language | integrated" meaning you write queries using the Java | collections library and some helpers, in particular, | NavigableSet and NavigableMap. In effect you write and hard | code your query plans. However, for this you get many of the | same features an RDBMS would have and then some more, for | example you get indexes, indexes with compound keys, strongly | typed and enforced schemas with ONLINE updates, strong type | safety during schema changes (which are allowed to be | arbitrary), sophisticated transaction support, tight control | over caching and transactional "copy out", watching fields or | objects for changes, constraints and the equivalent of foreign | key constraints with better validation semantics than what JPA | or SQL gives you, you can define any custom data derivation | function for new kinds of "index", a CLI for ad-hoc querying, | and a GUI for exploration of the data. | | Oh yes, it also has a Raft implementation, so if you want | multi-cluster FDB with Raft-driven failover you could do that | too (iirc, FDB doesn't have this out of the box). | | And because the K/V format is stable, it has some helpers to | write in memory stores to byte arrays and streams, so you can | use it as a serialization format too. | | FDB has something a bit like this in its Record layer, but it's | nowhere near as powerful or well thought out. Permazen is | obscure and not widely used, but it's been deployed to | production as part of a large US 911 dispatching system and is | maintained. | | Incremental schema evolution is possible because Permazen | stores schema data in the K/V store, along with a version for | each persisted object (row), and upgrades objects on the fly | when they're first accessed. | | [1] https://permazen.io/ | | [2] | https://cdn.jsdelivr.net/gh/permazen/permazen@master/permaze... | SamReidHughes wrote: | 100%. I don't have the time to read the paper but online | schema changes, with the ability to fail and abort the entire | operation if one row is invalid, are basically the same | problem as background index building. | | If instead of using some generic K/V backend, it made use of | specific FDB features, it might be even better. Conflict | ranges and snapshot reads have been useful for me for some | background index building designs, and atomic ops have their | uses. | | > Oh yes, it also has a Raft implementation, so if you want | multi-cluster FDB with Raft-driven failover you could do that | too (iirc, FDB doesn't have this out of the box). | | I don't know what you mean by this. Multiple FDB clusters? | mike_hearn wrote: | It supports atomic ops and snapshot reads. Don't remember | about conflict ranges. It doesn't require all backends to | be identical, it supports a kind of graceful degradation | when backends don't have all the features. The creator is | quite keen on FDB and made sure Permazen works well with | it. | | Yes multiple FDB clusters. IIRC FDB replication doesn't | support full geo-replication, or didn't. There's a post by | me about it somewhere on their forums. | Dave_Rosenthal wrote: | I totally agree with your high level point that there isn't a | great SQL (OLTP, or otherwise) layer for FoundationDB. Building | something like this would be very hard--but I don't think the | FoundationDB storage engine itself would end up inflicting the | limitations you mention if it was well executed. And | FoundationDB _was_ specifically designed for real-time | workloads with mixed reads /writes (i.e. the OLTP case). | | Whether or not concurrency is optimistic (or done with locks, | or whatever) doesn't really have a bearing on things. Any | database is going to suffer if it has a bunch of updates to a | specific hot keys that needs to be isolated (in the ACID | sense). As long as your reads and writes are sufficiently | spread out you'll avoid lock contention/optimistic transaction | retries. | | You speak to the real main limitation of FoundationDB when you | talk about stuff like schema changes. There is a five-second | transaction limit which in practice means that you cannot, for | example, do a single giant transaction to change every row in a | table. This was definitely a deliberate deliberate design | choice, but not one without tradeoffs. The bad side is that if | you want to be able to do something like this (lockout clients | while you migrate a table) you need a different design that | uses another strategy, like indirection. The good side is that | screwed-up transactions that lock big chunks of your DB for a | long time don't take down your system. | | I find that the people who are relatively new to databases tend | to wish that the five second limit was gone because it makes | things simpler to code. People that are running them in | production tend to like it more because it avoids a slew of | production issues. | | That said, I think for many situations a timeout like 30 or 60 | seconds (with a warning at 10) would be a better operating | point rather than the default 5 second cliff. | mrtracy wrote: | I think that the SQL-on-top, and optimistic model, are | definitely things that can have a workflow-dependent | performance impact and are relevant. | | All databases do suffer under some red line of write | contention; but optimistic databases will suffer _more_ , and | will start degrading at a _lower level of contention_. | "Avoiding contention" is database optimization table stakes, | and you should be structuring every schema you can to do so; | but hot keys are almost inevitable when a certain class of | real-time product scales, and they will show up in ways you | do not expect. When it happens, you'd like your DBMS to give | as much runway as possible before you have to make the tough | changes to break through. | | SQL-on-top becomes an issue for geographic distribution; | without "pushing down" predicates, read-modify-write | workloads, table joins, etc. on the client can incur | significant round-trip time issuing queries. I think the lack | of this is always going to present a persistent disadvantage | vs selecting a competitor. | | And again, given FDBs multiple-full-secondary model, it's | only a problem when working in real time, slower queries can | work off a local secondary. But latest-data-latency is | relevant for many applications. | aseipp wrote: | FWIW, I believe read transactions are unlimited in duration | now that the Redwood engine has been available. But I haven't | tested Redwood myself. Write transactions are still | definitely limited to 5 seconds, though. | gregwebs wrote: | TiDB uses TiKV as an equivalent to foundationDB. It supports | online migrations and pushing down read queries to the kv | later. It also defaults to optimistic locking, but supports | pessimistic. It also doesn't have a five second rate | transaction limit. a SQL layer on top of foundation DB could | probably solve all these problems and it wouldn't be novel. | preseinger wrote: | do you think the things you mention were deliberate design | decisions? | mike_hearn wrote: | Yes, one of the nice things about FDB is it has extensive | design docs. Optimizing for reading more often than writing | is obviously a pretty normal design choice, outside of log | ingestion you'll normally be reading more than writing. There | are people using FDB for logs (snowflake iirc?) and it's been | optimized for that sort of use case more in recent years, but | it's not like it was an unreasonable choice. | aseipp wrote: | Snowflake uses FoundationDB for warehouse metadata in the | control plane, IIRC. It is not in the data plane path for | log ingestion or other warehousing tech. That said the | control plane is, uh, pretty important! | mrtracy wrote: | They absolutely were, yes. There are very valuable | application profiles where FoundationDB's design is | excellent, and you can see that from its internal usage at | large companies like Apple and Snowflake. | monstrado wrote: | I built an online / mutable time-series database using FDB a few | years back at a previous company. Not only was it rock solid, but | it scaled linearly pretty effortlessly. It truly is one of novel | modern pieces of technologies out there, and I wish there were | more layers built on top of it. | georgelyon wrote: | FoundationDB is a truly one-of-a-kind bit of technology. Others | have already linked to the testing methodology that allows them | to run orders of magnitude more database hours in test than have | run in production: https://www.youtube.com/watch?v=4fFDFbi3toc | | A less known but also great talk is the follow which talked about | what the a few of the team worked on next, effectively trying to | generalize the methodology to any computer program: | https://www.youtube.com/watch?v=fFSPwJFXVlw | | I liken the approach to being able to fuzz the execution space of | the program, not just the inputs. | [deleted] | jeffbee wrote: | How hard have people pushed this thing? We get regular threads of | effusive praise, but little criticism. Last time I mentioned that | years ago my colleagues found half a dozen ways to lose data in | FDB I got called out here and even in private emails, but it | seems more valuable to know where the limits of these systems | are, and not very valuable to read the positive feelings of | people who used FDB in trivial and uncritical ways. | ryanworl wrote: | FoundationDB is used at Datadog as the metadata store for | Husky, the storage and query engine powering a significant | number of Datadog products, such as logs, network performance | monitoring, and trace analytics. | | 1. https://www.datadoghq.com/blog/engineering/introducing- | husky... | | 2. https://www.datadoghq.com/blog/engineering/husky-deep-dive/ | | 3. https://www.youtube.com/watch?v=mNneCaZewTg | | 4. https://www.youtube.com/watch?v=1-zo9jqdRZU | | I was involved with this project from the beginning and it | would've taken significantly longer to deliver without | FoundationDB. | jeffbee wrote: | I know there are multiple companies that use it. The question | is not whether people put things into FDB. The question is | whether anyone has checked to see if their junk was still | there later. I don't consider large scale deployments to be | proof of anything. When I worked on Gmail we were still | finding data-loss bugs in either BigTable or Colossus | regularly, even after those systems had been the largest | datastores on the planet for many years. | [deleted] | eatonphil wrote: | Is Snowflake big enough of a deal? | | https://news.ycombinator.com/item?id=16880404 | | Also, in the post itself, authors including Apple and Snowflake | devs, it mentions it's run in production by Apple and | Snowflake. | | I haven't seen yet though what Apple uses it for. | tilolebo wrote: | It is used by CloudKit | | https://machinelearning.apple.com/research/foundationdb- | reco... | jeffbee wrote: | The time at which my colleagues found easy ways to lose data | was well after Apple had claimed to use it in iCloud at | scale. So, I don't think deployment at scale is a proof of | correctness. The thing that needs doing is regularly looking | in the database for things that should be there. | endisneigh wrote: | I'm curious - could you elaborate on the circumstances? | Like the version of FDB, cluster size, network | circumstances, etc? | Dave_Rosenthal wrote: | Yes, there are definitely a lot of big companies that have used | FoundationDB very hard at huge scale for many years. That said, | yeah, it feels like there are also a lot of folks on HN who | just jump on the "cool, fault simulation" bandwagon and don't | have a lot of personal real-world experience. | | What I can tell you, for sure, is that if you find an issue | with something as important and fundamental as data loss the | team working on FoundationDB would take it super seriously. | cetinsert wrote: | https://deno.com/deploy is building | | https://deno.com/kv on FoundationDB! | qaq wrote: | Suprised none used it as a foundation for a NewSQL DB, the thing | is battle tested and actively developed by Apple and Snowflake. | danpalmer wrote: | I think I remember the FDB team developing one that was closed | source back before their acquisition. I thought the business | model was going to be open core and closed, paid, layers on | top. I seem to remember them benchmarking the SQL layer and it | being highly performant still, despite the complexity it added. | | Maybe this thing still exists in close source form at Apple? It | wouldn't surprise me if it does and forms the basis of a | Spanner alternative, they're big enough to need it. Or maybe | they canned it pre/post acquisition. | | Edit: ah, you've already mentioned the closed source layer that | exists at Apple. There we go! | endisneigh wrote: | There's https://www.tigrisdata.com/ | | It's similar to mongo (it's nosql) | eatonphil wrote: | Not a NewSQL database though as GP mentioned. I don't think | Tigris has a SQL layer. | endisneigh wrote: | Yes I know. I explicitly said it was similar to mongo. Just | responding to the bit that it's battle tested and used as a | foundation (no pun intended) for another db. As far as I | know it's the only database that has a company around it | that is using FDB | qaq wrote: | there was poc of sqlite on top of FDB. There is also sql | layer that Apple did not open source that they use at scale. | Just seems a wasted opportunity. | endisneigh wrote: | It's because you introduce a lot of latency. Cockroachdb | for example (which is a great db) has a lot of latency | compared to Postgres. | | At the time of its release it was probably hard to justify | having an order of magnitude more latency than competitors | (of course they were not fault tolerant, but still). | riku_iki wrote: | hypothetically, you can run cocroach with replication | factor 1, and have also low latency and apples to apples | comparison. | canadiantim wrote: | I know some people have had success using FoundationDB as a KV | store with SurrealDB[1] | | [1] https://github.com/orgs/surrealdb/discussions/25 | qaq wrote: | thats a document-graph database though | endisneigh wrote: | I've been using FDB for toy projects for a while. It's truly rock | solid. In my experience it's the best open source database I've | used, including mariadb, Postgres and cockroach. That being said, | I wish there were more layers as the functionality out of the box | is very very limited. | | Ideally someone could implement the firestore or dynamodb api on | top. | | https://github.com/losfair/mvsqlite | | Is basically distributed SQLite backed by FDB. I've been scared | to use it since I don't know rust and can't attest to if mvcc had | been implemented correctly. | | In using this I actually realized how coupled the storage engine | is to the storage system and how few open source projects make | the storage engine easily swap-able. | fhrow4484 wrote: | > That being said, I wish there were more layers as the | functionality out of the box is very very limited. | | The record layer https://github.com/FoundationDB/fdb-record- | layer which allows to store protobuf, and define the primary | keys and index directly on those proto fields is truly amazing: | | https://github.com/FoundationDB/fdb-record-layer/blob/main/d... | facu17y wrote: | mvcc is already taken care of by fdb, no? | endisneigh wrote: | Yea, but mvsqlite implements its own to get around the | limitations around transactions. | tommiegannert wrote: | I really wanted to use FoundationDB for building a graph | database, but was taken aback by the limitations in record | (10+100 kB) and somewhat transaction sizes (10 MB) [1]. And the | documentation [2] doesn't really give any answers than "build | it yourself." | | mvsqlite seems to improve the transaction size [3], which is | nice. Does it also improve the key/value limitations? | | > Transaction size cannot exceed 10,000,000 bytes of affected | data. [---] Keys cannot exceed 10,000 bytes in size. Values | cannot exceed 100,000 bytes in size. | | [1] https://apple.github.io/foundationdb/known-limitations.html | | [2] https://apple.github.io/foundationdb/largeval.html | aseipp wrote: | Transaction size and duration is limited to keep the latency | and throughput of the system manageable under load, from my | understanding. It makes sense to some degree even with no | background in the design; if you are serving X/rps with a | latency of Y milliseconds, using Z resources, and you double | Y, you now need to double your resources Z as well, to serve | the same amount of clients. You always hit a cap somewhere, | so if you want consistent throughput and latency, it's maybe | not a bad tradeoff. | | mvsqlite fixes the transaction size through its own | transaction layer, from my understanding; I don't know how | that would impact performance. The 10kb/100Kb key value limit | is probably not fixable in any way, but it's not really a | huge problem as a user in practice for FDB because you can | just shard the value across two keys in a consistent | transaction and it's fine. 10 kilobyte keys have pretty much | never ever been an issue in my cases either; you can | typically just do something like hash a really big key before | insert and use that. | tanepiper wrote: | A few years ago I was working at an agency, one of their teams | was building a real-time gaming system on top of FoundationDB. | | Apple then bought it up and shut the open source down. They had | to rebuild whole layers from scratch. | Dave_Rosenthal wrote: | Yeah, that sucked for sure and we hated to disappoint people | like that (co-founder here). But you have it exactly backwards. | FoundationDB was never open source. There was a binary that you | could download and use as a trial, or you could buy a license | for real use. The users that bought licenses got to keep using | those licenses. Some of those customers went on to build | billion-dollar businesses on top of FoundationDB (Snowflake!) A | few years after acquiring the tech Apple themselves open | sourced it (!) so now it is open source. The big challenge for | users is that most of the sophisticated "layers" that make the | tech into more of an easy-to-use database rather than just a | storage engine are still proprietary. | 58028641 wrote: | As far as I can tell, FoundationDB was never open source until | Apple open sourced it. | tanepiper wrote: | At least one reference on here from 2018 - | https://news.ycombinator.com/item?id=16878786 | | And here's a news story - https://www.forbes.com/sites/benkep | es/2015/03/25/a-cautionar... | detaro wrote: | ... which both state that it wasnt open-source before the | apple buyout. | endisneigh wrote: | It was never open source before apple. Rather the binary | was freely available to be used. When apple bought them | they took it away but continued to support customers with | contracts. In that way it was inaccessible until it was | open sourced. | metadat wrote: | Yes, I got bitten by this and will never forget- FDB | abruptly shut off public access in mid-2015. Fortunately | for me, it only cost half a day to migrate my system to | Postgres. | stephenr wrote: | Hey now, don't let verifiable facts and observed history get | in the way of a chance to bash Apple. | hadjian wrote: | His jokes are hilarious! | mprime1 wrote: | Part of the FDB team (great folks) went on to create something | quite incredible I have the pleasure of having early access to. | If you're into dependability check this out: | https://antithesis.com/ | leetrout wrote: | What always fascinated me is they built the simulator for the | database first(ish) and relied on it as a first class citizen | while building the DB: | | https://www.youtube.com/watch?v=4fFDFbi3toc | | > We wanted FoundationDB to survive failures of machines, | networks, disks, clocks, racks, data centers, file systems, etc., | so we created a simulation framework closely tied to Flow. By | replacing physical interfaces with shims, replacing the main | epoll-based run loop with a time-based simulation, and running | multiple logical processes as concurrent Flow Actors, Simulation | is able to conduct a deterministic simulation of an entire | FoundationDB cluster within a single-thread! Even better, we are | able to execute this simulation in a deterministic way, enabling | us to reproduce problems and add instrumentation ex post facto. | This incredible capability enabled us to build FoundationDB | exclusively in simulation for the first 18 months and ensure | exceptional fault tolerance long before it sent its first real | network packet. For a database with as strong a contract as the | FoundationDB, testing is crucial, and over the years we have run | the equivalent of a trillion CPU-hours of simulated stress | testing. | | https://pierrezemb.fr/posts/notes-about-foundationdb/ | riwsky wrote: | "The Jepsen is coming... from INSIDE THE HOUSE!" | AaronFriel wrote: | When I was writing a Haskell client library for Hyperdex, | another distributed kv store, I found it incredibly helpful to | implement a simulator for correctness. This helped me identify | which behavior was unspecified (arithmetic overflow: should | error) or where my simulator deviated. | | https://github.com/AaronFriel/hyhac/blob/master/test/Test/Hy... | | Alas, I think Hyperdex development paused a few years later. | It's a shame that it stopped then. | pavlov wrote: | For some types of distributed systems, you can do this kind of | simulated testing in advance by building a TLA+ model. | | It's not a full-blown simulator (because generally the | application code doesn't even exist yet when you're building | the TLA+ model). But it can let you collect data and validate | assumptions about your design before writing a single line of | code. | rockwotj wrote: | My beef with TLA+ is that it's not the same code, so while | you're testing the design yes, you aren't testing the | implementation of the design, which is just as important (if | not harder too) to get right. | aseipp wrote: | Yes, but there really aren't too many good solutions to | that that aren't either extremely language or domain | specific. And if you're careful you can get a lot of direct | mileage out of it. For example, MongoDB (yes, that one!) | used it in the development of their Atlas system and has a | paper about using TLA+ to model the system, characterize | behaviors, then generate compilable-code test cases from | those minimal set of behaviors -- which are then directly | linked against the core internals of the Atlas codebase as | a client library. They then run those tests and re-generate | them when the model changes. "Model based test case | generation" is the strategy here. So you can characterize | what happens in split brain scenarios, state machine | transition failures (conflicting transactions), etc. | | In reality the design stage is a pretty critical phase so | you need all the help you can get, so even if you don't | like TLA+ you're way better off than not modeling at all. | | As an example of the language specific thing, though, | there's a library for Haskell I like that's very cool, | called Spectacle, which also implements the temporal logic | of TLA+ along with a model checker, but as a Haskell DSL. | An interesting benefit of this is that you can model check | actual real Haskell code that runs e.g. in your services, | but I haven't taken this very far. There are also | alternative solutions like Stateright for Rust. But again, | not everyone has the benefit of these... | bigfish24 wrote: | Do you know the paper for Atlas? | aseipp wrote: | Yes, I managed to find it: "eXtreme Modeling in Practice" | https://arxiv.org/pdf/2006.00915.pdf | | Unfortunately I got the product wrong; it was not Atlas, | it was Realm Sync. All of the test-case generation stuff | is in Section 5. | pavlov wrote: | Yes, the model is more like an executable form of | documentation. There's no guarantee that code comments | match what the code actually does; similarly there's no | guarantee that the TLA+ model matches what the system does. | | Documentation is still generally useful, and so is a model. | You have to be committed to keeping both up to date as the | code evolves. | falsandtru wrote: | I'm loving this point. The unfortunate thing is those tests are | closed source (I saw a maintainer says so probably in an issue | before). It seems testable but still seems to be closed source. | So we cannot fork the project even if FDB becomes totally | closed source again. | aseipp wrote: | No, the simulation harness and tests are open source and you | can run them. It would be impossible for anyone to contribute | anyway without it, for example, Snowflake, which heavily | depends on it. It's built into the server binary directly, so | the same code is always used, and it's simply a different | operational mode when compared to the real server. I used to | have a project to do lots of simulation runs on my big 32 | core server and then aggregate the logs into clickhouse for | analysis. It wasn't that hard. | | However, they (at least at the time most of the developers | were at Apple, many have now moved to Snowflake and the Apple | team has grown a little I think) haven't released or | integrated their nightly cluster and performance testing | systems into open, nor have they integrated them with GitHub | Actions or Nightly runs or anything. My understanding is that | this is "just" a lot of compute cluster/platform | orchestration code on top of the tests that exist in the | repository. So, while Apple or Snowflake integrates changes | across hundreds of concurrent fuzzing simulations on whatever | platforms they have, if you write patches yourself, you're | stuck with long simulation runs. Maybe that's changed; I | haven't kept up since the 7.0 series. | | In practice if you write patches and they accept them, they | will just do the testing in their runs for you, on a cluster | far larger than what you could have. Failures reports will | tell you how to reproduce them from the test files. As a | contributor, testing the system on your own is mostly a | matter of how much money or how many CPU cores you can | personally stand to set on fire. | | Someone could probably integrate this functionality into a | Kubernetes operator or something so that outside engineers | could run large scale simulations reliably. But it is really | expensive and CPU/compute intense, no matter how you go about | it. | | [1] https://forums.foundationdb.org/t/how-to-use- | foundationdb-un... | | [2] https://github.com/apple/foundationdb/tree/main/tests | falsandtru wrote: | Those tests are not the implementations of the tests, just | specifying the test case and the few options. But I found | the implementations. I am not sure if this is all of the | simulation tests, but it seems to cover the basic cases. | | https://github.com/apple/foundationdb/tree/main/fdbserver/w | o... | | > Someone could probably integrate this functionality into | a Kubernetes operator or something so that outside | engineers could run large scale simulations reliably. But | it is really expensive and CPU/compute intense, no matter | how you go about it. | | Maybe this. | | https://github.com/FoundationDB/fdb-joshua | aseipp wrote: | Yeah, that's basically an actually good implementation of | the pile of crap that I threw together several years ago | while writing a few patches. :) | | And yes, I linked to the spec files because there | actually isn't that much test code written in Flow I | feel; the high-level specs in the .txt files can be mixed | and matched so much to create a lot of variety from some | small number of primitives, so that's really where all | the good stuff is. Implementation vs interface, and all | that. | samsquire wrote: | Thanks for sharing that and quoting an incredibly useful | snippet. | | This is such an interesting topic! | | Some thoughts: | | * I wonder if the approach could be used to implement | debuggable replayability, with accurate tracing and profiling. | A bit like what verdagon is doing with Vale. | | * It could be used to integrate the event loop with tracing | (rather than instrumentation with Jaegar) | | * I really like the idea that "every object" is an event loop, | which reminds me of Microsoft Orleans with its actor model for | its grains. | | * I am interested with actor and lightweight thread | architectures. | | * I am interested in the scalabiliy of nodejs event loop | architecture and Win32 desktop application programming. | | * I think this approach could be used to test and simulate | microservices. | | * Approach could be used to test GUIs with React Redux reducer | style. | ajmurmann wrote: | Working on a distributed key/value store myself, I couldn't | agree more and think what FoundationDB did for testing from the | start is absolutely the way to go. Testing distributed system | is very tricky and tests can be incredibly time consuming and | bring everything to a halt. | ibotty wrote: | Every time I look at FoundationDB for replacing some Redis usage | I wonder about key expiry/TTL, look for it and find nothing. | | Is this such a strange use case, that there is not even a blog | entry about it only some forum entries? | endisneigh wrote: | You would need to implement that yourself. Easily can be done | by storing tuples with your expiry date. You then could watch | the keys to remove expired keys automatically. FDB is very | barebones by design. Alternatively (and easier): | | https://forums.foundationdb.org/t/designing-key-value-expira... ___________________________________________________________________ (page generated 2023-07-03 23:00 UTC)