[HN Gopher] What's the big deal about embedded key-value databases? ___________________________________________________________________ What's the big deal about embedded key-value databases? Author : eatonphil Score : 96 points Date : 2022-08-23 15:58 UTC (7 hours ago) (HTM) web link (notes.eatonphil.com) (TXT) w3m dump (notes.eatonphil.com) | eis wrote: | A few more entries that might be of interest: * | DynamoDB and the Dynamo KV store * LMDB (embedded kv) | * Dgraph (distributed graph db) and its embedded kv store | BadgerDB | didgetmaster wrote: | I am building a general-purpose data management system called | Didgets (https://didgets.com/) that extensively uses KV stores | that I invented. Since it was primarily designed to be a file | system replacement, I used them for attaching contextual meta- | data tags to file objects. | | My whole container started to look like a sparsely populated | relational table where every row/column intersection could have | multiple values (e.g. a photo could have a tag for every person | in the picture attached). I started experimenting with using the | KV stores as columns to form regular relational tables. | | It turns out that it was relatively easy and was extremely fast. | I started building tables with 50+ million rows and many columns | and performing queries against them. Benchmarking the system | against other databases revealed that it was very fast (and | didn't need separate indexes to accomplish this). | | Here is a video showing how it does a bunch of queries 10x faster | than the same data stored in a highly indexed table in Postgres: | https://www.youtube.com/watch?v=OVICKCkWMZE | atmin wrote: | No mention of SQLite as an embedded SQL database? | eatonphil wrote: | This post is about key-value stores. | | While foundationdb uses SQLite I didn't otherwise think of | SQLite as being relevant here. :) | morelisp wrote: | "Time is a flat circle." - someone at Sleepycat, probably. | rad_gruchalski wrote: | This is a good read. By the way, Kafka Streams is also built on | top of RocksDB. Not strictly a database but relevant to a certain | extent. | Xeoncross wrote: | I highly recommend people comfortable with Go checkout the | building blocks at https://github.com/thomasjungblut/go-sstables | | This codebase shows how SSTables, WAL, memtables, recordio, | skiplists, segment files, and other storage engine components | work in a digestible way. Includes a demo database showing how it | all comes together to make a RocksDB / LevelDB competitor (not | really). | tristan957 wrote: | I work on a storage engine at $dayJob. We have created a | connector for MongoDB, although for a very ancient version. We | are currently working with $cloudProvider to use our storage | engine in their cloud DBaaS offerings. | | This field is pretty interesting when you're talking about | performance vs space amp vs write amp vs read amp. | aviramha wrote: | Great article! One cool thing about RocksDB it's actually even | used in other KV databases such as Redis on Flash | https://redis.com/blog/hood-redis-enterprise-flash-database-... | eatonphil wrote: | Yup, FB's ZippyDB [0] is another example mentioned in the | article. | | [0] https://engineering.fb.com/2021/08/06/core-data/zippydb/ | | Edit: I've added Redis Enterprise Flash to the list now. | Thanks! | dboreham wrote: | The article misses the point. All data storage and query | systems end up architected in layers. Upper layers deal with | higher abstractions (objects, rows, whatever). Lower layers | deal with simpler functions, closer to the hardware. The upper | layers are consumers of the lower layers. This is where | "embedded KV stores" like LevelDB, RocksDB, etc come from. They | began as the embedded storage layer for some bigger thing. | Every product you think of as a database or document store is | built like this, including MySQL and PostgreSQL and Oracle. | Such a storage layer, shipped as an independent library, is how | you (or anyone) builds your own database-ish thing. That's what | the article should say. | | The list of examples are odd. For instance MongoRocks is cited | for using RocksDB, but actual stock MongoDB uses Wired Tiger, | which isn't mentioned. | | Disclosure: I played a part in the late-beginning of this space | when Netscape funded Sleepycat to develop BerkeleyDB. dbm and | ndbm existed beforehand, but BerkeleyDB used in LDAP servers is | I think the genesis point for this pattern as it exists today. | eatonphil wrote: | If there's a difference between what you wrote and what I | wrote I'm missing it. | | But you're also welcome to write your own post. :) | morelisp wrote: | I do feel like there's a historical perspective missing | from the article which the GP touches on. Embedded KV | stores aren't new (although some of the algorithms behind | the current crop certainly are). They used to dominate | "backend" software development until their popularity waned | as the world got obsessed with "model the domain, damn the | computation cost" (because all resources were doubling or | more yearly) followed by "we'll just distribute it". | | The need for parallelism killed the first approach and the | cost of increasingly complex reduce steps killed the | second. Now we're back to "how much can we fit in RAM on a | local machine" and it turns out, if you can still bang bits | for smart key formats, a hell of a lot. | galaxyLogic wrote: | > Upper layers deal with higher abstractions (objects, rows, | whatever) | | Right, I'm waiting for standard for a level above relational | databases which is Object-databases. I know there are several | ones already and there are Object-Relational mapping layers. | | I think the key point there is that Object databases are a | level ABOVE relational databases. They are not "better" but | they deal with the higher level of objects rather than | "tables", just like relational databases can be seen to be | are a level above key-value -stores. | | I would like Object databases to become better and easier to | use and more standardized. | | I think there is value in being able to see both level, the | objects, and the relational data that makes up the objects. | morelisp wrote: | Neither objects nor relations are "above" the other. You | can map them in a vacuous mathematical sense, but it's a | massively leaky abstraction in either direction. | eatonphil wrote: | Some concrete examples: | | 1. Yugabyte's relational query layer sits on top of a | document store (DocDB): | https://www.yugabyte.com/blog/how-we-built-a-high- | performanc.... | | 2. You can put documents in a PostgreSQL JSON(B) column. | nicholasjarnold wrote: | > They began as the embedded storage layer for some bigger | thing. | | I immediately thought of Kafka's streaming query stuff when I | read the headline (ksqlDB). I'm not sure if that's the origin | story of RocksDB, but it's the storage engine underlying that | streaming query tooling in Kafka's ecosystem. | eis wrote: | TiKV is not an embedded key-value store, it is distributed. | eatonphil wrote: | Thanks! Fixed and attributed you at the end. | x3n0ph3n3 wrote: | My team has a use-case that involves a precomputed RocksDB | database saved on an AWS EFS volume that is mounted on a lambda | with 100's-1000's of invocations per second. It allows for some | extremely fast querying of relatively static data. Another | process is responsible for periodically updating the database and | writing it back to the EFS volume. | samsquire wrote: | With RockSet's converged indexes and an SQL query optimiser you | can build an SQL database. | | https://rockset.com/blog/converged-indexing-the-secret-sauce... | | Rockset's converged indexes + denormalisation means you can have | fast querying. | mprovost wrote: | I feel like this is missing any mention of the history of KV | stores. Unix came with an embedded database (dbm) from the early | days (1979) [0] which was rewritten at Berkeley into the more | popular bdb in the 80s. [1] Sendmail was one of the more common | programs that used it. And then when djb built his replacement | for sendmail, qmail, he invented cdb. [2] | | [0] https://en.wikipedia.org/wiki/DBM_(computing) | | [1] https://en.wikipedia.org/wiki/Berkeley_DB | | [2] https://cr.yp.to/cdb.html | LAC-Tech wrote: | When I read about event sourcing, my mind immediately went to how | that would map to a K/V database. Has anyone done this in | production? | | Also - no mention of LMDB? RocksDB and LMDB feel like the ones | that stand out in that field - levelDB definitely had a | reputation for corrupting data. | effnorwood wrote: | adammarples wrote: | Plug for my python dict wrapper | https://github.com/adammarples/rocksdbdict | Adiqq wrote: | Honestly, I'm still not sure, why would I use something like | RocksDB instead or in addition to plain PostgreSQL/MongoDB/Redis | instances. | | I don't work with a lot of data, but typically my decisions base | on basic factors and purpose: | | PostgreSQL - SQL, structured data, cannot scale horizontally | | MongoDB - NoSQL, unstructured data | | Redis - key-value, distributed cache | | I get it that you can replace storage engine and you can | theoretically get more performance, but in practice compatibility | and standardization is more important, because a lot of products | (including third-party) will already use | PostgreSQL/MongoDB/Redis, so it's no-brainer to use it as well | for your solution. | | However for me to pick RocksDB or some other, new, shining | database/storage engine, there would have to be more compelling | reasons. | jzelinskie wrote: | Unless you are building a database, these embedded KV store | libraries are less likely to be the best solution the job. If | you are considering them for an app that isn't a database, you | should also take a long, hard look at SQLite first. | | What's also interesting is the trend of newer distributed | "database systems" like Vitess[0] or SpiceDB[1] that forego | embedded KV stores and instead reuse existing SQL databases as | their "embedded database". Vitess leverages MySQL and SpiceDB | leverages MySQL, PostgreSQL, CockroachDB, or Spanner. Systems | built this way get to leverage many high-level features from | existing databases systems such that they can focus on | innovating in even higher-level functionality. In the case of | Vitess, it's scaling, distributing, and schema management of | MySQL. In the case of SpiceDB, it's building a database | specifically optimized for querying access control data in a | way that can coordinate with causality across multiple | services. | | [0]: https://github.com/vitessio/vitess | | [1]: https://github.com/authzed/spicedb | zarzavat wrote: | In your list RocksDB is most like Redis, but even faster | because the data doesn't have to leave the process. | | Think of it as a high performance sports car like a Ferrari. | It's not good at taking the kids to school or buying groceries. | But if you need to prioritise performance at the expense of all | other considerations then it's exactly what you need. | Xeoncross wrote: | Like S3 or Redis, RocksDB is much more performant when you | don't need the query engine and want to have highly compact | storage with fast lookups and high write throughput. | | Storage engines are different levels of complexity based on the | query requirements. Simple K/V stores can run circles around | Postgres/MySQL as long as you don't need the extra features. | rajko_rad wrote: | Two more examples to check out: Yugabyte also persists with | rocksDB https://www.yugabyte.com/blog/how-we-built-a-high- | performanc... | | And this is very cool, distributed SQLite with FDB: | https://univalence.me/posts/mvsqlite | eatonphil wrote: | Thank you, edited to include Yugabyte! | ramoz wrote: | Should see a rise in embedded KV popularity in correlation with | ML applications. Storing embeddings in something like leveldb in | formats such as flatbuffer offer high-performance solutions for | online prediction (i.e. for mapping business values to their | embedding format on the fly to send off to some model for | inference). | jupp0r wrote: | Would that be on mobile devices for offline usage? I'm thinking | that for typical backend use cases one would use a dedicated | key value store service, right? | ramoz wrote: | This would depend on your requirements and type of inference. | Say you need to compute inference across 1000's of | content/documents/images every second or so, out of some | corpus of millions-billions, then having a kv store on | disk/SSD (NVME) might be for more efficient & cheaper (in | terms of grabbing those embeddings to conduct a downstream ML | task). How you update the corpus matters too -- a lot of | embedding spaces need to be updated in aggregate. | lacker wrote: | IMO it's just confusing to call both, say, RocksDB and MySQL | "databases". They sit at different levels of the stack and it is | easier to just think of them as entirely different things, your | "SQL database" and your "storage engine". So your stack looks | like | | Application | | | | | MySQL | | | | | RocksDB | | | | | Filesystem | | In general the MySQL layer is doing all the convenient stuff for | application developers like supporting different queries and | datatypes. The RocksDB layer is optimizing for performance | metrics like throughput and reliability and just treats data as | sequences of bytes. | tomhallett wrote: | 100% agreed. TIL that mysql uses RocksDB under the hood. | | Here's another example of a realtime database which uses | RocksDB under the hood: https://rockset.com/blog/how-we-use- | rocksdb-at-rockset/ | eatonphil wrote: | As far as I'm aware, MySQL does not use RocksDB under the | hood by default. MyRocks is a distribution of MySQL that uses | RocksDB. | moralestapia wrote: | Yeah, weird comment from GP. By the time RocksDB was born, | MySQL was already going to prom. | ruw1090 wrote: | Close, but in database years it was actually already in | its mid life crisis. | icelancer wrote: | Only if you configure it that way. Same as MyISAM/InnoDB/etc. | [deleted] | lcnPylGDnU4H9OF wrote: | Actually, this helps a lot. I'd never heard of RocksDB and I'm | barely familiar with InnoDB and hopefully I am not wrong to | compare the two. | jeffbee wrote: | I think the use of bare RocksDB is more common than the use of | MyRocks. | NetOpWibby wrote: | You should add RethinkDB! I moved to it from MongoDB years ago. | jeffbee wrote: | RethinkDB is utterly defunct as a project, has not had a | substantive release in years, and in my experience just flat | out doesn't work. And let's don't even discuss Mongo. Asking | yourself to choose between these is like selecting your | favorite brand of thumbtack to step on. | gqewogpdqa wrote: | Lol. When did you last use MongoDB and why is it a thumbtack? | NetOpWibby wrote: | RethinkDB still works well for me /shrug | orthecreedence wrote: | Are you still using it? How is the pace going on the community- | supported version? I stopped using it after the company folded, | but I do kind of miss it. Definitely one of the more | interesting designs, and light years beyond what MongoDB was at | the time. | NetOpWibby wrote: | I'm definitely still using it, via rethinkdb-ts (npm | package). I even forked it to make it work with Deno. | | The built-in Data Explorer is a must-have for me and idk of | any other database that has something similar. | eis wrote: | There are plenty of data explorers for other databases, | especially SQL DBs. I don't think it being built into the | DB should be a make-it-or-break-it feature. | | I used RethinkDB back in the days because it was the first | DB that had pretty good replication and sharding - it was | zero effort. I felt the functional programming model to be | strange, some stuff got executed locally, other parts | remotely and it was not very straight forward when things | didn't go as planned. | | By the time the RethinkDB company folded, CockroachDB | emerged and has been my go-to distributed DB since. | eatonphil wrote: | No I don't think that's relevant. They implement their own | btree it seems [0]. | | They don't use a key-value store library. | | I know it's a bit of a fine line. But I'm talking about | standalone libraries people embed across different | applications/databases. That's what RocksDB/LevelDB/Pebble are. | | [0] | https://github.com/rethinkdb/rethinkdb/tree/v2.4.x/src/btree | tristan957 wrote: | HSE[0] is another storage engine to throw on the pile. | | [0]: https://github.com/hse-project/hse | NonNefarious wrote: | The term is "key/value." | mdzn wrote: | The article says that Consul or etcd are designed to always be | up, but it's actually quite the opposite. They both leverage Raft | for maintaining consensus and thus optimize for consistency at | the cost of availability in case of network partitions. See CAP | theorem. | cloudhead wrote: | All distributed databases are designed to "always be up", | that's the point of making them distributed, otherwise a single | instance is fine. | morelisp wrote: | There are reasons to distribute DBs that do not need to be up | constantly, e.g. distributing work (transactions or queries) | across more resources than are available on one machine; or | to bring a replica closer to some other service to reduce | latency. | | Kafka Streams is the first kind; the source-of-truth storage | is HA (as HA as the Kafka topics it's backed with at least) | but can only be queried with high consistency when the | consumer is active, and it goes down for rebalances when you | scale out or fail over (and in many operational setups also | when you upgrade). | | For an example of the second kind, see Fly.io's Litestream | explanation - https://fly.io/blog/all-in-on-sqlite- | litestream/. | | That being said, I think the etcd etc. examples are just | meant to be in contrast to stock Redis or Memcache, which | offer very little HA support, generally just failover with | minimal consistency guarantee. | kefir wrote: | Apache Ignite 3 also uses RocksDB as a pluggable storage | https://www.gridgain.com/resources/blog/apache-ignite-3-alph... | eatonphil wrote: | Thanks! Adding this. ___________________________________________________________________ (page generated 2022-08-23 23:00 UTC)