[HN Gopher] immudb - world's fastest immutable database, built o... ___________________________________________________________________ immudb - world's fastest immutable database, built on a zero trust model Author : dragonsh Score : 124 points Date : 2021-12-27 15:00 UTC (8 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | timdaub wrote: | I went on their website and tried to understand how immutability | is enforced but I couldn't find anything. | | I'm sceptical, but particularly because they make a deliberate | comparison to blockchain that I doubt they'll be able to deliver. | | The PoW immutability of e.g. BTC and ETH is strong as it yields | the following guarantees for stored data: | | - Immutability of the BTC blockchain is protected through all | cummulative work that has happened on a specific branch of the | chain. Even if someone replayed BTC, it'd take millenias to | recompute the work on an average machine | | - The immutability isn't enforced on a file level, as I suspect | it is with immudb. Immutability is enforced through the network | that has additionally shown to have conservative political views | too. You can go, sync a BTC node and change the underlying level | db. Still that won't change the network state. Immutability on a | single system is physically impossible if e.g you consider | deleting the file as mutation. | | - immudb says "it's immutable like a blockchain but less | complicated", but Bitcoin isn't more complicated than some | sophisticated enterprise db solution. | | - I think immudb should be maximally upfront what they mean by | immutability: It seems they want to communicate that they're | doing event sourcing - that's different from immutability | | Finally there's a rather esotheric argument. If you run an | immutable database as an organizatiom where one individual node | cannot alter the network state but you have (in)direct control | over all nodes: Isn't it always mutable as you could e.g. choose | to swap out consensus? | | So from a philosophical perspective, then immutability can truly | only occur if mutability is out of an individual's control. | | Why do I have the authority to say this? Because I too have once | worked for a database with blockchain characteristics called | https://www.bigchaindb.com | | Edit: The best solution that also has a theoretically unlimited | throughput is this toy project: | https://github.com/hoytech/quadrable | | Conceptually, it computes a merkle tree over all data and | regularly commits to Ethereum. Through this commitment the data | may still change locally: But then at least would be provably | tampered. So I guess for databases, the artibute we can really | implement is "tamper-proof". | jandrese wrote: | The big question is if someone gets on your DB server and wants | to change a record how does the software prevent them from | altering a record and then recomputing the remainder of the | chain? | layer8 wrote: | I'd say the attribute is "tamper-proof history", not "tamper- | proof data (current content)". | YogurtFiend wrote: | I'm not sure that this is a _useful_ tool. Let's talk about the | threat model or the attacks that this defends against. | | If a Client is malicious, they might try to manipulate the data | in the database in an untoward way. In a "normal" database, this | might cause data loss, if the database isn't being continuously | backed up. But immudb does continuous backups (effectively, since | it's immutable) so, if a malicious client has been detected, it's | possible to restore an older version of the database. The real | problem is how would you know that a client has tampered with | your database? Well, because this database is "tamper-proof," | duh! But the issue lies in the definition of tamper-proof. From | my reading of the source code and documentation, the "proof that | no tampering has occurred" is a proof that the current state of | the database can be reached by applying some database operations | to a previous state. As a result, a malicious client could simply | ask the database to "delete everything and insert this new data," | to make the database look like whatever it wanted. This is a | valid way to transition the state of the database from its old | state to the new state, and so shouldn't be rejected by the | tamper detection mechanism. | | "Ah," but you say, "it would look super sus [as the kids say] to | just delete the entire database. We'd know that something was | up!" The problem with this solution is how are you going to | automate "looking super sus?" You could enact a policy to flag | any update that updates more than N records at a time, but that's | not really a solution. The "right" solution is to trace the | provenance of database updates. Rather than allowing arbitrary | database updates, you want to allow your database to be changed | only by updates that are sensible for your application. The | _actual_ statement you want to prove is that "the current state | of the database is a known past state of the database updated by | operations that my application ought to have issued." Of course | what are "operations that my application ought to have issued?" | Well, it depends how deep you want to go with your threat model. | A simple thing you could do is have a list of all the queries | that your application issues, and check to make sure all | operations come from that list. This still allows other attacks | through, and you could go even more in depth if you wanted to. | | Importantly, immudb doesn't appear to contend with any of this. | They claim that their database is "tamper-proof," when in reality | you'd need a complicated external auditing system to make it | meaningfully tamper-proof for your application. (Again, a threat | model ought to include a precise definition of "tamper-proof," | which would help clear up these issues.) | | It's also worth comparing this to | https://en.wikipedia.org/wiki/Certificate_Transparency, which is | an append-only database. Compared to immudb, the _exposed data | model_ for certificate transparency logs is an append-only set, | which means that it doesn't have any of these same problems. The | problem with immudb is that the data model it exposes is more | complicated, but it's built-in verification tools haven't been | upgraded to match. | | (Also, for context, I've tried to obtain a copy of their white | paper, but after an hour the email with the link to it never | arrived.) | layer8 wrote: | Regarding backups, note that you still need separate backups | with immudb. | gigatexal wrote: | So is this a useful alternative to blockchains or just hype? | newtonapple wrote: | Has anyone tried immudb in production? What are some of immudb's | performance characteristics? It'd be nice to know how it performs | under various conditions: query per sec, database / table sizes, | SQL join performance etc. | | Also, what are the system requirements for immudb? What kind of | machine would I need to run a medium to large website (say, 1TB | of data, 5-25K qps, e.g. Wikipedia)? | | It mentioned in the documentation that it can use S3 as its | storage? Are there performance implications if you do this? | tarr11 wrote: | Previous HN thread about immutable databases: | | https://news.ycombinator.com/item?id=23290769 | artemonster wrote: | Can someone ELI5 how immutability applies to databases and which | advantages it brings. Thank you! | gopalv wrote: | > immutability ... which advantages it brings | | Immutability brings a bunch of perf short-cuts which is usually | impossible to build with a mutable store. | | You'll find a lot of metric stores optimized for fast ingest to | take advantage of the immutability as a core assumption, though | they don't tend to do what immudb does with the cryptographic | signatures to check for tampering. | | Look at GE Historian or Apache Druid for most of what I'm | talking about here. | | You can build out a tiered storage system which pushes the data | to a remote cold store and keep only immediate writes or recent | reads locally. | | You can run a filter condition once on an immutable | block/tablet, then never run it again (like a count(*) where | rpm > X and plane_id = ?) can be remembered as compressed | bitsets of each column, rather than as final row selection | masks. Then reuse half of that when you change the plane_id = ? | parameter. | | The fact that the data will never be updated makes it | incredibly fast to query as you stream more data constantly | while refreshing the exact same dashboard every 3 seconds for a | monitoring screen - every 3s, it will only actually process the | data that arrived in those 3 seconds, not repeat the query over | the last 24h all over again. | | The moment you allow even a DELETE operation, all of this | becomes a complex mess of figuring out how to adjust for | changes (you can invalidate the bit-vectors of the updated cols | etc, but it is harder). | jandrese wrote: | If the data is being added or updated continually how do you | prevent the database from growing without bound? | mjh2539 wrote: | You don't. You just keep throwing disks at it. | throwaway984393 wrote: | Immutability is probably the most powerful concept that applies | to how modern technology can be used. Versioned, immutable, and | cryptographically-signed artifacts do a bunch of things for | you. | | From an operational standpoint, it allows you to roll out a | change in exactly the way you tested, confident that it will | work the way it's intended. It also allows you to roll back or | forward to any change with the same confidence. It also means | you can restore a database _immediately_ to the last known good | state. Changes essentially cannot fail; no monkey-patching a | schema or dataset, no "migrations" that have to be | meticulously prepared and tested to make sure they won't | accidentally break in production. | | From a security and auditing standpoint, it ensures that a | change is exactly what it's supposed to be. No random changes | by who-knows-who at who-knows-when. You see a reliable history | of all changes. | | From a development standpoint, it allows you to see the full | history of changes and verify the source or integrity of data, | which is important in some fields like research. | bob1029 wrote: | There is also a performance advantage if you can build | everything under these constraints. A pointer to something | held in an immutable log will never become invalid or | otherwise point to garbage data in the future. At worst, | whatever is pointed to has since been updated or compensated | for in some _future_ transaction which is held further | towards the end of the log. Being able to make these | assumptions allows for all kinds of clever tricks. | | The inability to mutate data pointed to in prior areas of the | log does come with tradeoffs regarding other performance | optimizations that expressly rely on mutability, but in my | experience constraining the application to work with an | immutable log (i.e. dealing with stale reads & compensating | transactions) usually results in substantial performance | uplift compared to solutions relying on mutability. One | recent idea that furthers this difference is NAND storage, | where there _may_ be a substantial cost to be paid if one | wants to rewrite prior blocks of data (depending on the type | of controller /algorithm used by the device). | fouc wrote: | > A pointer to something held in an immutable log will | never become invalid or otherwise point to garbage data in | the future. | | Now I'm wondering if we can have immutable versioned APIs | pharmakom wrote: | Is it possible to delete data for compliance reasons? Not as a | frequent operation, but say on a monthly batch? | jeroiraz wrote: | logical deletion is in place, physical deletion is already on | the roadmap | cabalamat wrote: | Would it be possible to have something like this that works by | writing to a PROM? That would make it immutable at the hardware | level. | ShamelessC wrote: | > Data stored in immudb is cryptographically coherent and | verifiable. Unlike blockchains, immudb can handle millions of | transactions per second, and can be used both as a lightweight | service or embedded in your application as a library. immudb runs | everywhere, on an IoT device, your notebook, a server, on-premise | or in the cloud. | | Seems pretty useful actually. Can anyone with a relevant | background comment on when this would be a bad idea to use? | KarlKemp wrote: | The data that is at risk of being changed with malicious intent | is certainly not insignificant, but still just a fraction of | all data. Changing to this adds a new and complicated system, | replacing whatever you're currently using, which will have seen | far better testing and is known by the people working with it. | staticassertion wrote: | If you can trust your writers there's likely no need for this. | A modern approach tends to have databases owned by a single | service, which exposes the model via RPCs. So you generally | don't have more than one writer, which means you're pretty much | de-facto "zero trust" if that single writer follows a few rules | (ie: mutual auth, logging, etc). | | But in some cases you don't have that same constraint. For | example, databases that store logs (Elastic, Splunk, etc) might | have many readers and writers, including humans. | | In that case enforced immutability might be a nice property to | have. Attackers who get access to your Splunk/ES cluster | certainly will have fun with it. | imglorp wrote: | There are a few properties to be aware of. Although it might be | a KV store, you're probably going to want sensible queries | using other than the primary key. Eg time series or secondary | keys. So in addition to the KV store, there is probably a need | for an external index and query mechanism. Another issue is | obtaining consistent hashing, where multiple documents might | have the same content but vary by order or by date format. | Finally, do you have to go to the beginning and hash everything | to get a proof of one transaction, or is there some shortcut | aggregation possible? | | We evaluated AWS QLDB for these things in our application as a | financial ledger and were impressed at their progress with a | novel data store. They invented some of the tech in house for | this product instead of grabbing an off the shelf open product. | Lockin would be a downside here. | | Immudb looks promising because it's not locked to a cloud host. | | https://aws.amazon.com/qldb/faqs/ | jaboutboul wrote: | note that it does KV and SQL | mistrial9 wrote: | > Eg time series or secondary keys | | not an "all or nothing" question.. for example, a fast-enough | "return the most recent in a time series" is not exactly | time-series, but solves many use cases | rattlesnakedave wrote: | Seems like it would still be vulnerable to rollback attacks. | Signed rows would probably get you farther with less novel tech | involved if you want immutability. | throwaway984393 wrote: | Don't forget to star this repo if you like immudb! | | I didn't realize GitHub had "Like and subscribe" culture now. : / | willcipriano wrote: | > You can add new versions of existing records, but never change | or delete records. This lets you store critical data without fear | of it being tampered. | | > immudb can be used as a key-value store or relational data | structure and supports both transactions and blobs, so there are | no limits to the use cases. | | This is game changing. Use it for say a secondary data store for | high value audit logs. I'll consider using it in the future. | voidfunc wrote: | What happens if you have some data that absolutely _must_ | change or be deleted? For example, a record gets committed with | something sensitive by mistake. | pmontra wrote: | Or customers ask their personal data to be deleted, GDPR, | right to be forgotten, etc. | | I guess we must consider what can go in an immutable storage | and what must not. | rch wrote: | You should be storing potentially GDPR-covered data | encrypted with entity specific keys, which are destroyed | when necessary. | gnufx wrote: | Right, regardless of the storage, but in the research | computing circles I see, it's just not done. The promises | of "data destruction" that get demanded are basically | accompanied by fingers crossed behind the back (is that | an international thing to "cover" for lying?) considering | the filesystem and backup mechanisms etc. | jaboutboul wrote: | There is a data expiration and also logical deletion | feature for exactly this use case. | dillondoyle wrote: | I don't totally understand the value of the second, but isn't | the first already exist in things like BigQuery? | zimpenfish wrote: | > isn't the first already exist in things like BigQuery? | | You can truncate a BQ table and reload it if you want to | change things. Had to do this at a previous gig (twice a | day!) because the data warehouse people would only take data | from BQ but the main data was in Firebase (yes, it was an | insane place.) | mathnmusic wrote: | Or a traditional database with read-only credentials and a | function that adds "ORDER BY version DESC LIMIT 1". | [deleted] | lojack wrote: | but is a traditional database cryptographically secure? if | a super user with write permissions (or, say, direct access | to the physical data store) modifies records are users able | to validate the integrity of the data? | jayd16 wrote: | You could use permissions and stored procedures that | ensure append only. | jeroiraz wrote: | The difference is that client applications does not need | to trust proper "append-only" permissions were enforced | on server side but they will be have the chance to detect | any tampering while in the former approach, it won't be | noticeable | ledgerdev wrote: | Does this have, or are there any plans for a change-feed? Has | anyone used this as an event sourcing db? | chalcolithic wrote: | >millions of transactions per second I wonder if I wanted to | survey a landscape of all databases that claim such numbers how | could I possibly find them? | furstenheim wrote: | GDPR compliance will be tricky. How does one delete data? | KarlKemp wrote: | Pruning is on the roadmap. | jquery wrote: | How is it immutable if you can prune it? | jeroiraz wrote: | several solutions may be possible. Simplest would be to | delete payloads associated to entries. While the actual | data won't be there, it will still be possible to build | cryptographic proofs. Then it's possible to prune by | physical deleting entire transaction data, which may or not | affect proof generation. However, tampering will still be | subject to detection. | jcims wrote: | Are records atomically immutable or is there a set | concept such that the lack of mutation can be verified | over a set of records? | jeroiraz wrote: | currently it's logical deletion and time-based expiration. | Actual values associated to expired entries are not fetched. | Physical deletion is already in the roadmap. | ledgerdev wrote: | My preferred method is to tokenize sensitive data before | storing in the immutable logs/database. | [deleted] | fragmede wrote: | Store the data encrypted, then delete the keys when requested. | endisneigh wrote: | This isn't really deleting it though. What happens if in the | future technology changes and current cryptography is moot? | nowherebeen wrote: | Then fire up a new database with the latest customer data | every 18 months. And completely delete the old database | once you confirm it no longer has value. | cookiengineer wrote: | Or just store the customer database in /tmp and reboot | the server every 18 months. /s | endisneigh wrote: | I thought the point of this is to have an exhaustive | record for audit purposes. | okr wrote: | You clone the database and remove/update the corresponding | lines. GDPR does not mean, you have to fix it right away, imho. | gnabgib wrote: | Article 17 does include the term "without undue delay"[0], | but such vague language seems ripe for some court precedent. | | A clone and remove/update per GDPR request seems like undue | delay, certainly one that could be avoided by alternative | architecture choices (keep the personally identifiable | information (PII) in a mutable store) | | [0]: https://gdpr-info.eu/art-17-gdpr/ | sigzero wrote: | No, it's not undue delay. That's just how it currently | works and that is a fine argument. | [deleted] | peoplefromibiza wrote: | But you have to do in a pretty short timeframe | | > _Under Article 12.3 of the GDPR, you have 30 days to | provide information on the action your organization will | decide to take on a legitimate erasure request. This | timeframe can be extended up to 60 days depending on the | complexity of the request._ | | even if they ask for more time, first communication has to | come within 30 days | 1cvmask wrote: | Words like immutable make me allergic | abc_lisper wrote: | See a doctor then . It isn't expected. Could be a lack of CS | education, in which case, read some books. If that doesn't fix | it, see a psychiatrist - something could be wrong with your | brain. ___________________________________________________________________ (page generated 2021-12-27 23:00 UTC)