[HN Gopher] immudb - world's fastest immutable database, built o...
       ___________________________________________________________________
        
       immudb - world's fastest immutable database, built on a zero trust
       model
        
       Author : dragonsh
       Score  : 124 points
       Date   : 2021-12-27 15:00 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | timdaub wrote:
       | I went on their website and tried to understand how immutability
       | is enforced but I couldn't find anything.
       | 
       | I'm sceptical, but particularly because they make a deliberate
       | comparison to blockchain that I doubt they'll be able to deliver.
       | 
       | The PoW immutability of e.g. BTC and ETH is strong as it yields
       | the following guarantees for stored data:
       | 
       | - Immutability of the BTC blockchain is protected through all
       | cummulative work that has happened on a specific branch of the
       | chain. Even if someone replayed BTC, it'd take millenias to
       | recompute the work on an average machine
       | 
       | - The immutability isn't enforced on a file level, as I suspect
       | it is with immudb. Immutability is enforced through the network
       | that has additionally shown to have conservative political views
       | too. You can go, sync a BTC node and change the underlying level
       | db. Still that won't change the network state. Immutability on a
       | single system is physically impossible if e.g you consider
       | deleting the file as mutation.
       | 
       | - immudb says "it's immutable like a blockchain but less
       | complicated", but Bitcoin isn't more complicated than some
       | sophisticated enterprise db solution.
       | 
       | - I think immudb should be maximally upfront what they mean by
       | immutability: It seems they want to communicate that they're
       | doing event sourcing - that's different from immutability
       | 
       | Finally there's a rather esotheric argument. If you run an
       | immutable database as an organizatiom where one individual node
       | cannot alter the network state but you have (in)direct control
       | over all nodes: Isn't it always mutable as you could e.g. choose
       | to swap out consensus?
       | 
       | So from a philosophical perspective, then immutability can truly
       | only occur if mutability is out of an individual's control.
       | 
       | Why do I have the authority to say this? Because I too have once
       | worked for a database with blockchain characteristics called
       | https://www.bigchaindb.com
       | 
       | Edit: The best solution that also has a theoretically unlimited
       | throughput is this toy project:
       | https://github.com/hoytech/quadrable
       | 
       | Conceptually, it computes a merkle tree over all data and
       | regularly commits to Ethereum. Through this commitment the data
       | may still change locally: But then at least would be provably
       | tampered. So I guess for databases, the artibute we can really
       | implement is "tamper-proof".
        
         | jandrese wrote:
         | The big question is if someone gets on your DB server and wants
         | to change a record how does the software prevent them from
         | altering a record and then recomputing the remainder of the
         | chain?
        
         | layer8 wrote:
         | I'd say the attribute is "tamper-proof history", not "tamper-
         | proof data (current content)".
        
       | YogurtFiend wrote:
       | I'm not sure that this is a _useful_ tool. Let's talk about the
       | threat model or the attacks that this defends against.
       | 
       | If a Client is malicious, they might try to manipulate the data
       | in the database in an untoward way. In a "normal" database, this
       | might cause data loss, if the database isn't being continuously
       | backed up. But immudb does continuous backups (effectively, since
       | it's immutable) so, if a malicious client has been detected, it's
       | possible to restore an older version of the database. The real
       | problem is how would you know that a client has tampered with
       | your database? Well, because this database is "tamper-proof,"
       | duh! But the issue lies in the definition of tamper-proof. From
       | my reading of the source code and documentation, the "proof that
       | no tampering has occurred" is a proof that the current state of
       | the database can be reached by applying some database operations
       | to a previous state. As a result, a malicious client could simply
       | ask the database to "delete everything and insert this new data,"
       | to make the database look like whatever it wanted. This is a
       | valid way to transition the state of the database from its old
       | state to the new state, and so shouldn't be rejected by the
       | tamper detection mechanism.
       | 
       | "Ah," but you say, "it would look super sus [as the kids say] to
       | just delete the entire database. We'd know that something was
       | up!" The problem with this solution is how are you going to
       | automate "looking super sus?" You could enact a policy to flag
       | any update that updates more than N records at a time, but that's
       | not really a solution. The "right" solution is to trace the
       | provenance of database updates. Rather than allowing arbitrary
       | database updates, you want to allow your database to be changed
       | only by updates that are sensible for your application. The
       | _actual_ statement you want to prove is that "the current state
       | of the database is a known past state of the database updated by
       | operations that my application ought to have issued." Of course
       | what are "operations that my application ought to have issued?"
       | Well, it depends how deep you want to go with your threat model.
       | A simple thing you could do is have a list of all the queries
       | that your application issues, and check to make sure all
       | operations come from that list. This still allows other attacks
       | through, and you could go even more in depth if you wanted to.
       | 
       | Importantly, immudb doesn't appear to contend with any of this.
       | They claim that their database is "tamper-proof," when in reality
       | you'd need a complicated external auditing system to make it
       | meaningfully tamper-proof for your application. (Again, a threat
       | model ought to include a precise definition of "tamper-proof,"
       | which would help clear up these issues.)
       | 
       | It's also worth comparing this to
       | https://en.wikipedia.org/wiki/Certificate_Transparency, which is
       | an append-only database. Compared to immudb, the _exposed data
       | model_ for certificate transparency logs is an append-only set,
       | which means that it doesn't have any of these same problems. The
       | problem with immudb is that the data model it exposes is more
       | complicated, but it's built-in verification tools haven't been
       | upgraded to match.
       | 
       | (Also, for context, I've tried to obtain a copy of their white
       | paper, but after an hour the email with the link to it never
       | arrived.)
        
         | layer8 wrote:
         | Regarding backups, note that you still need separate backups
         | with immudb.
        
       | gigatexal wrote:
       | So is this a useful alternative to blockchains or just hype?
        
       | newtonapple wrote:
       | Has anyone tried immudb in production? What are some of immudb's
       | performance characteristics? It'd be nice to know how it performs
       | under various conditions: query per sec, database / table sizes,
       | SQL join performance etc.
       | 
       | Also, what are the system requirements for immudb? What kind of
       | machine would I need to run a medium to large website (say, 1TB
       | of data, 5-25K qps, e.g. Wikipedia)?
       | 
       | It mentioned in the documentation that it can use S3 as its
       | storage? Are there performance implications if you do this?
        
       | tarr11 wrote:
       | Previous HN thread about immutable databases:
       | 
       | https://news.ycombinator.com/item?id=23290769
        
       | artemonster wrote:
       | Can someone ELI5 how immutability applies to databases and which
       | advantages it brings. Thank you!
        
         | gopalv wrote:
         | > immutability ... which advantages it brings
         | 
         | Immutability brings a bunch of perf short-cuts which is usually
         | impossible to build with a mutable store.
         | 
         | You'll find a lot of metric stores optimized for fast ingest to
         | take advantage of the immutability as a core assumption, though
         | they don't tend to do what immudb does with the cryptographic
         | signatures to check for tampering.
         | 
         | Look at GE Historian or Apache Druid for most of what I'm
         | talking about here.
         | 
         | You can build out a tiered storage system which pushes the data
         | to a remote cold store and keep only immediate writes or recent
         | reads locally.
         | 
         | You can run a filter condition once on an immutable
         | block/tablet, then never run it again (like a count(*) where
         | rpm > X and plane_id = ?) can be remembered as compressed
         | bitsets of each column, rather than as final row selection
         | masks. Then reuse half of that when you change the plane_id = ?
         | parameter.
         | 
         | The fact that the data will never be updated makes it
         | incredibly fast to query as you stream more data constantly
         | while refreshing the exact same dashboard every 3 seconds for a
         | monitoring screen - every 3s, it will only actually process the
         | data that arrived in those 3 seconds, not repeat the query over
         | the last 24h all over again.
         | 
         | The moment you allow even a DELETE operation, all of this
         | becomes a complex mess of figuring out how to adjust for
         | changes (you can invalidate the bit-vectors of the updated cols
         | etc, but it is harder).
        
           | jandrese wrote:
           | If the data is being added or updated continually how do you
           | prevent the database from growing without bound?
        
             | mjh2539 wrote:
             | You don't. You just keep throwing disks at it.
        
         | throwaway984393 wrote:
         | Immutability is probably the most powerful concept that applies
         | to how modern technology can be used. Versioned, immutable, and
         | cryptographically-signed artifacts do a bunch of things for
         | you.
         | 
         | From an operational standpoint, it allows you to roll out a
         | change in exactly the way you tested, confident that it will
         | work the way it's intended. It also allows you to roll back or
         | forward to any change with the same confidence. It also means
         | you can restore a database _immediately_ to the last known good
         | state. Changes essentially cannot fail; no monkey-patching a
         | schema or dataset, no  "migrations" that have to be
         | meticulously prepared and tested to make sure they won't
         | accidentally break in production.
         | 
         | From a security and auditing standpoint, it ensures that a
         | change is exactly what it's supposed to be. No random changes
         | by who-knows-who at who-knows-when. You see a reliable history
         | of all changes.
         | 
         | From a development standpoint, it allows you to see the full
         | history of changes and verify the source or integrity of data,
         | which is important in some fields like research.
        
           | bob1029 wrote:
           | There is also a performance advantage if you can build
           | everything under these constraints. A pointer to something
           | held in an immutable log will never become invalid or
           | otherwise point to garbage data in the future. At worst,
           | whatever is pointed to has since been updated or compensated
           | for in some _future_ transaction which is held further
           | towards the end of the log. Being able to make these
           | assumptions allows for all kinds of clever tricks.
           | 
           | The inability to mutate data pointed to in prior areas of the
           | log does come with tradeoffs regarding other performance
           | optimizations that expressly rely on mutability, but in my
           | experience constraining the application to work with an
           | immutable log (i.e. dealing with stale reads & compensating
           | transactions) usually results in substantial performance
           | uplift compared to solutions relying on mutability. One
           | recent idea that furthers this difference is NAND storage,
           | where there _may_ be a substantial cost to be paid if one
           | wants to rewrite prior blocks of data (depending on the type
           | of controller /algorithm used by the device).
        
             | fouc wrote:
             | > A pointer to something held in an immutable log will
             | never become invalid or otherwise point to garbage data in
             | the future.
             | 
             | Now I'm wondering if we can have immutable versioned APIs
        
       | pharmakom wrote:
       | Is it possible to delete data for compliance reasons? Not as a
       | frequent operation, but say on a monthly batch?
        
         | jeroiraz wrote:
         | logical deletion is in place, physical deletion is already on
         | the roadmap
        
       | cabalamat wrote:
       | Would it be possible to have something like this that works by
       | writing to a PROM? That would make it immutable at the hardware
       | level.
        
       | ShamelessC wrote:
       | > Data stored in immudb is cryptographically coherent and
       | verifiable. Unlike blockchains, immudb can handle millions of
       | transactions per second, and can be used both as a lightweight
       | service or embedded in your application as a library. immudb runs
       | everywhere, on an IoT device, your notebook, a server, on-premise
       | or in the cloud.
       | 
       | Seems pretty useful actually. Can anyone with a relevant
       | background comment on when this would be a bad idea to use?
        
         | KarlKemp wrote:
         | The data that is at risk of being changed with malicious intent
         | is certainly not insignificant, but still just a fraction of
         | all data. Changing to this adds a new and complicated system,
         | replacing whatever you're currently using, which will have seen
         | far better testing and is known by the people working with it.
        
         | staticassertion wrote:
         | If you can trust your writers there's likely no need for this.
         | A modern approach tends to have databases owned by a single
         | service, which exposes the model via RPCs. So you generally
         | don't have more than one writer, which means you're pretty much
         | de-facto "zero trust" if that single writer follows a few rules
         | (ie: mutual auth, logging, etc).
         | 
         | But in some cases you don't have that same constraint. For
         | example, databases that store logs (Elastic, Splunk, etc) might
         | have many readers and writers, including humans.
         | 
         | In that case enforced immutability might be a nice property to
         | have. Attackers who get access to your Splunk/ES cluster
         | certainly will have fun with it.
        
         | imglorp wrote:
         | There are a few properties to be aware of. Although it might be
         | a KV store, you're probably going to want sensible queries
         | using other than the primary key. Eg time series or secondary
         | keys. So in addition to the KV store, there is probably a need
         | for an external index and query mechanism. Another issue is
         | obtaining consistent hashing, where multiple documents might
         | have the same content but vary by order or by date format.
         | Finally, do you have to go to the beginning and hash everything
         | to get a proof of one transaction, or is there some shortcut
         | aggregation possible?
         | 
         | We evaluated AWS QLDB for these things in our application as a
         | financial ledger and were impressed at their progress with a
         | novel data store. They invented some of the tech in house for
         | this product instead of grabbing an off the shelf open product.
         | Lockin would be a downside here.
         | 
         | Immudb looks promising because it's not locked to a cloud host.
         | 
         | https://aws.amazon.com/qldb/faqs/
        
           | jaboutboul wrote:
           | note that it does KV and SQL
        
           | mistrial9 wrote:
           | > Eg time series or secondary keys
           | 
           | not an "all or nothing" question.. for example, a fast-enough
           | "return the most recent in a time series" is not exactly
           | time-series, but solves many use cases
        
         | rattlesnakedave wrote:
         | Seems like it would still be vulnerable to rollback attacks.
         | Signed rows would probably get you farther with less novel tech
         | involved if you want immutability.
        
       | throwaway984393 wrote:
       | Don't forget to star this repo if you like immudb!
       | 
       | I didn't realize GitHub had "Like and subscribe" culture now. : /
        
       | willcipriano wrote:
       | > You can add new versions of existing records, but never change
       | or delete records. This lets you store critical data without fear
       | of it being tampered.
       | 
       | > immudb can be used as a key-value store or relational data
       | structure and supports both transactions and blobs, so there are
       | no limits to the use cases.
       | 
       | This is game changing. Use it for say a secondary data store for
       | high value audit logs. I'll consider using it in the future.
        
         | voidfunc wrote:
         | What happens if you have some data that absolutely _must_
         | change or be deleted? For example, a record gets committed with
         | something sensitive by mistake.
        
           | pmontra wrote:
           | Or customers ask their personal data to be deleted, GDPR,
           | right to be forgotten, etc.
           | 
           | I guess we must consider what can go in an immutable storage
           | and what must not.
        
             | rch wrote:
             | You should be storing potentially GDPR-covered data
             | encrypted with entity specific keys, which are destroyed
             | when necessary.
        
               | gnufx wrote:
               | Right, regardless of the storage, but in the research
               | computing circles I see, it's just not done. The promises
               | of "data destruction" that get demanded are basically
               | accompanied by fingers crossed behind the back (is that
               | an international thing to "cover" for lying?) considering
               | the filesystem and backup mechanisms etc.
        
             | jaboutboul wrote:
             | There is a data expiration and also logical deletion
             | feature for exactly this use case.
        
         | dillondoyle wrote:
         | I don't totally understand the value of the second, but isn't
         | the first already exist in things like BigQuery?
        
           | zimpenfish wrote:
           | > isn't the first already exist in things like BigQuery?
           | 
           | You can truncate a BQ table and reload it if you want to
           | change things. Had to do this at a previous gig (twice a
           | day!) because the data warehouse people would only take data
           | from BQ but the main data was in Firebase (yes, it was an
           | insane place.)
        
           | mathnmusic wrote:
           | Or a traditional database with read-only credentials and a
           | function that adds "ORDER BY version DESC LIMIT 1".
        
             | [deleted]
        
             | lojack wrote:
             | but is a traditional database cryptographically secure? if
             | a super user with write permissions (or, say, direct access
             | to the physical data store) modifies records are users able
             | to validate the integrity of the data?
        
               | jayd16 wrote:
               | You could use permissions and stored procedures that
               | ensure append only.
        
               | jeroiraz wrote:
               | The difference is that client applications does not need
               | to trust proper "append-only" permissions were enforced
               | on server side but they will be have the chance to detect
               | any tampering while in the former approach, it won't be
               | noticeable
        
       | ledgerdev wrote:
       | Does this have, or are there any plans for a change-feed? Has
       | anyone used this as an event sourcing db?
        
       | chalcolithic wrote:
       | >millions of transactions per second I wonder if I wanted to
       | survey a landscape of all databases that claim such numbers how
       | could I possibly find them?
        
       | furstenheim wrote:
       | GDPR compliance will be tricky. How does one delete data?
        
         | KarlKemp wrote:
         | Pruning is on the roadmap.
        
           | jquery wrote:
           | How is it immutable if you can prune it?
        
             | jeroiraz wrote:
             | several solutions may be possible. Simplest would be to
             | delete payloads associated to entries. While the actual
             | data won't be there, it will still be possible to build
             | cryptographic proofs. Then it's possible to prune by
             | physical deleting entire transaction data, which may or not
             | affect proof generation. However, tampering will still be
             | subject to detection.
        
               | jcims wrote:
               | Are records atomically immutable or is there a set
               | concept such that the lack of mutation can be verified
               | over a set of records?
        
         | jeroiraz wrote:
         | currently it's logical deletion and time-based expiration.
         | Actual values associated to expired entries are not fetched.
         | Physical deletion is already in the roadmap.
        
         | ledgerdev wrote:
         | My preferred method is to tokenize sensitive data before
         | storing in the immutable logs/database.
        
         | [deleted]
        
         | fragmede wrote:
         | Store the data encrypted, then delete the keys when requested.
        
           | endisneigh wrote:
           | This isn't really deleting it though. What happens if in the
           | future technology changes and current cryptography is moot?
        
             | nowherebeen wrote:
             | Then fire up a new database with the latest customer data
             | every 18 months. And completely delete the old database
             | once you confirm it no longer has value.
        
               | cookiengineer wrote:
               | Or just store the customer database in /tmp and reboot
               | the server every 18 months. /s
        
               | endisneigh wrote:
               | I thought the point of this is to have an exhaustive
               | record for audit purposes.
        
         | okr wrote:
         | You clone the database and remove/update the corresponding
         | lines. GDPR does not mean, you have to fix it right away, imho.
        
           | gnabgib wrote:
           | Article 17 does include the term "without undue delay"[0],
           | but such vague language seems ripe for some court precedent.
           | 
           | A clone and remove/update per GDPR request seems like undue
           | delay, certainly one that could be avoided by alternative
           | architecture choices (keep the personally identifiable
           | information (PII) in a mutable store)
           | 
           | [0]: https://gdpr-info.eu/art-17-gdpr/
        
             | sigzero wrote:
             | No, it's not undue delay. That's just how it currently
             | works and that is a fine argument.
        
             | [deleted]
        
           | peoplefromibiza wrote:
           | But you have to do in a pretty short timeframe
           | 
           | > _Under Article 12.3 of the GDPR, you have 30 days to
           | provide information on the action your organization will
           | decide to take on a legitimate erasure request. This
           | timeframe can be extended up to 60 days depending on the
           | complexity of the request._
           | 
           | even if they ask for more time, first communication has to
           | come within 30 days
        
       | 1cvmask wrote:
       | Words like immutable make me allergic
        
         | abc_lisper wrote:
         | See a doctor then . It isn't expected. Could be a lack of CS
         | education, in which case, read some books. If that doesn't fix
         | it, see a psychiatrist - something could be wrong with your
         | brain.
        
       ___________________________________________________________________
       (page generated 2021-12-27 23:00 UTC)