hngopher.com

       [HN Gopher] Immudb 1.0 - open-source, immutable database with SQ...
       ___________________________________________________________________
        
       Immudb 1.0 - open-source, immutable database with SQL and verified
       timetravel
        
       Author : vchain-dz
       Score  : 247 points
       Date   : 2021-05-25 11:49 UTC (11 hours ago)
        
 (HTM) web link (www.codenotary.com)
 (TXT) w3m dump (www.codenotary.com)
        
       | nerdponx wrote:
       | So is this something I would want to use for a basic CRUD
       | application, and reap the benefits of time travel and
       | immutability?
       | 
       | Or are there downsides that would relegate it to specific use
       | cases? A what would those use cases be?
        
         | brokencube wrote:
         | It wouldn't be suitable for any application where you care
         | about GDPR (i.e. you store personal information and have users
         | in the EU)
         | 
         | The "right to be forgotten" is not compatible with immutable
         | data. You can't simply need to mark data as deleted, you need
         | to 'purge' it from your system (and possible backups, depending
         | on how long you keep historic backups) - that isn't possible in
         | a system with immutable data.
        
           | blablabla123 wrote:
           | I mean there are solutions for this. About CQRS/Event
           | sourcing I've read that it's possible to solve it by
           | encrypting the data with different keys and then
           | rotating/throwing away the keys every now and then. Seems a
           | bit hacky but probably there are more elegant approaches.
        
       | LukeEF wrote:
       | There seems to be a growth in the number of time traveling
       | immutable-first databases available. We have OpenCrux, Datomic,
       | TerminusDB, Noms, Dolt, and now Immudb. Three using datalog for
       | query and two forcing SQL (not sure about Noms).
       | 
       | What sort of use cases are most common? GitHub repository says:
       | 
       | > Companies use immudb to protect credit card transactions and to
       | secure processes by storing digital certificates and checksums.
       | 
       | But I am not sure how people are building that into their
       | architecture to be honest.
        
         | clusterhacks wrote:
         | I have used "immutable" schema designs when there were strong
         | requirements for full audit needs over time. It works very well
         | even in a normal RDBMs system. It also allows some very neat
         | reporting e.g. compare the same report at different points in
         | time.
         | 
         | The basic idea was that every operation (create, update,
         | delete) are actually normal SQL inserts and all reads are
         | against views defined such that the most recent tuples are
         | returned unless they are flagged as "deleted."
         | 
         | I have typically used these types of designs in mostly simple
         | applications with tables where the row counts are in the low
         | millions of tuples. Dealing with this design in the billions of
         | tuples (probably sharded somehow) might have motivated us out
         | of normal RDBMs and into one of the specialized immutable DBs
         | mentioned.
        
           | ak39 wrote:
           | Thanks. Can you comment on how this differs from a "mutable"
           | RDBMS model but one with automatic history based on triggers
           | for example?
        
       | gregwebs wrote:
       | > Data stored in immudb is cryptographically coherent and
       | verifiable, just like blockchains, but without all the
       | complexity. Unlike blockchains, immudb can handle millions of
       | transactions per second, and can be used both as a lightweight
       | service or embedded in your application as a library.
       | 
       | > Companies use immudb to protect credit card transactions and to
       | secure processes by storing digital certificates and checksums.
       | 
       | This explanation is available on their github repo [1]. It has
       | been a common refrain on Hacker News that you don't need a
       | blockchain and instead can just use a database, but this product
       | may actually fill the gap where tamper resistance is desired.
       | 
       | [1] https://github.com/codenotary/immudb
        
         | simtel20 wrote:
         | Yeah, this interests me because I'm thinking about how to use
         | grafeas - it's role is critical for reliable software
         | development going forward - but storing it's data in a backend
         | like this would add one more layer of trust and verifiability
         | to a software supply chain. There are some interesting
         | possibilities with making e.g. public software repos' metadata
         | clonable verifiable and queryable via local immutable copies.
        
           | decodebytes wrote:
           | Maybe take a look at rekor, part of the sigstore project,
           | it's built specifically for software supply chain
           | transparency (disclaimer I am one of the community):
           | 
           | https://github.com/sigstore/rekor
        
         | c01n wrote:
         | Does immudb offer mechanisms for distributed consensus, because
         | that is one of the top features in blockchains, they do this
         | while remaining P2P.
        
           | jeroiraz wrote:
           | the order of changes is not subject to consensus, but clients
           | have the tools to ensure no history rewrite happened
        
             | judge2020 wrote:
             | sounds like git :)
        
               | stingraycharles wrote:
               | I think both blockchains and git are based on the concept
               | of merkle trees, so that sounds about right.
               | 
               | https://en.m.wikipedia.org/wiki/Merkle_tree
        
             | c01n wrote:
             | Can Immudb work in a decentralized network while remaining
             | secure from attacks in such networks or is Immudb meant for
             | centralized systems if so I think you cannot compare it to
             | Blockchains. Maybe a better comparison is Git.
        
               | jeroiraz wrote:
               | immudb is not meant for public decentralized networks,
               | although it might be possible to use embedded immudb to
               | build a public blockchain... but that's a different
               | story. immudb server is tailored to provide a database
               | where any tampering will be subject to detection by any
               | single client application consuming its data.
        
         | decodebytes wrote:
         | Maybe take a look at rekor, part of the sigstore project, it's
         | built specifically for software supply chain transparency
         | (disclaimer I am one of the community). Being a transparency
         | log, you get much better guarantees around inclusion proof (it
         | uses a merkle tree):
         | 
         | https://github.com/sigstore/rekor
        
         | cryptonector wrote:
         | It sounds a lot like ZFS.
         | 
         | What I really want is a way to get a hash of a root node /
         | snapshot.
        
         | simias wrote:
         | Was there a gap in the first place? We could design temper-
         | proof data storage since way before the blockchain. All you
         | need is checksums, public key cryptography and a way to publish
         | your signed checksums.
         | 
         | I'm not saying that this isn't a good project but it's a bit
         | strange to frame it as if it was a major technical
         | breakthrough.
         | 
         | If anything what catches my eye in this announcement is the
         | "time travel" feature as well as the wire protocol
         | compatibility with Postgres, that's pretty cool.
        
         | capableweb wrote:
         | I don't quite understand how something you run yourself on your
         | own hardware can be tamper-proof (digitally, not physically).
         | If you're running the software you can modify it, so no matter
         | how many processes there are in place for resisting mutability,
         | you'll always be able to find some way to mutate it.
         | 
         | Compared to blockchain which is running on X number of nodes
         | that you'd have to have access to in order to modify something,
         | immudb doesn't actually seem to replace the use case when you
         | need something actually tamper-proof.
        
           | jeroiraz wrote:
           | the entire state of the database gets captured by a hash
           | value. By having light-weight clients (or auditors) keeping
           | track of it is how tampering is detected in despite of where
           | the database server is running
        
             | exfalso wrote:
             | This is insufficient. The strongest guarantee you can get
             | without consensus is that the state of the DB you see on
             | the client is/was a correct state at some point, it doesn't
             | provide freshness/rollback attack prevention, aka that the
             | state you see is in fact the latest one.
             | 
             | Keeping track of the "HEAD" hash on the clients _is_ what
             | consensus protocols achieve. You can also achieve it with
             | trusted counters like the one SGX provides (depends on
             | Intel ME so not exactly recommended, also most probably
             | switched off in cloud environments). Alternative is an
             | implementation of something like
             | https://dl.acm.org/doi/10.5555/3241189.3241289.
             | 
             | You can of course say that it's the clients' responsibility
             | to do this, but in practice they won't and they'll
             | implicitly trust the server state.
             | 
             | Having said this, the project does look promising, we may
             | end up using it in a confidential compute setting where
             | clients can verify the server code running, and we'll add
             | rollback protection on top
        
               | toolslive wrote:
               | > aka that the state you see is in fact the latest one.
               | 
               | This is an impossible guarantee. Suppose the state that
               | is sent to you from the server needs some time to get to
               | you. meanwhile the state on the server could have
               | changed. You don't even need a remote server to have this
               | issue. Your thread (where you see the latest state) is
               | put to sleep for a while (sheduler, os, ...) It wakes up.
               | Is the state it observes still the latest? That's
               | impossible to know. The only thing you can do is to
               | refuse future updates if the state they were built upon
               | is not the current state of the database.
        
             | capableweb wrote:
             | I see. It's a blockchain without calling it a blockchain,
             | so people who hate blockchain can use it without having to
             | realize they use a blockchain.
        
               | ForHackernews wrote:
               | It's only the actually-useful bits of a "blockchain"
               | without the planet-cooking proof-of-waste consensus
               | algorithm brute-forcing sha256 over and over again.
        
               | f38zf5vdt wrote:
               | And with git being the most superior blockchain of them
               | all.
        
               | jacquesm wrote:
               | Blockchain is just a special case of Merkle trees, there
               | isn't anything original about them other than that
               | Bitcoin served as a marketing engine for the term
               | blockchain because some people made a ton of money with
               | it.
               | 
               | https://en.wikipedia.org/wiki/Merkle_tree
        
               | [deleted]
        
               | joshuak wrote:
               | No block chains are different than Merkle trees entirely.
               | Block chains include previous hashes in each block,
               | whereas Merkle trees, as the name implies are trees of
               | block hashes. In Merkle trees blocks do not include the
               | previous block's hash.
        
               | decodebytes wrote:
               | Rekor is just that. It's a merkle tree implementation
               | (with extras such as timestamping)
               | 
               | https://github.com/sigstore/rekor
        
             | [deleted]
        
           | foepys wrote:
           | Blockchains like Bitcoin are actually not tamper-proof. They
           | can be attacked by 51% attacks where you can even rewrite
           | history if you have enough hashpower. The protocol is
           | _explicitly_ designed to always follow the longest chain thus
           | the only defense is to hash faster than the attackers. This
           | might vary for other blockchains but the biggest and most
           | mentioned is particularly unsafe in that regard.
        
             | kdragon wrote:
             | You can't rewrite the history of blocks that have already
             | been distributed. You may fool SPV nodes but any node with
             | a copy of the blockchain (even if pruned) will reject your
             | version.
        
             | [deleted]
        
             | PhilippGille wrote:
             | There are wo things you skip over:
             | 
             | 1. You can't rewrite everything. Given enough hash power to
             | create a longer chain, you can create a block that a)
             | removes any transactions from a block in the original chain
             | and b) contains new valid transactions (must be signed by
             | you so you must be the owner of the Bitcoins used in the
             | tx), allowing you to double spend your coins, but you can't
             | change other people's transactions.
             | 
             | 2. With each new block changing a past one becomes harder,
             | while you make it sound as if you could arbitrarily rewrite
             | history. Merchants usually wait several blocks before
             | accepting your on-chain payment. Exchanges wait 6 blocks as
             | that's seen as infeasible to change a block that's buried
             | under 5 other blocks for non-nation state actors.
             | 
             | TL;DR: 1) Other people's transactions can at most be
             | removed but not be changed and 2) data on the Bitcoin
             | blockchain is tamper proof after x blocks.
        
           | rcoder wrote:
           | You can build a Merkle tree from any data that is append-
           | only; Git does this, as do ZFS and Dat/Hypercore. That lets
           | you make strong assertions about data integrity, even without
           | blocking local writes.
           | 
           | Now add a mirror: git upstream, FS snapshot, immudb replica,
           | etc...or even just an outside log of the merkle proofs
           | themselves. Then, if your database ever fails a check against
           | that proof, you know the data has been modified, not just
           | appended to.
           | 
           | To use a familiar Git workflow example: you can do whatever
           | local writes you want, but if you disallow force pushes no
           | one can erase history on the upstream repo.
           | 
           | Put another way: if you have immutable backups you don't need
           | a blockchain to ensure data integrity. OTOH, if you can't
           | trust your own infrastructure even as far as a secure remote
           | backup you have other problems that a blockchain won't solve
           | either.
        
           | rhacker wrote:
           | you can do something quite simple like posting a tweet or
           | inserting something into a public chain, like Etherium. Then
           | follow that back to the private immudb hash.
        
           | theamk wrote:
           | You can still have the extra verifier nodes, but those don't
           | have to be on the critical read/write path.
           | 
           | Presumably you can create a config where have your "main"
           | beefy server where all the activity is -- which is backed up,
           | redundant, etc... And a bunch of "client" servers, which just
           | pull and verify the data all the time. And the client servers
           | notify if there are any errors using some out-of-band
           | channel, probably using the same system you use for general
           | server health monitoring.
           | 
           | So you are getting same security guarantees as "private
           | blockchain", but with drastically higher performance, and
           | only needing one beefy server. And the downside is that you
           | won't auto-stop all operations on tampering, you'll only get
           | an alert for it.
        
           | akiselev wrote:
           | _> immudb doesn 't actually seem to replace the use case when
           | you need something actually tamper-proof._
           | 
           | I think that's an unrealistic requirement. There's tamper-
           | evident and tamper-resistant but AFAIK _nothing_ is tamper
           | proof. Best you can do is an HSM with a tamper resistant HMAC
           | with keys and a running checksum in unrecoverable ROM coupled
           | to the packaging.
        
             | capableweb wrote:
             | > nothing is tamper proof
             | 
             | I beg the differ.
             | 
             | If I place a signed message in the Bitcoin chain, can you
             | then modify that message?
             | 
             | If you can prove that you can somehow modify the message,
             | I'll give you $1,000,000 USD tomorrow.
        
               | akiselev wrote:
               | That's "tamper proof" in the colloquial sense. As a term
               | of art, it means something very specific. For example,
               | see FIPS 140-2/3 [1]
               | 
               | It makes no sense to say that the blockchain is tamper
               | proof because the blockchain is just information. Tamper
               | "proofness"/resistance is first a property of the devices
               | storing the information - once you get into custody
               | chains, provenance documents, etc. that's when a system
               | becomes tamper resistant. At best the blockchain as a
               | system is "tamper evident" in the colloquial sense
               | because the network of all the other nodes decides which
               | bits of information form the "real" blockchain. However,
               | without verifying the (physical) identity and data
               | integrity of the devices that run (at least?) 50%+1 of
               | the nodes, you have no idea whether the system has been
               | tampered with.
               | 
               | [1] https://en.wikipedia.org/wiki/FIPS_140
        
               | k_ wrote:
               | Not after you post it, but by infecting your device
               | before you make that message, and tampering when you
               | place it in the Bitcoin chain
        
         | [deleted]
        
         | codetrotter wrote:
         | > this product may actually fill the gap where tamper
         | resistance is desired
         | 
         | I think in the future, all enterprise storage solutions will be
         | append-only by default. To protect against cryptolocker
         | malware. But also with isolated functionality for actually
         | deleting data, for example because of GDPR requests or because
         | of malware that tries to fill all writable storage with
         | garbage. So that data can still be deleted, but not from any of
         | the regular servers that are reading and appending data to the
         | system. Instead from separate servers that are isolated and for
         | data storage management only.
        
         | ampdepolymerase wrote:
         | How does this compare feature wise to
         | https://aws.amazon.com/qldb/
        
           | jeroiraz wrote:
           | there are many differences (as immudb contributor): - immudb
           | can be used embedded or client-server database while qldb is
           | a aws service - immudb behaves as a key-value store but also
           | provides SQL support while qldb provides a document-like data
           | model with PartiQL language - immudb provides time travel
           | features - immudb it's faster, built-in with a mode of
           | operation designed for fast writes which works with eventual
           | indexing.
           | 
           | Finally but super important, immudb can be deployed anywhere
           | and it's open source!
        
             | giaour wrote:
             | QLDB provides time travel features, too (if by "time
             | travel" you mean being able to query the state of a record
             | at an arbitrary point in the past): https://docs.aws.amazon
             | .com/qldb/latest/developerguide/worki...
        
               | jeroiraz wrote:
               | immudb already included history support for key-value
               | entries in previous releases. But since v1.0.0, immudb
               | provides query resolution at a given point, using the
               | current data on that specific moment but also being able
               | to combine data at different points in time on the same
               | query. Is not clear to me if it's something that can be
               | achieved with "SELECT * FROM history", it requires up
               | most one result per different entry (the most recent one)
        
               | giaour wrote:
               | QLDB is a document DB, so you are limited to a single
               | point or range per query. Also keep in mind `history` in
               | QLDB is a function, not just a store of previous values;
               | given a table "foo" and a key "bar", getting its
               | immutable state from last Tuesday at 4 PM EDT would be:
               | 
               | SELECT * FROM history('foo', `2021-05-18T20:00:00`,
               | `2021-05-18T20:00:00`) as t WHERE t.metadata.id = 'bar';
        
               | jeroiraz wrote:
               | temporal features provided in immudb allows query (and
               | subquery) resolution based on older states of the
               | database. So for instance, it can be thought as
               | retrieving the documents on its current state in a given
               | time range. Querying the history of changes of a given
               | key or document is slightly different and it's also
               | covered with history operation in immudb.
        
               | giaour wrote:
               | Ok, that sounds extremely similar to the history function
               | in QLDB.
               | 
               | In the examples shown in the AWS docs, the results of a
               | historical query are not changes made to the document,
               | but the fully resolved state of a document at the
               | requested timestamp (or within the timestamp range). Like
               | other threads on this page mention, this is an unusual
               | but not uncommon DB feature these days.
        
       | deknos wrote:
       | this is hugely interesting, i have to look into this, but... for
       | dev/test environments, can i have a "unverified" version, where
       | clients reget/reset the state?
        
       | supergirl wrote:
       | not exactly immutable is it? their docs say you can do UPSERT for
       | example. the key is that once you update something, the clients
       | can check using crypto that something was changed. you can't do
       | this in regular databases.
        
         | dmacvicar wrote:
         | Immutable in the sense that the old value is preserved, even if
         | you update it, and you can't change the history (tamper-
         | evident).
        
       | boshomi wrote:
       | GDPR requires to erease user date if users withdraw their consent
       | or their data are no longer required for purpose which you
       | originally collected or processed it for.
       | 
       | Therefore, you must carefully check that no personal data is
       | stored in immutable databases.
        
       | endisneigh wrote:
       | How is this any different than taking every mutation, signing it
       | using whatever signing mechanism you'd like and adding a column,
       | in addition to the ones you'd like with the hash.
       | 
       | Then, if anything changes you know it's been mutated because the
       | computed signature has changed.
        
         | jeroiraz wrote:
         | In some way, it's basically that but on steroids... Note that
         | if the signature includes the previous one then you are
         | protecting the history of changes. However, this simple
         | approach may not scale when dealing with considerable amount of
         | data, proving some older entry was not tampered may require to
         | validate all signatures from that point up to the latest one.
         | immudb employs hash trees to optimise these proofs.
        
         | ianpurton wrote:
         | Your solution wouldn't handle the case of row deletion.
         | 
         | It's a little harder than you might think to make a database
         | with tamper resistance.
        
           | hypertele-Xii wrote:
           | According to its own description, this database does not
           | support deletion at all.
           | 
           | "You can [...] never change or delete records."
        
           | endisneigh wrote:
           | Oh I'm sure - but without delving into philosophy, how would
           | you know that something was deleted and tampered with vs.
           | Immudb (for example) being compromised and turns out it's
           | possible to delete something without you knowing vs. it never
           | existed to begin with?
           | 
           | In my mind the only way to guarantee is to maintain a copy
           | yourself and check against the "original", but if you're
           | going to do that, then what I described is sufficient, no?
           | 
           | I only mention this because the project mentions that the
           | history is protected by clients, which I imagine is similar
           | to what I'm describing, e.g. copying and checking against the
           | original.
        
             | ianpurton wrote:
             | > In my mind the only way to guarantee is to maintain a
             | copy yourself and check against the "original", but if
             | you're going to do that, then what I described is
             | sufficient, no?
             | 
             | The attacker in that case could update your copy. But you
             | have somewhat started to fix the issue.
             | 
             | To cover the case where a bad admin has access to the DB
             | and any copies, you need to send a hash every so often to
             | an outside source. In this case they use clients (I'm not
             | sure exactly how they do this).
             | 
             | In fact you need a list of hashes one for every 100 rows
             | for example. Re-generated the hashes and checking against
             | an external source should detect a tamper.
             | 
             | In the case of Bitcoin (which is extremely tamper
             | resistant) every node operator is a validator. The hashes
             | are stored in a merkle tree.
        
       | foobarbazetc wrote:
       | Definitely not the first database to allow time travel, TM or
       | not.
        
         | slver wrote:
         | I think it's the first to allow it with TM.
        
       | alrs wrote:
       | > For any question contact us on Discord.
       | 
       | Hard no.
        
       | endymi0n wrote:
       | There, I did it for you in PostgreSQL: ALTER TABLE table_name SET
       | (autovacuum_enabled = false);
       | 
       | Snark aside, it's still not 100% clear what's the upside of using
       | a completely different database, just for that use case.
        
         | _bohm wrote:
         | Huh? Dead tuples are not queryable in Postgres.
        
       | anentropic wrote:
       | > immudb is the first database which allows you to do queries
       | across time.
       | 
       | I don't think it is
       | 
       | e.g. Datomic already had this for a long time, no?
        
         | dspillett wrote:
         | Several databases (MS SQL Server, MariaDB, Postgres with
         | appropriate extension) support system versioned temporal tables
         | (added in the SQL2011 standard, though I don't know if any DB
         | entirely follows the standard) which I'm pretty sure counts as
         | "queries across time".
         | 
         | Maybe they are claiming to be the first with it built-in as a
         | core part of the engine that it is specifically optimised for,
         | but even that might not be true.
        
           | refset wrote:
           | > even that might not be true
           | 
           | It's not. For example, see SAP HANA's "Timeline Index"
           | https://websci.informatik.uni-
           | freiburg.de/publications/sigmo...
        
         | [deleted]
        
         | foobarbazetc wrote:
         | Yeah, it's not. Which makes the rest suspect.
        
         | branko_d wrote:
         | Oracle has had flashback queries for a long time.
         | 
         | Though this does not do what immudb claims:
         | 
         | > _immudb is the first database to provide tamper-evident data
         | management, immutable history and client-cryptographic proof._
         | 
         | And:
         | 
         | > _Clients do not need to trust the server and every new client
         | adds trust to the deployment_
        
         | waheoo wrote:
         | Yes. https://youtube.com/watch?v=Cym4TZwTCNU
        
         | endymi0n wrote:
         | Datomic does, so does Oracle, Snowflake and BigQuery.
        
           | CharlesW wrote:
           | Teradata Vantage, too.
        
         | cbsmith wrote:
         | Yeah, that line was a real head scratcher. I think someone in
         | the marketing department got a bit ahead of their reality.
        
           | foobarbazetc wrote:
           | It's not marketing as much as it is "try to get a patent for
           | something that's been done for decades by doing it slightly
           | differently".
        
         | [deleted]
        
         | chatmasta wrote:
         | We're building something similar to this at Splitgraph, at
         | least in the sense that we have immutable data in a Postgres-
         | compatible DB with point-in-time queries across versioned,
         | addressable snapshots. In our case, we apply the idea of
         | immutability to "data images" that are analogous to Docker
         | images. You build and push them in the same way, and then you
         | can reference any "image" (version) [0] of data by addressing
         | it with the correct tag.
         | 
         | For example, here is a link to a live query on our Data
         | Delivery Network (DDN) that runs a JOIN on two daily snapshots
         | (20200809 and 20200810). [1] In this case, these images are the
         | result of a daily script that builds and pushes a new image
         | each day. The storage costs are minimal, as each new image only
         | needs to store the changed rows, rather than a duplicative
         | snapshot.
         | 
         | Each immutable image is comprised of a set of small content-
         | addressable cstore fragments uploaded to object storage, which
         | we only load into the database when they become necessary to
         | satisfy a query. When a query arrives at the DDN, we intercept
         | it at the network level by scripting PgBouncer with embedded
         | Python to orchestrate the infrastructure required to answer the
         | query. The embedded code parses the AST of the query for table
         | references, which it uses to "mount" a temporary schema for
         | serving the query. The temporary schema includes an FDW that
         | implements a "layered querying" protocol (think AUFS) to lazily
         | download only the fragments required to satisfy the query.
         | 
         | (Also, we support live data. But that's for another time!)
         | 
         | [0] https://www.splitgraph.com/docs/concepts/images
         | 
         | [1]
         | https://www.splitgraph.com/workspace/ddn?layout=hsplit&query...
        
         | ignoramous wrote:
         | Doesn't Bigtable, according to the 2006 paper, allow for this
         | too?
         | 
         | > _Each cell in a Bigtable can contain multiple versions of the
         | same data; these versions are indexed by timestamp. Bigtable
         | timestamps are 64-bit integers. They can be assigned by
         | Bigtable, in which case they represent realtime in
         | microseconds..._
         | 
         | https://research.google/pubs/pub27898.pdf
        
       | parentheses wrote:
       | Seems like a database that stores content hashes. Very cool but
       | what makes it better than simply adding a table to my database
       | (or a DB specifically for this) and running `insert into
       | content_hashes...`?
       | 
       | The above approach also allows me to choose any database because
       | I can model this data however I want.
        
         | jeroiraz wrote:
         | immudb can hold the actual data. An equivalent approach using
         | an existent database without this features will involve
         | creating a cryptographic data structure which captures not only
         | individual content but the entire history of changes. Also
         | having the functionality to construct and verify the
         | cryptographic proofs to validate read data
        
       | hutrdvnj wrote:
       | What happens if you have to delete some data e.g. due to law?
        
         | jacquesm wrote:
         | You have several options here:
         | 
         | - store the data encrypted using a secondary protocol, lose the
         | key
         | 
         | - rewrite the whole db
         | 
         | If either of these is not feasible then you should have thought
         | longer about what tech is suitable for which application.
         | Operating your company in a legal manner is a pretty strong
         | factor when making such choices.
        
           | remram wrote:
           | Is losing the key sufficient to comply with the law? "We
           | didn't actually delete anything but I promise I don't
           | remember how to decrypt it" would be acceptable for the court
           | to not e.g. seize your drives?
        
             | speed_spread wrote:
             | It's the same as "we actually deleted the data and I
             | promise we didn't keep any backup copies", except it's
             | probably even easier to enforce, since you already to have
             | to secure the key instead of the whole database.
        
             | imhoguy wrote:
             | IANAL With GDPR right to forget you need to get rid of any
             | identifable subject information. If you can't tell a
             | subject from data then you comply. Encrypted data without a
             | key is just a noise.
             | 
             | You are allowed to keep aggregations and hashes of data.
             | These shouldn't allow to identify a subject. E.g. you can
             | keep list of banned emails as MD5s to verify on sign up
             | etc.
        
               | remram wrote:
               | In this situation though, any client who still knows the
               | key can access the data, since there is no way to remove
               | data from the database server, or make it unavailable at
               | the server level.
               | 
               | Assuming the clients and server are operated by different
               | entities (otherwise the immutability and verifiability
               | are not that interesting), if someone comes to the server
               | operator with a court order and ask that data be removed,
               | it seems like there is nothing they can do.
        
               | setr wrote:
               | You can't do much of anything if you've already given
               | away the information in question -- the same is true if
               | someone copied the data itself.
               | 
               | You have to not give away the key in the first place, at
               | least not to any clients that you don't own.
               | 
               | E.g. following the rule "any problem can be solved with a
               | level of indirection", external clients get some Auth key
               | A, which they feed to internal client, who internally
               | maps it to some data key B, and decrypts the data and
               | hands it back to the external client.
               | 
               | When the data is removed, you delete the mapping from
               | your internal client.
        
           | hutrdvnj wrote:
           | > store the data encrypted using a secondary protocol, lose
           | the key
           | 
           | Thing is that you have to do this upfront. I think it's very
           | possible to get into a situation where the data you have to
           | delete is in plaintext. Dropping the whole DB and recreate it
           | from scratch is a bit hefty.
        
       | cyberge99 wrote:
       | I love what you've done. I think you may have an issue with the
       | TimeTravel trademark however. Snowflake uses it in your exact
       | market segment (not to mention where else it may be used in a
       | similar context). Good stuff though, I'll be checking it out.
        
       | artemonster wrote:
       | Can someone ELI5 what is an "immutable database"? If you can add
       | to the table, that means mutation, right? I am missing
       | something...
        
         | f38zf5vdt wrote:
         | SQL system versioned tables but with git hash tree versioning
         | for every mutable command.
        
         | goto11 wrote:
         | It basically means "append only". You can add new data to the
         | database, but you can't change or delete existing data.
        
         | dspillett wrote:
         | _> immudb is the first database to provide tamper-evident data
         | management, immutable history and client-cryptographic proof.
         | Every change is preserved and can 't be changed without clients
         | noticing._
         | 
         | Sounds like they are recording all changes (like SQL2011's
         | system versioned tables, as implemented more-or-less by several
         | common DB engines) but with some sort of hash-chain ledger so
         | that history can be verified and therefore any tampering
         | detected.
         | 
         |  _> If you can add to the table, that means mutation, right?_
         | 
         | It isn't keeping the current view of the data immutable, but is
         | keeping an immutable history of the data. It is immutable in
         | the sense that nothing written to it is ever lost, and you can
         | use the "time-travel" query functions (like SELECT stuff FROM
         | atable FOR SYSTEM_TIME AS OF '2021-03-05') to retrieve it even
         | if it looks to have been completely mangled or deleted if you
         | use a non-time-travelling query.
        
         | qsort wrote:
         | It's immutable in the same sense a purely functional data
         | structure is immutable. You represent mutation by making a new
         | version of the data structure. Of course you don't _literally_
         | do that on the database because it would be inefficient, but
         | there are several algorithmical tricks that can expose an
         | interface that works _as if_.
        
           | artemonster wrote:
           | that makes sense on a language level, when you hold a
           | reference to some data and you can assume nothing can be
           | changed about it. how does that hold on DB level?
        
             | qsort wrote:
             | In the same way. A database is basically just a giant data
             | structure, a table is not unlike a B-Tree (in some engines
             | it _literally_ is a B-tree). Data warehouses already do
             | something like this informally, as they are structured in a
             | star schema around a single  "append-only" fact table.
        
         | [deleted]
        
         | ianpurton wrote:
         | You would be able to query and INSERT but not DELETE and
         | UPDATE.
         | 
         | This is useful for example in banking applications that keep an
         | audit trail for example.
         | 
         | A sysadmin would not be able to update or delete items in the
         | audit table and so can't cover up a crime.
         | 
         | If the database is tampered with at the file level, they have a
         | way to detect that. (Probably some kind of merkle tree.)
        
           | artemonster wrote:
           | allright, makes perfect sense. thank you!
        
       | JulianMorrison wrote:
       | If this is deployed in a situation where record volumes are
       | large, example: recording credit card transactions, there is
       | going to have to be a process to "retire" old records (and
       | perhaps, move them to external archives). The alternative is
       | endlessly growing storage, and the resulting performance
       | degradation.
       | 
       | At a first glance, I don't see anything like that in there.
        
       | 1cvmask wrote:
       | Any major customers using this and if so how?
        
       | tutfbhuf wrote:
       | I would like to have such a database based on git. Where every
       | change is a git commit. This should then work with things like
       | github where you can connect to your database via github api. The
       | db git repositories could be either private or even public. You
       | can then deploy a serverless webpage to gh-pages and use a
       | serverless gh-gitdb as storage.
       | 
       | serverless := you don't have to operate the infrastructure
       | yourself
        
         | agbell wrote:
         | It seems like this is somewhat in that direction. It looks like
         | it is using merkle trees to store the history.
        
         | quasiperson wrote:
         | You should check out https://www.dolthub.com/ then. They are
         | working on something very similar.
        
         | lifty wrote:
         | Check out https://replicache.dev and https://github.com/attic-
         | labs/noms
        
       | arpinum wrote:
       | The QLDB performance comparison looks quite dodgy, but I can't
       | find their QLDB benchmark code to see what they are doing wrong.
        
       | 0xbadcafebee wrote:
       | > This new functionality allows travel back in time through the
       | data change history, and even compares these values in the same
       | query!
       | 
       | So we can actually treat our databases like immutable
       | infrastructure and actually roll back changes now without the
       | hulking cludge that is snapshots/restores and database
       | migrations? That's game-changing.
        
       | robto wrote:
       | Reminds me a lot of Fluree[0], an immutable, cryptographically
       | verifiable, temporal database, but with RDF as a query language,
       | which I think is very nice. SQL is nice because it's familiar but
       | it's honestly not that hard to improve on.
       | 
       | [0]https://flur.ee/
        
       ___________________________________________________________________
       (page generated 2021-05-25 23:00 UTC)