[HN Gopher] Implementing a Merkle Tree for an Immutable Verifiab... ___________________________________________________________________ Implementing a Merkle Tree for an Immutable Verifiable Log Author : eatonphil Score : 96 points Date : 2022-05-06 16:45 UTC (6 hours ago) (HTM) web link (aly.arriqaaq.com) (TXT) w3m dump (aly.arriqaaq.com) | skybrian wrote: | I knew that Go had a checksum database but not that it had a | tamper-proof log. This seems like something that other package | management systems should do as well. | gumby wrote: | What do you think your git repo is? | colburnmh wrote: | True to an extent--meaning mostly true in practice. | | You can mutate the Git data if you chose to using commands | like "filter-branch". "filter-branch" isn't used frequently, | since it causes issues with every up/down-stream replica if | the data has been pushed/pulled, but it is possible. But, | even some commonly used commands like "amend", "rebase", and | "squash" cause limited data mutations which are broadly | considered appropriate and useful. | cryptonector wrote: | Uhm, you can mutate _any_ Merkle hash tree / log you want, | provided you can replace the copies of the past that | _others_ store. Correspondingly, the security of a Merkle | hash tree / blockchain depends on having others store | copies of it _and_ check that additions chain from their | past heads. | | Git, of course, lets you "mutate" the Merkle hash tree[0], | but -to the extent that the hash function used allows- your | rewriting of published branch history will be detected by | others. (Those others can recover by reviewing the deltas | and using `git rebase --onto`.) | | There's no Merkle-hash-tree-defeating code in Git as such | because this is a _generic_ issue for Merkle hash trees, | not a Git-specific one. | | And to be clear: there is _nothing_ wrong with `git rebase` | and "rEwRiTiNg HiStOry" as long as you're not rewriting | _published_ history. And for those times when you must | rewrite published history (for whatever reason), you can | recover forks with `git rebase --onto`. | | The idea that "oh no!!1! Git lets you rewrite history, | that's baaad and you shouldn't ever ever do that!!" is a | terrible canard that must be called out as such every time | it comes up. Mercurial does too. It's not really a problem | unless you rewrite published history, but the vast majority | of the time any user rewrites Git (or Mercurial, or...) | history, they're doing so locally, on dev branches, not on | published trunks. | | I think the history-rewrite-is-always-bad meme must have | been started by people who hated Git or who just didn't | understand. But it's a very destructive meme that causes | people to be happy pushing and accepting ugly and useless | history because "you don't want to ever fix it". That meme | must be defeated. | | [0] Well, Git doesn't let you mutate the objects in the | tree itself, it only lets you add or remove objects, and it | lets you change the commits pointed to by branches, tags, | and other symbolic refs. | kjeetgill wrote: | This is kind of the brilliance of git forcing users to | refer to commits by hashes, all the things you're referring | to _don 't_ modify commits, they make new ones. | | Braches are the the only mutable place where you have it | point to different commits in a sequence. | | I could imagine in some horrific alternate universe someone | deciding to hide hashes as _not user friendly_. Like, I | think we got incredibly lucky how that played out. | GauntletWizard wrote: | You can experience this alternate universe right now in | Github Actions, which allows you to refer to other | "Actions" by their _tag_ , and encourages you to pin | yourself to a "v3" which the team will then destroy and | replace to update you. | | If this sounds terrible, insecure, and begging to be | exploited, it's because every idiot on the Github Actions | Team should be censured for their poor understanding of | Git, Github, and yet proceeding to ship anyway. | hermanb wrote: | I've been wondering about this too and always used full | sha's until now. But recently I've made an action myself: | You actually need to publish the action to the | marketplace with each tag manually. It feels like there | might be more going on. | | Is GitHub storing those published tags and avoiding | tampering by only letting you use those tags once? Are | they warning or blocking runs if you tamper? ... | | I'm really curious because it seems like SUCH a giant | risk otherwise. | GauntletWizard wrote: | Nope, they even suggest (and companies have built tooling | around) deleting versions of the tags. | WorldMaker wrote: | Deleting a tag is a force push operation like any other | and repo policies that block force pushes will block tag | updates. | | Tags themselves aren't necessarily the worst idea, but | yes policies encouraging force pushes are likely to | experience exploitation. | | Also, annotated tags have their own "commit" hashes, and | can be code signed like any other commit. There are more | precautions that could be taken. | tylersmith wrote: | You can always create new trees by commiting or rebasing | but absent being able to find a proper hash collision you | can't change content within a given commit hash. | tylorr wrote: | Those all generate new commits, they don't modify existing | ones. Though the only way of easily detecting that is | comparing it to a clone of the repo. | jacques_chester wrote: | I know that at least RubyGems, NPM, PyPI and Maven Central are | in various stages of using Sigstore for signing and transparent | logging. | TedDoesntTalk wrote: | How is this different from a blockchain (serious question)? | [deleted] | greiskul wrote: | A blockchain is a merkle tree with a distributed consensus | algorithm. | charcircuit wrote: | A blockchain is just a data structure. A consensus algorithm | can agree what the current state / last block of a block | chain is, but they are separate. | | Similarly if one says B tree or LSM tree that doesn't mean | they are also talking about a consensus algorithm. | WorldMaker wrote: | The only thing that keeps a blockchain appearing to be a | rebase-only straight line and not an otherwise ordinary | merkle tree _is_ the consensus algorithm on what the | "current state / last block of the chain" is. | | The difference between a blockchain and a merkle tree in no | way resembles that between a B tree or LSM tree. It's | closer to the difference between a Binary Search Tree and a | Red/Black Tree. They are both Binary Search trees and most | operations are the exact same, but Red/Black Trees have an | extra "color" attribute and rely on that for a better | balancing algorithm than the "default" BST algorithm. | charcircuit wrote: | I think you are over complicating things. A blockchain is | just a linked list with content addressed next pointers. | You don't need a distributed system to use it. When on a | single machine you don't need a consensus algorithm to | agree on what the head of the list is. Even if you have | multiple different heads none of them will be a tree. A | node can only ever point to 1 other node. | [deleted] | est31 wrote: | There is no token/currency associated to it, and it's made for | a single specific purpose only, compared to most blockchains | which try to run programs, manage wallets, etc. It also doesn't | have a decentralized consensus algorithm, nor is there a gossip | network to inform about new transactions that are to be added | to the log. It's centralized per design. | | Which of these differences you see as important to the | blockchain / non-blockchain distinction, depends on the precise | definition for blockchain that you want to use. | | That being said, even the CT website uses the term "ledger" to | explain what the logs are: | https://certificate.transparency.dev/howctworks/ | radicalbyte wrote: | The term "ledger" pre-dates crypto. An append-only log is a | ledger if the writers are trusted. | a1369209993 wrote: | A blockchain provides a guarantee of (statisical eventual) | write availability, since anyone can mine new blocks | (eventually). A verifiable log only accepts new entries from a | limited, generally closed set of writers (usually just one), | and you can only add new entries by going through one of those | writers. (Blockchains also tend to have better read | availability due to being more widely distributed, but there's | no reason why you _couldn 't_ destribute a verifiable log that | widely, it just tends not to happen in practice.) | | This makes a verifiable log unsuitable for financial purposes | since a adversary with lead pipe capabilities (or corrupt judge | capabilities as the case may be) can block undesirable | transactions at a single (or few) point of failure, whereas | against a blockchain they'd have to target anyone with | significant compute capacity. | | On the other hand, verifiable logs don't need proof of | work/stake/etc to limit hostile forking, so if the log is | describing things rather than acting a ground source of truth | (so you can just ignore it if it starts refusing writes), it's | much more efficient than a blockchain. | lichtenberger wrote: | We're also storing a similar hash tree (a hash is stored in each | node -- an insert/update/delete triggers a recomputation of the | hash in all ancestor nodes) to verify if a subtree has changed in | SirixDB[1]. However, only pointers to neighbour nodes as well as | the content is used for recomputation of the ancestor hashes | instead of all children during updates. Furthermore checksums of | child page-fragments are stored in parent references. | | The hashes are used for fast diff calculations along with an | optional DeweyID node scheme to quickly get diffs in a resource | in preorder or to check if a subtree has changed without fetching | all ancestor nodes (due to the stored DeweyIDs). | | For instance you can simply check for updates of a node in a JSON | structure with the following query: let $node | := jn:doc('mycol.jn','mydoc.jn')=>fieldName[[1]] let | $result := for $node-in-rev in jn:all-times($node) | return if | ((not(exists(jn:previous($node-in-rev)))) | or (sdb:hash($node-in-rev) ne sdb:hash(jn:previous($node-in- | rev)))) then $node-in-rev | else () return [ for | $jsonItem in $result return { "node": $jsonItem, | "revision": sdb:revision($jsonItem) } ] | | It iterates through all revisions of a node and returns the node | and the revision (and we could also add the author) when it has | been updated. | | To make this even easier I could write a native Java function for | use in the query. | | [1] https://sirix.io | jaboutboul wrote: | "Recently I'd been reading on the application of immutable | databases for tracking changes in sensitive data, and for | auditing purposes." | | Pretty sure he is referring to https://immudb.io. | SemanticStrengh wrote: | there is also https://aws.amazon.com/qldb/ Thoses tech make | blockchain obscolete for many uses. | jaboutboul wrote: | Yes could be. We should point out that QLDB isn't open source | and only runs as an AWS service. | losteric wrote: | Isn't QLDB a blockchain tech? Albeit a real-world practical | one. | SemanticStrengh wrote: | It is not it seems https://aws.amazon.com/qldb/faqs/#:~:tex | t=Amazon%20QLDB%20ca.... | jaboutboul wrote: | It's not a blockchain. It's meant to be a ledger database | so it works in a similar way to a Blockchain, same internal | data structure, but without the distributed consensus | concept. | cryptonector wrote: | Or Git. Or... | | This is a common pattern now. ___________________________________________________________________ (page generated 2022-05-06 23:00 UTC)