[HN Gopher] Implementing a Merkle Tree for an Immutable Verifiab...
       ___________________________________________________________________
        
       Implementing a Merkle Tree for an Immutable Verifiable Log
        
       Author : eatonphil
       Score  : 96 points
       Date   : 2022-05-06 16:45 UTC (6 hours ago)
        
 (HTM) web link (aly.arriqaaq.com)
 (TXT) w3m dump (aly.arriqaaq.com)
        
       | skybrian wrote:
       | I knew that Go had a checksum database but not that it had a
       | tamper-proof log. This seems like something that other package
       | management systems should do as well.
        
         | gumby wrote:
         | What do you think your git repo is?
        
           | colburnmh wrote:
           | True to an extent--meaning mostly true in practice.
           | 
           | You can mutate the Git data if you chose to using commands
           | like "filter-branch". "filter-branch" isn't used frequently,
           | since it causes issues with every up/down-stream replica if
           | the data has been pushed/pulled, but it is possible. But,
           | even some commonly used commands like "amend", "rebase", and
           | "squash" cause limited data mutations which are broadly
           | considered appropriate and useful.
        
             | cryptonector wrote:
             | Uhm, you can mutate _any_ Merkle hash tree  / log you want,
             | provided you can replace the copies of the past that
             | _others_ store. Correspondingly, the security of a Merkle
             | hash tree  / blockchain depends on having others store
             | copies of it _and_ check that additions chain from their
             | past heads.
             | 
             | Git, of course, lets you "mutate" the Merkle hash tree[0],
             | but -to the extent that the hash function used allows- your
             | rewriting of published branch history will be detected by
             | others. (Those others can recover by reviewing the deltas
             | and using `git rebase --onto`.)
             | 
             | There's no Merkle-hash-tree-defeating code in Git as such
             | because this is a _generic_ issue for Merkle hash trees,
             | not a Git-specific one.
             | 
             | And to be clear: there is _nothing_ wrong with `git rebase`
             | and  "rEwRiTiNg HiStOry" as long as you're not rewriting
             | _published_ history. And for those times when you must
             | rewrite published history (for whatever reason), you can
             | recover forks with `git rebase --onto`.
             | 
             | The idea that "oh no!!1! Git lets you rewrite history,
             | that's baaad and you shouldn't ever ever do that!!" is a
             | terrible canard that must be called out as such every time
             | it comes up. Mercurial does too. It's not really a problem
             | unless you rewrite published history, but the vast majority
             | of the time any user rewrites Git (or Mercurial, or...)
             | history, they're doing so locally, on dev branches, not on
             | published trunks.
             | 
             | I think the history-rewrite-is-always-bad meme must have
             | been started by people who hated Git or who just didn't
             | understand. But it's a very destructive meme that causes
             | people to be happy pushing and accepting ugly and useless
             | history because "you don't want to ever fix it". That meme
             | must be defeated.
             | 
             | [0] Well, Git doesn't let you mutate the objects in the
             | tree itself, it only lets you add or remove objects, and it
             | lets you change the commits pointed to by branches, tags,
             | and other symbolic refs.
        
             | kjeetgill wrote:
             | This is kind of the brilliance of git forcing users to
             | refer to commits by hashes, all the things you're referring
             | to _don 't_ modify commits, they make new ones.
             | 
             | Braches are the the only mutable place where you have it
             | point to different commits in a sequence.
             | 
             | I could imagine in some horrific alternate universe someone
             | deciding to hide hashes as _not user friendly_. Like, I
             | think we got incredibly lucky how that played out.
        
               | GauntletWizard wrote:
               | You can experience this alternate universe right now in
               | Github Actions, which allows you to refer to other
               | "Actions" by their _tag_ , and encourages you to pin
               | yourself to a "v3" which the team will then destroy and
               | replace to update you.
               | 
               | If this sounds terrible, insecure, and begging to be
               | exploited, it's because every idiot on the Github Actions
               | Team should be censured for their poor understanding of
               | Git, Github, and yet proceeding to ship anyway.
        
               | hermanb wrote:
               | I've been wondering about this too and always used full
               | sha's until now. But recently I've made an action myself:
               | You actually need to publish the action to the
               | marketplace with each tag manually. It feels like there
               | might be more going on.
               | 
               | Is GitHub storing those published tags and avoiding
               | tampering by only letting you use those tags once? Are
               | they warning or blocking runs if you tamper? ...
               | 
               | I'm really curious because it seems like SUCH a giant
               | risk otherwise.
        
               | GauntletWizard wrote:
               | Nope, they even suggest (and companies have built tooling
               | around) deleting versions of the tags.
        
               | WorldMaker wrote:
               | Deleting a tag is a force push operation like any other
               | and repo policies that block force pushes will block tag
               | updates.
               | 
               | Tags themselves aren't necessarily the worst idea, but
               | yes policies encouraging force pushes are likely to
               | experience exploitation.
               | 
               | Also, annotated tags have their own "commit" hashes, and
               | can be code signed like any other commit. There are more
               | precautions that could be taken.
        
             | tylersmith wrote:
             | You can always create new trees by commiting or rebasing
             | but absent being able to find a proper hash collision you
             | can't change content within a given commit hash.
        
             | tylorr wrote:
             | Those all generate new commits, they don't modify existing
             | ones. Though the only way of easily detecting that is
             | comparing it to a clone of the repo.
        
         | jacques_chester wrote:
         | I know that at least RubyGems, NPM, PyPI and Maven Central are
         | in various stages of using Sigstore for signing and transparent
         | logging.
        
       | TedDoesntTalk wrote:
       | How is this different from a blockchain (serious question)?
        
         | [deleted]
        
         | greiskul wrote:
         | A blockchain is a merkle tree with a distributed consensus
         | algorithm.
        
           | charcircuit wrote:
           | A blockchain is just a data structure. A consensus algorithm
           | can agree what the current state / last block of a block
           | chain is, but they are separate.
           | 
           | Similarly if one says B tree or LSM tree that doesn't mean
           | they are also talking about a consensus algorithm.
        
             | WorldMaker wrote:
             | The only thing that keeps a blockchain appearing to be a
             | rebase-only straight line and not an otherwise ordinary
             | merkle tree _is_ the consensus algorithm on what the
             | "current state / last block of the chain" is.
             | 
             | The difference between a blockchain and a merkle tree in no
             | way resembles that between a B tree or LSM tree. It's
             | closer to the difference between a Binary Search Tree and a
             | Red/Black Tree. They are both Binary Search trees and most
             | operations are the exact same, but Red/Black Trees have an
             | extra "color" attribute and rely on that for a better
             | balancing algorithm than the "default" BST algorithm.
        
               | charcircuit wrote:
               | I think you are over complicating things. A blockchain is
               | just a linked list with content addressed next pointers.
               | You don't need a distributed system to use it. When on a
               | single machine you don't need a consensus algorithm to
               | agree on what the head of the list is. Even if you have
               | multiple different heads none of them will be a tree. A
               | node can only ever point to 1 other node.
        
           | [deleted]
        
         | est31 wrote:
         | There is no token/currency associated to it, and it's made for
         | a single specific purpose only, compared to most blockchains
         | which try to run programs, manage wallets, etc. It also doesn't
         | have a decentralized consensus algorithm, nor is there a gossip
         | network to inform about new transactions that are to be added
         | to the log. It's centralized per design.
         | 
         | Which of these differences you see as important to the
         | blockchain / non-blockchain distinction, depends on the precise
         | definition for blockchain that you want to use.
         | 
         | That being said, even the CT website uses the term "ledger" to
         | explain what the logs are:
         | https://certificate.transparency.dev/howctworks/
        
           | radicalbyte wrote:
           | The term "ledger" pre-dates crypto. An append-only log is a
           | ledger if the writers are trusted.
        
         | a1369209993 wrote:
         | A blockchain provides a guarantee of (statisical eventual)
         | write availability, since anyone can mine new blocks
         | (eventually). A verifiable log only accepts new entries from a
         | limited, generally closed set of writers (usually just one),
         | and you can only add new entries by going through one of those
         | writers. (Blockchains also tend to have better read
         | availability due to being more widely distributed, but there's
         | no reason why you _couldn 't_ destribute a verifiable log that
         | widely, it just tends not to happen in practice.)
         | 
         | This makes a verifiable log unsuitable for financial purposes
         | since a adversary with lead pipe capabilities (or corrupt judge
         | capabilities as the case may be) can block undesirable
         | transactions at a single (or few) point of failure, whereas
         | against a blockchain they'd have to target anyone with
         | significant compute capacity.
         | 
         | On the other hand, verifiable logs don't need proof of
         | work/stake/etc to limit hostile forking, so if the log is
         | describing things rather than acting a ground source of truth
         | (so you can just ignore it if it starts refusing writes), it's
         | much more efficient than a blockchain.
        
       | lichtenberger wrote:
       | We're also storing a similar hash tree (a hash is stored in each
       | node -- an insert/update/delete triggers a recomputation of the
       | hash in all ancestor nodes) to verify if a subtree has changed in
       | SirixDB[1]. However, only pointers to neighbour nodes as well as
       | the content is used for recomputation of the ancestor hashes
       | instead of all children during updates. Furthermore checksums of
       | child page-fragments are stored in parent references.
       | 
       | The hashes are used for fast diff calculations along with an
       | optional DeweyID node scheme to quickly get diffs in a resource
       | in preorder or to check if a subtree has changed without fetching
       | all ancestor nodes (due to the stored DeweyIDs).
       | 
       | For instance you can simply check for updates of a node in a JSON
       | structure with the following query:                   let $node
       | := jn:doc('mycol.jn','mydoc.jn')=>fieldName[[1]]         let
       | $result := for $node-in-rev in jn:all-times($node)
       | return                          if
       | ((not(exists(jn:previous($node-in-rev))))
       | or (sdb:hash($node-in-rev) ne sdb:hash(jn:previous($node-in-
       | rev)))) then                            $node-in-rev
       | else                            ()         return [           for
       | $jsonItem in $result           return { "node": $jsonItem,
       | "revision": sdb:revision($jsonItem) }         ]
       | 
       | It iterates through all revisions of a node and returns the node
       | and the revision (and we could also add the author) when it has
       | been updated.
       | 
       | To make this even easier I could write a native Java function for
       | use in the query.
       | 
       | [1] https://sirix.io
        
       | jaboutboul wrote:
       | "Recently I'd been reading on the application of immutable
       | databases for tracking changes in sensitive data, and for
       | auditing purposes."
       | 
       | Pretty sure he is referring to https://immudb.io.
        
         | SemanticStrengh wrote:
         | there is also https://aws.amazon.com/qldb/ Thoses tech make
         | blockchain obscolete for many uses.
        
           | jaboutboul wrote:
           | Yes could be. We should point out that QLDB isn't open source
           | and only runs as an AWS service.
        
           | losteric wrote:
           | Isn't QLDB a blockchain tech? Albeit a real-world practical
           | one.
        
             | SemanticStrengh wrote:
             | It is not it seems https://aws.amazon.com/qldb/faqs/#:~:tex
             | t=Amazon%20QLDB%20ca....
        
             | jaboutboul wrote:
             | It's not a blockchain. It's meant to be a ledger database
             | so it works in a similar way to a Blockchain, same internal
             | data structure, but without the distributed consensus
             | concept.
        
         | cryptonector wrote:
         | Or Git. Or...
         | 
         | This is a common pattern now.
        
       ___________________________________________________________________
       (page generated 2022-05-06 23:00 UTC)