[HN Gopher] A new hash algorithm for Git
       ___________________________________________________________________
        
       A new hash algorithm for Git
        
       Author : Tomte
       Score  : 400 points
       Date   : 2020-02-04 07:44 UTC (15 hours ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | angrygoat wrote:
       | This article is via an LWN subscriber link; a cheerful reminder
       | that LWN are good and they are worth subscribing to :)
       | https://lwn.net/subscribe/
        
         | john-radio wrote:
         | OP's link says it is "subscription-only content," but it is
         | still publicly available. It says that it has been "made
         | available by an LWN subscriber." How does that work?
        
           | robjan wrote:
           | Subscribers receive a sharing link which can be used to share
           | articles with friends. They tolerate sharing on HN because it
           | brings in new customers.
        
             | globuous wrote:
             | Damn, lwn is sweeet !! :)
        
             | john-radio wrote:
             | That's one smart model.
        
           | cesarb wrote:
           | It's a "subscriber link": https://lwn.net/op/FAQ.lwn#slinks
        
       | brobdingnagians wrote:
       | I was interested in how fossil handled the SHA1 transition, and
       | found this nicely explained as below:
       | 
       | https://fossil-scm.org/home/doc/trunk/www/hashpolicy.wiki
        
         | velcrovan wrote:
         | Fossil's main author is chiming in the discussion of this on
         | Fossil's forums:
         | 
         | (https://fossil-scm.org/forum/forumpost/50a5bea5fb)
         | 
         | > That's appalling. Fossil's implementation doesn't require a
         | conversion.
         | 
         | "This is a key point, that I want to highlight. I'm sorry that
         | it wasn't made more clear in the LWN posting nor in the HN
         | discussion.
         | 
         | "With Fossil, to begin using the new SHA3 hash algorithm, you
         | just upgrade your fossil binary. No further actions, workflow
         | changes, disruptions, or thought are required on the part of
         | the user.
         | 
         | * "Old check-ins with SHA1 hashes continue to use their SHA1
         | hash names."
         | 
         | * "New check-ins automatically get more secure SHA3 hash
         | names."
         | 
         | * "No repository conversions need to occur"
         | 
         | * "Given a hash prefix, Fossil automatically figures out
         | whether it is dealing with a SHA1 or a SHA3 hash"
         | 
         | * "No human brain-cycles are wasted trying to navigate through
         | a hash-algorithm cut-over."
         | 
         | "Contrast this to Git, where a repository must be either all-
         | SHA1 or all-SHA2. Hence, to cut-over a repository requires
         | rebuilding the repository and in the process renaming all
         | historical artifacts -- essentially rebasing the entire
         | repository. The historical artifact renaming means that
         | external links to historical check-ins (such as in tickets) are
         | broken. And during the transition period, users have to be
         | constantly aware of whether they are using SHA1 or SHA2 hash
         | names. It is a big mess. It is no wonder, then, that few people
         | have been eager to transition their repositories over to the
         | newer SHA2 format."
        
           | seniorsassycat wrote:
           | The way I read the fossil's authors comments, old commits
           | continue to use sha1 hashes. A repository will be vulnerable
           | to sha1 collision attacks as long as there is an object in
           | the repository that has not been hashed with the new
           | algorithm.
           | 
           | For example, floppy.c could be replaced in a repo with file
           | with the same sha1 hash as long as the last commit that
           | modifies floppy.c used a sha1 hash.
           | 
           | Right?
        
             | wyoung2 wrote:
             | In addition to D. Richard Hipp's thoughts as HN user SQLite
             | -- author also of Fossil, so he oughtta know -- I offer
             | these:
             | 
             | 1. Keep in mind that Fossil and Git are both applications
             | of blockchain technology, which in this particular
             | practical case means you must not only forge a single
             | artifact's hash, you must also do it in a way that allows
             | it to fit into the overall blockchain.
             | 
             | 2. Fossil's sync protocol purposefully won't apply Dr.
             | Hipp's hypothetical evil.c to an existing Fossil blockchain
             | if presented it. Fossil will say, "I've already got that
             | one, thanks," and move on. Only new or outdated clones
             | could be so-fooled.
        
               | ajkjk wrote:
               | > applications of blockchain technology
               | 
               | Are we saying this now? More like blockchain is an
               | application of git technology.
        
               | seniorsassycat wrote:
               | https://en.wikipedia.org/wiki/Merkle_tree
        
               | SQLite wrote:
               | "blockchain" is self-descriptive, easier to pronounce
               | (only two syllables instead of three), and easier to
               | spell correctly. :-)
        
               | _jal wrote:
               | No. We are not.
               | 
               | If you're looking for prior art, ZFS's application of
               | Merkle trees predates both. I think there was some other
               | public use before that, but I can't recall right now.
        
             | Tyr42 wrote:
             | They are also using "Hardened SHA1", which detects
             | collision attacks, and assigns a longer id to commits which
             | seem malicious, while being backwards compatible.
        
             | SQLite wrote:
             | Just to be clear: Every time you modify a file, the new
             | changes get put in using SHA3. In an older repository, any
             | given commit might have some files identified using SHA1
             | (assuming they have not changed in 3 years) and others
             | identified using SHA3.
             | 
             | For example, the manifest of the latest SQLite check-in is
             | see at
             | (https://www.sqlite.org/src/artifact/29a969d6b1709b80). You
             | can see that most of the files have longer SHA3 hashes, but
             | some of the files that have not been touched in three years
             | still carry SHA1 hashes.
             | 
             | An attack like what you describe is possible _if_ you could
             | generate an evil.c file that has the exact same SHA1 hash
             | as the older floppy.c file. Then you could substitute the
             | evil.c artifact in place of the floppy.c artifact, get some
             | unsuspecting victim to clone your modified repository, and
             | cause mischief that way. Note, however, that this is a pre-
             | image attack, which is rather more difficult to pull off
             | than the collision attacks against SHA1, and (to my
             | knowledge) has never been publicly demonstrated.
             | Furthermore, the evil.c file with the same SHA1 hash would
             | need to be valid C code that does something evil while
             | still yielding the same hash (good luck with that!) and
             | Fossil (like Git) has also switched over to Hardened SHA1,
             | making the attack even harder still.
             | 
             | As still more defense, Fossil also maintains a MD5 hash
             | against the entire content of the commit. So, in addition
             | to finding evil.c that compiles, does your evil bidding,
             | has the same hardened-SHA1 hash as floppy.c, you also have
             | to make sure that the entire commit has the same MD5 hash
             | after substituting the text of evil.c in place of floppy.c.
             | 
             | So, no, it is not really practical to hack a Fossil
             | repository as you describe.
        
               | wyoung2 wrote:
               | > Furthermore, the evil.c file with the same SHA1 hash
               | would need to be valid C code that does something evil
               | while still yielding the same hash
               | 
               | ...and also produce an innocent-looking diff!
               | 
               | I mean, you could stuff a bunch of random bytes into a C
               | comment to force the desired hash in the output using
               | these documented attack techniques, but anyone inspecting
               | the diffs between versions is likely to see such an
               | explosion of noise and call foul.
               | 
               | If you want an analogy, it's like someone saying they've
               | learned to impersonate federal agent identification
               | cards, only it requires that the person carrying the fake
               | ID to have a thousand rainbow-dyed ducks on a leash in
               | tow behind him.
               | 
               | Such attacks are fine when it's dumb software systems
               | doing the checks, but for a source code repository where
               | people do in fact visually check the diffs occasionally?
               | 
               | Well, let's just say that when someone manages to use
               | SHAttered and/or SHAmbles type attacks on Git (or even
               | Fossil) I expect that it won't take a genius detective to
               | see that the repo's been attacked.
        
               | seniorsassycat wrote:
               | That's an argument for why you shouldn't worry about sha1
               | attacks in source control, but we should take the attack
               | for granted when discussing how to mitigate the attack.
               | 
               | If we weren't worried about sha1 collisions in git then
               | we wouldn't switch to a new hash function.
        
               | wyoung2 wrote:
               | When _is_ the right time to worry? Maybe wait until
               | someone publishes a practical attack, then wait years for
               | the new code to get sufficiently far out into the world
               | that you can switch to it?
               | 
               | I mean, I see you're expressing concern, but the first
               | major red flag on this went up three years ago, and
               | another big one went up last month. (https://sha-
               | mbles.github.io/)
               | 
               | When we dealt with this same problem over in Fossil land,
               | we ended up needing to wait most of three years for
               | Debian to finally ship a new enough binary that we could
               | switch the default to SHA-3. Fortunately (?) RHEL doesn't
               | ship Fossil, else we'd likely have had to wait even
               | longer.
               | 
               | Atop that same problem, Git's also got tremendously more
               | inertia. Git has to wait out not only the Debian and RHEL
               | stable package policies but also all of that
               | infrastructure tooling they brag on. Every random
               | programmer's editor, merge tool, Git front end... all of
               | that which a project depends on will have to convert over
               | before that one project can move to a post-SHA-1 future.
               | 
               | This is going to be a colossal mess.
        
               | tjoff wrote:
               | Many diff tools don't highlight whitespace-only changes.
               | Or at least not in a clear manner.
               | 
               | Also, if something is replaced in the history how often
               | do people go back and view diffs in old code? Hardly
               | often enough to rely on it being spotted.
        
               | wyoung2 wrote:
               | It only takes one person to raise the flag.
               | 
               | Sure, many thousands of people doing blind "git clone &&
               | configure && sudo make install" could be burned by a
               | problem like this, but _someone_ would eventually do a
               | diff and see the problem on any project big enough to
               | have those thousands of trusting users in the first
               | place.
               | 
               | I'm not excusing these SHA-1 weaknesses, only pointing
               | out that it won't be trivial to apply them to program
               | source code repos no matter how cheap the attacks get.
               | 
               | For instance, the demonstration case for SHAttered was a
               | pair of PDFs: humans can't reasonably inspect those to
               | find whatever noise had to be stuffed into them to
               | achieve the result.
               | 
               | I also understand that these SHA-1 weaknesses have been
               | used to attack X.509 certificates, but there again you
               | have a case very unlike a software code repo, where the
               | one doing the checking isn't another programmer but a
               | program.
        
               | remram wrote:
               | The problem is that we are considering an issue where
               | different people can get different objects for the same
               | hash. If the people checking all see the valid files,
               | they cannot raise any alarms to save the poor victims who
               | got poisoned with the wrong objects. They'll clone from
               | the wrong fork, and no amount of checking hashes or
               | signed tags will prevent them from running compromised
               | code.
        
               | wyoung2 wrote:
               | > If the people checking all see the valid files
               | 
               | ...which will likely contain thousands of bytes of
               | pseudorandom data in order to force the hash collision...
               | 
               | > they cannot raise any alarms
               | 
               | You think a human won't be able to notice that the diff
               | from the last version they tested looks awfully funny?
               | Code that can fool the compiler into producing an evil
               | binary is one thing, but code that can pass a human code
               | review is quite another.
               | 
               | You might be surprised how often that occurs.
               | 
               | I don't do a diff before each third-party DVCS repo pull,
               | but I do diff the code when integrating such third-party
               | code into my projects, if only so I understand what
               | they've done since the last time I updated. Commit
               | messages, ChangeLogs, and release announcements only get
               | you so far.
               | 
               | Back when I was producing binary packages for a popular
               | software distribution, I'd often be forced to diff the
               | code when producing new binaries, since several of the
               | popular binary package distribution systems are based on
               | patches atop pristine upstream source packages. (RPM,
               | DEB, Cygwin packages...)
               | 
               | Each time a binary package creator updates, there's a
               | good chance they've had to diff the versions to work out
               | how to apply their old distro-specific patches atop the
               | new codebase.
               | 
               |  _Someone 's_ going to notice the first time this
               | happens, and my guess is that it'll happen rather
               | quickly.
        
               | seniorsassycat wrote:
               | Isn't this the same attack given as an example why git is
               | migrating hash functions in the subject article?
               | 
               | The attack may be difficult and unlikely I'm not
               | questioning that, but if I understand correctly then
               | Fossil's migration is straightforward because they did
               | not address the same issues Git chose to.
        
               | SQLite wrote:
               | > if I understand correctly then Fossil's migration is
               | straightforward because they did not address the same
               | issues Git chose to.
               | 
               | I think more is at play here.
               | 
               | (1) You can set Fossil to ignore all SHA1 artifacts using
               | the "shun-sha1" hash policy.
               | 
               | (2) The excess complication in the Git migration strategy
               | is likely due to the inability of the underlying Git file
               | formats to handle two different hash algorithms in the
               | same repository at the same time.
               | 
               | But, I could be wrong. Post a rebuttal if you have
               | evidence to the contrary.
        
               | apeace wrote:
               | And if you are that concerned about this type of attack,
               | it may be worth your time to simply start a new Fossil
               | repository using the sha3-only hash policy (writing a
               | script to replay commits into the new repo, so you don't
               | lose history).
               | 
               | It seems like a problem very few people need to worry
               | about and Fossil has made the right trade-offs.
        
               | mb7733 wrote:
               | Doesn't all of this apply to git just as well, except for
               | the last bit about the MD5 hash?
               | 
               | It just seems to me that the Fossil maintainers have
               | decided that keeping all old SHA1 hashes is acceptable,
               | while the git maintainers have decided that it is not.
               | 
               | Unless I've misunderstood, this is why it was "so easy"
               | for Fossil to transition to a new hashing algorithm. Not
               | some superiority in the design of Fossil, as implied on
               | the Fossil forums.
        
         | Tyr42 wrote:
         | Ah, so it uses "Hardened SHA1", which detects if you are trying
         | to exploit SHA1, and then produces a longer, unambiguous hash.
         | But otherwise Hardened SHA1 has the same output as SHA1, so
         | it's a drop in replacement.
         | 
         | Then it also has a similar looking-ish migration to SHA3-256.
        
           | velcrovan wrote:
           | Fossil defaults to SHA3-256 since 2.10 (released in October
           | 2019). But it has had SHA3-256 since March 2017, and
           | generally any repos/clones managed with a Fossil version
           | since then have been seamlessly updated to SHA3-256 in the
           | background.
        
       | zokier wrote:
       | Does anyone know if a standard format for sort of tagged-union
       | hash type, something similar as crypt format for passwords? Feels
       | like everyone is needing to support multiple hash types at some
       | point, and basically needs to reinvent that particular wheel
       | again and again.
        
         | loeg wrote:
         | It isn't too bad to just exhaustively look up provided hashes
         | in all your databases (at least, for Git). You should probably
         | only support 1 primary hash at a time, and 1 additional legacy
         | hash for migration purposes. This makes lookup twice as
         | expensive; for git, this is not usually the slow part (the slow
         | part is 'git status' having to compare the entire local
         | filesystem checkout to the repo).
        
       | official151 wrote:
       | http://www.jobsfinderuae.com/
        
       | kazinator wrote:
       | > _There is, of course, a way to unambiguously give a hash value
       | in the new Git code, and they can even be mixed on the command
       | line; this example comes from the transition document:_
       | git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256}
       | 
       | > _For a Git user interface this is relatively straightforward
       | and concise_
       | 
       | No, it isn't. It's a complete and utter user interface
       | clusterfuck. Just say no to this insanity.
        
         | microtherion wrote:
         | Note the qualifier "for a Git user interface".
         | 
         | The average git command is along the lines of "git ph-nglui
         | --mglw=nafh Cthulhu...R'lyeh -- wgah^nagl fhtagn"
        
           | kazinator wrote:
           | That cute rhetoric will not fool anyone. Common git workflows
           | use fairly succinct git commands:                 git diff
           | git commit -p       git rebase -i HEAD~3
           | 
           | The command quoted in my original comment is just this we
           | strip away the SHA256 garbage:                 git log
           | abac87a..f787cac
           | 
           | (Or maybe it is:                 git log abac87a^..f787cac^
           | 
           | I cannot guess whether the ^ operator still has the same
           | meaning or whether it is part of this ^{sha...} notation.)
           | 
           | The hashes will typically be copy and pasted, so you type
           | just the _git log_ , _.._ and spaces.
           | 
           | The fixed parts of convoluted git syntax can be hidden behind
           | shell functions and aliases. But notations for _referencing_
           | objects are not fixed; they will end up as arguments.
        
             | microtherion wrote:
             | As others have pointed out, there already is precedent for
             | ^{...}, so if you're comfortable with the other uses, I'm
             | not sure why you should NOT be comfortable with this new
             | addition.
        
             | [deleted]
        
             | jolmg wrote:
             | > I cannot guess whether the ^ operator still has the same
             | meaning or whether it is part of this ^{sha...} notation.)
             | 
             | This isn't the first ^{...} notation. The manpage
             | gitrevisions(7) also mentions <rev>^{/<text>} for
             | referencing a commit based on a regular expression of its
             | commit message, like                 git checkout 'add-
             | search^{/finished query builder}'
             | 
             | Though, this new notation is probably more in-line with the
             | notation <rev>^{<type>}, which lets you disambiguate what
             | you put in <rev> as in deadbeef^{tag}, so that it's not
             | confused with deadbeef^{commit}.
             | 
             | EDIT: The article doesn't mention it, but I imagine one
             | interpretation would take precedence and cause git to issue
             | a warning when it's ambiguous. Right now, if I tag a commit
             | with the hash of another commit, its interpretation as a
             | tag takes precedence and I get a warning at the top,
             | "warning: refname '368bc6e' is ambiguous." That would mean
             | you'd only ever write ^{sha256} when the provided part of a
             | sha256 hash is ambiguous with an existing sha1 hash or
             | something else like a tag. That's also vice versa with
             | ^{sha1}.
        
             | wyoung2 wrote:
             | > That cute rhetoric will not fool anyone.
             | 
             | Well, let's see, the Fossil equivalents are:
             | 
             | 1. Do nothing at all for a conversion from the SHA-1 to
             | SHA-3 -- yes, 3, not 2 as in Git! -- because it's automatic
             | for months now and dead easy going back 3 years now.
             | (https://www.fossil-
             | scm.org/fossil/doc/trunk/www/hashpolicy.w...)
             | 
             | 2. "fossil diff"
             | 
             | 3. "fossil ci"
             | 
             | 4. Why are you rebasing in the first place, again?
             | https://www.fossil-
             | scm.org/fossil/doc/trunk/www/rebaseharm.m...
        
               | danShumway wrote:
               | Articles like this are eye opening to me, in a bad way.
               | Every once in a while, I get really curious about giving
               | Fossil a try, because it does have some legitimately cool
               | ideas, and then I see the documentation saying things
               | like:
               | 
               | > Rebasing is the same as lying
               | 
               | And I think, "Holy crud do I not want to be part of this
               | community."
               | 
               | The nice thing about Git is that (within reason) once I
               | understood it, I was able to use it in very flexible
               | ways.
               | 
               | It's really common for different projects I manage to
               | range all over the place from the extreme "commits as
               | literal history" perspective all the way to the "commits
               | as literature/guide" perspective. Sometimes I don't
               | rebase at all, sometimes I rebase a lot. Sometimes I
               | commit everything, all the time, sometimes I refuse to
               | commit any code that isn't a deployable feature.
               | Sometimes I leave branches as historical artifacts,
               | sometimes I don't care about history and I'm just trying
               | to coordinate developers across timelines.
               | 
               | That's not to say that Git isn't opinionated about some
               | things -- nearly all good tools have at least a few
               | strong opinions. But Git passes the (IMO extremely low)
               | bar of not conflating a workflow decision with a moral
               | failing. Over the years as a software engineer, I've
               | learned to be somewhat skeptical of programming/workflow
               | heuristics advertised as rules, and to be _very_
               | skeptical of heuristics advertised as ideologies.
               | 
               | I really don't understand the perspective of someone who
               | can't think of even one good reason why they would ever
               | want to edit history. You've never accidentally committed
               | a password to repo, or had to respond to a takedown
               | request?
        
               | wyoung2 wrote:
               | > sometimes I don't care about history and I'm just
               | trying to coordinate developers across timelines
               | 
               | The fact that Fossil preserves history does not prevent
               | you from coordinating with people across timelines. It is
               | rather the whole point of a DVCS.
               | 
               | > conflating a workflow decision with a moral failing
               | 
               | I think it's fairer to say that we don't think a data
               | repository is any place for lies of any sort, even white
               | lies.
               | 
               | > I've learned to be somewhat skeptical of
               | programming/workflow heuristics advertised as rules, and
               | to be very skeptical of heuristics advertised as
               | ideologies.
               | 
               | Sure, flexible tools are often better than inflexible
               | ones, but you also have to consider the cost of the
               | flexibility. Here, it means someone can say "this
               | happened at some point in the past," and it's just plain
               | wrong.
               | 
               | That isn't always an important thing. Most filesystems
               | and databases operate on the same principle, presenting
               | only the current truth, not any past truth.
               | 
               | Yet, we also have snapshotting in DBMSes and filesystems,
               | because it's often very useful to be able to say, "This
               | was the state of the system as of 2020.02.04."
               | 
               | You don't need a snapshotting filesystem for everything,
               | and you don't need Fossil for everything, but it sure is
               | nice to have ready access to both when needed.
               | 
               | > You've never accidentally committed a password to repo,
               | or had to respond to a takedown request?
               | 
               | Fossil has shunning for that: https://fossil-
               | scm.org/fossil/doc/trunk/www/shunning.wiki
               | 
               | And no, shunning is nothing at all like rebase, which
               | should be clear from the article.
               | 
               | Fossil also has the `amend` command: http://fossil-
               | scm.org/fossil/help?cmd=amend
               | 
               | And no, it is also not like rebase, because it only adds
               | to the project history, it never destroys information.
        
               | dahart wrote:
               | > I think it's fairer to say that we don't think a data
               | repository is any place for lies of any sort, even white
               | lies.
               | 
               | I, too, wish this extreme hyperbole would be just left
               | out of the discussion completely. It is offputting, and I
               | think it's intentionally a bad faith argument, it fails
               | to acknowledge the utility, the design intent, and the
               | context behind rebase, which has been talked about at
               | length by Linus and others.
               | 
               | When rebase is used as designed, according to the golden
               | rule, it's not modifying published history, so it's not
               | "lying". Whether rebase has safety problems is a separate
               | issue from whether it's use as designed amounts to being
               | "dishonest".
               | 
               | I'm all in favor of improved design choices, and if
               | Fossil is making those better design choices, let them
               | stand on their own without intentionally denigrating git
               | and every user of git through utter exaggeration.
        
               | danShumway wrote:
               | My understanding is that shunning is blacklisting
               | specific artifacts. That's nice, but I don't understand
               | how that solves the problem.
               | 
               | When I revise history in Git, even if it's just doing
               | something as simple as removing sensitive information, I
               | often need to replace that information, either through
               | new commits, or by introducing minor edits to surrounding
               | commits. I could add those changes on top of my current
               | HEAD, but then checkouts of old versions would be broken.
               | On the other hand, if I can just replay my commits while
               | inserting extra code, I'll end up with something that's
               | pretty close to my original history, with just the
               | offending information excluded/replaced.
               | 
               | That carries the cost that people will need to force pull
               | my repo, but at least the repo history will still roughly
               | correspond to what development looked like, rather than
               | being out-of-order and mostly impossible to build except
               | for at my current HEAD.
               | 
               | As a followup question, what do you do if the sensitive
               | information you need to exclude is in a commit message?
               | `amend` won't help you, since it's not destroying
               | information. Do you shun that commit and then... what?
               | 
               | It just seems like destroying information isn't enough
               | unless you can also replace it?
               | 
               | > Sure, flexible tools are often better than inflexible
               | ones, but you also have to consider the cost of the
               | flexibility.
               | 
               | I appreciate this -- I like having multiple tools for
               | different purposes. I don't see a problem with having a
               | VC that focuses on auditability, or having one that goes
               | in a radically different direction from Git. Fossil has
               | very interesting ideas, which is why I try to pay it some
               | attention whenever I see it mentioned or linked to.
               | 
               | However, whenever I follow those links and start digging
               | deeper into the philosophy behind its design decisions,
               | inevitably the conversation changes from, "here's our
               | alternative approach to Git" to "what Git does is
               | fundamentally wrong". It's not, "Fossil doesn't have this
               | problem because we eschew rebasing", it's "why would
               | anyone rebase?"
               | 
               | (Nearly) all architectural decisions have good and bad
               | consequences. Sometimes those consequences are
               | imbalanced, so we have heuristics that can say things
               | like, "often X is a bad idea." That's fine.
               | 
               | More harmfully, sometimes people extend heuristics into
               | rules that say, "it's never a good idea to do X".
               | Programming rules are usually wrong.
               | 
               | But programming ideologies the worst, because they say,
               | "there is something mentally or morally wrong with a
               | person who would do X". This is toxic for the reasons
               | that Fossil devs already mention in their documentation:
               | 
               | > programmers should avoid linking their code with their
               | sense of self
               | 
               | Programming ideologies explicitly encourage developers to
               | have egos, because ideology conflates architectural
               | decisions and workflow processes with individual worth.
               | Programming ideologies make it harder for people to grow
               | as programmers, because they tie intellectual growth to
               | fears about being wrong. They're completely toxic.
               | 
               | And is Fossil's documentation promoting an ideology? I'm
               | guessing that you'd disagree with me on this, but my take
               | is that when Fossil's official documentation says things
               | like:
               | 
               | > Honorable writers adjust their narrative to fit
               | history. Rebase adjusts history to fit the narrative.
               | 
               | or
               | 
               | > It is dishonest. It deliberately omits historical
               | information. It causes problems for collaboration. And it
               | has no offsetting benefits.
               | 
               | That's not designing a focused tool to support specific
               | heuristics, or making a case that, "sometimes strict
               | auditability is important". That's just trolling for
               | fights.
        
               | wyoung2 wrote:
               | > ideology conflates architectural decisions and workflow
               | processes with individual worth
               | 
               | No. You start with the ideology based on your local
               | culture and project needs, then you pick the tool that
               | supports your project's needs.
               | 
               | This is why we spend so much time talking about
               | philosophy in the Fossil vs. Git article, particularly
               | this section: https://fossil-
               | scm.org/fossil/doc/trunk/www/fossil-v-git.wik...
               | 
               | Which of the two philosophies matches better with the way
               | your project works? That alone is a pretty good guide to
               | whether you want Fossil or Git. (Or something else!)
        
               | kazinator wrote:
               | I have no interest in Fossil because it stores stuff in
               | sqlite databases instead of the filesystem which I think
               | is a stupid approach. I'm also not interested in version
               | control systems that are dragging along a wiki and bug
               | tracker. I just want a C program in /usr/bin that does
               | version control.
        
               | wyoung2 wrote:
               | SQLite can be considerably faster than the filesystem:
               | https://www.sqlite.org/fasterthanfs.html
               | 
               | If you think your filesystem-based Git repo is easy to
               | manipulate, go poking around in there, and what you'll
               | find is a bespoke one-off pile-of-files database! Given a
               | choice between Git's DB and SQLite, I put more trust into
               | SQLite.
               | 
               | > I just want a C program in /usr/bin that does version
               | control.
               | 
               | ...which Git doesn't provide. Git is hundreds of files
               | scattered all over your filesystem, a large number of
               | which aren't C binaries anyway, and of those that are,
               | only one of them is the front-end program sitting in
               | /usr/bin, whereas Fossil _can_ be built to a single
               | static executable in  /usr/bin.
               | 
               | And if you _can 't_ build Fossil statically on your
               | system, it's likely due to an OS limitation rather than
               | something about Fossil itself, as on RHEL where they've
               | made fully static linking rather difficult in the past
               | few releases.
               | 
               | Getting back to Git, large chunks of Git are written in
               | POSIX shell, Perl, Python, and Tcl/Tk. Almost all of
               | Fossil is written in C, and the rest of the code is
               | embedded within that binary running under built-in
               | interpreters rather than depending on platform
               | interpreters.
               | 
               | This has nice knock-on effects, one of which is that
               | Fossil is truly native on Windows, whereas you have to
               | drag along a Linux portability environment to run Git on
               | Windows. Another is that Fossil plays nicely with
               | chroot/jail/container technology.
               | 
               | > I'm also not interested in version control systems that
               | are dragging along a wiki and bug tracker.
               | 
               | Not a GitHub or GitLab user, then, I'm guessing?
        
               | kazinator wrote:
               | The diatribe against rebasing is stupid. In fact, not
               | having more than one parent is a good thing because you
               | with multiple parents, you don't know what is relevant.
               | The history has turned into a hairball. When you try to
               | navigate back in time, you face forking roads at every
               | step and it turns into a maze walk.
               | 
               | The point is valid that when we rebase, we are losing
               | history: the context of where that change was originally
               | parented.
               | 
               | However, (1) the history does not matter if the change
               | was parented in some temporary context, like your
               | unpublished changes and (2) the information can be
               | tracked in other ways, such as a Gerrit Change-Id (or
               | something like it) in the commit message.
               | 
               | Regarding (1) the extra parent pointers in a merge commit
               | cause retention of garbage. If we do everything with
               | merge instead of rebase, we will never lose any of the
               | temporary commits. If we prepare an unpublished change
               | through numerous rebase operations, all that temporary
               | crap will stay referenced from the head, waste space and
               | confuse other people with irrelevant information when
               | they try to navigate the history.
        
               | wyoung2 wrote:
               | > history does not matter if the change was parented in
               | some temporary context
               | 
               | It does if it means a big ball o' hackage lands on the
               | public working branch, since it complicates merges,
               | backouts, cherrypicks, and bisects.
               | 
               | Git users can also hide individual commit messages behind
               | one big combined message, losing part of the project's
               | development history and logical progression.
               | 
               | When I pull your repo and build it, and I find that it
               | doesn't build on my system, I don't want to dig through a
               | 500-line merge commit to figure out why you changed this
               | one line from the one that used to build last week, I
               | want the 14-line diff it was part of so I can begin to
               | understand what you were thinking when you committed it.
               | If I later find out that that 14-line change was wrong
               | but the rest of your 500-line merge was fine, I want to
               | be able to back it out with a single command. (In Fossil,
               | it's `fossil merge --backout abcd1234`.)
               | 
               | > confuse other people with irrelevant information when
               | they try to navigate the history.
               | 
               | How much time do you spend navigating the project's
               | history vs looking at the tip of the current branch?
               | 
               | I'd wager that the times you dig back into the history,
               | it's because you are in fact trying to figure out why you
               | got here, which means a trail of detailed breadcrumbs
               | will be more likely helpful than "...and between one week
               | and the next, something changed in commit abcd1234, but
               | we've lost all of its internal context, so we'll be
               | spending next week reconstructing it because Angie's on
               | vacation now."
        
               | kazinator wrote:
               | Regarding (1), not "everything will work as before".
               | 
               | What happens if a Fossil repo that has had SHA3 commits
               | written to it is accessed by old Fossil software before
               | that change was introduced?
        
               | wyoung2 wrote:
               | If you try to use Fossil 1.37 -- the last 1.x release --
               | to clone a repo that has SHA-3 hashed artifacts in it, it
               | says, "server returned an error - clone aborted". Since
               | 1.37 pre-dates this feature, it can't give a more
               | detailed diagnosis than that.
               | 
               | If you have an old clone made from before the transition
               | and try to update it, I'm not sure what it says, since I
               | don't have any of those around any more. It has, after
               | all, been three years since Fossil began to move on this
               | problem, so that it's largely a past issue for us now.
               | 
               | This transition time was indeed annoying for us over in
               | Fossil land, but Git's going to have to go through a
               | transition like this, too. The question isn't whether but
               | how long we'll have to wait for it to begin and how long
               | it'll take to complete.
        
             | allover wrote:
             | You are now ignoring the fact that in the initial quote you
             | objected to was the intentionally tongue-in-cheek:
             | 
             | > 'For a Git user interface this is relatively
             | _straightforward_ and concise '.
             | 
             | It kinda looks like you missed the joke and are now
             | doubling-down on your disagreement.
             | 
             | The author does _not_ think the proposed example is
             | reasonable. You 're in agreement.
        
               | kazinator wrote:
               | Since git is something that I rely on for everyday use,
               | and long-term data stroage, and its development is being
               | threatened by the inclusion of moronic changes I
               | completely disagree with, I'm completely unreceptive to
               | jokes. This is no laughing matter.
        
               | allover wrote:
               | I agree, things shouldn't be this bad.
               | 
               | But unless you're going to take this up with Linus,
               | you're just yelling at your fellow disappointed
               | spectators.
        
               | hinkley wrote:
               | And the responder makes a pretty unsubtle allusion to
               | Lovecraft.
               | 
               | Anyone who compares the git CLI to being driven insane by
               | Elder Gods is not defending the git CLI.
        
               | allover wrote:
               | Not sure if the HN thread or my comment has thrown you,
               | but I'm replying to 'kazinator'.
               | 
               | I know _he 's_ not defending it.
               | 
               | What I said is that he (kazinator) is inadvertently
               | attacking somebody that's _also_ not defending it (the
               | author).
        
         | hinkley wrote:
         | When people started using the phrase "Stockholm Syndrome" with
         | respect to git I took it as a sort of hyperbole. A rhetorical
         | device.
         | 
         | But the more 'improvements' they make to it the more literal
         | that accusation becomes in my head. And what's worse is that
         | I've grown enough callouses now that my response is an eyeroll
         | instead of pain. I use git all the time, but it's terrible and
         | I need something that is better, not just sucks less. And
         | apparently soon, because I don't know when that koolaid is
         | going to start looking good but it's not long now.
         | 
         | Send help.
        
         | scarejunba wrote:
         | Perhaps suggest an alternative? It may help understand why this
         | was chosen.
        
           | kazinator wrote:
           | One attractive alternative is not to do a thing.
           | 
           | Don't cave in to sky-is-falling bullshit regarding the
           | existing SHA1.
           | 
           | Git is not a crypto system; it's just version control.
           | 
           | We've used version control systems just fine that had no
           | integrity features at all. For isntance you can go into a RCS
           | ,v file and diddle anything you want. Some BSD people are
           | still on CVS, and their world hasn't fallen apart.
        
           | loeg wrote:
           | One alternative would be to just do lookups in both hash
           | databases (until SHA1 is fully migrated away from), and
           | reject invocations that conflict. Git's CLI already rejects
           | ambiguous short hash prefixes for SHA1, it could easily
           | reject ambiguous prefixes between SHA1 and SHA256 and
           | otherwise allow unique prefixes for either hash. This would
           | be pretty ergonomic for users.
        
             | jolmg wrote:
             | For most cases that would suffice and would be ergonomic,
             | but what if a full SHA1 also qualifies as a prefix of one
             | or more SHA256, and you want the SHA1? There's still a need
             | for a mechanism to disambiguate for these cases, even if it
             | ends up very rarely needed.
        
               | loeg wrote:
               | You're talking about a 160 bit truncated hash collision
               | on SHA256, which is extraordinarily unlikely if SHA256 is
               | not itself completely broken (moreso than SHA1 already
               | is!). I don't think any syntax is needed for that in the
               | porcelain CLI; it could be handled with non-user-facing
               | commands if it ever came up (it won't).
        
               | jolmg wrote:
               | > extraordinarily unlikely if SHA256 is not itself
               | completely broken (moreso than SHA1 already is
               | 
               | I was hoping I captured that by saying "very rarely".
               | However, if SHA1 collisions can be made willingly,
               | doesn't that mean that one can also willingly make a SHA1
               | hash that matches with the prefix of an existing SHA256
               | hash?
        
               | minitech wrote:
               | > doesn't that mean that one can also willingly make a
               | SHA1 hash that matches with the prefix of an existing
               | SHA256 hash?
               | 
               | No, the "prefix of an existing SHA256 hash" stops being
               | relevant at that point - that's just a full preimage
               | attack on SHA1. Isn't known to be feasible yet.
               | 
               | > I was hoping I captured that by saying "very rarely"
               | 
               | It's rarer than that. :)
        
               | loeg wrote:
               | As far as I know, that kind of collision isn't practical
               | at this time. So predicating UI decisions on that basis
               | seems like a mistake to me (given how long git has
               | already ignored the looming threat of SHA1 being broken).
               | 
               | When and if someone injects a SHA1 attack into your
               | repository, and the main git CLI throws up its hands and
               | says "hash collision" trying to access it, I'm not seeing
               | major problems here. The git CLI doesn't need to provide
               | convenient commands to interact with attacks that are not
               | practical today. To the extent that these will become
               | practical, I think git should drop the SHA1 lookup after
               | a migration period regardless, and it would not hurt to
               | provide a gitconfig knob to disable SHA1 lookup.
        
         | gouggoug wrote:
         | > _For a Git user interface this is relatively straightforward
         | and concise_
         | 
         | You forgot to include the end of that sentence, that
         | acknowledges your issue with it:
         | 
         | > _, but one can still imagine that users might tire of it
         | relatively quickly._
        
           | hinkley wrote:
           | I've had this argument at work.
           | 
           | "Tire of it quickly" and "have an immediate gag reflex" are
           | two completely different categories of negative reaction.
           | 
           | It's hard to see the sunset when you're down in the muck, and
           | eventually 'less bad' starts to look like progress to you.
           | It's a trap and you should be aware of it.
        
       | sandGorgon wrote:
       | does anyone know if github/bitbucket support it today ?
        
         | freddie_mercury wrote:
         | Why would they support it? The article clearly states it is
         | nowhere close to being useful yet.
         | 
         | It is untested, unstable code that can only write to
         | repositories and not read them.
         | 
         | "Much of the work to implement the SHA-256 transition has been
         | done, but it remains in a relatively unstable state and most of
         | it is not even being actively tested yet. In mid-January,
         | carlson posted the first part of this transition code, which
         | clearly only solves part of the problem:
         | 
         | "First, it contains the pieces necessary to set up repositories
         | and write _but not read_ extensions.objectFormat. In other
         | words, you can create a SHA-256 repository, but will be unable
         | to read it. "
        
           | sandGorgon wrote:
           | actually - i might have worded it confusingly.
           | 
           | For smaller projects (like my own), can i move to sha-256
           | with no expectation of backward compatibility _today_ ?
        
             | SAI_Peregrinus wrote:
             | "First, it contains the pieces necessary to set up
             | repositories and write _but not read_
             | extensions.objectFormat. In other words, you can create a
             | SHA-256 repository, but will be unable to read it. "
             | 
             | If you want it to be write-only, sure, go ahead!
        
       | tcharlton wrote:
       | I can't find documentation for the command in the article:
       | git convert-repo --to-hash=sha-256 --frobnicate-blobs --climb-
       | subtrees \         --liability-waiver=none --use-shovels
       | --carbon-offsets
       | 
       | Surely some of those options aren't real...
        
         | amarshall wrote:
         | > A new version of Git can be made...with a simple command
         | like: <command> ... note that the specific command-line options
         | may differ
         | 
         | Gives me the impression that it's a construction of the article
         | alone. Unsurprising, given the snark of the options.
        
         | buserror wrote:
         | Of course they are ?!?!!? https://git-man-page-
         | generator.lokaltog.net/
         | 
         | (never fails to amuse me)
        
           | bangboombang wrote:
           | Oh dear god. Because it's git related, my brain somehow still
           | tries to make sense of that stuff because it just seems so
           | real.
        
             | ekimekim wrote:
             | It's a curious feeling. Like reading code that is
             | syntactically valid but utterly nonsensical.
        
           | hrgiger wrote:
           | reminds me https://projects.haykranen.nl/java/
        
         | pabs3 wrote:
         | That seems to be intended to be humour.
        
           | kzrdude wrote:
           | Well, I loved it, for one.
        
         | throwaway744678 wrote:
         | I believe we have here an example of Poe's law [1]
         | 
         | [1] https://en.wikipedia.org/wiki/Poe%27s_law
        
         | [deleted]
        
       | strenholme wrote:
       | I'm already seeing a lot of discussion both here and over at LWN
       | about which hash algorithm to use.
       | 
       | The Git team made the right choice: SHA2-256 is the best choice
       | here; it has been around for 19 years and is still secure, in the
       | sense that there are no known attacks against it.
       | 
       | Both BLAKE[2/3] and SHA-3 (Keccak) have been around for 12 years
       | and are both secure; just as BLAKE2 and BLAKE3 are faster reduced
       | round variants of BLAKE, Keccak/SHA-3 has the official faster
       | reduced round Kangaroo12 and Marsupilami14 variants.
       | 
       | BLAKE is faster when using software to perform the hash; Keccak
       | is faster when using hardware to perform the hash. I prefer the
       | Keccak approach because it gives us more room for improved
       | performance once CPU makers create specialized instructions to
       | run it, while being fast enough in software. And, yes, SHA-3 has
       | the advantage of being the official successor to SHA-2.
        
         | _verandaguy wrote:
         | Honest question: what are the use cases in Git where hash
         | computation speed is a meaningful optimization?
        
           | strenholme wrote:
           | It's actually not a big deal with Git, which is why SHA2-256
           | is the right choice.
        
           | loeg wrote:
           | Rewriting all repos from SHA1 to hash-next?
        
           | SQLite wrote:
           | My experience in developing and maintaining Fossil is that
           | the hashing speed is not a factor, unless you are checking in
           | huge JPEGs or MP3s or something. And even then, the relative
           | performance of the various hash algorithms is not enough to
           | worry about.
        
           | papreclip wrote:
           | less wasted computation means less global warming
        
         | jayflux wrote:
         | > BLAKE is faster when using software to perform the hash
         | 
         | Is BLAKE 3 still faster than sha-256 when using the cpu
         | speciliazed instructions? I think most modern desktop CPUs has
         | built-in instructions for SHA256.
         | 
         | I'm guessing when people compare BLAKE 3 to SHA 256 they're
         | comparing software to software, but this wouldn't be the case
         | in reality?
        
           | strenholme wrote:
           | I haven't seen any benchmarks for BLAKE3 vs. the Intel/AMD
           | SHA extensions. My guess is that Intel hardware accelerated
           | SHA-256 will be faster than BLAKE3 running in software for
           | most real world uses.
           | 
           | I can tell you this much: It is only with Ice Lake, which was
           | released in the last year, that mainstream Intel chips
           | _finally_ got native hi speed SHA-NI support. Coffee Lake and
           | Comet Lake, which are still the CPUs in a lot of new laptops
           | being sold right now, do not support SHA-NI.
        
             | wahern wrote:
             | AMD Zen supports SHA extensions across all SKUs. Here are
             | `openssl speed` numbers on an AMD EPYC 3201:
             | type             16 bytes     64 bytes    256 bytes   1024
             | bytes   8192 bytes  16384 bytes       blake2s256
             | 46720.33k   187461.21k   305314.65k   373840.55k
             | 398207.66k   401528.15k       blake2b512       38423.44k
             | 155318.81k   422325.08k   592401.75k   674843.31k
             | 681743.70k       sha256           84620.44k   279840.47k
             | 723573.76k  1199678.81k  1484693.50k  1510484.65k
             | sha512           33854.38k   135674.20k   275343.70k
             | 444872.36k   545802.92k   554166.95k       sha3-256
             | 26146.35k   103860.27k   253944.92k   308119.21k
             | 347477.33k   351906.47k       sha3-512         26349.83k
             | 105590.85k   144236.03k   173082.62k   189448.19k
             | 189814.10k
             | 
             | It's possible that Blake3 might be faster than accelerated
             | SHA-256 on large inputs, where Blake3 can maximally
             | leverage its SIMD friendliness. OTOH, Blake3 really pushes
             | the envelope in terms of minimal security margin.
             | Performance isn't everything. SHA-3 is so slow because NIST
             | wanted a failsafe.
             | 
             | OpenSSL info:                 OpenSSL 1.1.1c  28 May 2019
             | built on: Tue Aug 20 11:46:33 2019 UTC
             | options:bn(64,64) rc4(8x,int) des(int) aes(partial)
             | blowfish(ptr)        compiler: gcc -fPIC -pthread -m64 -Wa,
             | --noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-
             | prefix-map=/build/openssl-D7S1fy/openssl-1.1.1c=. -fstack-
             | protector-strong -Wformat -Werror=format-security
             | -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC
             | -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2
             | -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5
             | -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM
             | -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM
             | -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM
             | -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
             | 
             | NOTE: /proc/cpuinfo shows sha_ni detection, and the apt-get
             | source of this version of OpenSSL confirms SHA extension
             | support in the source code, but I didn't confirm that it
             | was actually being used at runtime.
        
               | strenholme wrote:
               | Assuming Blake3 will be across the board 43% faster (7
               | instead of 10 rounds) than 32-bit blake2s256, we would
               | get:                 Blake3  SHA-256        66743
               | 84620    Tiny       846287  1199679  Medium (1024 bytes)
               | 973919  1510485  Largeish (16384 bytes)
               | 
               | This is based on the parent's numbers with a fudge factor
               | to account for Blake3 being a faster version of
               | blake2b512.
               | 
               | Of course, this does take in to account that Blake3 has
               | tree hashing and other modes which scale better to
               | multiple cores.
        
           | ptomato wrote:
           | On my machine with sha extensions, blake3 is about 15% faster
           | (single threaded in both cases) than sha256.
        
             | abecedarius wrote:
             | Also, Blake3 has some kind of advantage in
             | parallelizability, iirc.
        
               | ptomato wrote:
               | yeah, blake3 multi-threaded is about 11 times faster for
               | me than sha256 single-threaded.
        
         | KMag wrote:
         | SHA-256 is probably the right choice, but I don't think it's as
         | obvious as you suggest, given SHA-512/256.
         | 
         | SHA-512/256 is a standard peer-reviewed and well-studied way to
         | run SHA-512 with a different initial state and then truncate
         | output to 256 bits.
         | 
         | This is heavy bikeshedding, but SHA-512/256 would be a more
         | conservative choice than SHA-256. Under standard assumptions,
         | SHA-256 is no weaker than SHA-512. The structure is extremely
         | similar to SHA-256, but a collision on intermediate state
         | requires a collision on all 512 bits of state instead of 256.
         | 
         | On most 64-bit CPUs without dedicated hash instructions,
         | SHA-512/256 is faster for messages longer than a couple of
         | blocks, due to processing blocks twice as large in fewer than
         | twice as many operations.
         | 
         | Currently, the latest server and laptop CPUs have SHA-256
         | hardware acceleration but not SHA-512 acceleration. I'm not
         | sure how many phone CPUs support sha256 but not ARMv8.2-SHA
         | extensions (SHA-512). If it weren't for this difference in
         | hardware acceleration, there would be few reasons to use
         | SHA-256.
         | 
         | That being said, the current difference in hardware
         | acceleration support probably makes SHA-256 the right choice
         | here.
        
           | strenholme wrote:
           | SHA-512/256 is a lot newer than SHA2-256 (usually called
           | SHA-256, but I prefer the SHA2 prefix to make it clear that
           | it's a very different beast than SHA3-256), and its speed on
           | 32-bit CPUs is less than optimal, so I don't see it as being
           | a more conservative choice. In terms of security, it uses the
           | same 19-year-old unbroken algorithm as SHA2-256.
           | 
           | I am aware of the length extension issues, but they are not
           | relevant for Git's use case.
           | 
           | In terms of support, SHA-512/256 has, as you mentioned, less
           | hardware acceleration support, and it's also not supported in
           | a lot of mainstream programs like GNU Coreutils. I also know
           | that some companies mandate using SHA2-256 whenever a
           | cryptographic hash is needed.
           | 
           | Git made the right choice with SHA2-256: It's the most widely
           | supported secure cryptographic hash out there.
        
       | mratsim wrote:
       | > Thus, unlike some other source-code management systems, Git
       | does not (conceptually, at least) record "deltas" from one
       | revision to the next. It thus forms a sort of blockchain, with
       | each block containing the state of the repository at a given
       | commit.
       | 
       | Color me surprised, dropping the "blockchain" word in the middle
       | of the introduction
        
         | AndrewDucker wrote:
         | Git is a blockchain.
         | 
         | Being, as it is, a chain of signed blocks.
        
           | tonyedgecombe wrote:
           | Git is a Merkle tree as is a blockchain.
           | 
           | https://en.wikipedia.org/wiki/Merkle_tree
        
             | the8472 wrote:
             | It is more a DAG than a tree.
        
               | afiori wrote:
               | And talking about hash attacks it becomes relevant to
               | consider the possibility of it being just a Directed
               | Graph
        
           | afiori wrote:
           | It is still a sort of namedropping. In the sense that it is
           | used due to the trendiness of the term.
           | 
           | It is entirely possible and likely that it is used for
           | didactic purposes as many people are familiar with the
           | blockchain structure and its use of hashes.
        
             | bawolff wrote:
             | I thought it was a joke. The whole: blockchain isnt that
             | inovative if you use a strict technical definition because
             | lots of things are a chain of blocks before bitcoin was
             | cool, meme.
             | 
             | And honestly fair enough. The inovative part of bitcoin is
             | not the blockchain but all the economics & game theory
             | going on to create trust in the system
        
               | afiori wrote:
               | I think this is the reason why it the parent was
               | criticizing it. Blockchain as a term generally mean
               | "crypto-magic-stuff on a blockchain" so for git to use it
               | instead of a more academical Merkel trees (or Merkel DAG
               | if they exist) sound a bit like low effort name dropping.
               | 
               | Again it is not a criticism of the article, but it is not
               | a criticism of the criticism either.
        
       | kazinator wrote:
       | I'm completely against this security theater nonsense; please
       | keep my git SHA-1.
       | 
       | Please fork git for this and call it something else, like git6,
       | and ensure that git6 cannot push to git repos.
        
       | powerapple wrote:
       | is it a real problem for git? Do we merge code based on hashes
       | instead of looking at the code.
        
         | donatj wrote:
         | The problem is in if you can make evil code with the same hash
         | as innocuous code, you can poison people who pull from a given
         | repo you have access to. It would allow you to make changes to
         | the history without merging anything or anyone being the wiser.
         | 
         | It makes the distributed aspect of git untrustworthy, as
         | previously you knew if you pulled from anywhere and the hash
         | was good, you'd pulled the correct code. With SHA1 being
         | functionally broken that's no longer necessarily the case.
        
       | [deleted]
        
       | throwaway-q2233 wrote:
       | Can't just take Sha 1 of Sha 256?
        
         | loeg wrote:
         | Nope.
        
       | anaisbetts wrote:
       | I don't understand the practical attack vector for breaking SHA1s
       | in Git. Not only are objects checksummed by SHA1, they also
       | encode the _length_. Finding a SHA1 collision is plausible, but
       | finding a SHA1 collision that both lets you do something
       | Nefarious, _and_ is the length you need, seems really really
       | unlikely
        
         | acidictadpole wrote:
         | The author does seem to concede that hitting all the checkmarks
         | in an attack on git would be pretty tricky:
         | 
         | > An attacker would not just have to do that, though; this new
         | version would have to contain the desired hostile code, still
         | function as a working floppy driver, and not look like an
         | obfuscated C code contest entry
         | 
         | The whole idea is that they want to switch away before these
         | things become likely. They are unlikely now, but SHA-1 is only
         | getting weaker as time goes by and more research is done.
        
           | pdonis wrote:
           | _> and not look like an obfuscated C code contest entry_
           | 
           | The full quote here is even better:
           | 
           | "and not look like an obfuscated C code contest entry (at
           | least not more than it already does)."
        
         | xxs wrote:
         | Length actually is already part of the hash result... and the
         | sha-1 collision uses the same length pdfs.
        
         | phaemon wrote:
         | As mentioned, the shattered PDFs[1] have the same length,
         | however it's worth noting that adding the Git header breaks the
         | matching, ie. you get different SHA sums for the files in Git
         | because of the header.
         | 
         | [1] https://shattered.io/
        
         | tialaramex wrote:
         | This makes no sense. The collision manufacture algorithm of
         | course produces the same length output in both the A and B
         | documents. Doing otherwise would be considerably harder in
         | fact.
        
         | kzrdude wrote:
         | SHA1 is crumbling. It's being replaced because it is likely to
         | be broken further, in practice.
        
         | upofadown wrote:
         | To make the collision work you need to produce two different
         | files, both with some randomish looking junk in them. So if you
         | can do that in a way where you can substitute one of the files
         | for the other without getting caught then you are almost for
         | sure smart enough to also figure out a way to make the lengths
         | the same.
        
         | HereBeBeasties wrote:
         | You're assuming that 100% of the source code matters, but most
         | source code has comments. Some has a lot of comments
         | (boilerplate headers). Delete all the comments and superfluous
         | whitespace, add nefarious code, put in a comment in the
         | remaining bytes for the sole purpose of causing a hash
         | collision (likely plenty of bytes to play with).
        
           | scoutt wrote:
           | Yes, but...
           | 
           | > this new version would have to contain the desired hostile
           | code, still function as a working floppy driver, and _not
           | look like an obfuscated C code contest entry_
           | 
           | It's still plausible that one can pull a trick like that to
           | introduce malicious code into the repo, but improbable.
        
         | est31 wrote:
         | The shattered collision attack featured two pdfs with the same
         | sha1 and wait for it, the same length. Also note that even with
         | normal sha1, the length is hashed into the final sha1 hash
         | already, that's what the merkle damgard scheme is about. You
         | can read about it on Wikipedia.
         | 
         | Reusing the precise collision from the shattered attack is made
         | impossible by initializing the state with _anything_ other than
         | the prefix from the shattered attack. But the cost for mounting
         | such an attack yourself is only 11k USD. However, as git uses
         | the sha1collisiondetection library, such an attack would be
         | detected by current git. Thus, this library is a much better
         | protection than the length encoding.
        
       | nnx wrote:
       | Surprising they didn't go with Blake3 instead since it has much
       | higher performance and Git's performance-oriented ethos.
        
         | nullc wrote:
         | > Git's performance-oriented ethos
         | 
         | Than sha256 will likely be preferable in the long run: It's
         | faster with SHA-NI than blake3.
         | 
         | If you're not developing on a system with sha-ni, get with the
         | program. Zen2 is freeking awesome. :)
        
           | wolfgke wrote:
           | > Zen2 is freeking awesome. :)
           | 
           | SHA-NI was introduced with the Intel Goldmont
           | microarchitecture.
        
             | nullc wrote:
             | Yes, but Goldmont is not particularly awesome. :)
             | Presumably goldmont would be a downgrade for many people.
             | 
             | (On AMD the first generation zen have sha-ni, FWIW)
        
               | wolfgke wrote:
               | Of course, processors that use one of the
               | Atom/Celeron/Pentium microarchitectures are not the best
               | choice if you desire maximum speed, but otherwise they
               | are surprisingly interesting processors (IMHO much more
               | interesting than what Intel delivers with the Core
               | series).
               | 
               | At this time, Intel often experiments with or introduces
               | features that are particularly interesting for embedded
               | usages first on the Atom. For example the already
               | mentioned SHA-NI. Another example are the MOVBE
               | instructions (insanely useful if you handle big-endian
               | data, for example in network packages (I am aware that on
               | older x86 processors, there exists the BSWAP
               | instruction)) - they were first introduced with Atom.
        
           | majewsky wrote:
           | Great! I can't wait to have to throw away perfectly fine
           | systems because of a new Git version. /s
        
         | pjc50 wrote:
         | I'm waiting for Blake7.
        
           | xxs wrote:
           | fond memories indeed....
        
         | curben wrote:
         | The decision was made before the release of Blake3. The article
         | did mention the algorithm is no longer hardcoded (hence the
         | ability to support both SHA1 & SHA256). This means it's
         | possible to transition to Blake3 (or any other) in future,
         | though it won't be trivial.
        
         | simias wrote:
         | Is a significant part of git's typical profile spent computing
         | hashes? I'm genuinely asking because I don't know the answer.
         | I'd expect all the diffing and (potentially fuzzy) merging to
         | be significantly more expensive operations, at least as far as
         | big-O is concerned.
        
           | hannob wrote:
           | > Is a significant part of git's typical profile spent
           | computing hashes?
           | 
           | No. Hashes are really cheap.
           | 
           | This annoys me a bit, because every discussion about hashing
           | goes into endless bikeshedding which hash function to use.
           | The simple truth is: SHA2, SHA3, Blake2/3 are all good enough
           | from both a security and performance perspective that for
           | almost any use case and the advantages and disadvantages are
           | so minor that it really doesn't matter.
        
             | tialaramex wrote:
             | Length extension is an unnecessary problem in MD
             | constructions. It makes sense to get rid of the problem. So
             | if you are building a new thing today there's some sense in
             | not picking SHA-256 in order that you won't later hit your
             | head on a length extension attack. SHA-512/256 (that's not
             | a choice, it's just one hash in the SHA2 family) is a
             | reasonable choice though, and of course if Git was
             | vulnerable to length extension somehow they'd be in trouble
             | years ago so for them why not SHA-256.
        
         | bjoli wrote:
         | There are organisations that can only use approved crypto for
         | various certifications and government contracts. It would be
         | bad to drive such users away from git.
         | 
         | Under "feedback from git people" on https://www.mercurial-
         | scm.org/wiki/SHA1TransitionPlan
        
         | Ayesh wrote:
         | Linux also has the ethos to choose boring technology. SHA2 has
         | been here for so long and battle tested. For the majority of
         | us, it is the natural choice. I'm not implying anything
         | negative about SHA3/Blake/Keccak.
        
         | dchest wrote:
         | BLAKE3 was released a month ago! The decision for new hash in
         | git was made about two years ago. BLAKE2 was considered,
         | though.
         | 
         | SHA-256 is fine. The biggest problem is switching to it...
        
         | fanf2 wrote:
         | They made the choice years ago and blake3 was announced last
         | month.
        
       | velox_io wrote:
       | Unless I'm missing something, why not just allow repositories to
       | be upgraded to SHA2 hashes? The only problem is ensuring
       | everyone's tooling supports it.
        
         | majewsky wrote:
         | This question is exactly what a major portion of the article
         | covers.
        
           | velox_io wrote:
           | It isn't the easiest article to read, plus they over
           | complicate things by talking about things such as truncating
           | SHA2 hashes.
           | 
           | I don't see why changing the hashing algorithm is so
           | problematic, hence the reason why I asked the question.
           | Converting a repository to SHA2 should be straight forward
           | (the only issue is everyone's tooling), you could also run
           | the repositories side-by-side. I'm genuinely interested as I
           | think Git & Bittorrent are quite elegant solutions to complex
           | problems.
        
             | majewsky wrote:
             | > the only issue is everyone's tooling
             | 
             | Exactly! If you've ever worked in a corporate environment,
             | you know the fun of having to support 10-year-old versions
             | of your favorite cutting-edge software.
        
         | londons_explore wrote:
         | I don't think it's that unreasonable to release git binaries
         | today with sha256 support, then wait 5 years, then make all new
         | commits use sha256.
         | 
         | Anyone who tries to use a git client more than 5 years old
         | wouldn't be able to pull+push to a new repo. Sounds reasonable
         | to me. Git clients more than a few years old are pretty broken
         | already due to TLS changes.
         | 
         | Keeping around a dual hash system forever sounds like baggage
         | and complexity that outweighs the benefits.
        
       | k5hp wrote:
       | Just in case: http://archive.is/omsjJ
        
       | ericfrederich wrote:
       | I'll have to update my program which generates vanity hashes. I
       | do enjoy starting projects with an obligatory "Initial Commit"
       | with a deadbeef SHA-1
        
         | bmn__ wrote:
         | I like to start a repo with an "empty" commit, that is to say
         | its tree is the magic 4b825dc.
         | https://news.ycombinator.com/item?id=18342763
         | 
         | I wonder if it would still be practically possible to
         | manipulate the commit id.
        
           | wyoung2 wrote:
           | Wow! I wouldn't have guessed that Git had that vulnerability.
           | Fossil solves it easily: creating a new repo involves
           | generating a random project code (a nonce) which goes into
           | the hash of the first commit, so that even two identical
           | commit sequences won't produce identical blockchains.
           | 
           | Fossil lets you force the project ID on creating the repo,
           | but the capability only exists for special purposes.
        
           | loeg wrote:
           | Yep. You can inject arbitrary metadata into the git commit
           | object and the git cli ignores it, other than including it in
           | the hash. E.g., https://github.com/kevinwallace/gitbrute ,
           | https://github.com/kevinwallace/gitbrute/commit/0001111 .
        
       | PaulHoule wrote:
       | I am not a fan of SHA-256, you are better off with SHA-386 or
       | SHA-256/512 which resist prefix attacks and are actually a little
       | fast on 64 bit machines.
        
       | GlitchMr wrote:
       | I wonder if it would make sense to use `concat(sha1, sha256)`
       | hash algorithm. This wouldn't change the prefixes while improving
       | strength of an algorithm (by including SHA256 in a hash).
        
         | ascar wrote:
         | I'm probably missing something, but isn't it simpler to just
         | make both available separately and allow users to still
         | reference by sha1, if they want to, while sha256 can be used
         | for collision detection by git operations internally?
        
           | patrec wrote:
           | Correct, and I think this is what they are doing -- you can
           | optionally keep the sha1s around.
        
         | rocqua wrote:
         | There is a downside that this would mean commit-prefixes remain
         | sensitive to collisions. Hence anyone checking out a commit by
         | a hash-prefix would still be vulnerable.
         | 
         | Not a dealbreaker by far, but still a slight mark against this
         | solution.
        
           | u801e wrote:
           | Does git have code to detect whether a hash prefix is
           | ambiguous? I know that if you use a short prefix (which is
           | more likely to be shared by multiple objects), git will
           | output an error message staying that the object reference is
           | ambiguous IIRC.
        
             | loeg wrote:
             | Yes.
        
           | IshKebab wrote:
           | I don't see how it would change anything. A collision of a
           | short prefix is trivial to generate with any hash.
        
         | timvisee wrote:
         | Very interesting idea. But, wouldn't existing hashed be kept
         | intact anyway.
        
         | patrec wrote:
         | I supposed you are advocating two distinct Merkle trees?
         | Because otherwise the prefixes will change anyway.
         | 
         | But the only reason this would be attractive is because then
         | people could keep using the existing prefixes to refer to the
         | whole commit. But of course doing this would be insecure. So
         | for this to make any sense at all, people would need to make
         | good choices on when to use an insecure prefix and when to use
         | the whole hash, because it's security relevant. This seems a
         | bit doubtful to me.
        
           | GlitchMr wrote:
           | To be fair, the prefix problem would exist no matter what
           | hash function would you pick. GitHub displays 7 characters of
           | a hash, giving 28 bits. You could very quickly generate
           | collisions with birthday attack in pretty much no time.
           | Prefixes are always going to be insecure because they are so
           | short.
           | 
           | In fact, https://github.com/bradfitz/gitbrute exists.
        
             | patrec wrote:
             | Correct, but backwards compatibility does make a difference
             | here, as in: there are surely quite a few cases where it
             | would not be attractive to use a shortened hash if git
             | hashes are changed incompatibly anyway, but where it will
             | be attractive to use the shortened hash, because that keeps
             | an existing setup working as before.
             | 
             | Also: the prefixing increases the length of the hash (and
             | hence the desire to shorten it) without adding any
             | security.
        
               | GlitchMr wrote:
               | Yeah, kinda agreeing here. The hash length will need to
               | be increased anyway, but concatenation of SHA1 and SHA256
               | will be 104 bytes in total when displayed (40 + 64),
               | which is a lot.
               | 
               | It may be a better to display SHA-256 commit hashes, but
               | accept SHA-1 hash prefixes for old commits. It may be
               | confusing for git to accept hashes that aren't visible in
               | `git log`, but it's probably for the better.
        
         | dchest wrote:
         | Something to remember about the security of concatenated
         | hashes: https://crypto.stackexchange.com/a/63543/291
        
           | bangboombang wrote:
           | This is pretty interesting and shows you shouldn't try to
           | pull any sort of stunts if you're not a crypto expert. I've
           | actually wondered before whether md5 + sha1 would result in
           | something stronger than those two used individually. Now I
           | know.
        
             | bawolff wrote:
             | The linked article doesnt contradict the original post.
             | Linked article says strength of 2 hash algos (of this type)
             | is only as strong as the strongest and not the sum of their
             | strengths. But original poster only needed the combined
             | hash to be as strong as the sha256 for his/her purpose.
             | 
             | Notwithstanding, i still dont like it as an idea.
        
             | GlitchMr wrote:
             | By the way, this may be rather obvious, but concatenating
             | hash algorithms is a terrible idea for passwords. A
             | password cracker could easily pick the less secure
             | algorithm to crack, and ignore the other hash.
             | 
             | Note that git doesn't concern itself with reversing a hash
             | function. The commit contents are part of a repository,
             | there is no value in guessing the commit contents basing on
             | its hash. Here, the hash function choice is purely about
             | collision resistance.
             | 
             | But yeah, don't do weird things with hashes. Cryptography
             | is hard. Don't invent memecrypto:
             | https://twitter.com/sciresm/status/912082817412063233, it's
             | not going to increase the security. Use a single algorithm
             | if you can. Don't transform the output of a hash function
             | in any way.
        
           | GlitchMr wrote:
           | I'm well aware concatenation wouldn't necessarily improve the
           | strength. However, the idea is, even if SHA-1 was hopelessly
           | broken. CONCAT(SHA1(x), SHA256(x)) would be at least as
           | strong as SHA-256 (where "at least" means it may have the
           | same strength).
        
             | Double_a_92 wrote:
             | If you know that it's a concatenation, couldn't you _only_
             | look at the SHA1 part and _completely bypass_ any other
             | strong hash? On second thought probably not, because you
             | might find _any_ possible collision, that isn 't a
             | collition on all the other hash algorithms. If you
             | bruteforce through a password list it would still apply
             | though.
        
               | GlitchMr wrote:
               | This doesn't work for collision resistance attacks. git
               | commits aren't password hashes. Specifically, the
               | attacker's goal in this case is to find different values
               | a and b for which hash(a) = hash(b), rather than finding
               | a value of m in h = hash(m) for known h.
        
               | pwagland wrote:
               | This would help if you _only_ shared the prefix, however
               | git would still use the full hash.
               | 
               | The proposed method would have the advantage of keeping
               | existing known abbreviations, which are _already_ less
               | secure than SHA-1, while keeping the security of the
               | second hash.
               | 
               | It also has the disadvantage that the full hash would
               | become excessively large and unwieldy, so pros and cons.
        
               | simias wrote:
               | Things like signed commits would still use the full hash,
               | so that would make tampering with that impossible.
               | 
               | This solution would basically just make the UI backward-
               | compatible while still requiring the complete
               | modification of the internal to change the hash function.
               | 
               | You'd still risk a collision if you refer to commits
               | using a shortened hash outside of git but something tells
               | me that you don't even need a vulnerability to take
               | advantage of that if you have an attack vector. For
               | instance github seems to use 7 hex digit in short hashes,
               | this could probably be bruteforced relatively easily (be
               | it for SHA-1 or SHA-256). To give you an idea I looked at
               | the current bitcoin difficulty (which AFAIK uses two
               | rounds of SHA-256 internally and works by bruteforcing
               | hashes with a certain number of leaning zeroes) and the
               | hashes look like this: 000000000000000000028048b31e42bd53
               | d3b36da90d1a840ae695ec1a5ee738
        
       | donatj wrote:
       | Excuse my ignorance, but couldn't they just add a SHA256 hash to
       | commit objects (or some new commit-verify object) of the entire
       | trees current concatenated content, leave everything else SHA1
       | and get the same benefit without rewriting the entire thing from
       | the ground up? Git could even do that as part of the git gc step
       | slowly over time - tag commits with a secondary hash.
       | 
       | Rewriting the whole thing including every git repos history seems
       | like throwing the baby out with the bathwater, when you could
       | just add a secondary transparent verification instead. Just seems
       | like there has to be a better way.
        
         | kzrdude wrote:
         | Hashing everything in one go doesn't scale well. When making a
         | new commit you want to only hash a proportional part of the
         | repository, and the tree structure of git allows that, only the
         | files and "tree" objects (directory listings) that change are
         | hashed again.
        
         | wongarsu wrote:
         | You can't change past commits to add that hash (without
         | changing all commit hashes), so this method could only protect
         | new commits. For any existing repo this would lead to a very
         | weird security model: We admit that sha1 hashes are broken, and
         | only guarantee that commits made by git versions newer than git
         | x.x.x are safe from after-the-fact modification (or
         | alternatively only commits made after date X).
        
           | jayd16 wrote:
           | What if we use the exploit to add the new data but keep the
           | sha1 the same? :)
        
           | gregmac wrote:
           | My inclination is that protecting only new commits might be
           | enough, but it gets me thinking: What would a practical
           | attack on this look like, assuming sha1 was broken? Let's say
           | I'm trying to insert a line of code that does something
           | nefarious, and that it's now trivial to generate "magic text"
           | you can stick anywhere in a file (eg, inside a comment at the
           | end of a line) to get any desired sha1 hash.
           | 
           | Are all the other future commits still valid, or am I going
           | to suddenly get conflicts or garbled text? Depending on where
           | the modification is done, that code might have gone through
           | much more churn -- especially if there are a bunch of sha-256
           | commits after it (which I can't attack). I don't know enough
           | about how git stores content blobs to answer this.
           | 
           | Second problem: Can I push my replacement commit to another
           | repository (eg, github)? Would even force push work? Do I
           | have to delete branches and re-push my own? If I already have
           | enough permission on the repository to do this, it means I
           | can already push whatever I want -- so does this attack _even
           | matter at all_?
           | 
           | Assuming that's successful (or I can trick people into using
           | my own repository), what will happen to someone that already
           | has a clone and does a pull? Will they get my change (and
           | will it work or be a pile of conflicts or garbled text)?
           | 
           | Even if only fresh clones will get the changes it could still
           | be quite devastating -- especially if using CI -- but I'm
           | just not clear if this attack is even theoretically possible.
        
             | masklinn wrote:
             | > Are all the other future commits still valid, or am I
             | going to suddenly get conflicts or garbled text? Depending
             | on where the modification is done, that code might have
             | gone through much more churn -- especially if there are a
             | bunch of sha-256 commits after it (which I can't attack). I
             | don't know enough about how git stores content blobs to
             | answer this.
             | 
             | A blob is a "snapshot" of a file. The next version of a
             | file is a completely different blob with no direct relation
             | to the previous.
             | 
             | "Pack files" use delta compression in order to lower the
             | actual size of "similar" blobs.
             | 
             | You _could_ get conflicts if you tried merging or rebasing
             | over the nefarious blob, and the  "patch history" (git log
             | -p, which builds the patch view on the fly) would show
             | possibly unexpected complete file replacements.
        
             | OJFord wrote:
             | > My inclination is that protecting only new commits might
             | be enough
             | 
             | Why? It's not the same as saying 'versions after vX are
             | safe', it's the same as saying 'any unsafety after vX was
             | there before, not introduced since' (both with 'as a result
             | of SHA-1 collision' qualifiers of course).
             | 
             | > Can I push my replacement commit to another repository
             | (eg, github)? Would even force push work?
             | 
             | Implementation dependent I suppose, but I wouldn't have
             | thought so - I don't see why they'd actually check the
             | content when the hash is supposed to indicate whether it
             | differs or not.
             | 
             | > Do I have to delete branches and re-push my own? If I
             | already have enough permission on the repository to do
             | this, it means I can already push whatever I want -- so
             | _does this attack even matter at all_?
             | 
             | I think an attack would look more like:                 1.
             | Create hostile commit that collides with extant commit SHA
             | 2. Infiltrate a package repository, or GitHub, or corporate
             | network, or ...       3. Insert hostile commit in place of
             | real one
             | 
             | Of course it's a problem if 2 & 3 happen alone anyway, but
             | the problem with the collision commit is that it makes it
             | so much less detectable.
        
               | Nullabillity wrote:
               | Git commits are snapshots, not diffs. Each commit
               | contains a tree, which contains a list of files and their
               | respective hashes. As long as its whole tree is SHA-256
               | then a commit should be safe, regardless of its history.
               | 
               | The downside to the migration would be that all unchanged
               | files would be stored twice (once identified by SHA1,
               | once identified by SHA-256). But you could work around
               | that by hardlinking identical files.
        
               | loeg wrote:
               | This doesn't protect subdirectories unless you rewrite
               | the entire tree structure with SHA256. I don't know if
               | Git does that now, or not. Git generally points to
               | unmodified subdirectories with the existing content hash;
               | if the SHA1 is pointed to by SHA256, which is implied by
               | the transition plan proposed in the grand-grandparent
               | comment, then those subdirectories are essentially
               | unprotected.
        
           | tzs wrote:
           | Couldn't they make a table that contains a list of all the
           | old objects by SHA1 hash, for for each contains the new
           | SHA256 hash of that object, and then commit this table in the
           | repository?
        
             | loeg wrote:
             | Yep.
        
           | arve0 wrote:
           | I'm not known with the internal data structure of git, but
           | couldn't you add the new hash as a commit in a new format "on
           | the side", leaving the original commit as is?
        
             | WorldMaker wrote:
             | Git does have an commit-related object called a note that
             | you can attach as a separate object. [1]
             | 
             | Presumably the proposed "hash translation store" could use
             | an approach similar to notes, and include the hash
             | translations as objects in the git database (hopefully in a
             | way that could be signed by a tag).
             | 
             | [1] http://alblue.bandlem.com/2011/11/git-tip-of-week-git-
             | notes....
        
             | loeg wrote:
             | This is kind of what rewriting the repo is. Yes, you could
             | leave the SHA1 commit tree around afterwards (i.e., for
             | convenience of existing URLs), but you wouldn't want to
             | keep SHA1 around as the authoritative hashname.
        
           | [deleted]
        
       | cm2187 wrote:
       | Stating the obvious but the hash is a hex, that leaves lots of
       | characters for a one character prefix for sha256 hashes. Like the
       | character "s" for instance.
        
       | speedgoose wrote:
       | >...a simple command like:
       | 
       | > git convert-repo --to-hash=sha-256 --frobnicate-blobs --climb-
       | subtrees --liability-waiver=none --use-shovels --carbon-offsets
       | 
       | Is it sarcasm ?
        
         | andrewflnr wrote:
         | Yes.
        
       | pkilgore wrote:
       | I love LWN's technical writing--its worth the cost of a
       | subscription!
        
       | sunil_saini wrote:
       | The above article suggests that Sha-1 collision is infeasible
       | because attacker has to come up with code that not only generate
       | same hash but also benefit him. But can't he just add some
       | malicious code and add some random text in comments to produce
       | same hash?
        
         | pornel wrote:
         | "produce same (specific) hash" is a pre-image attack, which is
         | very very hard. So hard, that even MD5 isn't broken for pre-
         | image, and there's only a theoretical pre-image attack against
         | MD4.
         | 
         | We know only collision attacks which is "produce 2 files with
         | the same hash, but you can't control what hash". So you can't
         | target any existing repo. You need to use social engineering to
         | get one of your special files into a repo.
        
       | zackmorris wrote:
       | Summary of hashing function security in bits, for convenience:
       | 
       | https://en.wikipedia.org/wiki/Secure_Hash_Algorithms
       | 
       | Since collision resistance is roughly half the number of bits, it
       | seems unconscionable to me that anything below 256 bit hashes
       | even exist, because 64 bits is crackable but 128 bits effectively
       | never will be. This was well-understood even in the 90s when MD5
       | and SHA were first published.
       | 
       | Just thinking about this for the first time, I don't buy any
       | argument about storage or performance, since those become less
       | important as time goes on. It feels like Linus made a mistake
       | here, and offloaded the inevitable work of upgrading repositories
       | onto the general public (socialized the cost) which is something
       | that all programmers should work harder to avoid.
       | 
       | Said as an armchair warrior who has never accomplished anything
       | of any importance, I realize.
        
         | mathnmusic wrote:
         | Also relevant: Multihash is a format for self-describing hashes
         | that helps with data portability and future-proofing:
         | https://github.com/multiformats/multihash
        
       | alkonaut wrote:
       | I didn't get the argument against just converting? Sure some code
       | bases are large and spread out, but any git repo needs to have
       | one blessed central point, and everyone needs to be able to just
       | re-clone from the central repository whenever history is
       | rewritten for whatever reason (could be that a huge file is
       | trimmed from the past etc). Why can't all commits in the Kernel
       | history be rewritten to SHA256? (Other than that it would be an
       | annoying interruption in the development)?
        
         | corbet wrote:
         | The kernel doesn't really have the one central blessed point of
         | which you speak. Sure you can grab mainline releases from
         | Linus's repository, but that's not where the development
         | actually happens. It really is a distributed project, and
         | having to delete all those old repositories would really hurt.
        
           | alkonaut wrote:
           | If 2 separate copies of the same repository does the same
           | rewrite to sha256, their histories are still compatible and
           | equal up to the point where they diverge. So other than that
           | the rewrite needs to happen in more places, it should still
           | be doable. Needs to happen at more or less the same time
           | however.
        
         | velcrovan wrote:
         | The whole point of git is that there doesn't need to be a
         | blessed central point.
        
           | RichardCA wrote:
           | Most development shops are using the traditional client-
           | server model, or self-host using Gitlab.
           | 
           | I personally would never allow a repo with two hashing
           | algorithms to exist on my watch.
           | 
           | If you have ever had to use a tool like BFG to prune large
           | objects from a repo you'll see it's not that bad, but it does
           | require users to re-clone.
           | 
           | I would want to use the same process for SHA256 - that is let
           | it be the default for new projects and then convert older
           | projects based on need.
           | 
           | But there needs to be a BFG style conversion tool that spits
           | out an object id map as output.
           | 
           | Here's more info on BFG: https://rtyley.github.io/bfg-repo-
           | cleaner/
        
       | jakeogh wrote:
       | Is there an archive of crypto related future predictions?
       | 
       | How long until a specified length preimage attack can break
       | bittorrent blocks?
       | 
       | I remember a paper published a ~decade ago estimating very short
       | (well funded) ASIC sha1 collisons. Anyone have that ref?
       | 
       | EDIT: Should I have not said preimage? My understanding is
       | bittorrent is broken (by DDoS, not infohash(?)) if you can make a
       | bad block that matches the length and sha1 of a target block.
        
         | glandium wrote:
         | > How long until a specified length preimage attack can break
         | bittorrent blocks?
         | 
         | Even MD5 still doesn't have a known preimage attack, so... many
         | many years?
        
           | tialaramex wrote:
           | To be fair for MD5 there is a known attack, it's just
           | impractical. It's a real attack though because the whole
           | point of a crypto hash is that you'd have to brute force it
           | to win, and the paper shows a slightly quicker way because
           | MD5 is broken. It's just not quick enough that you could
           | actually do it.
           | 
           | Oh wait, perhaps you actually meant preimage as you said
           | rather than I assumed second preimage. OK yes, that isn't
           | ever going to be possible for non-trivial inputs.
        
             | rocqua wrote:
             | I think OP meant 'viable' pre-image attack. Not just an
             | attack that is better than brute force.
        
             | GoblinSlayer wrote:
             | Cheap talk is hardly an attack though.
        
         | pabs3 wrote:
         | There is one for hashes:
         | 
         | http://valerieaurora.org/hash.html
        
           | strenholme wrote:
           | Not to mention this one, which covers more hash algorithms:
           | 
           | https://electriccoin.co/blog/lessons-from-the-history-of-
           | att...
        
         | hannob wrote:
         | Preimage is really a whole different beast than collission.
         | 
         | It's also not particularly surprising. Just by its length SHA-1
         | has in its best case 80 bits of collission security and 160
         | bits of preimage security.
         | 
         | Now its important to understand that attacks usually don't
         | cause full devastation, but they usually make attacks a bit
         | better than optimal.
         | 
         | Attacks in the 60 bit range is what's possible, attacks in the
         | 70 bit range is what's dangerous. It's easy to imagine that
         | relatively small deviation from optimal security gets SHA-1
         | from 80 into the dangerous territorry (the attacks are in the
         | low 60s range). However getting from 160 bit down to the 60/70
         | bit range would require massive improvements in attacks.
         | 
         | It's safe to say that SHA-1 is still very far from preimage
         | attacks. Still to be clear I'd still recommend to get rid of it
         | whereever you can. The far bigger risk is that you think you
         | only need preimage security, while you actually need collission
         | security for scenarios you haven't thought about.
        
         | tialaramex wrote:
         | > EDIT: Should I have not said preimage? My understanding is
         | bittorrent is broken (by DDoS, not infohash(?)) if you can make
         | a bad block that matches the length and sha1 of a target block.
         | 
         | There are three different attacks
         | 
         | 1. Collision, which is practical (expensive but practical) for
         | SHA-1 today, lets somebody make two documents A and B which
         | have the same hash. This is only useful if you can fool people
         | somehow into accepting document B when they think it's document
         | A because of the hash, for example with digital signatures.
         | 
         | 2. Pre-image, which is not practical for any hashes you care
         | about including MD5. This lets you find the document A given
         | the hash(A) value. This is very niche, since obviously for
         | large documents by the pigeon hole principle there will be many
         | such pre-images and it's impossible to get the "right" one, for
         | small inputs it can be relevant, sometimes.
         | 
         | 3. Second Pre-image, likewise not practical. Given either
         | document A or hash(A) which you could easily determine from
         | document A, this lets you produce a new document A' that is
         | different from A but hash(A') == hash(A). This would be
         | extremely bad, and is what you'd need to attack real world
         | Bittorrent from somebody else.
         | 
         | Often people say "pre-image" meaning strictly second pre-image,
         | it's usually clear from context, and a true pre-image attack as
         | I explained above is only rarely relevant.
         | 
         | Collision would only let bad guys corrupt their own
         | purposefully constructed collision bittorrent, which like, why?
         | So yes, Bittorrent would only really be in serious trouble if
         | there was a second pre-image attack. But on the other hand,
         | don't use broken cryptographic primitives. Attacks only get
         | better, always.
        
           | jakeogh wrote:
           | Thanks. Is there a name for collision with the same preimage
           | size?
        
           | dfox wrote:
           | The reason why people mostly meand second pre-image when
           | saying unqualified "pre-image" is that probably any
           | imaginable method of reversing a hash function (given
           | sufficiently long input to the hash) will with overwhelming
           | probability produce hash input that is different from the
           | original.
        
           | SAI_Peregrinus wrote:
           | 1.5 Chosen-prefix collision: Given a prefix A, generate two
           | values AB and AC, where B and C differ but are both prefixed
           | with A. (AX is A concatenated with X). This exists for SHA1.
           | It's more powerful than a basic collision wheri you can't
           | pick the prefix, but weaker than either type of pre-image.
        
             | wyoung2 wrote:
             | It's worth noting that this attack is a property of the
             | Merkle-Damgard hash construction, not of SHA-1
             | specifically, which means SHA-2 (Git's path forward) is
             | also vulnerable:
             | 
             | https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_c
             | o...
             | 
             | https://www.reddit.com/r/crypto/comments/44p5jc/eli5_why_ar
             | e...
             | 
             | Fossil uses SHA-3, which has an entirely different
             | construction, which is not at this time known to have a
             | similar weakness. SHA-3 is also much newer, with a much
             | shorter list of known attacks.
        
       ___________________________________________________________________
       (page generated 2020-02-04 23:00 UTC)