[HN Gopher] A new hash algorithm for Git ___________________________________________________________________ A new hash algorithm for Git Author : Tomte Score : 400 points Date : 2020-02-04 07:44 UTC (15 hours ago) (HTM) web link (lwn.net) (TXT) w3m dump (lwn.net) | angrygoat wrote: | This article is via an LWN subscriber link; a cheerful reminder | that LWN are good and they are worth subscribing to :) | https://lwn.net/subscribe/ | john-radio wrote: | OP's link says it is "subscription-only content," but it is | still publicly available. It says that it has been "made | available by an LWN subscriber." How does that work? | robjan wrote: | Subscribers receive a sharing link which can be used to share | articles with friends. They tolerate sharing on HN because it | brings in new customers. | globuous wrote: | Damn, lwn is sweeet !! :) | john-radio wrote: | That's one smart model. | cesarb wrote: | It's a "subscriber link": https://lwn.net/op/FAQ.lwn#slinks | brobdingnagians wrote: | I was interested in how fossil handled the SHA1 transition, and | found this nicely explained as below: | | https://fossil-scm.org/home/doc/trunk/www/hashpolicy.wiki | velcrovan wrote: | Fossil's main author is chiming in the discussion of this on | Fossil's forums: | | (https://fossil-scm.org/forum/forumpost/50a5bea5fb) | | > That's appalling. Fossil's implementation doesn't require a | conversion. | | "This is a key point, that I want to highlight. I'm sorry that | it wasn't made more clear in the LWN posting nor in the HN | discussion. | | "With Fossil, to begin using the new SHA3 hash algorithm, you | just upgrade your fossil binary. No further actions, workflow | changes, disruptions, or thought are required on the part of | the user. | | * "Old check-ins with SHA1 hashes continue to use their SHA1 | hash names." | | * "New check-ins automatically get more secure SHA3 hash | names." | | * "No repository conversions need to occur" | | * "Given a hash prefix, Fossil automatically figures out | whether it is dealing with a SHA1 or a SHA3 hash" | | * "No human brain-cycles are wasted trying to navigate through | a hash-algorithm cut-over." | | "Contrast this to Git, where a repository must be either all- | SHA1 or all-SHA2. Hence, to cut-over a repository requires | rebuilding the repository and in the process renaming all | historical artifacts -- essentially rebasing the entire | repository. The historical artifact renaming means that | external links to historical check-ins (such as in tickets) are | broken. And during the transition period, users have to be | constantly aware of whether they are using SHA1 or SHA2 hash | names. It is a big mess. It is no wonder, then, that few people | have been eager to transition their repositories over to the | newer SHA2 format." | seniorsassycat wrote: | The way I read the fossil's authors comments, old commits | continue to use sha1 hashes. A repository will be vulnerable | to sha1 collision attacks as long as there is an object in | the repository that has not been hashed with the new | algorithm. | | For example, floppy.c could be replaced in a repo with file | with the same sha1 hash as long as the last commit that | modifies floppy.c used a sha1 hash. | | Right? | wyoung2 wrote: | In addition to D. Richard Hipp's thoughts as HN user SQLite | -- author also of Fossil, so he oughtta know -- I offer | these: | | 1. Keep in mind that Fossil and Git are both applications | of blockchain technology, which in this particular | practical case means you must not only forge a single | artifact's hash, you must also do it in a way that allows | it to fit into the overall blockchain. | | 2. Fossil's sync protocol purposefully won't apply Dr. | Hipp's hypothetical evil.c to an existing Fossil blockchain | if presented it. Fossil will say, "I've already got that | one, thanks," and move on. Only new or outdated clones | could be so-fooled. | ajkjk wrote: | > applications of blockchain technology | | Are we saying this now? More like blockchain is an | application of git technology. | seniorsassycat wrote: | https://en.wikipedia.org/wiki/Merkle_tree | SQLite wrote: | "blockchain" is self-descriptive, easier to pronounce | (only two syllables instead of three), and easier to | spell correctly. :-) | _jal wrote: | No. We are not. | | If you're looking for prior art, ZFS's application of | Merkle trees predates both. I think there was some other | public use before that, but I can't recall right now. | Tyr42 wrote: | They are also using "Hardened SHA1", which detects | collision attacks, and assigns a longer id to commits which | seem malicious, while being backwards compatible. | SQLite wrote: | Just to be clear: Every time you modify a file, the new | changes get put in using SHA3. In an older repository, any | given commit might have some files identified using SHA1 | (assuming they have not changed in 3 years) and others | identified using SHA3. | | For example, the manifest of the latest SQLite check-in is | see at | (https://www.sqlite.org/src/artifact/29a969d6b1709b80). You | can see that most of the files have longer SHA3 hashes, but | some of the files that have not been touched in three years | still carry SHA1 hashes. | | An attack like what you describe is possible _if_ you could | generate an evil.c file that has the exact same SHA1 hash | as the older floppy.c file. Then you could substitute the | evil.c artifact in place of the floppy.c artifact, get some | unsuspecting victim to clone your modified repository, and | cause mischief that way. Note, however, that this is a pre- | image attack, which is rather more difficult to pull off | than the collision attacks against SHA1, and (to my | knowledge) has never been publicly demonstrated. | Furthermore, the evil.c file with the same SHA1 hash would | need to be valid C code that does something evil while | still yielding the same hash (good luck with that!) and | Fossil (like Git) has also switched over to Hardened SHA1, | making the attack even harder still. | | As still more defense, Fossil also maintains a MD5 hash | against the entire content of the commit. So, in addition | to finding evil.c that compiles, does your evil bidding, | has the same hardened-SHA1 hash as floppy.c, you also have | to make sure that the entire commit has the same MD5 hash | after substituting the text of evil.c in place of floppy.c. | | So, no, it is not really practical to hack a Fossil | repository as you describe. | wyoung2 wrote: | > Furthermore, the evil.c file with the same SHA1 hash | would need to be valid C code that does something evil | while still yielding the same hash | | ...and also produce an innocent-looking diff! | | I mean, you could stuff a bunch of random bytes into a C | comment to force the desired hash in the output using | these documented attack techniques, but anyone inspecting | the diffs between versions is likely to see such an | explosion of noise and call foul. | | If you want an analogy, it's like someone saying they've | learned to impersonate federal agent identification | cards, only it requires that the person carrying the fake | ID to have a thousand rainbow-dyed ducks on a leash in | tow behind him. | | Such attacks are fine when it's dumb software systems | doing the checks, but for a source code repository where | people do in fact visually check the diffs occasionally? | | Well, let's just say that when someone manages to use | SHAttered and/or SHAmbles type attacks on Git (or even | Fossil) I expect that it won't take a genius detective to | see that the repo's been attacked. | seniorsassycat wrote: | That's an argument for why you shouldn't worry about sha1 | attacks in source control, but we should take the attack | for granted when discussing how to mitigate the attack. | | If we weren't worried about sha1 collisions in git then | we wouldn't switch to a new hash function. | wyoung2 wrote: | When _is_ the right time to worry? Maybe wait until | someone publishes a practical attack, then wait years for | the new code to get sufficiently far out into the world | that you can switch to it? | | I mean, I see you're expressing concern, but the first | major red flag on this went up three years ago, and | another big one went up last month. (https://sha- | mbles.github.io/) | | When we dealt with this same problem over in Fossil land, | we ended up needing to wait most of three years for | Debian to finally ship a new enough binary that we could | switch the default to SHA-3. Fortunately (?) RHEL doesn't | ship Fossil, else we'd likely have had to wait even | longer. | | Atop that same problem, Git's also got tremendously more | inertia. Git has to wait out not only the Debian and RHEL | stable package policies but also all of that | infrastructure tooling they brag on. Every random | programmer's editor, merge tool, Git front end... all of | that which a project depends on will have to convert over | before that one project can move to a post-SHA-1 future. | | This is going to be a colossal mess. | tjoff wrote: | Many diff tools don't highlight whitespace-only changes. | Or at least not in a clear manner. | | Also, if something is replaced in the history how often | do people go back and view diffs in old code? Hardly | often enough to rely on it being spotted. | wyoung2 wrote: | It only takes one person to raise the flag. | | Sure, many thousands of people doing blind "git clone && | configure && sudo make install" could be burned by a | problem like this, but _someone_ would eventually do a | diff and see the problem on any project big enough to | have those thousands of trusting users in the first | place. | | I'm not excusing these SHA-1 weaknesses, only pointing | out that it won't be trivial to apply them to program | source code repos no matter how cheap the attacks get. | | For instance, the demonstration case for SHAttered was a | pair of PDFs: humans can't reasonably inspect those to | find whatever noise had to be stuffed into them to | achieve the result. | | I also understand that these SHA-1 weaknesses have been | used to attack X.509 certificates, but there again you | have a case very unlike a software code repo, where the | one doing the checking isn't another programmer but a | program. | remram wrote: | The problem is that we are considering an issue where | different people can get different objects for the same | hash. If the people checking all see the valid files, | they cannot raise any alarms to save the poor victims who | got poisoned with the wrong objects. They'll clone from | the wrong fork, and no amount of checking hashes or | signed tags will prevent them from running compromised | code. | wyoung2 wrote: | > If the people checking all see the valid files | | ...which will likely contain thousands of bytes of | pseudorandom data in order to force the hash collision... | | > they cannot raise any alarms | | You think a human won't be able to notice that the diff | from the last version they tested looks awfully funny? | Code that can fool the compiler into producing an evil | binary is one thing, but code that can pass a human code | review is quite another. | | You might be surprised how often that occurs. | | I don't do a diff before each third-party DVCS repo pull, | but I do diff the code when integrating such third-party | code into my projects, if only so I understand what | they've done since the last time I updated. Commit | messages, ChangeLogs, and release announcements only get | you so far. | | Back when I was producing binary packages for a popular | software distribution, I'd often be forced to diff the | code when producing new binaries, since several of the | popular binary package distribution systems are based on | patches atop pristine upstream source packages. (RPM, | DEB, Cygwin packages...) | | Each time a binary package creator updates, there's a | good chance they've had to diff the versions to work out | how to apply their old distro-specific patches atop the | new codebase. | | _Someone 's_ going to notice the first time this | happens, and my guess is that it'll happen rather | quickly. | seniorsassycat wrote: | Isn't this the same attack given as an example why git is | migrating hash functions in the subject article? | | The attack may be difficult and unlikely I'm not | questioning that, but if I understand correctly then | Fossil's migration is straightforward because they did | not address the same issues Git chose to. | SQLite wrote: | > if I understand correctly then Fossil's migration is | straightforward because they did not address the same | issues Git chose to. | | I think more is at play here. | | (1) You can set Fossil to ignore all SHA1 artifacts using | the "shun-sha1" hash policy. | | (2) The excess complication in the Git migration strategy | is likely due to the inability of the underlying Git file | formats to handle two different hash algorithms in the | same repository at the same time. | | But, I could be wrong. Post a rebuttal if you have | evidence to the contrary. | apeace wrote: | And if you are that concerned about this type of attack, | it may be worth your time to simply start a new Fossil | repository using the sha3-only hash policy (writing a | script to replay commits into the new repo, so you don't | lose history). | | It seems like a problem very few people need to worry | about and Fossil has made the right trade-offs. | mb7733 wrote: | Doesn't all of this apply to git just as well, except for | the last bit about the MD5 hash? | | It just seems to me that the Fossil maintainers have | decided that keeping all old SHA1 hashes is acceptable, | while the git maintainers have decided that it is not. | | Unless I've misunderstood, this is why it was "so easy" | for Fossil to transition to a new hashing algorithm. Not | some superiority in the design of Fossil, as implied on | the Fossil forums. | Tyr42 wrote: | Ah, so it uses "Hardened SHA1", which detects if you are trying | to exploit SHA1, and then produces a longer, unambiguous hash. | But otherwise Hardened SHA1 has the same output as SHA1, so | it's a drop in replacement. | | Then it also has a similar looking-ish migration to SHA3-256. | velcrovan wrote: | Fossil defaults to SHA3-256 since 2.10 (released in October | 2019). But it has had SHA3-256 since March 2017, and | generally any repos/clones managed with a Fossil version | since then have been seamlessly updated to SHA3-256 in the | background. | zokier wrote: | Does anyone know if a standard format for sort of tagged-union | hash type, something similar as crypt format for passwords? Feels | like everyone is needing to support multiple hash types at some | point, and basically needs to reinvent that particular wheel | again and again. | loeg wrote: | It isn't too bad to just exhaustively look up provided hashes | in all your databases (at least, for Git). You should probably | only support 1 primary hash at a time, and 1 additional legacy | hash for migration purposes. This makes lookup twice as | expensive; for git, this is not usually the slow part (the slow | part is 'git status' having to compare the entire local | filesystem checkout to the repo). | official151 wrote: | http://www.jobsfinderuae.com/ | kazinator wrote: | > _There is, of course, a way to unambiguously give a hash value | in the new Git code, and they can even be mixed on the command | line; this example comes from the transition document:_ | git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256} | | > _For a Git user interface this is relatively straightforward | and concise_ | | No, it isn't. It's a complete and utter user interface | clusterfuck. Just say no to this insanity. | microtherion wrote: | Note the qualifier "for a Git user interface". | | The average git command is along the lines of "git ph-nglui | --mglw=nafh Cthulhu...R'lyeh -- wgah^nagl fhtagn" | kazinator wrote: | That cute rhetoric will not fool anyone. Common git workflows | use fairly succinct git commands: git diff | git commit -p git rebase -i HEAD~3 | | The command quoted in my original comment is just this we | strip away the SHA256 garbage: git log | abac87a..f787cac | | (Or maybe it is: git log abac87a^..f787cac^ | | I cannot guess whether the ^ operator still has the same | meaning or whether it is part of this ^{sha...} notation.) | | The hashes will typically be copy and pasted, so you type | just the _git log_ , _.._ and spaces. | | The fixed parts of convoluted git syntax can be hidden behind | shell functions and aliases. But notations for _referencing_ | objects are not fixed; they will end up as arguments. | microtherion wrote: | As others have pointed out, there already is precedent for | ^{...}, so if you're comfortable with the other uses, I'm | not sure why you should NOT be comfortable with this new | addition. | [deleted] | jolmg wrote: | > I cannot guess whether the ^ operator still has the same | meaning or whether it is part of this ^{sha...} notation.) | | This isn't the first ^{...} notation. The manpage | gitrevisions(7) also mentions <rev>^{/<text>} for | referencing a commit based on a regular expression of its | commit message, like git checkout 'add- | search^{/finished query builder}' | | Though, this new notation is probably more in-line with the | notation <rev>^{<type>}, which lets you disambiguate what | you put in <rev> as in deadbeef^{tag}, so that it's not | confused with deadbeef^{commit}. | | EDIT: The article doesn't mention it, but I imagine one | interpretation would take precedence and cause git to issue | a warning when it's ambiguous. Right now, if I tag a commit | with the hash of another commit, its interpretation as a | tag takes precedence and I get a warning at the top, | "warning: refname '368bc6e' is ambiguous." That would mean | you'd only ever write ^{sha256} when the provided part of a | sha256 hash is ambiguous with an existing sha1 hash or | something else like a tag. That's also vice versa with | ^{sha1}. | wyoung2 wrote: | > That cute rhetoric will not fool anyone. | | Well, let's see, the Fossil equivalents are: | | 1. Do nothing at all for a conversion from the SHA-1 to | SHA-3 -- yes, 3, not 2 as in Git! -- because it's automatic | for months now and dead easy going back 3 years now. | (https://www.fossil- | scm.org/fossil/doc/trunk/www/hashpolicy.w...) | | 2. "fossil diff" | | 3. "fossil ci" | | 4. Why are you rebasing in the first place, again? | https://www.fossil- | scm.org/fossil/doc/trunk/www/rebaseharm.m... | danShumway wrote: | Articles like this are eye opening to me, in a bad way. | Every once in a while, I get really curious about giving | Fossil a try, because it does have some legitimately cool | ideas, and then I see the documentation saying things | like: | | > Rebasing is the same as lying | | And I think, "Holy crud do I not want to be part of this | community." | | The nice thing about Git is that (within reason) once I | understood it, I was able to use it in very flexible | ways. | | It's really common for different projects I manage to | range all over the place from the extreme "commits as | literal history" perspective all the way to the "commits | as literature/guide" perspective. Sometimes I don't | rebase at all, sometimes I rebase a lot. Sometimes I | commit everything, all the time, sometimes I refuse to | commit any code that isn't a deployable feature. | Sometimes I leave branches as historical artifacts, | sometimes I don't care about history and I'm just trying | to coordinate developers across timelines. | | That's not to say that Git isn't opinionated about some | things -- nearly all good tools have at least a few | strong opinions. But Git passes the (IMO extremely low) | bar of not conflating a workflow decision with a moral | failing. Over the years as a software engineer, I've | learned to be somewhat skeptical of programming/workflow | heuristics advertised as rules, and to be _very_ | skeptical of heuristics advertised as ideologies. | | I really don't understand the perspective of someone who | can't think of even one good reason why they would ever | want to edit history. You've never accidentally committed | a password to repo, or had to respond to a takedown | request? | wyoung2 wrote: | > sometimes I don't care about history and I'm just | trying to coordinate developers across timelines | | The fact that Fossil preserves history does not prevent | you from coordinating with people across timelines. It is | rather the whole point of a DVCS. | | > conflating a workflow decision with a moral failing | | I think it's fairer to say that we don't think a data | repository is any place for lies of any sort, even white | lies. | | > I've learned to be somewhat skeptical of | programming/workflow heuristics advertised as rules, and | to be very skeptical of heuristics advertised as | ideologies. | | Sure, flexible tools are often better than inflexible | ones, but you also have to consider the cost of the | flexibility. Here, it means someone can say "this | happened at some point in the past," and it's just plain | wrong. | | That isn't always an important thing. Most filesystems | and databases operate on the same principle, presenting | only the current truth, not any past truth. | | Yet, we also have snapshotting in DBMSes and filesystems, | because it's often very useful to be able to say, "This | was the state of the system as of 2020.02.04." | | You don't need a snapshotting filesystem for everything, | and you don't need Fossil for everything, but it sure is | nice to have ready access to both when needed. | | > You've never accidentally committed a password to repo, | or had to respond to a takedown request? | | Fossil has shunning for that: https://fossil- | scm.org/fossil/doc/trunk/www/shunning.wiki | | And no, shunning is nothing at all like rebase, which | should be clear from the article. | | Fossil also has the `amend` command: http://fossil- | scm.org/fossil/help?cmd=amend | | And no, it is also not like rebase, because it only adds | to the project history, it never destroys information. | dahart wrote: | > I think it's fairer to say that we don't think a data | repository is any place for lies of any sort, even white | lies. | | I, too, wish this extreme hyperbole would be just left | out of the discussion completely. It is offputting, and I | think it's intentionally a bad faith argument, it fails | to acknowledge the utility, the design intent, and the | context behind rebase, which has been talked about at | length by Linus and others. | | When rebase is used as designed, according to the golden | rule, it's not modifying published history, so it's not | "lying". Whether rebase has safety problems is a separate | issue from whether it's use as designed amounts to being | "dishonest". | | I'm all in favor of improved design choices, and if | Fossil is making those better design choices, let them | stand on their own without intentionally denigrating git | and every user of git through utter exaggeration. | danShumway wrote: | My understanding is that shunning is blacklisting | specific artifacts. That's nice, but I don't understand | how that solves the problem. | | When I revise history in Git, even if it's just doing | something as simple as removing sensitive information, I | often need to replace that information, either through | new commits, or by introducing minor edits to surrounding | commits. I could add those changes on top of my current | HEAD, but then checkouts of old versions would be broken. | On the other hand, if I can just replay my commits while | inserting extra code, I'll end up with something that's | pretty close to my original history, with just the | offending information excluded/replaced. | | That carries the cost that people will need to force pull | my repo, but at least the repo history will still roughly | correspond to what development looked like, rather than | being out-of-order and mostly impossible to build except | for at my current HEAD. | | As a followup question, what do you do if the sensitive | information you need to exclude is in a commit message? | `amend` won't help you, since it's not destroying | information. Do you shun that commit and then... what? | | It just seems like destroying information isn't enough | unless you can also replace it? | | > Sure, flexible tools are often better than inflexible | ones, but you also have to consider the cost of the | flexibility. | | I appreciate this -- I like having multiple tools for | different purposes. I don't see a problem with having a | VC that focuses on auditability, or having one that goes | in a radically different direction from Git. Fossil has | very interesting ideas, which is why I try to pay it some | attention whenever I see it mentioned or linked to. | | However, whenever I follow those links and start digging | deeper into the philosophy behind its design decisions, | inevitably the conversation changes from, "here's our | alternative approach to Git" to "what Git does is | fundamentally wrong". It's not, "Fossil doesn't have this | problem because we eschew rebasing", it's "why would | anyone rebase?" | | (Nearly) all architectural decisions have good and bad | consequences. Sometimes those consequences are | imbalanced, so we have heuristics that can say things | like, "often X is a bad idea." That's fine. | | More harmfully, sometimes people extend heuristics into | rules that say, "it's never a good idea to do X". | Programming rules are usually wrong. | | But programming ideologies the worst, because they say, | "there is something mentally or morally wrong with a | person who would do X". This is toxic for the reasons | that Fossil devs already mention in their documentation: | | > programmers should avoid linking their code with their | sense of self | | Programming ideologies explicitly encourage developers to | have egos, because ideology conflates architectural | decisions and workflow processes with individual worth. | Programming ideologies make it harder for people to grow | as programmers, because they tie intellectual growth to | fears about being wrong. They're completely toxic. | | And is Fossil's documentation promoting an ideology? I'm | guessing that you'd disagree with me on this, but my take | is that when Fossil's official documentation says things | like: | | > Honorable writers adjust their narrative to fit | history. Rebase adjusts history to fit the narrative. | | or | | > It is dishonest. It deliberately omits historical | information. It causes problems for collaboration. And it | has no offsetting benefits. | | That's not designing a focused tool to support specific | heuristics, or making a case that, "sometimes strict | auditability is important". That's just trolling for | fights. | wyoung2 wrote: | > ideology conflates architectural decisions and workflow | processes with individual worth | | No. You start with the ideology based on your local | culture and project needs, then you pick the tool that | supports your project's needs. | | This is why we spend so much time talking about | philosophy in the Fossil vs. Git article, particularly | this section: https://fossil- | scm.org/fossil/doc/trunk/www/fossil-v-git.wik... | | Which of the two philosophies matches better with the way | your project works? That alone is a pretty good guide to | whether you want Fossil or Git. (Or something else!) | kazinator wrote: | I have no interest in Fossil because it stores stuff in | sqlite databases instead of the filesystem which I think | is a stupid approach. I'm also not interested in version | control systems that are dragging along a wiki and bug | tracker. I just want a C program in /usr/bin that does | version control. | wyoung2 wrote: | SQLite can be considerably faster than the filesystem: | https://www.sqlite.org/fasterthanfs.html | | If you think your filesystem-based Git repo is easy to | manipulate, go poking around in there, and what you'll | find is a bespoke one-off pile-of-files database! Given a | choice between Git's DB and SQLite, I put more trust into | SQLite. | | > I just want a C program in /usr/bin that does version | control. | | ...which Git doesn't provide. Git is hundreds of files | scattered all over your filesystem, a large number of | which aren't C binaries anyway, and of those that are, | only one of them is the front-end program sitting in | /usr/bin, whereas Fossil _can_ be built to a single | static executable in /usr/bin. | | And if you _can 't_ build Fossil statically on your | system, it's likely due to an OS limitation rather than | something about Fossil itself, as on RHEL where they've | made fully static linking rather difficult in the past | few releases. | | Getting back to Git, large chunks of Git are written in | POSIX shell, Perl, Python, and Tcl/Tk. Almost all of | Fossil is written in C, and the rest of the code is | embedded within that binary running under built-in | interpreters rather than depending on platform | interpreters. | | This has nice knock-on effects, one of which is that | Fossil is truly native on Windows, whereas you have to | drag along a Linux portability environment to run Git on | Windows. Another is that Fossil plays nicely with | chroot/jail/container technology. | | > I'm also not interested in version control systems that | are dragging along a wiki and bug tracker. | | Not a GitHub or GitLab user, then, I'm guessing? | kazinator wrote: | The diatribe against rebasing is stupid. In fact, not | having more than one parent is a good thing because you | with multiple parents, you don't know what is relevant. | The history has turned into a hairball. When you try to | navigate back in time, you face forking roads at every | step and it turns into a maze walk. | | The point is valid that when we rebase, we are losing | history: the context of where that change was originally | parented. | | However, (1) the history does not matter if the change | was parented in some temporary context, like your | unpublished changes and (2) the information can be | tracked in other ways, such as a Gerrit Change-Id (or | something like it) in the commit message. | | Regarding (1) the extra parent pointers in a merge commit | cause retention of garbage. If we do everything with | merge instead of rebase, we will never lose any of the | temporary commits. If we prepare an unpublished change | through numerous rebase operations, all that temporary | crap will stay referenced from the head, waste space and | confuse other people with irrelevant information when | they try to navigate the history. | wyoung2 wrote: | > history does not matter if the change was parented in | some temporary context | | It does if it means a big ball o' hackage lands on the | public working branch, since it complicates merges, | backouts, cherrypicks, and bisects. | | Git users can also hide individual commit messages behind | one big combined message, losing part of the project's | development history and logical progression. | | When I pull your repo and build it, and I find that it | doesn't build on my system, I don't want to dig through a | 500-line merge commit to figure out why you changed this | one line from the one that used to build last week, I | want the 14-line diff it was part of so I can begin to | understand what you were thinking when you committed it. | If I later find out that that 14-line change was wrong | but the rest of your 500-line merge was fine, I want to | be able to back it out with a single command. (In Fossil, | it's `fossil merge --backout abcd1234`.) | | > confuse other people with irrelevant information when | they try to navigate the history. | | How much time do you spend navigating the project's | history vs looking at the tip of the current branch? | | I'd wager that the times you dig back into the history, | it's because you are in fact trying to figure out why you | got here, which means a trail of detailed breadcrumbs | will be more likely helpful than "...and between one week | and the next, something changed in commit abcd1234, but | we've lost all of its internal context, so we'll be | spending next week reconstructing it because Angie's on | vacation now." | kazinator wrote: | Regarding (1), not "everything will work as before". | | What happens if a Fossil repo that has had SHA3 commits | written to it is accessed by old Fossil software before | that change was introduced? | wyoung2 wrote: | If you try to use Fossil 1.37 -- the last 1.x release -- | to clone a repo that has SHA-3 hashed artifacts in it, it | says, "server returned an error - clone aborted". Since | 1.37 pre-dates this feature, it can't give a more | detailed diagnosis than that. | | If you have an old clone made from before the transition | and try to update it, I'm not sure what it says, since I | don't have any of those around any more. It has, after | all, been three years since Fossil began to move on this | problem, so that it's largely a past issue for us now. | | This transition time was indeed annoying for us over in | Fossil land, but Git's going to have to go through a | transition like this, too. The question isn't whether but | how long we'll have to wait for it to begin and how long | it'll take to complete. | allover wrote: | You are now ignoring the fact that in the initial quote you | objected to was the intentionally tongue-in-cheek: | | > 'For a Git user interface this is relatively | _straightforward_ and concise '. | | It kinda looks like you missed the joke and are now | doubling-down on your disagreement. | | The author does _not_ think the proposed example is | reasonable. You 're in agreement. | kazinator wrote: | Since git is something that I rely on for everyday use, | and long-term data stroage, and its development is being | threatened by the inclusion of moronic changes I | completely disagree with, I'm completely unreceptive to | jokes. This is no laughing matter. | allover wrote: | I agree, things shouldn't be this bad. | | But unless you're going to take this up with Linus, | you're just yelling at your fellow disappointed | spectators. | hinkley wrote: | And the responder makes a pretty unsubtle allusion to | Lovecraft. | | Anyone who compares the git CLI to being driven insane by | Elder Gods is not defending the git CLI. | allover wrote: | Not sure if the HN thread or my comment has thrown you, | but I'm replying to 'kazinator'. | | I know _he 's_ not defending it. | | What I said is that he (kazinator) is inadvertently | attacking somebody that's _also_ not defending it (the | author). | hinkley wrote: | When people started using the phrase "Stockholm Syndrome" with | respect to git I took it as a sort of hyperbole. A rhetorical | device. | | But the more 'improvements' they make to it the more literal | that accusation becomes in my head. And what's worse is that | I've grown enough callouses now that my response is an eyeroll | instead of pain. I use git all the time, but it's terrible and | I need something that is better, not just sucks less. And | apparently soon, because I don't know when that koolaid is | going to start looking good but it's not long now. | | Send help. | scarejunba wrote: | Perhaps suggest an alternative? It may help understand why this | was chosen. | kazinator wrote: | One attractive alternative is not to do a thing. | | Don't cave in to sky-is-falling bullshit regarding the | existing SHA1. | | Git is not a crypto system; it's just version control. | | We've used version control systems just fine that had no | integrity features at all. For isntance you can go into a RCS | ,v file and diddle anything you want. Some BSD people are | still on CVS, and their world hasn't fallen apart. | loeg wrote: | One alternative would be to just do lookups in both hash | databases (until SHA1 is fully migrated away from), and | reject invocations that conflict. Git's CLI already rejects | ambiguous short hash prefixes for SHA1, it could easily | reject ambiguous prefixes between SHA1 and SHA256 and | otherwise allow unique prefixes for either hash. This would | be pretty ergonomic for users. | jolmg wrote: | For most cases that would suffice and would be ergonomic, | but what if a full SHA1 also qualifies as a prefix of one | or more SHA256, and you want the SHA1? There's still a need | for a mechanism to disambiguate for these cases, even if it | ends up very rarely needed. | loeg wrote: | You're talking about a 160 bit truncated hash collision | on SHA256, which is extraordinarily unlikely if SHA256 is | not itself completely broken (moreso than SHA1 already | is!). I don't think any syntax is needed for that in the | porcelain CLI; it could be handled with non-user-facing | commands if it ever came up (it won't). | jolmg wrote: | > extraordinarily unlikely if SHA256 is not itself | completely broken (moreso than SHA1 already is | | I was hoping I captured that by saying "very rarely". | However, if SHA1 collisions can be made willingly, | doesn't that mean that one can also willingly make a SHA1 | hash that matches with the prefix of an existing SHA256 | hash? | minitech wrote: | > doesn't that mean that one can also willingly make a | SHA1 hash that matches with the prefix of an existing | SHA256 hash? | | No, the "prefix of an existing SHA256 hash" stops being | relevant at that point - that's just a full preimage | attack on SHA1. Isn't known to be feasible yet. | | > I was hoping I captured that by saying "very rarely" | | It's rarer than that. :) | loeg wrote: | As far as I know, that kind of collision isn't practical | at this time. So predicating UI decisions on that basis | seems like a mistake to me (given how long git has | already ignored the looming threat of SHA1 being broken). | | When and if someone injects a SHA1 attack into your | repository, and the main git CLI throws up its hands and | says "hash collision" trying to access it, I'm not seeing | major problems here. The git CLI doesn't need to provide | convenient commands to interact with attacks that are not | practical today. To the extent that these will become | practical, I think git should drop the SHA1 lookup after | a migration period regardless, and it would not hurt to | provide a gitconfig knob to disable SHA1 lookup. | gouggoug wrote: | > _For a Git user interface this is relatively straightforward | and concise_ | | You forgot to include the end of that sentence, that | acknowledges your issue with it: | | > _, but one can still imagine that users might tire of it | relatively quickly._ | hinkley wrote: | I've had this argument at work. | | "Tire of it quickly" and "have an immediate gag reflex" are | two completely different categories of negative reaction. | | It's hard to see the sunset when you're down in the muck, and | eventually 'less bad' starts to look like progress to you. | It's a trap and you should be aware of it. | sandGorgon wrote: | does anyone know if github/bitbucket support it today ? | freddie_mercury wrote: | Why would they support it? The article clearly states it is | nowhere close to being useful yet. | | It is untested, unstable code that can only write to | repositories and not read them. | | "Much of the work to implement the SHA-256 transition has been | done, but it remains in a relatively unstable state and most of | it is not even being actively tested yet. In mid-January, | carlson posted the first part of this transition code, which | clearly only solves part of the problem: | | "First, it contains the pieces necessary to set up repositories | and write _but not read_ extensions.objectFormat. In other | words, you can create a SHA-256 repository, but will be unable | to read it. " | sandGorgon wrote: | actually - i might have worded it confusingly. | | For smaller projects (like my own), can i move to sha-256 | with no expectation of backward compatibility _today_ ? | SAI_Peregrinus wrote: | "First, it contains the pieces necessary to set up | repositories and write _but not read_ | extensions.objectFormat. In other words, you can create a | SHA-256 repository, but will be unable to read it. " | | If you want it to be write-only, sure, go ahead! | tcharlton wrote: | I can't find documentation for the command in the article: | git convert-repo --to-hash=sha-256 --frobnicate-blobs --climb- | subtrees \ --liability-waiver=none --use-shovels | --carbon-offsets | | Surely some of those options aren't real... | amarshall wrote: | > A new version of Git can be made...with a simple command | like: <command> ... note that the specific command-line options | may differ | | Gives me the impression that it's a construction of the article | alone. Unsurprising, given the snark of the options. | buserror wrote: | Of course they are ?!?!!? https://git-man-page- | generator.lokaltog.net/ | | (never fails to amuse me) | bangboombang wrote: | Oh dear god. Because it's git related, my brain somehow still | tries to make sense of that stuff because it just seems so | real. | ekimekim wrote: | It's a curious feeling. Like reading code that is | syntactically valid but utterly nonsensical. | hrgiger wrote: | reminds me https://projects.haykranen.nl/java/ | pabs3 wrote: | That seems to be intended to be humour. | kzrdude wrote: | Well, I loved it, for one. | throwaway744678 wrote: | I believe we have here an example of Poe's law [1] | | [1] https://en.wikipedia.org/wiki/Poe%27s_law | [deleted] | strenholme wrote: | I'm already seeing a lot of discussion both here and over at LWN | about which hash algorithm to use. | | The Git team made the right choice: SHA2-256 is the best choice | here; it has been around for 19 years and is still secure, in the | sense that there are no known attacks against it. | | Both BLAKE[2/3] and SHA-3 (Keccak) have been around for 12 years | and are both secure; just as BLAKE2 and BLAKE3 are faster reduced | round variants of BLAKE, Keccak/SHA-3 has the official faster | reduced round Kangaroo12 and Marsupilami14 variants. | | BLAKE is faster when using software to perform the hash; Keccak | is faster when using hardware to perform the hash. I prefer the | Keccak approach because it gives us more room for improved | performance once CPU makers create specialized instructions to | run it, while being fast enough in software. And, yes, SHA-3 has | the advantage of being the official successor to SHA-2. | _verandaguy wrote: | Honest question: what are the use cases in Git where hash | computation speed is a meaningful optimization? | strenholme wrote: | It's actually not a big deal with Git, which is why SHA2-256 | is the right choice. | loeg wrote: | Rewriting all repos from SHA1 to hash-next? | SQLite wrote: | My experience in developing and maintaining Fossil is that | the hashing speed is not a factor, unless you are checking in | huge JPEGs or MP3s or something. And even then, the relative | performance of the various hash algorithms is not enough to | worry about. | papreclip wrote: | less wasted computation means less global warming | jayflux wrote: | > BLAKE is faster when using software to perform the hash | | Is BLAKE 3 still faster than sha-256 when using the cpu | speciliazed instructions? I think most modern desktop CPUs has | built-in instructions for SHA256. | | I'm guessing when people compare BLAKE 3 to SHA 256 they're | comparing software to software, but this wouldn't be the case | in reality? | strenholme wrote: | I haven't seen any benchmarks for BLAKE3 vs. the Intel/AMD | SHA extensions. My guess is that Intel hardware accelerated | SHA-256 will be faster than BLAKE3 running in software for | most real world uses. | | I can tell you this much: It is only with Ice Lake, which was | released in the last year, that mainstream Intel chips | _finally_ got native hi speed SHA-NI support. Coffee Lake and | Comet Lake, which are still the CPUs in a lot of new laptops | being sold right now, do not support SHA-NI. | wahern wrote: | AMD Zen supports SHA extensions across all SKUs. Here are | `openssl speed` numbers on an AMD EPYC 3201: | type 16 bytes 64 bytes 256 bytes 1024 | bytes 8192 bytes 16384 bytes blake2s256 | 46720.33k 187461.21k 305314.65k 373840.55k | 398207.66k 401528.15k blake2b512 38423.44k | 155318.81k 422325.08k 592401.75k 674843.31k | 681743.70k sha256 84620.44k 279840.47k | 723573.76k 1199678.81k 1484693.50k 1510484.65k | sha512 33854.38k 135674.20k 275343.70k | 444872.36k 545802.92k 554166.95k sha3-256 | 26146.35k 103860.27k 253944.92k 308119.21k | 347477.33k 351906.47k sha3-512 26349.83k | 105590.85k 144236.03k 173082.62k 189448.19k | 189814.10k | | It's possible that Blake3 might be faster than accelerated | SHA-256 on large inputs, where Blake3 can maximally | leverage its SIMD friendliness. OTOH, Blake3 really pushes | the envelope in terms of minimal security margin. | Performance isn't everything. SHA-3 is so slow because NIST | wanted a failsafe. | | OpenSSL info: OpenSSL 1.1.1c 28 May 2019 | built on: Tue Aug 20 11:46:33 2019 UTC | options:bn(64,64) rc4(8x,int) des(int) aes(partial) | blowfish(ptr) compiler: gcc -fPIC -pthread -m64 -Wa, | --noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug- | prefix-map=/build/openssl-D7S1fy/openssl-1.1.1c=. -fstack- | protector-strong -Wformat -Werror=format-security | -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC | -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 | -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 | -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM | -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM | -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM | -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2 | | NOTE: /proc/cpuinfo shows sha_ni detection, and the apt-get | source of this version of OpenSSL confirms SHA extension | support in the source code, but I didn't confirm that it | was actually being used at runtime. | strenholme wrote: | Assuming Blake3 will be across the board 43% faster (7 | instead of 10 rounds) than 32-bit blake2s256, we would | get: Blake3 SHA-256 66743 | 84620 Tiny 846287 1199679 Medium (1024 bytes) | 973919 1510485 Largeish (16384 bytes) | | This is based on the parent's numbers with a fudge factor | to account for Blake3 being a faster version of | blake2b512. | | Of course, this does take in to account that Blake3 has | tree hashing and other modes which scale better to | multiple cores. | ptomato wrote: | On my machine with sha extensions, blake3 is about 15% faster | (single threaded in both cases) than sha256. | abecedarius wrote: | Also, Blake3 has some kind of advantage in | parallelizability, iirc. | ptomato wrote: | yeah, blake3 multi-threaded is about 11 times faster for | me than sha256 single-threaded. | KMag wrote: | SHA-256 is probably the right choice, but I don't think it's as | obvious as you suggest, given SHA-512/256. | | SHA-512/256 is a standard peer-reviewed and well-studied way to | run SHA-512 with a different initial state and then truncate | output to 256 bits. | | This is heavy bikeshedding, but SHA-512/256 would be a more | conservative choice than SHA-256. Under standard assumptions, | SHA-256 is no weaker than SHA-512. The structure is extremely | similar to SHA-256, but a collision on intermediate state | requires a collision on all 512 bits of state instead of 256. | | On most 64-bit CPUs without dedicated hash instructions, | SHA-512/256 is faster for messages longer than a couple of | blocks, due to processing blocks twice as large in fewer than | twice as many operations. | | Currently, the latest server and laptop CPUs have SHA-256 | hardware acceleration but not SHA-512 acceleration. I'm not | sure how many phone CPUs support sha256 but not ARMv8.2-SHA | extensions (SHA-512). If it weren't for this difference in | hardware acceleration, there would be few reasons to use | SHA-256. | | That being said, the current difference in hardware | acceleration support probably makes SHA-256 the right choice | here. | strenholme wrote: | SHA-512/256 is a lot newer than SHA2-256 (usually called | SHA-256, but I prefer the SHA2 prefix to make it clear that | it's a very different beast than SHA3-256), and its speed on | 32-bit CPUs is less than optimal, so I don't see it as being | a more conservative choice. In terms of security, it uses the | same 19-year-old unbroken algorithm as SHA2-256. | | I am aware of the length extension issues, but they are not | relevant for Git's use case. | | In terms of support, SHA-512/256 has, as you mentioned, less | hardware acceleration support, and it's also not supported in | a lot of mainstream programs like GNU Coreutils. I also know | that some companies mandate using SHA2-256 whenever a | cryptographic hash is needed. | | Git made the right choice with SHA2-256: It's the most widely | supported secure cryptographic hash out there. | mratsim wrote: | > Thus, unlike some other source-code management systems, Git | does not (conceptually, at least) record "deltas" from one | revision to the next. It thus forms a sort of blockchain, with | each block containing the state of the repository at a given | commit. | | Color me surprised, dropping the "blockchain" word in the middle | of the introduction | AndrewDucker wrote: | Git is a blockchain. | | Being, as it is, a chain of signed blocks. | tonyedgecombe wrote: | Git is a Merkle tree as is a blockchain. | | https://en.wikipedia.org/wiki/Merkle_tree | the8472 wrote: | It is more a DAG than a tree. | afiori wrote: | And talking about hash attacks it becomes relevant to | consider the possibility of it being just a Directed | Graph | afiori wrote: | It is still a sort of namedropping. In the sense that it is | used due to the trendiness of the term. | | It is entirely possible and likely that it is used for | didactic purposes as many people are familiar with the | blockchain structure and its use of hashes. | bawolff wrote: | I thought it was a joke. The whole: blockchain isnt that | inovative if you use a strict technical definition because | lots of things are a chain of blocks before bitcoin was | cool, meme. | | And honestly fair enough. The inovative part of bitcoin is | not the blockchain but all the economics & game theory | going on to create trust in the system | afiori wrote: | I think this is the reason why it the parent was | criticizing it. Blockchain as a term generally mean | "crypto-magic-stuff on a blockchain" so for git to use it | instead of a more academical Merkel trees (or Merkel DAG | if they exist) sound a bit like low effort name dropping. | | Again it is not a criticism of the article, but it is not | a criticism of the criticism either. | kazinator wrote: | I'm completely against this security theater nonsense; please | keep my git SHA-1. | | Please fork git for this and call it something else, like git6, | and ensure that git6 cannot push to git repos. | powerapple wrote: | is it a real problem for git? Do we merge code based on hashes | instead of looking at the code. | donatj wrote: | The problem is in if you can make evil code with the same hash | as innocuous code, you can poison people who pull from a given | repo you have access to. It would allow you to make changes to | the history without merging anything or anyone being the wiser. | | It makes the distributed aspect of git untrustworthy, as | previously you knew if you pulled from anywhere and the hash | was good, you'd pulled the correct code. With SHA1 being | functionally broken that's no longer necessarily the case. | [deleted] | throwaway-q2233 wrote: | Can't just take Sha 1 of Sha 256? | loeg wrote: | Nope. | anaisbetts wrote: | I don't understand the practical attack vector for breaking SHA1s | in Git. Not only are objects checksummed by SHA1, they also | encode the _length_. Finding a SHA1 collision is plausible, but | finding a SHA1 collision that both lets you do something | Nefarious, _and_ is the length you need, seems really really | unlikely | acidictadpole wrote: | The author does seem to concede that hitting all the checkmarks | in an attack on git would be pretty tricky: | | > An attacker would not just have to do that, though; this new | version would have to contain the desired hostile code, still | function as a working floppy driver, and not look like an | obfuscated C code contest entry | | The whole idea is that they want to switch away before these | things become likely. They are unlikely now, but SHA-1 is only | getting weaker as time goes by and more research is done. | pdonis wrote: | _> and not look like an obfuscated C code contest entry_ | | The full quote here is even better: | | "and not look like an obfuscated C code contest entry (at | least not more than it already does)." | xxs wrote: | Length actually is already part of the hash result... and the | sha-1 collision uses the same length pdfs. | phaemon wrote: | As mentioned, the shattered PDFs[1] have the same length, | however it's worth noting that adding the Git header breaks the | matching, ie. you get different SHA sums for the files in Git | because of the header. | | [1] https://shattered.io/ | tialaramex wrote: | This makes no sense. The collision manufacture algorithm of | course produces the same length output in both the A and B | documents. Doing otherwise would be considerably harder in | fact. | kzrdude wrote: | SHA1 is crumbling. It's being replaced because it is likely to | be broken further, in practice. | upofadown wrote: | To make the collision work you need to produce two different | files, both with some randomish looking junk in them. So if you | can do that in a way where you can substitute one of the files | for the other without getting caught then you are almost for | sure smart enough to also figure out a way to make the lengths | the same. | HereBeBeasties wrote: | You're assuming that 100% of the source code matters, but most | source code has comments. Some has a lot of comments | (boilerplate headers). Delete all the comments and superfluous | whitespace, add nefarious code, put in a comment in the | remaining bytes for the sole purpose of causing a hash | collision (likely plenty of bytes to play with). | scoutt wrote: | Yes, but... | | > this new version would have to contain the desired hostile | code, still function as a working floppy driver, and _not | look like an obfuscated C code contest entry_ | | It's still plausible that one can pull a trick like that to | introduce malicious code into the repo, but improbable. | est31 wrote: | The shattered collision attack featured two pdfs with the same | sha1 and wait for it, the same length. Also note that even with | normal sha1, the length is hashed into the final sha1 hash | already, that's what the merkle damgard scheme is about. You | can read about it on Wikipedia. | | Reusing the precise collision from the shattered attack is made | impossible by initializing the state with _anything_ other than | the prefix from the shattered attack. But the cost for mounting | such an attack yourself is only 11k USD. However, as git uses | the sha1collisiondetection library, such an attack would be | detected by current git. Thus, this library is a much better | protection than the length encoding. | nnx wrote: | Surprising they didn't go with Blake3 instead since it has much | higher performance and Git's performance-oriented ethos. | nullc wrote: | > Git's performance-oriented ethos | | Than sha256 will likely be preferable in the long run: It's | faster with SHA-NI than blake3. | | If you're not developing on a system with sha-ni, get with the | program. Zen2 is freeking awesome. :) | wolfgke wrote: | > Zen2 is freeking awesome. :) | | SHA-NI was introduced with the Intel Goldmont | microarchitecture. | nullc wrote: | Yes, but Goldmont is not particularly awesome. :) | Presumably goldmont would be a downgrade for many people. | | (On AMD the first generation zen have sha-ni, FWIW) | wolfgke wrote: | Of course, processors that use one of the | Atom/Celeron/Pentium microarchitectures are not the best | choice if you desire maximum speed, but otherwise they | are surprisingly interesting processors (IMHO much more | interesting than what Intel delivers with the Core | series). | | At this time, Intel often experiments with or introduces | features that are particularly interesting for embedded | usages first on the Atom. For example the already | mentioned SHA-NI. Another example are the MOVBE | instructions (insanely useful if you handle big-endian | data, for example in network packages (I am aware that on | older x86 processors, there exists the BSWAP | instruction)) - they were first introduced with Atom. | majewsky wrote: | Great! I can't wait to have to throw away perfectly fine | systems because of a new Git version. /s | pjc50 wrote: | I'm waiting for Blake7. | xxs wrote: | fond memories indeed.... | curben wrote: | The decision was made before the release of Blake3. The article | did mention the algorithm is no longer hardcoded (hence the | ability to support both SHA1 & SHA256). This means it's | possible to transition to Blake3 (or any other) in future, | though it won't be trivial. | simias wrote: | Is a significant part of git's typical profile spent computing | hashes? I'm genuinely asking because I don't know the answer. | I'd expect all the diffing and (potentially fuzzy) merging to | be significantly more expensive operations, at least as far as | big-O is concerned. | hannob wrote: | > Is a significant part of git's typical profile spent | computing hashes? | | No. Hashes are really cheap. | | This annoys me a bit, because every discussion about hashing | goes into endless bikeshedding which hash function to use. | The simple truth is: SHA2, SHA3, Blake2/3 are all good enough | from both a security and performance perspective that for | almost any use case and the advantages and disadvantages are | so minor that it really doesn't matter. | tialaramex wrote: | Length extension is an unnecessary problem in MD | constructions. It makes sense to get rid of the problem. So | if you are building a new thing today there's some sense in | not picking SHA-256 in order that you won't later hit your | head on a length extension attack. SHA-512/256 (that's not | a choice, it's just one hash in the SHA2 family) is a | reasonable choice though, and of course if Git was | vulnerable to length extension somehow they'd be in trouble | years ago so for them why not SHA-256. | bjoli wrote: | There are organisations that can only use approved crypto for | various certifications and government contracts. It would be | bad to drive such users away from git. | | Under "feedback from git people" on https://www.mercurial- | scm.org/wiki/SHA1TransitionPlan | Ayesh wrote: | Linux also has the ethos to choose boring technology. SHA2 has | been here for so long and battle tested. For the majority of | us, it is the natural choice. I'm not implying anything | negative about SHA3/Blake/Keccak. | dchest wrote: | BLAKE3 was released a month ago! The decision for new hash in | git was made about two years ago. BLAKE2 was considered, | though. | | SHA-256 is fine. The biggest problem is switching to it... | fanf2 wrote: | They made the choice years ago and blake3 was announced last | month. | velox_io wrote: | Unless I'm missing something, why not just allow repositories to | be upgraded to SHA2 hashes? The only problem is ensuring | everyone's tooling supports it. | majewsky wrote: | This question is exactly what a major portion of the article | covers. | velox_io wrote: | It isn't the easiest article to read, plus they over | complicate things by talking about things such as truncating | SHA2 hashes. | | I don't see why changing the hashing algorithm is so | problematic, hence the reason why I asked the question. | Converting a repository to SHA2 should be straight forward | (the only issue is everyone's tooling), you could also run | the repositories side-by-side. I'm genuinely interested as I | think Git & Bittorrent are quite elegant solutions to complex | problems. | majewsky wrote: | > the only issue is everyone's tooling | | Exactly! If you've ever worked in a corporate environment, | you know the fun of having to support 10-year-old versions | of your favorite cutting-edge software. | londons_explore wrote: | I don't think it's that unreasonable to release git binaries | today with sha256 support, then wait 5 years, then make all new | commits use sha256. | | Anyone who tries to use a git client more than 5 years old | wouldn't be able to pull+push to a new repo. Sounds reasonable | to me. Git clients more than a few years old are pretty broken | already due to TLS changes. | | Keeping around a dual hash system forever sounds like baggage | and complexity that outweighs the benefits. | k5hp wrote: | Just in case: http://archive.is/omsjJ | ericfrederich wrote: | I'll have to update my program which generates vanity hashes. I | do enjoy starting projects with an obligatory "Initial Commit" | with a deadbeef SHA-1 | bmn__ wrote: | I like to start a repo with an "empty" commit, that is to say | its tree is the magic 4b825dc. | https://news.ycombinator.com/item?id=18342763 | | I wonder if it would still be practically possible to | manipulate the commit id. | wyoung2 wrote: | Wow! I wouldn't have guessed that Git had that vulnerability. | Fossil solves it easily: creating a new repo involves | generating a random project code (a nonce) which goes into | the hash of the first commit, so that even two identical | commit sequences won't produce identical blockchains. | | Fossil lets you force the project ID on creating the repo, | but the capability only exists for special purposes. | loeg wrote: | Yep. You can inject arbitrary metadata into the git commit | object and the git cli ignores it, other than including it in | the hash. E.g., https://github.com/kevinwallace/gitbrute , | https://github.com/kevinwallace/gitbrute/commit/0001111 . | PaulHoule wrote: | I am not a fan of SHA-256, you are better off with SHA-386 or | SHA-256/512 which resist prefix attacks and are actually a little | fast on 64 bit machines. | GlitchMr wrote: | I wonder if it would make sense to use `concat(sha1, sha256)` | hash algorithm. This wouldn't change the prefixes while improving | strength of an algorithm (by including SHA256 in a hash). | ascar wrote: | I'm probably missing something, but isn't it simpler to just | make both available separately and allow users to still | reference by sha1, if they want to, while sha256 can be used | for collision detection by git operations internally? | patrec wrote: | Correct, and I think this is what they are doing -- you can | optionally keep the sha1s around. | rocqua wrote: | There is a downside that this would mean commit-prefixes remain | sensitive to collisions. Hence anyone checking out a commit by | a hash-prefix would still be vulnerable. | | Not a dealbreaker by far, but still a slight mark against this | solution. | u801e wrote: | Does git have code to detect whether a hash prefix is | ambiguous? I know that if you use a short prefix (which is | more likely to be shared by multiple objects), git will | output an error message staying that the object reference is | ambiguous IIRC. | loeg wrote: | Yes. | IshKebab wrote: | I don't see how it would change anything. A collision of a | short prefix is trivial to generate with any hash. | timvisee wrote: | Very interesting idea. But, wouldn't existing hashed be kept | intact anyway. | patrec wrote: | I supposed you are advocating two distinct Merkle trees? | Because otherwise the prefixes will change anyway. | | But the only reason this would be attractive is because then | people could keep using the existing prefixes to refer to the | whole commit. But of course doing this would be insecure. So | for this to make any sense at all, people would need to make | good choices on when to use an insecure prefix and when to use | the whole hash, because it's security relevant. This seems a | bit doubtful to me. | GlitchMr wrote: | To be fair, the prefix problem would exist no matter what | hash function would you pick. GitHub displays 7 characters of | a hash, giving 28 bits. You could very quickly generate | collisions with birthday attack in pretty much no time. | Prefixes are always going to be insecure because they are so | short. | | In fact, https://github.com/bradfitz/gitbrute exists. | patrec wrote: | Correct, but backwards compatibility does make a difference | here, as in: there are surely quite a few cases where it | would not be attractive to use a shortened hash if git | hashes are changed incompatibly anyway, but where it will | be attractive to use the shortened hash, because that keeps | an existing setup working as before. | | Also: the prefixing increases the length of the hash (and | hence the desire to shorten it) without adding any | security. | GlitchMr wrote: | Yeah, kinda agreeing here. The hash length will need to | be increased anyway, but concatenation of SHA1 and SHA256 | will be 104 bytes in total when displayed (40 + 64), | which is a lot. | | It may be a better to display SHA-256 commit hashes, but | accept SHA-1 hash prefixes for old commits. It may be | confusing for git to accept hashes that aren't visible in | `git log`, but it's probably for the better. | dchest wrote: | Something to remember about the security of concatenated | hashes: https://crypto.stackexchange.com/a/63543/291 | bangboombang wrote: | This is pretty interesting and shows you shouldn't try to | pull any sort of stunts if you're not a crypto expert. I've | actually wondered before whether md5 + sha1 would result in | something stronger than those two used individually. Now I | know. | bawolff wrote: | The linked article doesnt contradict the original post. | Linked article says strength of 2 hash algos (of this type) | is only as strong as the strongest and not the sum of their | strengths. But original poster only needed the combined | hash to be as strong as the sha256 for his/her purpose. | | Notwithstanding, i still dont like it as an idea. | GlitchMr wrote: | By the way, this may be rather obvious, but concatenating | hash algorithms is a terrible idea for passwords. A | password cracker could easily pick the less secure | algorithm to crack, and ignore the other hash. | | Note that git doesn't concern itself with reversing a hash | function. The commit contents are part of a repository, | there is no value in guessing the commit contents basing on | its hash. Here, the hash function choice is purely about | collision resistance. | | But yeah, don't do weird things with hashes. Cryptography | is hard. Don't invent memecrypto: | https://twitter.com/sciresm/status/912082817412063233, it's | not going to increase the security. Use a single algorithm | if you can. Don't transform the output of a hash function | in any way. | GlitchMr wrote: | I'm well aware concatenation wouldn't necessarily improve the | strength. However, the idea is, even if SHA-1 was hopelessly | broken. CONCAT(SHA1(x), SHA256(x)) would be at least as | strong as SHA-256 (where "at least" means it may have the | same strength). | Double_a_92 wrote: | If you know that it's a concatenation, couldn't you _only_ | look at the SHA1 part and _completely bypass_ any other | strong hash? On second thought probably not, because you | might find _any_ possible collision, that isn 't a | collition on all the other hash algorithms. If you | bruteforce through a password list it would still apply | though. | GlitchMr wrote: | This doesn't work for collision resistance attacks. git | commits aren't password hashes. Specifically, the | attacker's goal in this case is to find different values | a and b for which hash(a) = hash(b), rather than finding | a value of m in h = hash(m) for known h. | pwagland wrote: | This would help if you _only_ shared the prefix, however | git would still use the full hash. | | The proposed method would have the advantage of keeping | existing known abbreviations, which are _already_ less | secure than SHA-1, while keeping the security of the | second hash. | | It also has the disadvantage that the full hash would | become excessively large and unwieldy, so pros and cons. | simias wrote: | Things like signed commits would still use the full hash, | so that would make tampering with that impossible. | | This solution would basically just make the UI backward- | compatible while still requiring the complete | modification of the internal to change the hash function. | | You'd still risk a collision if you refer to commits | using a shortened hash outside of git but something tells | me that you don't even need a vulnerability to take | advantage of that if you have an attack vector. For | instance github seems to use 7 hex digit in short hashes, | this could probably be bruteforced relatively easily (be | it for SHA-1 or SHA-256). To give you an idea I looked at | the current bitcoin difficulty (which AFAIK uses two | rounds of SHA-256 internally and works by bruteforcing | hashes with a certain number of leaning zeroes) and the | hashes look like this: 000000000000000000028048b31e42bd53 | d3b36da90d1a840ae695ec1a5ee738 | donatj wrote: | Excuse my ignorance, but couldn't they just add a SHA256 hash to | commit objects (or some new commit-verify object) of the entire | trees current concatenated content, leave everything else SHA1 | and get the same benefit without rewriting the entire thing from | the ground up? Git could even do that as part of the git gc step | slowly over time - tag commits with a secondary hash. | | Rewriting the whole thing including every git repos history seems | like throwing the baby out with the bathwater, when you could | just add a secondary transparent verification instead. Just seems | like there has to be a better way. | kzrdude wrote: | Hashing everything in one go doesn't scale well. When making a | new commit you want to only hash a proportional part of the | repository, and the tree structure of git allows that, only the | files and "tree" objects (directory listings) that change are | hashed again. | wongarsu wrote: | You can't change past commits to add that hash (without | changing all commit hashes), so this method could only protect | new commits. For any existing repo this would lead to a very | weird security model: We admit that sha1 hashes are broken, and | only guarantee that commits made by git versions newer than git | x.x.x are safe from after-the-fact modification (or | alternatively only commits made after date X). | jayd16 wrote: | What if we use the exploit to add the new data but keep the | sha1 the same? :) | gregmac wrote: | My inclination is that protecting only new commits might be | enough, but it gets me thinking: What would a practical | attack on this look like, assuming sha1 was broken? Let's say | I'm trying to insert a line of code that does something | nefarious, and that it's now trivial to generate "magic text" | you can stick anywhere in a file (eg, inside a comment at the | end of a line) to get any desired sha1 hash. | | Are all the other future commits still valid, or am I going | to suddenly get conflicts or garbled text? Depending on where | the modification is done, that code might have gone through | much more churn -- especially if there are a bunch of sha-256 | commits after it (which I can't attack). I don't know enough | about how git stores content blobs to answer this. | | Second problem: Can I push my replacement commit to another | repository (eg, github)? Would even force push work? Do I | have to delete branches and re-push my own? If I already have | enough permission on the repository to do this, it means I | can already push whatever I want -- so does this attack _even | matter at all_? | | Assuming that's successful (or I can trick people into using | my own repository), what will happen to someone that already | has a clone and does a pull? Will they get my change (and | will it work or be a pile of conflicts or garbled text)? | | Even if only fresh clones will get the changes it could still | be quite devastating -- especially if using CI -- but I'm | just not clear if this attack is even theoretically possible. | masklinn wrote: | > Are all the other future commits still valid, or am I | going to suddenly get conflicts or garbled text? Depending | on where the modification is done, that code might have | gone through much more churn -- especially if there are a | bunch of sha-256 commits after it (which I can't attack). I | don't know enough about how git stores content blobs to | answer this. | | A blob is a "snapshot" of a file. The next version of a | file is a completely different blob with no direct relation | to the previous. | | "Pack files" use delta compression in order to lower the | actual size of "similar" blobs. | | You _could_ get conflicts if you tried merging or rebasing | over the nefarious blob, and the "patch history" (git log | -p, which builds the patch view on the fly) would show | possibly unexpected complete file replacements. | OJFord wrote: | > My inclination is that protecting only new commits might | be enough | | Why? It's not the same as saying 'versions after vX are | safe', it's the same as saying 'any unsafety after vX was | there before, not introduced since' (both with 'as a result | of SHA-1 collision' qualifiers of course). | | > Can I push my replacement commit to another repository | (eg, github)? Would even force push work? | | Implementation dependent I suppose, but I wouldn't have | thought so - I don't see why they'd actually check the | content when the hash is supposed to indicate whether it | differs or not. | | > Do I have to delete branches and re-push my own? If I | already have enough permission on the repository to do | this, it means I can already push whatever I want -- so | _does this attack even matter at all_? | | I think an attack would look more like: 1. | Create hostile commit that collides with extant commit SHA | 2. Infiltrate a package repository, or GitHub, or corporate | network, or ... 3. Insert hostile commit in place of | real one | | Of course it's a problem if 2 & 3 happen alone anyway, but | the problem with the collision commit is that it makes it | so much less detectable. | Nullabillity wrote: | Git commits are snapshots, not diffs. Each commit | contains a tree, which contains a list of files and their | respective hashes. As long as its whole tree is SHA-256 | then a commit should be safe, regardless of its history. | | The downside to the migration would be that all unchanged | files would be stored twice (once identified by SHA1, | once identified by SHA-256). But you could work around | that by hardlinking identical files. | loeg wrote: | This doesn't protect subdirectories unless you rewrite | the entire tree structure with SHA256. I don't know if | Git does that now, or not. Git generally points to | unmodified subdirectories with the existing content hash; | if the SHA1 is pointed to by SHA256, which is implied by | the transition plan proposed in the grand-grandparent | comment, then those subdirectories are essentially | unprotected. | tzs wrote: | Couldn't they make a table that contains a list of all the | old objects by SHA1 hash, for for each contains the new | SHA256 hash of that object, and then commit this table in the | repository? | loeg wrote: | Yep. | arve0 wrote: | I'm not known with the internal data structure of git, but | couldn't you add the new hash as a commit in a new format "on | the side", leaving the original commit as is? | WorldMaker wrote: | Git does have an commit-related object called a note that | you can attach as a separate object. [1] | | Presumably the proposed "hash translation store" could use | an approach similar to notes, and include the hash | translations as objects in the git database (hopefully in a | way that could be signed by a tag). | | [1] http://alblue.bandlem.com/2011/11/git-tip-of-week-git- | notes.... | loeg wrote: | This is kind of what rewriting the repo is. Yes, you could | leave the SHA1 commit tree around afterwards (i.e., for | convenience of existing URLs), but you wouldn't want to | keep SHA1 around as the authoritative hashname. | [deleted] | cm2187 wrote: | Stating the obvious but the hash is a hex, that leaves lots of | characters for a one character prefix for sha256 hashes. Like the | character "s" for instance. | speedgoose wrote: | >...a simple command like: | | > git convert-repo --to-hash=sha-256 --frobnicate-blobs --climb- | subtrees --liability-waiver=none --use-shovels --carbon-offsets | | Is it sarcasm ? | andrewflnr wrote: | Yes. | pkilgore wrote: | I love LWN's technical writing--its worth the cost of a | subscription! | sunil_saini wrote: | The above article suggests that Sha-1 collision is infeasible | because attacker has to come up with code that not only generate | same hash but also benefit him. But can't he just add some | malicious code and add some random text in comments to produce | same hash? | pornel wrote: | "produce same (specific) hash" is a pre-image attack, which is | very very hard. So hard, that even MD5 isn't broken for pre- | image, and there's only a theoretical pre-image attack against | MD4. | | We know only collision attacks which is "produce 2 files with | the same hash, but you can't control what hash". So you can't | target any existing repo. You need to use social engineering to | get one of your special files into a repo. | zackmorris wrote: | Summary of hashing function security in bits, for convenience: | | https://en.wikipedia.org/wiki/Secure_Hash_Algorithms | | Since collision resistance is roughly half the number of bits, it | seems unconscionable to me that anything below 256 bit hashes | even exist, because 64 bits is crackable but 128 bits effectively | never will be. This was well-understood even in the 90s when MD5 | and SHA were first published. | | Just thinking about this for the first time, I don't buy any | argument about storage or performance, since those become less | important as time goes on. It feels like Linus made a mistake | here, and offloaded the inevitable work of upgrading repositories | onto the general public (socialized the cost) which is something | that all programmers should work harder to avoid. | | Said as an armchair warrior who has never accomplished anything | of any importance, I realize. | mathnmusic wrote: | Also relevant: Multihash is a format for self-describing hashes | that helps with data portability and future-proofing: | https://github.com/multiformats/multihash | alkonaut wrote: | I didn't get the argument against just converting? Sure some code | bases are large and spread out, but any git repo needs to have | one blessed central point, and everyone needs to be able to just | re-clone from the central repository whenever history is | rewritten for whatever reason (could be that a huge file is | trimmed from the past etc). Why can't all commits in the Kernel | history be rewritten to SHA256? (Other than that it would be an | annoying interruption in the development)? | corbet wrote: | The kernel doesn't really have the one central blessed point of | which you speak. Sure you can grab mainline releases from | Linus's repository, but that's not where the development | actually happens. It really is a distributed project, and | having to delete all those old repositories would really hurt. | alkonaut wrote: | If 2 separate copies of the same repository does the same | rewrite to sha256, their histories are still compatible and | equal up to the point where they diverge. So other than that | the rewrite needs to happen in more places, it should still | be doable. Needs to happen at more or less the same time | however. | velcrovan wrote: | The whole point of git is that there doesn't need to be a | blessed central point. | RichardCA wrote: | Most development shops are using the traditional client- | server model, or self-host using Gitlab. | | I personally would never allow a repo with two hashing | algorithms to exist on my watch. | | If you have ever had to use a tool like BFG to prune large | objects from a repo you'll see it's not that bad, but it does | require users to re-clone. | | I would want to use the same process for SHA256 - that is let | it be the default for new projects and then convert older | projects based on need. | | But there needs to be a BFG style conversion tool that spits | out an object id map as output. | | Here's more info on BFG: https://rtyley.github.io/bfg-repo- | cleaner/ | jakeogh wrote: | Is there an archive of crypto related future predictions? | | How long until a specified length preimage attack can break | bittorrent blocks? | | I remember a paper published a ~decade ago estimating very short | (well funded) ASIC sha1 collisons. Anyone have that ref? | | EDIT: Should I have not said preimage? My understanding is | bittorrent is broken (by DDoS, not infohash(?)) if you can make a | bad block that matches the length and sha1 of a target block. | glandium wrote: | > How long until a specified length preimage attack can break | bittorrent blocks? | | Even MD5 still doesn't have a known preimage attack, so... many | many years? | tialaramex wrote: | To be fair for MD5 there is a known attack, it's just | impractical. It's a real attack though because the whole | point of a crypto hash is that you'd have to brute force it | to win, and the paper shows a slightly quicker way because | MD5 is broken. It's just not quick enough that you could | actually do it. | | Oh wait, perhaps you actually meant preimage as you said | rather than I assumed second preimage. OK yes, that isn't | ever going to be possible for non-trivial inputs. | rocqua wrote: | I think OP meant 'viable' pre-image attack. Not just an | attack that is better than brute force. | GoblinSlayer wrote: | Cheap talk is hardly an attack though. | pabs3 wrote: | There is one for hashes: | | http://valerieaurora.org/hash.html | strenholme wrote: | Not to mention this one, which covers more hash algorithms: | | https://electriccoin.co/blog/lessons-from-the-history-of- | att... | hannob wrote: | Preimage is really a whole different beast than collission. | | It's also not particularly surprising. Just by its length SHA-1 | has in its best case 80 bits of collission security and 160 | bits of preimage security. | | Now its important to understand that attacks usually don't | cause full devastation, but they usually make attacks a bit | better than optimal. | | Attacks in the 60 bit range is what's possible, attacks in the | 70 bit range is what's dangerous. It's easy to imagine that | relatively small deviation from optimal security gets SHA-1 | from 80 into the dangerous territorry (the attacks are in the | low 60s range). However getting from 160 bit down to the 60/70 | bit range would require massive improvements in attacks. | | It's safe to say that SHA-1 is still very far from preimage | attacks. Still to be clear I'd still recommend to get rid of it | whereever you can. The far bigger risk is that you think you | only need preimage security, while you actually need collission | security for scenarios you haven't thought about. | tialaramex wrote: | > EDIT: Should I have not said preimage? My understanding is | bittorrent is broken (by DDoS, not infohash(?)) if you can make | a bad block that matches the length and sha1 of a target block. | | There are three different attacks | | 1. Collision, which is practical (expensive but practical) for | SHA-1 today, lets somebody make two documents A and B which | have the same hash. This is only useful if you can fool people | somehow into accepting document B when they think it's document | A because of the hash, for example with digital signatures. | | 2. Pre-image, which is not practical for any hashes you care | about including MD5. This lets you find the document A given | the hash(A) value. This is very niche, since obviously for | large documents by the pigeon hole principle there will be many | such pre-images and it's impossible to get the "right" one, for | small inputs it can be relevant, sometimes. | | 3. Second Pre-image, likewise not practical. Given either | document A or hash(A) which you could easily determine from | document A, this lets you produce a new document A' that is | different from A but hash(A') == hash(A). This would be | extremely bad, and is what you'd need to attack real world | Bittorrent from somebody else. | | Often people say "pre-image" meaning strictly second pre-image, | it's usually clear from context, and a true pre-image attack as | I explained above is only rarely relevant. | | Collision would only let bad guys corrupt their own | purposefully constructed collision bittorrent, which like, why? | So yes, Bittorrent would only really be in serious trouble if | there was a second pre-image attack. But on the other hand, | don't use broken cryptographic primitives. Attacks only get | better, always. | jakeogh wrote: | Thanks. Is there a name for collision with the same preimage | size? | dfox wrote: | The reason why people mostly meand second pre-image when | saying unqualified "pre-image" is that probably any | imaginable method of reversing a hash function (given | sufficiently long input to the hash) will with overwhelming | probability produce hash input that is different from the | original. | SAI_Peregrinus wrote: | 1.5 Chosen-prefix collision: Given a prefix A, generate two | values AB and AC, where B and C differ but are both prefixed | with A. (AX is A concatenated with X). This exists for SHA1. | It's more powerful than a basic collision wheri you can't | pick the prefix, but weaker than either type of pre-image. | wyoung2 wrote: | It's worth noting that this attack is a property of the | Merkle-Damgard hash construction, not of SHA-1 | specifically, which means SHA-2 (Git's path forward) is | also vulnerable: | | https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_c | o... | | https://www.reddit.com/r/crypto/comments/44p5jc/eli5_why_ar | e... | | Fossil uses SHA-3, which has an entirely different | construction, which is not at this time known to have a | similar weakness. SHA-3 is also much newer, with a much | shorter list of known attacks. ___________________________________________________________________ (page generated 2020-02-04 23:00 UTC)