[HN Gopher] SHA-1 'fully and practically broken' by new collisio... ___________________________________________________________________ SHA-1 'fully and practically broken' by new collision (2020) Author : ofou Score : 140 points Date : 2021-10-12 19:41 UTC (3 hours ago) (HTM) web link (duo.com) (TXT) w3m dump (duo.com) | dang wrote: | Some past threads: | | _SHA-1 collisions now cost $45k [pdf]_ - | https://news.ycombinator.com/item?id=23350223 - May 2020 (62 | comments) | | _The first chosen-prefix collision for SHA-1_ - | https://news.ycombinator.com/item?id=21979333 - Jan 2020 (352 | comments) | | _Abusing SHA-1 collisions for Chromium updates_ - | https://news.ycombinator.com/item?id=20114809 - June 2019 (36 | comments) | | _A SHA-1 chosen-prefix collision attack_ - | https://news.ycombinator.com/item?id=19907127 - May 2019 (71 | comments) | | _From Collisions to Chosen-Prefix Collisions Application to Full | SHA-1 [pdf]_ - https://news.ycombinator.com/item?id=19878917 - | May 2019 (18 comments) | | _SHA-1 Collision Detection on GitHub.com_ - | https://news.ycombinator.com/item?id=13917990 - March 2017 (90 | comments) | | _Linus ' reply on Git and SHA-1 collision_ - | https://news.ycombinator.com/item?id=13719368 - Feb 2017 (262 | comments) | | _When Will We See Collisions for SHA-1? (2012)_ - | https://news.ycombinator.com/item?id=13719079 - Feb 2017 (1 | comment) | | _Announcing the first SHA-1 collision_ - | https://news.ycombinator.com/item?id=13713480 - Feb 2017 (485 | comments) | | _How would Git handle a SHA-1 collision on a blob?_ - | https://news.ycombinator.com/item?id=13547348 - Feb 2017 (5 | comments) | | _Why it's harder to forge a SHA-1 certificate than to find a | SHA-1 collision_ - https://news.ycombinator.com/item?id=10778773 | - Dec 2015 (43 comments) | | _The Cost of Creating Collisions Using SHA-1_ - | https://news.ycombinator.com/item?id=8629906 - Nov 2014 (9 | comments) | | _When Will We See Collisions for SHA-1?_ - | https://news.ycombinator.com/item?id=4618069 - Oct 2012 (51 | comments) | cletus wrote: | Git was created 16 years ago. The impending breakage of SHA-1 was | known even at that time, just like how MD5 had been broken before | it. | | I'm honestly still shocked that updating the hashing algorithm | wasn't built into Git from day one. I really wonder why. Did | people think this wouldn't happen? Were they so in love with the | performance of C/C++ being able to pass around 20 byte hashes on | the stack without worrying about a more complicated structure (eg | a collection of variable length hashes)? | | It just seems like such a massive and foreseeable oversight | that's going to cause pain for some time to come for really no | reason at all. | tantalor wrote: | > going to cause pain | | What kind of attack on a git repo are you worried about? | snovv_crash wrote: | Subrepo hash substitution could enable a supply-chain attack | dec0dedab0de wrote: | Linus posted about it on google plus in 2017. I haven't re-read | it yet, but I remember one of the ideas floating around hn at | the time was to just have two hashes per commit. That is, two | insecure hashes may be secure together for git's purposes. Even | though we can generate collisions for md5, and sha1, it would | be much more difficult to have a file generate an arbitrary | collision for both at the same time. | | Here is a link to Linus's post on the internet archive | | https://web.archive.org/web/20170717192607/https://plus.goog... | | And here are some hn posts around the same time | | https://news.ycombinator.com/item?id=13733481 | | https://news.ycombinator.com/item?id=13719368 | bawolff wrote: | > That is, two insecure hashes may be secure together for | git's purposes. | | Generally that's not true for Merkle-Damgard (e.g. sha1, | sha2) hashes - afaik the difficulty of finding a collision in | the more difficult hash is the same (up to big-oh) as finding | it in both hashes. | umvi wrote: | "We note that classical collisions and chosen-prefix collisions | do not threaten all usages of SHA-1." | ori_b wrote: | For someone to be able to break your repo using sha1 | collisions, they need to be able to commit to it. | | If you don't trust someone, don't let them commit to your repo. | | > _Were they so in love with the performance of C /C++ being | able to pass around 20 byte hashes on the stack without | worrying about a more complicated structure (eg a collection of | variable length hashes)?_ | | The hashes show up everywhere. They're how every object in git | is identified. They make it onto disk. They're not just commit | ids. | | Changing hashes changes the disk format. | lamontcg wrote: | Yeah I don't think any of these proposed attacks work without | write access to the repo, at which point the game is already | pretty much over already. | [deleted] | alerighi wrote: | SHA-1 will still work fine for the purpose of git. It is just | no longer considered secure for cryptographic operations, such | as digital signature, that doesn't mean that you can't use it | for other purposes, like git does. Using it is still fine and | will ever be fine. | | Making the hashing algorithm exchangeable would have introduces | complexity in a software that is already complex, and also less | efficient (one of the reasons git was created was speed for | large project like the Linux kernel) for no real purpose. If | you want to change the algorithm, given that you will break | compatibility with all existing repositories, tools, and | clients, you make a fork of git because you are changing too | much. | | I don't see why migrating to SHA-256. The collisions are still | very unlikely to generate accidentally, and sure, if you want | to do damage to a repository you can create one on purpose, as | you can as well commit some hooks that contains malware, or | alter the git history in any way you want, so what's the point? | simias wrote: | Honestly I find these rationalizations around the use of | SHA-1 annoying and counter-productive. The rule is simple: | don't use SHA-1. If you already use SHA-1 migrate away from | it. You know that plenty of software out there that | interfaces with git expecting that the commit hash will be | unique. Is it a security risk? Maybe, maybe not. I don't care | to find out. | | It doesn't matter until it starts mattering. If the Git devs | had done the right thing over a decade ago we wouldn't be | having this discussion. The longer they wait the more painful | the migration will be. | | SHA-2 was published in 2001, git was released in 2005 and now | we're in 2021 and we're having this discussion again. The | first concrete attack was released in "early 2005" according | to wikipedia, so there's really no excuse. | | Just do it, make a major change where you replace SHA-1 with | SHA-256 and call it a day. It's going to be painful for a few | months and then we'll move on. | | For me these discussions demonstrate the immaturity of | software engineering. In other industries regulators would've | banned the use of SHA-1 and you couldn't get certified if you | used it. | | Do electronic engineers regularly try to argue "well ok RoHS | says we can't have lead solder in this product but frankly | for this one it's fine the casing is completely waterproof | and there are no risks for the customer"? No, they don't. If | the spec says no lead, then either it's no lead or you can't | sell your product. End of story. | | SHA-1 is the lead solder of software engineering. Only | acceptable for military use. | umvi wrote: | > You know that plenty of software out there that | interfaces with git expecting that the commit hash will be | unique | | You also know that plenty of software out there that | interfaces with git has hardcoded assumptions (like, for | example, the assumption that the commit hash will be | exactly 40 characters long). Some tools parse the output of | git log and other commit-bearing commands to make | decisions. Will changing git to SHA-256 create new | unforeseen security risks due to breakage of those tools | (for example, by only grabbing the first 40 characters of a | SHA-256 digest instead of all 64 or by just outright | crashing)? Maybe, maybe not. | | IMO I think you would create _more_ security risks with the | git integration breakage that would accompany migrating to | sha256 vs. staying with sha1. | | At this point it's almost like you want a new tool/new | command. `git` vs. `git2`. New projects use git2, existing | projects use git (or something like that). Otherwise | confusion and backwards-compatibility breakage will abound. | musicale wrote: | > New projects use git2, existing projects use git (or | something like that). Otherwise confusion and backwards- | compatibility breakage will abound. | | Like uh, python2 and python3? ;-p | nybble41 wrote: | > It is just no longer considered secure for cryptographic | operations, such as digital signature, that doesn't mean that | you can't use it for other purposes, like git does. | | Git effectively uses its hashes as a component of a digital | signature scheme, in the form of signed tags and signed | commits. The GPG signature covers the commit object, which | identifies the content (tree) by its hash. | madars wrote: | Collisions definitely do matter for git security: many people | pin explicit git hashes for their dependancies, and thus they | can be tricked in running malicious forks. This requires | placing a chosen commit in the git repo (so unlike second | preimage break it does not mean that you could attack repos | you have no control over) but that's not an unrealistic | threat model overall. | y4mi wrote: | What is the thread model though? | | I don't think it's possible to create a collision that's | also executeable code which adds a security hole or | anything. | | So what exactly would they achieve with the collision? | | And how do they push these gigantic files that have the | hash collisions to a server? The upload time would be | significant. | noway421 wrote: | The possible attack is to prepare 2 versions of a commit, | both resulting in the same commit id. Then later on, | after the project is successful/etc, swap out the commit | with the second version, while keeping the other commits | intact. | | Granted, the file that the commit touches would need to | be not touched in other commits. That's not out of | question in a typical software project - maybe a file in | the utils folder which is only written once and never | changed? | | > I don't think it's possible to create a collision | that's also executeable code | | You can include an unreadable binary blob in the commit. | Tweak the blob to find the collision while keeping the | code the way attack requires. | [deleted] | DSingularity wrote: | That one is nasty. | OJFord wrote: | A denial of service of sorts? (Something broken and | unusable is delivered instead, as distinct from something | usable but maliciously so.) | | I agree that the chances of ever getting a second pre- | image that not only makes sense, but does so in some | malicious way may as well be zero, surely? | lvh wrote: | The part about the 2nd pre-image chances being | effectively zero may be true, but the nasty cases | described upthread don't need a 2nd pre-image. (You could | do a lot worse with one, granted!) | [deleted] | madars wrote: | The files don't need to be gigantic. You could, for | example, have a binary config file which in one colliding | version encodes a potentially dangerous debugging | setting, e.g., "allow_unauthenticated_rpc = false" but in | other has it to "true". | lvh wrote: | 1) People systematically underestimate the possibility of | creating collisions that still do something | "interesting", like being polyglots (files that can be | interpreted in multiple formats, executable or | otherwise). See PoC||GTFO, specifically anything by Ange | Albertini, for examples; grep | https://github.com/angea/pocorgtfo/blob/master/README.md | for "MD5". I specifically recommend this writeup: https:/ | /github.com/angea/pocorgtfo/blob/master/writeups/19/R... | . | | 1bis) You can use an existing collision to create new | collisions. People seem to think you need to generate all | the work again from scratch; this is not true. See | PoC||GTFO for proof by example. | | 1cis) The files do not need to be gigantic. See PoC||GTFO | for proof by example. | | 2) You can do the collision in advance, and publish the | malicious version later. What it accomplishes is that the | concept of "this Git hash unambiguously specifies a | revision" no longer works, and one of them can be | malicious. | | 3) The standard should be "obviously safe beyond a | reasonable doubt", not "not obviously unsafe to a non- | expert". By the latter standard, pretty much any random | encryption construction is fine. (The examples I gave use | MD5, not SHA-1, but that's a matter of degrees.) | | 4) SHA-256 was published years before git first was. | whoisburbansky wrote: | What do you mean by the `bis` and `cis` suffixes to your | entry labels? | lvh wrote: | It's just a subdivision; it might as well have said 1a, | 1b... -- but "bis" and "cis/tris" (and possibly tetrakis) | tend to emphasize that they're addenda, not equal points. | whoisburbansky wrote: | Ah, gotcha. I thought I recognized the prefixes from | Organic, but I don't think I've seen it used like this | here. Neat! | [deleted] | occamrazor wrote: | In addition to the attack described in a sibling comment, | when a hashing algorithm has been broken in some way, it | is safe to assume that other more advanced collision | attacks will be soon discovered. | cogman10 wrote: | And, to that point, I'm not really convinced that a | cryptographic hashing algorithm is really a great choice for | git. | | It is nice that it checks off the boxes for even distribution | of hashes, but there's a bunch of other hashing algorithms | that can do that without the performance penalty inherent in | crypto hashes. For example, FNV seems like a good fit for | something like git. | simias wrote: | Is hashing a significant bottleneck in any git deployment? | I'd expect that the diffing would be vastly more expensive | for instance. | | Besides don't many modern CPU supports things like SHA-256 | in hardware? | TonyTrapp wrote: | FNV is really cool and has a reasonable quality given its | simplicity. But it does have issues (sticky state, and I | think the avalanche characteristics also weren't great) | that are solved by only slightly more complex hashing | algorithms. | lvh wrote: | > SHA-1 will still work fine for the purpose of git. It is | just no longer considered secure for cryptographic | operations, such as digital signature, that doesn't mean that | you can't use it for other purposes, like git does. Using it | is still fine and will ever be fine. | | This suggests git does not rely on its hash for security | properties, which seems false? What is the purpose of pinning | revisions or signing git tags? | elfchief wrote: | Git was not intended (AFAIK) to be cryptographically secure. | Being unsuitable for crypto != being unsuitable for other uses. | wcoenen wrote: | Surely signed tags and signed commits in git are supposed to | be cryptographically secure? [0] | | Doesn't the security of those signatures depend on the | security of the SHA-1 hashes that are being signed? | | [0] https://git-scm.com/book/en/v2/Git-Tools-Signing-Your- | Work | danachow wrote: | No - after all, the most common use of digital signatures | is to sign documents that can be easily tampered. All the | security is in the signature, not the content being signed. | lvh wrote: | This makes no sense. Sure: you can generate an X.509 | certificate that says whatever you want, but the point is | that you can validate the signature and see that it's a | forgery. In the case of a hash-addressed system like git, | the problem is that the signature is over a collision, so | it no longer certifies the thing its supposed to certify. | Git uses the hash as a shorthand for a revision, | including its entire history--so yes, it is using the | hash that way. | | By that logic, would MD5 be fine? MD4? CRC32? | wongarsu wrote: | A digital signature usually signs (~encrypts) a hash of | the content. So asking what exactly signing a commit or | tag entails is a very valid question. I would expect that | signing a tag is only as strong as SHA-1, since a tag is | essentially a label for a commit hash. For signing | commits I have no clue, but would be quite interested as | well. | formerly_proven wrote: | Git is designed around content-addressed storage and while you | _can_ cater for hash migrations, it gets pretty messy design- | wise. Gits core data structures were also designed really | quickly to patch a very urgent need of the kernel hackers. I | doubt it has anything to do with being in love passing 20 byte | strings on the stack. The fine git folks have produced a rather | detailed document about the hash migration (https://git- | scm.com/docs/hash-function-transition/) and it is not | particularly simple to do this. | john_alan wrote: | > Our work show that SHA-1 is now fully and practically broken | for use in digital signatures | | Yet 2nd preimage resistance is intact. Zzz. | r0f1 wrote: | Obligatory link: https://shattered.io/ | nikeee wrote: | The article states that the collision found is something | different than that one found by Google et al. | | It's a chosen-prefix attack. | TravisHusky wrote: | Of course it has its own website ;) | fanf2 wrote: | That was an earlier break; this article is about https://sha- | mbles.github.io/ | ofou wrote: | For the ones interested, you can use SHA-256 today. Warning: | Still in experimental mode. # .gitconfig file | [extensions] objectFormat = sha256 | musicale wrote: | Old collision (2020). ;-) | bjarneh wrote: | > The technique that the researchers developed is quite complex | and required two months of computations on 900 individual GPUs. | motohagiography wrote: | I only half joke when I ask, when you break a cryptogrpahic | hash, does it mean that now it's just a really amazing | compression algorithm, but with a very heavy compute | requirement? The non-joking half is speculating what data | compression in a post-quantum compute world looks like. | fwip wrote: | No, because reversing the hash has infinite possible answers | (well, "very many" for bounded input size). | | You can't decompress 32 bytes into 1GB because you don't know | which of the 1GB-sized answers is the intended one. | ganzuul wrote: | Natural data is easily identifiable. | dTal wrote: | Not it's not. You can't distinguish the complete works of | Shakespeare from the complete works of Shakespeare with | the words changed a bit, or switching to Harry Potter for | a bit in the middle, or a completely dazzling and | original work of fiction. Any hash might generate all | three of these, and essentially boundless other works. | It's a library of Babel. | ganzuul wrote: | Your examples really fall out of the scope of the | premise. | fwip wrote: | No, they're 100% correct. | contravariant wrote: | Only if you count all natural looking data of which there | is way more than you could possibly imagine. | | It is of course likely that humanity hasn't yet created | more than 2^100 (~10^30) files, so in _theory_ given a | registry of all files in existence you might be able to | identify it by its hash. However while this is simple it | 's definitely not easy. | ganzuul wrote: | I can imagine the output of program space. That's pretty | big. All possible universes in fact. | | It's an issue of probability and bins. While natural data | is infinite, there is vastly more unnatural data. At some | point you have enough metadata (e.g. natural vs. random) | to know that the original data came from Earth to pick | out the right needle from the needle stack. | | Unless the data is from a completely alien source, we are | close enough to that source for our heuristics to be of a | manageable size. | contravariant wrote: | It's somewhat irrelevant how many unnatural data there | is. The point is that there's way more plausible data (or | sentences if we're restricting ourselves to natural | language) than there are possible 160; 256 or 512 bit | hashes. | | Unless of course you enumerate all of human | communication, but like I said that doesn't count as | 'easy'. | fwip wrote: | Nope, not on these scales. | | Say you have a 32 byte hash, that's 2^(32 _8) possible | values, right? Now say your input is 128 bytes, that | means that (on average) each 32 byte hash maps to 2^(96_ | 8) values. | | "Natural looking" data will occur an astronomical number | of times in a search space that large - which is, again, | only 96 bytes larger than your hash. | __s wrote: | That's because natural data has low entropy | | But let's say every paragraph only offers 1 bit of | entropy. Then a 160 bit hash gives you fuzzy accuracy up | to 160 paragraphs. After that you'll have to extend the | hash with hints to guide which sequence of paragraphs | you're looking for, & hints for where the typos are | | ofc, 100x compression of English text doesn't require | this amount of compute to decompress: | https://en.wikipedia.org/wiki/Hutter_Prize | | It's also impractical since most compression use cases | want to put the work in compression, & have decompression | be straight forward | | edit: 100x was misreading, it's currently 8.6x | http://prize.hutter1.net/#prev | ganzuul wrote: | This is very abstract, but I believe that since both | input and output are very close in program space compared | to the total size of PS, the heuristics to map between | them are going to be of a manageable size. This is based | on an intuition about a single source for all activity in | this universe. | AnimalMuppet wrote: | "Paragraph" is highly optimistic. IIRC, each English | _character_ has about 1 bit of entropy. | __s wrote: | Shannon estimated 0.6 to 1.3: | https://cs.fit.edu/~mmahoney/dissertation/entropy1.html | | For the sake of argument I figured I'd be highly | optimistic. The linked prize shows practical evidence of | 1GB of Wikipedia being compressed below 1 bit per | character _( & that's with a limit of 50 hours cpu time, | 10GB memory, & 100GB disk)_ | motohagiography wrote: | This is what made it a funny thought experiement to me. I | was doing some space related stuff a while back and when | you're dealing with something as small as 100mb, and you | add the constraint of up to a light-year or more of | latency, which makes error correction immensely costly, | having a quantum processor reciever find the collision of a | unique hash could be faster than transmitting the data with | errors. | | There's probably a distance where the latency in light | years is longer than it takes for a quantum processor with | some shortcuts to exhaust the search space for the | collision on the hash - and potential hashing algorithms | that contain information and hints about a hyperplane that | generates the original data over the field of | possibilities. To a quantum computer with sufficient qbits | at a long enough latency distance away, the 32-bit hash of | a 2048bit RSA key may "arrive" faster than transmitting the | whole key. | | It feels like a fundamental misunderstanding of something | in communication theory, but this was the thinking that | prompted it. | mypalmike wrote: | Using the pigeonhole principle, consider how many 100 | megabit values must, on average, hash to the same 32 bit | value. The answer is a huge number (I could be wrong but | it seems to me it might be 2 ^ (100,000,000 - 32)?). This | is a bit of a paradox in cryptographic hashing, where | finding a collision is difficult, but the number of | collisions is enormous. | akomtu wrote: | Not so simple. A hash corresponds to infinitely many | messages, but how many of them are in ASCII charset under | 1KB long? It may happen that each hash has a unique message | within these constraints. | hyperpape wrote: | Ignoring punctuation, capitalization and spaces, you have | | (26 ^ 1024) / (2 ^ 160) ascii texts, which is too large | for python to do the division. | | Thinking harder, that's: | | >>> (13 * (160)) * (26 ** (1024 - 160)) 58601538220582696 | 072767254440146387121221583753570902541264313546485610004 | 750780866792754920589068132679848156271786567137191486170 | 703327101040110575724309837442235432328033598945697712388 | 381451978878967640960102861159559384620162207350857311340 | 050840762653291815079540852172190063832289697996458447828 | 737045986675145162241398406780211500684400272119809812611 | 929904403730528597187389968557044373171358434578669341421 | 589594031073341308966343982813670005358851607748477530217 | 997935791824321484740631022470362180786257680155724965742 | 143671853261978124815484529745096216129616265428595187946 | 218158257908697520717067407972750772493927919233876373156 | 233168124018474649093065762508852701249697457996541217284 | 731525708907010521446628919717790946317667295418177373934 | 569079380730731418728467142214924963037213836468723407639 | 463340501537517771097304361708288408926134607971881944851 | 903293709136443956469032501471787137653780375970169583703 | 004025580731889262167424195839803018053029467192083418365 | 771596772896857995516305314630974667718236900515720973033 | 360569178506327940782971768109891923840004914279569619570 | 809054106685577577395046255601816801966032983011243478952 | 485788667076648479153854024730784994930592675717244288051 | 727831378594097970916753614065993732935733615600050932039 | 565604760497376113470099235874052260244552013678784657526 | 956911961147614534279815393143753046456632251016766323098 | 5347176517861376 | | If you restrict it to words, assume that there's only | 1000 words in the English language, averaging 5 letters | each, you get: | | (1000 ^ 200) / (2 ^ 160) which becomes: | | >>> (500 ** 160) * (1000 ** 40) 6842277657836020854119773 | 355907793609766904013068924666782559979930620520927053718 | 196475529111921787261962890625000000000000000000000000000 | 000000000000000000000000000000000000000000000000000000000 | 000000000000000000000000000000000000000000000000000000000 | 000000000000000000000000000000000000000000000000000000000 | 000000000000000000000000000000000000000000000000000000000 | 000000000000000000000000000000000000000000000000000000000 | 000000000000000000000000000000000000000000000000000000000 | 000000000000000000000000000000000000000000000000000000000 | 00000000000000 | | There will be massive numbers of realistic texts that | match any given hash. | dTal wrote: | The pigeonhole principle forbids it, since the hash | itself is an ASCII message under 1KB long. Even within | your message constraints, the message can contain contain | any possible hash, and more besides. Therefore there are | more valid messages than hashes. | FartyMcFarter wrote: | Compression algorithms must encode data in a reversible way, | which hash functions can't do in general. | | For example, SHA-1 outputs a 160 bit hash, which means some | input of 161 or more bits will definitely have the same hash | as some other input of the same size. Even smaller inputs may | have collisions too. | [deleted] | tomsmeding wrote: | These hashes, regardless of their actual cryptographic | strength, are intended to be indistinguishable from a random | generator keyed with the input, kind of. Assuming that they | succeed in that, hashes should be quite well distributed. | That means that that for each hash n-bit hash output, there | will be about two (n+1)-bit input strings that produce that | hash, about four (n+2)-bit input strings, eight (n+3)-bit | input strings, etc. So while there are probably few _ASCII_ | input strings that map to one particular hash, for example, | there will be loads and loads of arbitrary bitstrings that | map to that hash. | | Hashes are very bad compressors. :) | kazinator wrote: | The compression joke evokes a misconception (as STEM jokes | often do) which, in this case, is that breaking a hash means | finding a way to recover the full text of a multi-megabyte | document, from a 100-something byte hash. | | Whereas all it means is that it's become feasible for someone | to produce a fake document with the same hash as the genuine | one; the attack _depends_ on the existence of collisions, | which depend on the hash being one-way, such that an infinite | set of documents could be "recovered" from the hash. | | Most of the documents you could "decompress" from a hash are | going to be gibberish, but some of them are going to be | legit. If a hash is "broken", that means it's somehow unduly | easy to search that ocean of gibberish collisions to find a | chosen text which goes to a certain desired hash. | | Nothing quantum can redefine a function that isn't one-to-one | and onto into one that is. The hashing function remains ever | the mathematical object that it is: a one-way function. But | calculations of or searches through that function can be | faster. | | In a post-quantum world, pre-quantum crypto and hashing could | well turn out to be "for shit" as such, but not due to ruses | like suddenly reversing irreversible functions. | | Not saying that the joke isn't funny! We can imagine some | hard-core computer science or math sitcom in which the fall | characters say things like this, and only the very narrow | target audience gets it. | lallysingh wrote: | Or roughly 1.3M GPU hours. Less as GPUs get upgraded. Not sure | how many AWS F1 (FPGA) hours it would take. | wongarsu wrote: | So somewhere between $300,000 (running consumer GPUs cheaply) | and $30 million (paying AWS on-demand prices for Tesla V100). | | Even the upper bound for that seems well worth it for well- | funded attackers if the target is juicy enough. | asdfman123 wrote: | I wonder how much money in Bitcoin they would have made if they | had used those hashes to mine instead. | | Speaking of which, it would be funny if they were pretending to | do security research but added a secret backdoor to mine | bitcoin instead, somehow exporting those hashes or using the | partial results of SHA-1 calculations (BTC isn't SHA-1). | | I'm just joking, but I wonder if that's possible. If anyone is | Machiavellian enough for that, it's security researchers. | rgovostes wrote: | (Jan 7, 2020) | ImJasonH wrote: | Reminder that GitHub has blocked Git commit collisions since | 2017, and as far as anybody is aware hasn't seen one in the wild. | | https://github.blog/2017-03-20-sha-1-collision-detection-on-... | tantalor wrote: | > A higher probability exists that every member of your | programming team will be attacked and killed by wolves in | unrelated incidents on the same night. | | - Scott Chacon | asdfman123 wrote: | That's even true if you make your developers commute into | Yellowstone National Park by ski every day | bawolff wrote: | This is stupid. The probability of any attack happening at | random is close to zero. E.g. the probability of a buffer | overflowing just right to give you a shell through random | chance is less than being eaten by wolves. | | The question is about the risk of someone intentionally | performing the attack, not the probability it will | accidentally happen at random. | api wrote: | Random collisions in 160-bit space are incredibly unlikely. | This is talking about intentional collision, and means that | it's entirely feasible for someone with significant compute | power to create a git commit that has the exact same hash as | another git commit. This could allow someone to silently modify | a git commit history to e.g. inject malware or a known "bug" | into a piece of software. The modified repository would be | indistinguishable if you're only using git hashes. | | Git's uses SHA-1 for unique identifiers, which is technically | okay as long as they are not considered secure. If git were | designed today it would probably use SHA2 or SHA3 but it's | probably not going to change due to the massive install base. | | Edit: anyone know if git's PGP signing feature creates a larger | hash of the data in the repo? If not maybe git should add a | feature where signing is done after the computation of a larger | hash such as SHA-512 over all commits since the previous | signature. | asdfman123 wrote: | > Random collisions in 160-bit space are incredibly unlikely | | Just for fun: to get a 5% chance of a hash collision between | ANY two numbers in an 160 bit space, you'd have to generate | 3.9e23 hashes. | | So you'd have to generate 1000 hashes per second for * _12 | trillion years.*_ | | Formula: | | n = sqrt(2 * 2^160 * ln(1/(1-0.05)) | | https://en.wikipedia.org/wiki/Birthday_problem#Probability_o. | .. | tialaramex wrote: | The defence used by GitHub specifically defends _against_ | these intentional collisions, not some mirage of random | collisions. | | Basically you collide a hash like SHA-1 or MD5 by getting it | into a state where transitions don't twiddle as many bits, | and then smashing the remaining bits by brute force trial. | But, such states are _weird_ so from inside the hash | algorithm you can notice "Huh, this is that weird state I | care about" and flag that at a cost of making the algorithm a | little slower. The tweaked SHA1 code is publicly available. | | If you're thinking "Oh! I should rip out our safe SHA256 code | and use this unsafe but then retro-actively safer SHA1" No. | Don't do that. SHA-256 is safer and faster. This is an | emergency patch for people for whom apparently 20 years | notice wasn't enough warning. | | In theory the known way to do this isn't the only way, but, | we have re-assuring evidence for MD5 that independent forces | (probably the NSA) who have every reason to choose a | different way to attack the hash to avoid detection _do_ | trigger the same weird states even though they 're spending | the eye-watering sum of money to break hashes themselves not | just copy-pasting a result from a published paper. | [deleted] | noway421 wrote: | From a practical point of view, how would injecting malware | happen? If you're trying to insert a malicious diff somewhere | deep in the git history, you would need to recompute all the | other commits after the injected commit - which would most | certainly change their commit ids too if they are touching | the same file. When other commit ids change, the malicious | change becomes detectable. | | There's also the case for auditing: force pushing into an | existing repo triggers an event in GitHub and is logged. | While the logging event can be missed, it leaves a paper | trail. | | With things like reproduce-able builds, this also becomes | harder. Distributing (through a means of a fork, or putting | it up on a website mytotallylegitgitrepos.com) source code | which builds into a binary which doesn't match upstream hash | is suspicious. | AnimalMuppet wrote: | Can you create an intentional collision, and _also_ have the | counterfeit have compilable code? (Or, in other | circumstances, intelligible English text?) | jfrunyon wrote: | Yes, just like you can create an intentional collision, and | _also_ have the counterfeit be a PDF. | [https://shattered.io/] | [deleted] | throw_away wrote: | Looks like there's a migration plan for git to SHA-256: | | https://git-scm.com/docs/hash-function-transition/ | jfrunyon wrote: | https://stackoverflow.com/questions/23584990/what-data-is- | be... says git signing just signs the commit hash (plus | metadata: committer, commit message, etc). | a_t48 wrote: | As far as I know it only signs individual commits. | oconnore wrote: | I was surprise that no one suggested truncating SHA-256 to | 160 bits (same as for SHA2-256/224, or SHA2-512/256). The | attacks on SHA-1 are not directly based on the length of the | hash, they are based on weaknesses in the algorithm. | | Even attacking SHA2-256/128 would be quite difficult as I | understand it, even though it's the same length as MD5. | | Truncated hashes also of course have the great property that | they mitigate the length extension in Merkle-Damgard | adrian_b wrote: | It is not yet possible "to create a git commit that has the | exact same hash as another git commit" in the sense that if | someone else has already done a commit you can make another | commit with the same hash. | | What is possible now is something that is much easier: if you | have enough money and time, you can create 2 commits with the | same hash, which start with some different parts, which may | be chosen arbitrarily, then they have to include some parts | that must be computed to ensure that the hashes will match | and which will be gibberish that might be disguised in some | constants or some binary blob, if possible. | | Then you can commit one of them and presumably you can later | substitute the initial commit with the other one without | anybody being able to detect the substitution. | outworlder wrote: | > presumably you can later substitute the initial commit | with the other one without anybody being able to detect the | substitution. | | How? Which operation would be involved? Will it not show | anywhere else(reflog)? | adrian_b wrote: | Because such an operation is unlikely to succeed even if | the hashes match, replacing SHA-1 was not a high priority | for Git. | noway421 wrote: | `git push --force` I assume. In GitHub, this will leave a | trail of events though - Webhook events and auditing. | DSMan195276 wrote: | I would think it's not that simple because `git push | --force` doesn't do anything if it thinks the histories | are the same, and in this case we've created a history | that appears to be the same. You'd likely need a custom | git client (which isn't a problem) but I don't know | enough about the internals of a git push to know if the | server would even accept objects if they match hashes of | objects it already has (it may just go "I don't need | that" and ignore it, because what's the point in doing | anything with it?). Presumably it would depend on the | exact server implementation whether you could get it to | replace an existing object with a "new" one which is | actually the same hash, but frankly I think that's | unlikely because it would be pointless work from the view | of the server. If it does happen I'm not sure what | auditing you would actually be able to see, webhooks and | such might not be triggered because the history didn't | actually "change". | | What you could do however is just host it yourself | somewhere else, say put it on a fork. Or if you have | access to the actual repository hosting the original | version, you could just manually replace it yourself. git | clients aren't going to just automatically pull the "new" | version though so you'll have some combo of people with | the old and people with the new, and it gets a little | messier from there. | sverhagen wrote: | If you can force push, why would you then not push a | different commit before pushing your updated commit with | the same, original hash, or does that also not work? | formerly_proven wrote: | The reflog shows changes to references. References are | hashes. Git is CAS, and uses the CAS assumption: H(m1) = | H(M2) <=> m1 = m2. | outworlder wrote: | > This could allow someone to silently modify a git commit | history to e.g. inject malware or a known "bug" into a piece | of software. | | You need a collision. You also need it to be syntactically | correct. You need it to not raise any red flags if you are | contributing a patch. And ultimately you need it to do what | you want. | | That's a pretty tall order. | api wrote: | All you need is one variable sufficiently large that you | can change until you find a collision, such as a comment | (which about all languages allow) or a meaningless number. | | You could even vary whitespace until it fits, like spaces | at the end of lines. | ImJasonH wrote: | Yep. GitHub is saying they would block an object that looked | like it was crafted to produce a collision using SHAttered, | and hasn't seen it, intentionally or otherwise, in the wild. | lifthrasiir wrote: | The chosen-prefix collision with a complexity of 2^63.4 (2020). | akomtu wrote: | Looks like they needed 1.5 millions GPU hours to find a | collision with a random hash. | smegcicle wrote: | thats it, it's over, git is finished | [deleted] ___________________________________________________________________ (page generated 2021-10-12 23:00 UTC)