[HN Gopher] A Git repository with 2^28 commits--one for every 7-... ___________________________________________________________________ A Git repository with 2^28 commits--one for every 7-character shorthash Author : breck Score : 59 points Date : 2021-07-09 20:39 UTC (2 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | detaro wrote: | > _The repository has so many commits that git push hangs and | runs out of memory, presumably because it tries to regenerate a | packfile on the fly._ | | Any guesses if this would also happen if you tried to push it | bit-by-bit? (although you'd of course need reasonably large | groups of commits still, to not end up with an impossible number | of pushes) | sverhagen wrote: | As the README.md acknowledges, the usefulness may be limited, | except for the fun in experimentation. What may not be obvious to | basic Git users is that, while it may take 2^28 commits to fill | up the entire address space of the 7-character shorthash, they | are not designed to be unique (they are just the first part of | the longer, unique hash). As a result, even relatively small | repositories often already have _some_ duplicate shorthashes. And | people scripting around their Git shorthashes must be prepared to | deal with larger shorthashes, like 8-characters, 9, 10, 11, | whatever it takes to disambiguate. My random Git repository of a | mere 16865 commits (well, that's just "master") that I'm looking | at over here, nothing out of the ordinary, needs shorthashes up | to 11 characters to disambiguate all of them. (Not all the | clashes may be on the same or main branch.) | emerged wrote: | To be clear it's code to generate a repository, not a repository. | app4soft wrote: | Yeah. | | Linked repo has only 2 commits.[0] | | [0] https://github.com/not-an-aardvark/every-git-commit- | shorthas... | Ericson2314 wrote: | And those two commits are | 00000002bdd056473559d2bd0eb835561b3c874b | 00000002f7c605501165ee5e3c2db20ffe178848 | | What the hell?! | surye wrote: | Hah, that's clever. The author is using their other toy | research tool/project: https://github.com/not-an- | aardvark/lucky-commit | redler wrote: | It's commit mining. | [deleted] | pronoiac wrote: | Doing the math, 2 to the 28th is around 268 million. | 988747 wrote: | I can imagine a big company with a monorepo (i.e. Google) | reaching that number in a few years. | whatshisface wrote: | Generously, there are about 40,000 people at Google who might | commit to the monorepo. That's only 6,000 or so commits per | person, a fairly achievable number. Although since they're | not purposely generating every shorthash, it would take | significantly longer for the absolute last unique hash to be | created. | yellow_lead wrote: | I wonder what the number of commits is before you need to | start worrying about 7 character collisions. (Birthday | problem anyone?) | charcircuit wrote: | Don't forget the commits made by programs. | pronoiac wrote: | From the "unexpectedly useful for security research" link: | | > Due to the birthday problem, any repository that has at | least 19291 commits is likely to have a pair of ambiguous | commits somewhere. | ghoward wrote: | I can't find the link. :( | | Edit: nevermind; I am stupid. | codetrotter wrote: | Note that collisions in short hash are not actually a problem | as such. | | > Git can figure out a short, unique abbreviation for your | SHA-1 values. If you pass --abbrev-commit to the git log | command, the output will use shorter values but keep them | unique; it defaults to using seven characters but makes them | longer if necessary to keep the SHA-1 unambiguous | | https://git-scm.com/book/en/v2/Git-Tools-Revision-Selection | | and also | | > Git doesn't really truncate anything, internally everything | will be handled with the complete value. | | https://stackoverflow.com/questions/7128444/how-does- | github-... | IgorPartola wrote: | I wonder how many tools out there hard code the 7 character | length for a hit commit hash length and would break upon a | collision. | wruza wrote: | This number may be even lower if you take the birthday | problem into account. I'm not a statistics guy to confirm | that or to make proper calculations, but I believe it applies | to this case as well, because first few bits of a hash are | like what a birthday is to an otherwise unique person. | | https://en.wikipedia.org/wiki/Birthday_problem | mrkramer wrote: | So this is basically a proof of work algorithm. | posnet wrote: | An assignment in one of my university security course was to | mine "gitcoin". | | Which was a git based proof of work, the server would only | accept pushes for commits if it had more leading zeros in its | hash than the previous commit on that branch. | distrill wrote: | That sounds like a ton of fun, and tbh way cooler than | anything I built in school. | colejohnson66 wrote: | Git, but on a Blockchain? /s | Cerium wrote: | Git is a Blocktree - a type of directed acyclic graph based | proof of nothing crypto product that is invulnerable to fork | based attacks by supporting it out of the box. /s | gopherbro wrote: | It seems someone write a script for it. ___________________________________________________________________ (page generated 2021-07-09 23:00 UTC)