[HN Gopher] A Git repository with 2^28 commits--one for every 7-...
       ___________________________________________________________________
        
       A Git repository with 2^28 commits--one for every 7-character
       shorthash
        
       Author : breck
       Score  : 59 points
       Date   : 2021-07-09 20:39 UTC (2 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | detaro wrote:
       | > _The repository has so many commits that git push hangs and
       | runs out of memory, presumably because it tries to regenerate a
       | packfile on the fly._
       | 
       | Any guesses if this would also happen if you tried to push it
       | bit-by-bit? (although you'd of course need reasonably large
       | groups of commits still, to not end up with an impossible number
       | of pushes)
        
       | sverhagen wrote:
       | As the README.md acknowledges, the usefulness may be limited,
       | except for the fun in experimentation. What may not be obvious to
       | basic Git users is that, while it may take 2^28 commits to fill
       | up the entire address space of the 7-character shorthash, they
       | are not designed to be unique (they are just the first part of
       | the longer, unique hash). As a result, even relatively small
       | repositories often already have _some_ duplicate shorthashes. And
       | people scripting around their Git shorthashes must be prepared to
       | deal with larger shorthashes, like 8-characters, 9, 10, 11,
       | whatever it takes to disambiguate. My random Git repository of a
       | mere 16865 commits (well, that's just "master") that I'm looking
       | at over here, nothing out of the ordinary, needs shorthashes up
       | to 11 characters to disambiguate all of them. (Not all the
       | clashes may be on the same or main branch.)
        
       | emerged wrote:
       | To be clear it's code to generate a repository, not a repository.
        
         | app4soft wrote:
         | Yeah.
         | 
         | Linked repo has only 2 commits.[0]
         | 
         | [0] https://github.com/not-an-aardvark/every-git-commit-
         | shorthas...
        
           | Ericson2314 wrote:
           | And those two commits are
           | 00000002bdd056473559d2bd0eb835561b3c874b
           | 00000002f7c605501165ee5e3c2db20ffe178848
           | 
           | What the hell?!
        
             | surye wrote:
             | Hah, that's clever. The author is using their other toy
             | research tool/project: https://github.com/not-an-
             | aardvark/lucky-commit
        
             | redler wrote:
             | It's commit mining.
        
         | [deleted]
        
       | pronoiac wrote:
       | Doing the math, 2 to the 28th is around 268 million.
        
         | 988747 wrote:
         | I can imagine a big company with a monorepo (i.e. Google)
         | reaching that number in a few years.
        
           | whatshisface wrote:
           | Generously, there are about 40,000 people at Google who might
           | commit to the monorepo. That's only 6,000 or so commits per
           | person, a fairly achievable number. Although since they're
           | not purposely generating every shorthash, it would take
           | significantly longer for the absolute last unique hash to be
           | created.
        
             | yellow_lead wrote:
             | I wonder what the number of commits is before you need to
             | start worrying about 7 character collisions. (Birthday
             | problem anyone?)
        
             | charcircuit wrote:
             | Don't forget the commits made by programs.
        
           | pronoiac wrote:
           | From the "unexpectedly useful for security research" link:
           | 
           | > Due to the birthday problem, any repository that has at
           | least 19291 commits is likely to have a pair of ambiguous
           | commits somewhere.
        
             | ghoward wrote:
             | I can't find the link. :(
             | 
             | Edit: nevermind; I am stupid.
        
           | codetrotter wrote:
           | Note that collisions in short hash are not actually a problem
           | as such.
           | 
           | > Git can figure out a short, unique abbreviation for your
           | SHA-1 values. If you pass --abbrev-commit to the git log
           | command, the output will use shorter values but keep them
           | unique; it defaults to using seven characters but makes them
           | longer if necessary to keep the SHA-1 unambiguous
           | 
           | https://git-scm.com/book/en/v2/Git-Tools-Revision-Selection
           | 
           | and also
           | 
           | > Git doesn't really truncate anything, internally everything
           | will be handled with the complete value.
           | 
           | https://stackoverflow.com/questions/7128444/how-does-
           | github-...
        
             | IgorPartola wrote:
             | I wonder how many tools out there hard code the 7 character
             | length for a hit commit hash length and would break upon a
             | collision.
        
           | wruza wrote:
           | This number may be even lower if you take the birthday
           | problem into account. I'm not a statistics guy to confirm
           | that or to make proper calculations, but I believe it applies
           | to this case as well, because first few bits of a hash are
           | like what a birthday is to an otherwise unique person.
           | 
           | https://en.wikipedia.org/wiki/Birthday_problem
        
       | mrkramer wrote:
       | So this is basically a proof of work algorithm.
        
         | posnet wrote:
         | An assignment in one of my university security course was to
         | mine "gitcoin".
         | 
         | Which was a git based proof of work, the server would only
         | accept pushes for commits if it had more leading zeros in its
         | hash than the previous commit on that branch.
        
           | distrill wrote:
           | That sounds like a ton of fun, and tbh way cooler than
           | anything I built in school.
        
         | colejohnson66 wrote:
         | Git, but on a Blockchain? /s
        
           | Cerium wrote:
           | Git is a Blocktree - a type of directed acyclic graph based
           | proof of nothing crypto product that is invulnerable to fork
           | based attacks by supporting it out of the box. /s
        
       | gopherbro wrote:
       | It seems someone write a script for it.
        
       ___________________________________________________________________
       (page generated 2021-07-09 23:00 UTC)