[HN Gopher] Continuous Unix commit history from 1970 until today
       ___________________________________________________________________
        
       Continuous Unix commit history from 1970 until today
        
       Author : FrankyHollywood
       Score  : 188 points
       Date   : 2022-06-16 14:04 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | danschuller wrote:
       | We have all this commit data at scale, it really feels like there
       | are interesting stories or lessons that could be extracted from
       | them.
       | 
       | There's kind of the obvious operational stuff like: What are the
       | properties of commits that introduce bugs compared to those that
       | don't. Which type of commits are rarely changed and which are
       | more likely to be changed over time. But what I'd find even more
       | interesting is some insight into how we solve problems and how
       | well we're able to solve them. I guess part of the puzzle is
       | missing - the external requirements / environment that give rise
       | to some number of the commits.
        
         | DSpinellis wrote:
         | There is a series of conferences MSR -- Mining Software
         | Repositories -- with research papers looking at such questions.
         | http://www.msrconf.org/ In fact, I presented this work in the
         | 2015 MSR conference.
        
       | vandahm wrote:
       | You don't see this every day:
       | 
       | https://github.com/dspinellis/unix-history-repo/blob/Researc...
       | 
       | Is this B, or is it BCPL? What would have compiled this code back
       | in the day?
        
         | marcodiego wrote:
         | They had "auto" vars in 1970. WG14, the ISO work group that
         | maintains the C programming language specification, has just
         | recently discussed acceptance of __auto_type.
         | 
         | EDIT: ops, the "auto" here means automatic allocation.
        
         | hoten wrote:
         | very weird that two characters - $( and $) - were used before {
         | and }
         | 
         | did old keyboards not have curly braces or what?
        
           | kps wrote:
           | {} were added to the 1967 revision of ASCII, along with `|~
           | and lower case. (EBCDIC never got them in the base character
           | set, only in alternate 'code pages'.)
        
         | pm215 wrote:
         | Wikipedia's article on B says that BCPL used := for assignment
         | and = for equality tests, whereas B used = for assignment and
         | == for equality. Assuming that's correct, this must be B code.
        
         | projektfu wrote:
         | It's B. BCPL has "LET MAIN() BE $(..." instead of "main $(...".
         | 
         | Running B was a challenge on the PDP-7 but easier on the
         | PDP-11, apparently, because of the increase of memory size. The
         | linked document has an interesting history about compiling B to
         | threaded code, a form of interpreted code, and then to machine
         | language. B never really made the jump to a full-fledged
         | citizen because it quickly got replaced by C, although BCPL was
         | popular for a long time.
         | 
         | https://www.bell-labs.com/usr/dmr/www/chist.html
        
         | Erlangen wrote:
         | So _auto_ is used as a keyword here. Maybe C inherits this
         | never-used auto from B?
        
           | veltas wrote:
           | auto stands for 'automatic', because such variables are
           | automatically allocated for each function invocation. In C it
           | became redundant because base types were added, and so the
           | base type could start the definition (auto was still
           | permitted with default base type of int until C99 I think).
           | auto in B is a bit like 'let', it starts a declaration, along
           | with 'extrn'.
        
         | mftb wrote:
         | Yea, I have to say, to me, this is cool. Glad to see this sort
         | of history being preserved.
        
         | judge2020 wrote:
         | Is that truly from 1970? For example, that commit's grandparent
         | seems to have been specifically crafted to use "Date: Thu, 1
         | Jan 1970 00:00:00 +0000" https://github.com/dspinellis/unix-
         | history-repo/commit/185f8....
        
           | anyfoo wrote:
           | That's 0 in Unix epoch time (guess why!), so seems more like
           | a missing timestamp than a crafted one. The fact that the
           | linked file does not have a 0 timestamp, but a slightly later
           | one, suggests it's valid, or at least intended to be valid.
        
             | Nition wrote:
             | I recall that in A Deepness in the Sky by Vernor Vinge, a
             | space sci-fi set in the far future, they're still using
             | Unix time underneath many many layers of abstractions, and
             | with their cultural context they guess that humanity must
             | have set it to start with the moment mankind first
             | travelled into space to land on the Moon.
        
               | anyfoo wrote:
               | Hah, plausible. Not far off timewise, and yet totally
               | wrong, but understandable how such a conclusion could be
               | made.
        
         | swatcoder wrote:
         | I don't know, but I love how clearly and concisely it expresses
         | what would later become ubiquitous as do-while and continue.
         | 
         | That's poetry. Nice find.
        
           | stingraycharles wrote:
           | I love how thin the layer above assembly is: without knowing
           | B, is my interpretation correct that this function
           | effectively "inherits" the stack of the calling function? In
           | other words, rather than passing function arguments and let
           | the compiler deal with it, you're supposed to push the string
           | you want to lcase onto the top of the stack?
           | 
           | Reminds me a lot of writing my own compiler/assembler in
           | university, where it's expected that all this happens
           | automatically nowadays.
        
             | anyfoo wrote:
             | Hmm, don't think so. The function does not operate on a
             | string, it seems to read a character using read() and write
             | it back, transformed, using write(). Given that the
             | function is named main, it's probably the top level
             | function anyway (from the programmer's point of view, often
             | the OS actually calls into a different function that is
             | part of the language runtime, e.g. _start, which in turn
             | calls main eventually, but that is usually hidden from the
             | programmer).
        
             | messe wrote:
             | No, that's not correct. It reads the string from standard
             | input. A C translation would look like this:
             | main()         {             int ch;             while ((ch
             | = read()) != 4) {                 if (ch > 0100 && ch <
             | 0133)                     ch = ch + 040;                 if
             | (ch == 015) continue;                 if (ch == 014)
             | continue;                 if (ch == 011) {
             | ch = 040040;                    write(040040);
             | write(040040);                 }                 write(ch);
             | }         }
             | 
             | A more modern C version would look like:
             | #include <stdio.h>              int         main(void)
             | {             int ch;             while ((ch = getchar())
             | != -1) {                 if (ch > 0100 && ch < 0133)
             | ch = ch + 040;                 if (ch == 015) continue;
             | if (ch == 014) continue;                 // No need to
             | handle tabstop specially                 putchar(ch);
             | }         }
        
       | justsomeguy123 wrote:
       | Gource Visualization video which points to
       | https://www.youtube.com/watch?v=S7JB0mhrGCQ does not work
       | anymore.
       | 
       | > Video unavailable > This video is no longer available because
       | the YouTube account associated with this video has been
       | terminated.
        
         | danuker wrote:
         | We need to solve this problem.
         | 
         | YouTube is free to delete any account, even just to cut costs.
        
           | alar44 wrote:
        
           | wolverine876 wrote:
           | I assume Github, the host of the OP, can do the same. How
           | many people have entrusted their life's work to it?
        
           | cmeacham98 wrote:
           | I'm not sure what the problem to be solved here is. It
           | doesn't seem reasonable to force YouTube (or any other free
           | video host) to indefinitely store and host content.
           | 
           | If you want something to stay around on the internet it has
           | to take up space on somebody's drive and bandwidth on
           | somebody's network connection - and for sufficiently large
           | content like video you're going to have to do that yourself
           | or convince/pay someone you trust to do so on your behalf.
        
       | roansh wrote:
       | How would you feel if your commits become publicly available for
       | everyone to see forever?
        
         | pavon wrote:
         | That ship sailed nearly half a century ago. All of this source
         | code was previously licensed to research universities starting
         | in 1975. The earlier releases weren't under FLOSS license like
         | we know them today, but with the intent that researchers would
         | be reading, learning from, and modifying the code. And they
         | did! creating later BSD Unix releases with more open licenses
         | whose code was shared more widely under more permissive
         | licenses.
         | 
         | Finally, the people who created this repo are some of the
         | primary authors of the code. They wanted this to be in the
         | open.
        
         | jrochkind1 wrote:
         | Really proud to be a part of history.
        
         | e40 wrote:
         | Isn't it cool? I mean, being in the history of a project like
         | this... it could be around long after we are gone.
        
         | alar44 wrote:
         | Fine. You?
        
         | duxup wrote:
         | I hope everyone is ok with cursing....
        
         | ARandomerDude wrote:
         | This is the point of GitHub. Also Unix was(/is) a masterwork of
         | craftsmanship. Struggling to see a problem here.
        
       | projektfu wrote:
       | I love Spinellis' work on teaching reading of code.
        
         | PAPPPmAc wrote:
         | Diomidis Spinellis' "Code Reading: The Open Source Perspective"
         | is a thing I've wanted but didn't know existed, browsing it now
         | to hopefully recommend, thanks for the pointer.
         | 
         | I work with computer engineering students and often tell them
         | that reading more code would be good for them but have never
         | had a great generic but concrete suggestion for how to get
         | there.
         | 
         | The second best programming class I took in college was a
         | graduate elective and the _only_ code-reading-based course I
         | took or knew of being offered: a guided safari in the Linux
         | kernel sources where we had to make targeted changes for the
         | assignments. FTR, the best programming class was set up as "new
         | language in a different paradigm every few weeks, write one
         | small program that suits it and one small program that
         | doesn't," not incidentally taught by the same person (
         | https://en.wikipedia.org/wiki/Raphael_Finkel ).
        
       | dgrin91 wrote:
       | I like how Github shows it as infinity commits
        
         | deathanatos wrote:
         | What's up with that? There only seem to be 4, on HEAD?
        
           | caslon wrote:
           | Check the other branches.
        
             | deathanatos wrote:
             | I saw the other branches when I made the comment.
             | 
             | The commit count is -- usually -- the commit count from the
             | currently selected ref.
             | 
             | E.g., on a sample repo, "master" displays as 29,474
             | commits. "master^" displays as 29,473.
        
             | kevincox wrote:
             | I always expected that the commit count was for that
             | branch. I guess it is global?
        
           | [deleted]
        
         | ollien wrote:
         | Yeah, is that a bug? lol
        
           | mywittyname wrote:
           | Sounds like a overflow bug prevention mechanism.
           | 
           | There are an infinite number of infinities, so surely one of
           | them is the maximum possible commits in github.
        
             | kps wrote:
             | Git runs into problems with more than 2160 commits in a
             | repository.
        
       | ChrisMarshallNY wrote:
       | That's a _lot_ of work!
       | 
       | A true labor of love.
       | 
       | Thanks!
        
       | ninefathom wrote:
       | Anybody feel brave enough to try merging in SVR4?
       | 
       | https://github.com/dspinellis/unix-history-repo/blob/Researc...
       | 
       | https://github.com/illumos/illumos-gate/blob/9ecd05bdc59e4a1...
        
       | mprovost wrote:
       | This repo has been super useful as I've been writing a book that
       | teaches Rust by rewriting classic Unix utilities. I settled on
       | using the 4.4 BSD source as a base but having the whole history
       | available has been really interesting. Recently I came across a
       | bug in the 4.4 version of cat that wasn't fixed until a few years
       | later (in FreeBSD).
        
       | sydthrowaway wrote:
       | Who holds the canonical unix repo?
        
         | kps wrote:
         | There is no canonical Unix repository.
         | 
         | Unix (1969) predates source version control (1972).
        
           | throw0101a wrote:
           | > _IBM 's OS/360 IEBUPDTE software update tool dates back to
           | 1962, arguably a precursor to version control system tools. A
           | full system designed for source code control was started in
           | 1972, Source Code Control System for the same system
           | (OS/360). Source Code Control System's introduction, having
           | been published on December 4, 1975, historically implied it
           | was the first deliberate revision control system.[4] RCS
           | followed just after,[5] with its networked version Concurrent
           | Versions System. The next generation after Concurrent
           | Versions System was dominated by Subversion,[6] followed by
           | the rise of distributed revision control tools such as
           | Git.[7]_
           | 
           | * https://en.wikipedia.org/wiki/Version_control#History
        
           | sydthrowaway wrote:
           | Who owns the modern unix copyright?
        
       | ChrisArchitect wrote:
       | You don't see this every day.....
       | 
       | But you do see it every year for the last number of years
       | 
       | Some previous discussion from 3 years ago:
       | 
       | https://news.ycombinator.com/item?id=19429249
        
       ___________________________________________________________________
       (page generated 2022-06-16 23:00 UTC)