[HN Gopher] Continuous Unix commit history from 1970 until today ___________________________________________________________________ Continuous Unix commit history from 1970 until today Author : FrankyHollywood Score : 188 points Date : 2022-06-16 14:04 UTC (8 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | danschuller wrote: | We have all this commit data at scale, it really feels like there | are interesting stories or lessons that could be extracted from | them. | | There's kind of the obvious operational stuff like: What are the | properties of commits that introduce bugs compared to those that | don't. Which type of commits are rarely changed and which are | more likely to be changed over time. But what I'd find even more | interesting is some insight into how we solve problems and how | well we're able to solve them. I guess part of the puzzle is | missing - the external requirements / environment that give rise | to some number of the commits. | DSpinellis wrote: | There is a series of conferences MSR -- Mining Software | Repositories -- with research papers looking at such questions. | http://www.msrconf.org/ In fact, I presented this work in the | 2015 MSR conference. | vandahm wrote: | You don't see this every day: | | https://github.com/dspinellis/unix-history-repo/blob/Researc... | | Is this B, or is it BCPL? What would have compiled this code back | in the day? | marcodiego wrote: | They had "auto" vars in 1970. WG14, the ISO work group that | maintains the C programming language specification, has just | recently discussed acceptance of __auto_type. | | EDIT: ops, the "auto" here means automatic allocation. | hoten wrote: | very weird that two characters - $( and $) - were used before { | and } | | did old keyboards not have curly braces or what? | kps wrote: | {} were added to the 1967 revision of ASCII, along with `|~ | and lower case. (EBCDIC never got them in the base character | set, only in alternate 'code pages'.) | pm215 wrote: | Wikipedia's article on B says that BCPL used := for assignment | and = for equality tests, whereas B used = for assignment and | == for equality. Assuming that's correct, this must be B code. | projektfu wrote: | It's B. BCPL has "LET MAIN() BE $(..." instead of "main $(...". | | Running B was a challenge on the PDP-7 but easier on the | PDP-11, apparently, because of the increase of memory size. The | linked document has an interesting history about compiling B to | threaded code, a form of interpreted code, and then to machine | language. B never really made the jump to a full-fledged | citizen because it quickly got replaced by C, although BCPL was | popular for a long time. | | https://www.bell-labs.com/usr/dmr/www/chist.html | Erlangen wrote: | So _auto_ is used as a keyword here. Maybe C inherits this | never-used auto from B? | veltas wrote: | auto stands for 'automatic', because such variables are | automatically allocated for each function invocation. In C it | became redundant because base types were added, and so the | base type could start the definition (auto was still | permitted with default base type of int until C99 I think). | auto in B is a bit like 'let', it starts a declaration, along | with 'extrn'. | mftb wrote: | Yea, I have to say, to me, this is cool. Glad to see this sort | of history being preserved. | judge2020 wrote: | Is that truly from 1970? For example, that commit's grandparent | seems to have been specifically crafted to use "Date: Thu, 1 | Jan 1970 00:00:00 +0000" https://github.com/dspinellis/unix- | history-repo/commit/185f8.... | anyfoo wrote: | That's 0 in Unix epoch time (guess why!), so seems more like | a missing timestamp than a crafted one. The fact that the | linked file does not have a 0 timestamp, but a slightly later | one, suggests it's valid, or at least intended to be valid. | Nition wrote: | I recall that in A Deepness in the Sky by Vernor Vinge, a | space sci-fi set in the far future, they're still using | Unix time underneath many many layers of abstractions, and | with their cultural context they guess that humanity must | have set it to start with the moment mankind first | travelled into space to land on the Moon. | anyfoo wrote: | Hah, plausible. Not far off timewise, and yet totally | wrong, but understandable how such a conclusion could be | made. | swatcoder wrote: | I don't know, but I love how clearly and concisely it expresses | what would later become ubiquitous as do-while and continue. | | That's poetry. Nice find. | stingraycharles wrote: | I love how thin the layer above assembly is: without knowing | B, is my interpretation correct that this function | effectively "inherits" the stack of the calling function? In | other words, rather than passing function arguments and let | the compiler deal with it, you're supposed to push the string | you want to lcase onto the top of the stack? | | Reminds me a lot of writing my own compiler/assembler in | university, where it's expected that all this happens | automatically nowadays. | anyfoo wrote: | Hmm, don't think so. The function does not operate on a | string, it seems to read a character using read() and write | it back, transformed, using write(). Given that the | function is named main, it's probably the top level | function anyway (from the programmer's point of view, often | the OS actually calls into a different function that is | part of the language runtime, e.g. _start, which in turn | calls main eventually, but that is usually hidden from the | programmer). | messe wrote: | No, that's not correct. It reads the string from standard | input. A C translation would look like this: | main() { int ch; while ((ch | = read()) != 4) { if (ch > 0100 && ch < | 0133) ch = ch + 040; if | (ch == 015) continue; if (ch == 014) | continue; if (ch == 011) { | ch = 040040; write(040040); | write(040040); } write(ch); | } } | | A more modern C version would look like: | #include <stdio.h> int main(void) | { int ch; while ((ch = getchar()) | != -1) { if (ch > 0100 && ch < 0133) | ch = ch + 040; if (ch == 015) continue; | if (ch == 014) continue; // No need to | handle tabstop specially putchar(ch); | } } | justsomeguy123 wrote: | Gource Visualization video which points to | https://www.youtube.com/watch?v=S7JB0mhrGCQ does not work | anymore. | | > Video unavailable > This video is no longer available because | the YouTube account associated with this video has been | terminated. | danuker wrote: | We need to solve this problem. | | YouTube is free to delete any account, even just to cut costs. | alar44 wrote: | wolverine876 wrote: | I assume Github, the host of the OP, can do the same. How | many people have entrusted their life's work to it? | cmeacham98 wrote: | I'm not sure what the problem to be solved here is. It | doesn't seem reasonable to force YouTube (or any other free | video host) to indefinitely store and host content. | | If you want something to stay around on the internet it has | to take up space on somebody's drive and bandwidth on | somebody's network connection - and for sufficiently large | content like video you're going to have to do that yourself | or convince/pay someone you trust to do so on your behalf. | roansh wrote: | How would you feel if your commits become publicly available for | everyone to see forever? | pavon wrote: | That ship sailed nearly half a century ago. All of this source | code was previously licensed to research universities starting | in 1975. The earlier releases weren't under FLOSS license like | we know them today, but with the intent that researchers would | be reading, learning from, and modifying the code. And they | did! creating later BSD Unix releases with more open licenses | whose code was shared more widely under more permissive | licenses. | | Finally, the people who created this repo are some of the | primary authors of the code. They wanted this to be in the | open. | jrochkind1 wrote: | Really proud to be a part of history. | e40 wrote: | Isn't it cool? I mean, being in the history of a project like | this... it could be around long after we are gone. | alar44 wrote: | Fine. You? | duxup wrote: | I hope everyone is ok with cursing.... | ARandomerDude wrote: | This is the point of GitHub. Also Unix was(/is) a masterwork of | craftsmanship. Struggling to see a problem here. | projektfu wrote: | I love Spinellis' work on teaching reading of code. | PAPPPmAc wrote: | Diomidis Spinellis' "Code Reading: The Open Source Perspective" | is a thing I've wanted but didn't know existed, browsing it now | to hopefully recommend, thanks for the pointer. | | I work with computer engineering students and often tell them | that reading more code would be good for them but have never | had a great generic but concrete suggestion for how to get | there. | | The second best programming class I took in college was a | graduate elective and the _only_ code-reading-based course I | took or knew of being offered: a guided safari in the Linux | kernel sources where we had to make targeted changes for the | assignments. FTR, the best programming class was set up as "new | language in a different paradigm every few weeks, write one | small program that suits it and one small program that | doesn't," not incidentally taught by the same person ( | https://en.wikipedia.org/wiki/Raphael_Finkel ). | dgrin91 wrote: | I like how Github shows it as infinity commits | deathanatos wrote: | What's up with that? There only seem to be 4, on HEAD? | caslon wrote: | Check the other branches. | deathanatos wrote: | I saw the other branches when I made the comment. | | The commit count is -- usually -- the commit count from the | currently selected ref. | | E.g., on a sample repo, "master" displays as 29,474 | commits. "master^" displays as 29,473. | kevincox wrote: | I always expected that the commit count was for that | branch. I guess it is global? | [deleted] | ollien wrote: | Yeah, is that a bug? lol | mywittyname wrote: | Sounds like a overflow bug prevention mechanism. | | There are an infinite number of infinities, so surely one of | them is the maximum possible commits in github. | kps wrote: | Git runs into problems with more than 2160 commits in a | repository. | ChrisMarshallNY wrote: | That's a _lot_ of work! | | A true labor of love. | | Thanks! | ninefathom wrote: | Anybody feel brave enough to try merging in SVR4? | | https://github.com/dspinellis/unix-history-repo/blob/Researc... | | https://github.com/illumos/illumos-gate/blob/9ecd05bdc59e4a1... | mprovost wrote: | This repo has been super useful as I've been writing a book that | teaches Rust by rewriting classic Unix utilities. I settled on | using the 4.4 BSD source as a base but having the whole history | available has been really interesting. Recently I came across a | bug in the 4.4 version of cat that wasn't fixed until a few years | later (in FreeBSD). | sydthrowaway wrote: | Who holds the canonical unix repo? | kps wrote: | There is no canonical Unix repository. | | Unix (1969) predates source version control (1972). | throw0101a wrote: | > _IBM 's OS/360 IEBUPDTE software update tool dates back to | 1962, arguably a precursor to version control system tools. A | full system designed for source code control was started in | 1972, Source Code Control System for the same system | (OS/360). Source Code Control System's introduction, having | been published on December 4, 1975, historically implied it | was the first deliberate revision control system.[4] RCS | followed just after,[5] with its networked version Concurrent | Versions System. The next generation after Concurrent | Versions System was dominated by Subversion,[6] followed by | the rise of distributed revision control tools such as | Git.[7]_ | | * https://en.wikipedia.org/wiki/Version_control#History | sydthrowaway wrote: | Who owns the modern unix copyright? | ChrisArchitect wrote: | You don't see this every day..... | | But you do see it every year for the last number of years | | Some previous discussion from 3 years ago: | | https://news.ycombinator.com/item?id=19429249 ___________________________________________________________________ (page generated 2022-06-16 23:00 UTC)