[HN Gopher] Decoded: GNU coreutils (2019)
       ___________________________________________________________________
        
       Decoded: GNU coreutils (2019)
        
       Author : pcr910303
       Score  : 211 points
       Date   : 2021-03-10 15:04 UTC (7 hours ago)
        
 (HTM) web link (maizure.org)
 (TXT) w3m dump (maizure.org)
        
       | gautamcgoel wrote:
       | I'm blown away by this project. What a great way to learn about
       | the coreutils, and also see how C is written in the real world!
       | I'm curious how the author made the diagrams explaining each
       | utility - did he use Inkscape?
        
       | mraza007 wrote:
       | OMG I'm so surprised I was going to post a question on HN
       | yesterday that how can I learn about GNU Coreutils and today I
       | wake up see this
       | 
       | What a coincidence!!!
       | 
       | Truly an amazing resource on GNU coreutils
        
       | ufo wrote:
       | The biggest takeaway for me is that I learned about the existence
       | of some utilities that I had never known were there. Specially
       | "factor" and "tsort".
        
         | rwmj wrote:
         | There's also "moreutils"[1] which is a set of useful additional
         | tools. "errno" is indispensable if you're a Linux programmer.
         | 
         | [1] https://joeyh.name/code/moreutils/
        
         | vram22 wrote:
         | There are many cool / useful less-known utilities in GNU /
         | Linux.
         | 
         | Check man7.org for good, though brief info on many of them.
         | 
         | I had explored many of them a while ago.
         | 
         | Maintained by Michael Kerrisk, author of The Linux Programming
         | Interface, a kind of reference bible for Linux APIs and system
         | calls.
         | 
         | Edit: many of which are used in making such utilities.
        
           | rustyminnow wrote:
           | Here's the list of coreutils pages for anybody else
           | interested: https://man7.org/linux/man-
           | pages/dir_by_project.html#coreuti...
        
           | MaxBarraclough wrote:
           | Don't forget _recutils_.
           | 
           | https://www.gnu.org/software/recutils/manual/A-Little-
           | Exampl...
           | 
           | https://en.wikipedia.org/wiki/Recfiles
        
       | dang wrote:
       | Discussed at the time:
       | 
       |  _Decoded: GNU Coreutils_ -
       | https://news.ycombinator.com/item?id=20328650 - July 2019 (55
       | comments)
        
       | ojnabieoot wrote:
       | Very nice work and much easier than trawling through the
       | repository.
       | 
       | Some ignorant and probably cliched musing: when I look at small
       | utilities like these I am always struck by a seeming distinction
       | between best practices for little C programs versus best
       | practices for large C applications (the author of the post
       | touches on this ad well).
       | 
       | In particular, the explicit flow (including goto) and "pedantic"
       | style is actually quite appropriate for something < 1000 lines
       | and where the expected behavior is extremely well understood. In
       | cases like pwd, mkdir, etc, trying to abstract too much is
       | arguably a mistake for maintainability and understanding.
       | 
       | I say all this as an immutable functional-first dev who hasn't
       | done much native code :) And I think the various type-safe /
       | memory-safe / etc versions of these tools are worth developing.
       | But there's something to be said about well-optimized native code
       | that clearly "does what it says on the box" in a way that's
       | accessible to anyone who understands basic Linux programming -
       | even if they can only contextually read C code.
       | 
       | (My only real gripe is typographic / linting related, mostly due
       | to being a whippersnapper).
        
         | kiwidrew wrote:
         | This is in keeping with the style of the original Unix
         | utilities.
         | 
         | Having a handful of global variables reduces the amount of
         | stuff being passed around from function to function; utilities
         | don't need to worry too much about free()ing dynamic
         | allocations, since that gets cleaned up on exit anyways; none
         | of the code has to be re-entrant, because each invocation of
         | the utility is running in its own process.
        
         | setpatchaddress wrote:
         | Could not disagree more about goto. Small programs always turn
         | into larger ones. And what you have at the end if you're not
         | from the beginning using practices appropriate for larger
         | programs is spaghetti code.
         | 
         | I'm not criticizing it in context -- a lot of this code dates
         | back to the mid 80's if I'm not mistaken. But always write new
         | code using scalable idioms.
        
           | overboard2 wrote:
           | If this program has remained small for 40 years, then maybe
           | not all small programs turn into larger ones.
        
           | ojnabieoot wrote:
           | I agree with you in general. But I think in this specific
           | case it's a bit more complicated: the downsides aren't as bad
           | as they normally would be, and the use of primitive flow
           | constructs arguably has an advantage in this domain:
           | 
           | POSIX and similarly stuffy requirements (even if "soft")
           | means that this code is fairly static. While there is some
           | bloat in the pragmas, etc., these applications are
           | necessarily slow to change and I think it's reasonable to say
           | that they won't suffer from _feature_ bloat anytime soon. So
           | the normal software risk considerations are a bit different
           | here. Further, any changes to the code will be fiercely
           | reviewed, and the individual programs are small enough that
           | increases in complexity will be quickly spotted. Relatedly,
           | these programs are small enough that, if a refactor to more
           | structured code were necessary, the work would be quite
           | feasible. So while the risks of goto are real in any C
           | program, in practice I think they're quite minimal here.
           | 
           | And I do think you're missing an advantage. These are core
           | userspace functions that perform safety- and security-
           | critical kernel interactions. So I definitively agree there
           | is a strong argument to use safe code, modern abstractions,
           | and so on. This is especially true for modern PCs that really
           | can afford to spend a few extra cycles creating a folder.
           | 
           | But a modern code construct, correctly applied, is only as
           | safe as the compiler. This is not guaranteed! A common
           | "gotcha" with buggy C compilers is inappropriately pruning
           | instructions because the compiler optimizes away a loop or
           | else statement. It is hardly a frequent issue but similar
           | bugs have shown up in recent gcc/clang releases. And in
           | particular core developers who are working on operating
           | systems are more likely to be using shaky C compilers.
           | 
           | Using gotos and ugly global state has the distinct advantage
           | that generated assembly tends to have less "surprises." If
           | there is a bug in the compiler it will be less well-hidden;
           | if there is a bug in the program then there is less mental
           | work between analyzing the C and analyzing the disassembly
           | for debugging.
           | 
           | Again, in general I think you're correct and that my argument
           | is ultimately more of a judgment call.
           | 
           | EDIT: I didn't really want to address any _structural_
           | advantages of goto for, e.g. exception handling via breaking
           | loops earlier, etc. I am not a domain expert enough to
           | comment appropriately but it does seem there are cases where
           | properly abstracted cleanup code in C is more spaghettified
           | than a goto: https://lkml.org/lkml/2003/1/12/203
        
             | not2b wrote:
             | If the flow graph doesn't have a clean nested structure,
             | this impedes compiler optimization. It can be possible to
             | normalize it, but this may require the compiler to clone
             | the code. Compilers are pretty good these days; if you've
             | experienced a C compiler "inappropriately" optimizing
             | something away the most likely cause these days is not a
             | compiler bug, but a software developer who doesn't
             | understand rules related to aliasing or undefined behavior.
             | 
             | I do agree that the specific use of goto to jump cleanly
             | out of several loops is appropriate: the problem is that C
             | lacks clean constructs for exiting named blocks. That would
             | be preferable to general goto and doesn't harm
             | optimization, the flow graph is still easy to analyze,
             | convert to SSA form and the like.
        
           | monocasa wrote:
           | I'd like to see the use of goto.
           | 
           | There's two 'allowed' uses in C that are common and represent
           | good code even today. goto error cleanup stubs, and goto in
           | virtual machine dispatch loops.
           | 
           | The size of the codebase doesn't really matter for those
           | cases; they're largely considered the idiomatic way to go
           | about the problems they're trying to solve.
        
             | not2b wrote:
             | The error cleanup role is handled in a number of other
             | languages (Ada, VHDL, Perl) by letting the programmer name
             | a block and having a statement that terminates that block
             | or (for a loop) goes to the next iteration, even if this
             | terminates multiple loops. The effect is similar to the C
             | goto way of doing that, but it's more controlled and easy
             | for compilers to deal with.
        
               | monocasa wrote:
               | Oh, for sure, other languages have different idiomatic
               | constructs that don't require such a heavy hammer as goto
               | to achieve s similar effect.
               | 
               | Even in C, if you're writing Microsoft only code, seh is
               | probably a better mechanism than goto error.
               | 
               | I'd argue that the defer statement in go (and the
               | surprising side effects of it, like that it's function
               | instead of block scope like you might otherwise expect)
               | ultimately come from trying to wrap this idiom in a
               | construct that's better supported by the language.
               | 
               | My point though is that in relatively standard, portable
               | C, there are valid, idiomatic use cases of goto, and it's
               | not quite so easy to say 'eww goto' in those very
               | specific circumstances.
        
             | robocat wrote:
             | I skimmed some Linux code the other day and noticed that
             | goto is used for more than those two situations. Maybe just
             | cruft...?
             | 
             | Search for retry: or handle_itb: in https://github.com/torv
             | alds/linux/blob/master/fs/ext4/resize...
             | 
             | Or fixleft: or copy: in https://github.com/torvalds/linux/b
             | lob/d158fc7f36a25e19791d2...
        
               | monocasa wrote:
               | Yeah, the retry piece is a bit more controversial. Some
               | people think that it's cleaner for code that's probably
               | already nesting loops, but I tend to break it apart in
               | different ways. That one I generally don't push too hard
               | in review, but require more tests to shore up confidence.
               | 
               | And frankly, the fix_left style code you see just isn't
               | modern idiomatic C, IMO. In a code review I'd have them
               | either combo of write a block comment explaining why it's
               | necessary to be weird and a lot of test cases for when
               | someone inevitably tries to rewrite it, or just rewrite
               | it in the first place.
               | 
               | Some of the areas of the Linux kernel aren't exactly
               | known for being the best written C (unfortunate as that
               | is) and you're seeing some of that.
        
       | psychoslave wrote:
       | "How is GNU `yes` so fast?" was already discussed on this topic:
       | https://news.ycombinator.com/item?id=14542938
        
       | gbajson wrote:
       | I have just spent 5 minutes trying to find any useful use cases
       | for 'vdir'. Does anyone of have idea why other 'ls' has been
       | created?
        
       | mshockwave wrote:
       | Very interesting way to visualize some of the most important
       | cornerstones in *nix systems
        
       | tyingq wrote:
       | Really lovely work. I'm curious if the png flowcharts are
       | generated from data, or hand drawn.
       | 
       | Edit: Also, some easter egg looking thing at the bottom right of
       | the page:
       | 
       | <div class="copyright col-md-6">
       | ##*#**##****#*#**/\##*###*****#**#*#*#**#******#**#*#*####*#*##*
       | 
       | </div>
       | 
       | Edit: Fixed asterisks, I think.
        
         | jedimastert wrote:
         | ##*#**##****#*#**/\##*###*****#**#*#*#**#******#**#*#*####*#*##
         | *
         | 
         | HN's formatter is having a _time_ with that many asterisks
         | 
         | I don't recognize the format, can someone help me out?
        
           | tyingq wrote:
           | It's 64 characters. I'd guess binary if that /\ wasn't in the
           | middle.
        
             | bluesign wrote:
             | seems like morse code
             | 
             | First part is maizure
             | 
             | Ps: hn messed up with stars :)
        
               | [deleted]
        
               | tyingq wrote:
               | Ah, good call. HN ate your asterisks, but yeah, the first
               | bit before the / is "maizure" in morse. Though having no
               | spaces between the letters makes for some
               | ambiguity...hard to decode the rest.                 ##
               | *# ** ##** **# *#* * == maizure
               | 
               | I can find some long words with a dictionary approach in
               | there, like:                 ARRIVE -> .-.-..-......-.
               | CLEVER -> -.-..-......-..-.       DESTINY ->
               | -......-..-.-.--       FENCED -> ..-..-.-.-..-..
               | MEMBER -> --.---.....-.       MISTER -> --.....-..-.
               | (etc)
               | 
               | But, too many variations that direction too.
        
               | JdeBP wrote:
               | You have certainly got further than people did in
               | https://news.ycombinator.com/item?id=17116855 .
        
               | tyingq wrote:
               | update...pretty sure the ending is "FSF.ORG"
               | 
               | Maybe an email address?
        
               | bluesign wrote:
               | Yeah millions of combinations, I tried but was not
               | patient enough.
               | 
               | for the curious: https://www.jbowman.com/remorse/
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2021-03-10 23:00 UTC)