[HN Gopher] Decoded: GNU coreutils (2019) ___________________________________________________________________ Decoded: GNU coreutils (2019) Author : pcr910303 Score : 211 points Date : 2021-03-10 15:04 UTC (7 hours ago) (HTM) web link (maizure.org) (TXT) w3m dump (maizure.org) | gautamcgoel wrote: | I'm blown away by this project. What a great way to learn about | the coreutils, and also see how C is written in the real world! | I'm curious how the author made the diagrams explaining each | utility - did he use Inkscape? | mraza007 wrote: | OMG I'm so surprised I was going to post a question on HN | yesterday that how can I learn about GNU Coreutils and today I | wake up see this | | What a coincidence!!! | | Truly an amazing resource on GNU coreutils | ufo wrote: | The biggest takeaway for me is that I learned about the existence | of some utilities that I had never known were there. Specially | "factor" and "tsort". | rwmj wrote: | There's also "moreutils"[1] which is a set of useful additional | tools. "errno" is indispensable if you're a Linux programmer. | | [1] https://joeyh.name/code/moreutils/ | vram22 wrote: | There are many cool / useful less-known utilities in GNU / | Linux. | | Check man7.org for good, though brief info on many of them. | | I had explored many of them a while ago. | | Maintained by Michael Kerrisk, author of The Linux Programming | Interface, a kind of reference bible for Linux APIs and system | calls. | | Edit: many of which are used in making such utilities. | rustyminnow wrote: | Here's the list of coreutils pages for anybody else | interested: https://man7.org/linux/man- | pages/dir_by_project.html#coreuti... | MaxBarraclough wrote: | Don't forget _recutils_. | | https://www.gnu.org/software/recutils/manual/A-Little- | Exampl... | | https://en.wikipedia.org/wiki/Recfiles | dang wrote: | Discussed at the time: | | _Decoded: GNU Coreutils_ - | https://news.ycombinator.com/item?id=20328650 - July 2019 (55 | comments) | ojnabieoot wrote: | Very nice work and much easier than trawling through the | repository. | | Some ignorant and probably cliched musing: when I look at small | utilities like these I am always struck by a seeming distinction | between best practices for little C programs versus best | practices for large C applications (the author of the post | touches on this ad well). | | In particular, the explicit flow (including goto) and "pedantic" | style is actually quite appropriate for something < 1000 lines | and where the expected behavior is extremely well understood. In | cases like pwd, mkdir, etc, trying to abstract too much is | arguably a mistake for maintainability and understanding. | | I say all this as an immutable functional-first dev who hasn't | done much native code :) And I think the various type-safe / | memory-safe / etc versions of these tools are worth developing. | But there's something to be said about well-optimized native code | that clearly "does what it says on the box" in a way that's | accessible to anyone who understands basic Linux programming - | even if they can only contextually read C code. | | (My only real gripe is typographic / linting related, mostly due | to being a whippersnapper). | kiwidrew wrote: | This is in keeping with the style of the original Unix | utilities. | | Having a handful of global variables reduces the amount of | stuff being passed around from function to function; utilities | don't need to worry too much about free()ing dynamic | allocations, since that gets cleaned up on exit anyways; none | of the code has to be re-entrant, because each invocation of | the utility is running in its own process. | setpatchaddress wrote: | Could not disagree more about goto. Small programs always turn | into larger ones. And what you have at the end if you're not | from the beginning using practices appropriate for larger | programs is spaghetti code. | | I'm not criticizing it in context -- a lot of this code dates | back to the mid 80's if I'm not mistaken. But always write new | code using scalable idioms. | overboard2 wrote: | If this program has remained small for 40 years, then maybe | not all small programs turn into larger ones. | ojnabieoot wrote: | I agree with you in general. But I think in this specific | case it's a bit more complicated: the downsides aren't as bad | as they normally would be, and the use of primitive flow | constructs arguably has an advantage in this domain: | | POSIX and similarly stuffy requirements (even if "soft") | means that this code is fairly static. While there is some | bloat in the pragmas, etc., these applications are | necessarily slow to change and I think it's reasonable to say | that they won't suffer from _feature_ bloat anytime soon. So | the normal software risk considerations are a bit different | here. Further, any changes to the code will be fiercely | reviewed, and the individual programs are small enough that | increases in complexity will be quickly spotted. Relatedly, | these programs are small enough that, if a refactor to more | structured code were necessary, the work would be quite | feasible. So while the risks of goto are real in any C | program, in practice I think they're quite minimal here. | | And I do think you're missing an advantage. These are core | userspace functions that perform safety- and security- | critical kernel interactions. So I definitively agree there | is a strong argument to use safe code, modern abstractions, | and so on. This is especially true for modern PCs that really | can afford to spend a few extra cycles creating a folder. | | But a modern code construct, correctly applied, is only as | safe as the compiler. This is not guaranteed! A common | "gotcha" with buggy C compilers is inappropriately pruning | instructions because the compiler optimizes away a loop or | else statement. It is hardly a frequent issue but similar | bugs have shown up in recent gcc/clang releases. And in | particular core developers who are working on operating | systems are more likely to be using shaky C compilers. | | Using gotos and ugly global state has the distinct advantage | that generated assembly tends to have less "surprises." If | there is a bug in the compiler it will be less well-hidden; | if there is a bug in the program then there is less mental | work between analyzing the C and analyzing the disassembly | for debugging. | | Again, in general I think you're correct and that my argument | is ultimately more of a judgment call. | | EDIT: I didn't really want to address any _structural_ | advantages of goto for, e.g. exception handling via breaking | loops earlier, etc. I am not a domain expert enough to | comment appropriately but it does seem there are cases where | properly abstracted cleanup code in C is more spaghettified | than a goto: https://lkml.org/lkml/2003/1/12/203 | not2b wrote: | If the flow graph doesn't have a clean nested structure, | this impedes compiler optimization. It can be possible to | normalize it, but this may require the compiler to clone | the code. Compilers are pretty good these days; if you've | experienced a C compiler "inappropriately" optimizing | something away the most likely cause these days is not a | compiler bug, but a software developer who doesn't | understand rules related to aliasing or undefined behavior. | | I do agree that the specific use of goto to jump cleanly | out of several loops is appropriate: the problem is that C | lacks clean constructs for exiting named blocks. That would | be preferable to general goto and doesn't harm | optimization, the flow graph is still easy to analyze, | convert to SSA form and the like. | monocasa wrote: | I'd like to see the use of goto. | | There's two 'allowed' uses in C that are common and represent | good code even today. goto error cleanup stubs, and goto in | virtual machine dispatch loops. | | The size of the codebase doesn't really matter for those | cases; they're largely considered the idiomatic way to go | about the problems they're trying to solve. | not2b wrote: | The error cleanup role is handled in a number of other | languages (Ada, VHDL, Perl) by letting the programmer name | a block and having a statement that terminates that block | or (for a loop) goes to the next iteration, even if this | terminates multiple loops. The effect is similar to the C | goto way of doing that, but it's more controlled and easy | for compilers to deal with. | monocasa wrote: | Oh, for sure, other languages have different idiomatic | constructs that don't require such a heavy hammer as goto | to achieve s similar effect. | | Even in C, if you're writing Microsoft only code, seh is | probably a better mechanism than goto error. | | I'd argue that the defer statement in go (and the | surprising side effects of it, like that it's function | instead of block scope like you might otherwise expect) | ultimately come from trying to wrap this idiom in a | construct that's better supported by the language. | | My point though is that in relatively standard, portable | C, there are valid, idiomatic use cases of goto, and it's | not quite so easy to say 'eww goto' in those very | specific circumstances. | robocat wrote: | I skimmed some Linux code the other day and noticed that | goto is used for more than those two situations. Maybe just | cruft...? | | Search for retry: or handle_itb: in https://github.com/torv | alds/linux/blob/master/fs/ext4/resize... | | Or fixleft: or copy: in https://github.com/torvalds/linux/b | lob/d158fc7f36a25e19791d2... | monocasa wrote: | Yeah, the retry piece is a bit more controversial. Some | people think that it's cleaner for code that's probably | already nesting loops, but I tend to break it apart in | different ways. That one I generally don't push too hard | in review, but require more tests to shore up confidence. | | And frankly, the fix_left style code you see just isn't | modern idiomatic C, IMO. In a code review I'd have them | either combo of write a block comment explaining why it's | necessary to be weird and a lot of test cases for when | someone inevitably tries to rewrite it, or just rewrite | it in the first place. | | Some of the areas of the Linux kernel aren't exactly | known for being the best written C (unfortunate as that | is) and you're seeing some of that. | psychoslave wrote: | "How is GNU `yes` so fast?" was already discussed on this topic: | https://news.ycombinator.com/item?id=14542938 | gbajson wrote: | I have just spent 5 minutes trying to find any useful use cases | for 'vdir'. Does anyone of have idea why other 'ls' has been | created? | mshockwave wrote: | Very interesting way to visualize some of the most important | cornerstones in *nix systems | tyingq wrote: | Really lovely work. I'm curious if the png flowcharts are | generated from data, or hand drawn. | | Edit: Also, some easter egg looking thing at the bottom right of | the page: | | <div class="copyright col-md-6"> | ##*#**##****#*#**/\##*###*****#**#*#*#**#******#**#*#*####*#*##* | | </div> | | Edit: Fixed asterisks, I think. | jedimastert wrote: | ##*#**##****#*#**/\##*###*****#**#*#*#**#******#**#*#*####*#*## | * | | HN's formatter is having a _time_ with that many asterisks | | I don't recognize the format, can someone help me out? | tyingq wrote: | It's 64 characters. I'd guess binary if that /\ wasn't in the | middle. | bluesign wrote: | seems like morse code | | First part is maizure | | Ps: hn messed up with stars :) | [deleted] | tyingq wrote: | Ah, good call. HN ate your asterisks, but yeah, the first | bit before the / is "maizure" in morse. Though having no | spaces between the letters makes for some | ambiguity...hard to decode the rest. ## | *# ** ##** **# *#* * == maizure | | I can find some long words with a dictionary approach in | there, like: ARRIVE -> .-.-..-......-. | CLEVER -> -.-..-......-..-. DESTINY -> | -......-..-.-.-- FENCED -> ..-..-.-.-..-.. | MEMBER -> --.---.....-. MISTER -> --.....-..-. | (etc) | | But, too many variations that direction too. | JdeBP wrote: | You have certainly got further than people did in | https://news.ycombinator.com/item?id=17116855 . | tyingq wrote: | update...pretty sure the ending is "FSF.ORG" | | Maybe an email address? | bluesign wrote: | Yeah millions of combinations, I tried but was not | patient enough. | | for the curious: https://www.jbowman.com/remorse/ | [deleted] ___________________________________________________________________ (page generated 2021-03-10 23:00 UTC)