[HN Gopher] A project with a single 11,000-line code file ___________________________________________________________________ A project with a single 11,000-line code file Author : todsacerdoti Score : 236 points Date : 2022-04-03 17:41 UTC (5 hours ago) (HTM) web link (austinhenley.com) (TXT) w3m dump (austinhenley.com) | coldcode wrote: | I worked on untangling a 29,000 line single .c file MacOS app in | the mid-1990's that featured a 14,000 line event loop function. | Fun. | saulrh wrote: | At my first job out of college, there was a "utils" file with | IIRC over 100k LoC. Nearly every file in the codebase imported | it. This was in _Perl_. That single import statement would | increase the time to start anything by upwards of three seconds. | One of the best things I did for my efficiency was to factor out | subsets of functionality that didn 't need any of those utils, | since those subsets would run unit tests in a tenth of a second | instead of three to five seconds. | | All of which is to say, by all means argue about whether colossal | files are acceptable software engineering, sometimes that fight | takes a back seat to "a double-digit percentage of the company's | CPU and memory are wasted on parsing and loading this file in | literally every new process". | cameronperot wrote: | Magboltz [1] is a single >95k line Fortran file (although, I'm | not sure if it's multiple files combined into one before upload). | | [1] https://magboltz.web.cern.ch/magboltz/ | voakbasda wrote: | It really speaks to the state of software engineering when there | are ample comments here defending this practice. This is | virtually indefensible in my book, as it screams technical debt | and strongly suggests that there are much deeper issues hiding in | that codebase. Personally I would not be willing to work on it | without first addressing those issues. | postalrat wrote: | And that's fine. Nobody is qualified to work on every piece of | software. | avar wrote: | It speaks to the state of software engineering that people are | complaining about file length, instead of something meaningful | like cyclomatic complexity. | | A 10k single line program can be easier to understand and | better organized than an overly abstracted mess strewn across | multiple files, but which checks all the "best practice" | checkboxes. | dj_mc_merlin wrote: | Do you think the people who wrote this were measuring | cyclomatic complexity? Obviously you can have >10k LOC file | that is well-maintained and useful.. but there's ample | evidence in the post that this was _not_ one of those, but a | machine held together by duct tape, spit, and hope. | chillingeffect wrote: | This more of a "rubber hits the road" than a theoretical forum. | There is a time and a place fir stuff like this. It's not a | reflection of the bigger picture of sw. | pkrumins wrote: | All software projects should be a single, self-contained file. | bavell wrote: | And on a single line of you're a real pro! | pkrumins wrote: | Single line programs is my expertise. :) | Jtsummers wrote: | Most of the comments here seem to be focusing on Austin's | reference to it as an 11k line of code _file_ , but if you read | the article it sounds a lot more like it's _both_ an 11k line of | code file _and_ an 11k line of code _procedure_. That is, that | there are no sub-procedures, just one straight execution through | the whole thing. If you 've only encountered large-ish source | files like this but with reasonable sets of functions inside, | that's a far cry from finding an 11k line of code procedure | itself. The former is often justifiable (though perhaps a | stretch), the latter is almost never justifiable. It's just | garbage. | quickthrower2 wrote: | Think of all your code as a single file with line separators, | file separators and a special UI that considers both to present | these in a text editor. | | The existence of large files is mostly just a style issue. | | Text editors with different or lets say more semantic interfaces | to the code would not care about file size. | | You could then happily have 1M loc in a single file. | | You would care about it as much as you care about how the code | was laid out on sectors and pages on your HDD. | scoresmoke wrote: | TypeScript's checker.ts is a 2.65 MB file containing 44,932 lines | at the moment. | | https://github.com/microsoft/TypeScript/blob/main/src/compil... | | Does anyone know why and how they maintain it? | razh wrote: | Unlike the OP's file, there's a rather substantial test suite | and massive corpus of TypeScript code to work with, so at the | very least, you'd have some grumpy people knocking on your door | if you did something that negatively impacted the greater | ecosystem. | | Some documentation from Orta Therox on the checker: | | https://github.com/microsoft/TypeScript-Compiler-Notes/blob/... | leros wrote: | This reminds me of an app we had at work for static content, | written in JSP of course. It was made so designers and UI | developers could make mostly static content pages. | | Someone had a good idea to make a header.jsp template for common | header stuff. | | But it was hilarious. The file essentially became a giant if-else | condition with a few hundred conditions like "if path == | 'some_page'" followed by CSS and sometimes JavaScript for that | page. | | Absolutely horrendous. | yk wrote: | The most strangely maintainable code, though I should probably | put maintainable in scare quotes, I have ever seen was a | astrophysics code that calculated the changes to spectrum during | interactions with background fields. That thing hat two loong | nested loops, in the outer loops it calculated local backgrounds, | and the inner loop was basically a Euler solver that used the | backgrounds from the outer loops. | | The outer loop was something like 4 kLOC, and consisted of blocks | where there were first 20 lines of loadTable(filename) calls, | then a call to calculateLosses( <all the tables just loaded> ) | and then freeTable( <just loaded tables> ) calls. The inner loop | was a little bit of setup and then a very long part where all | those losses would be subtracted from the spectra. | | The funny thing was, that once you got the structure, that code | was actually not that bad. However, I told my boss several times | that the second something comes along and doesn't exactly fit | into that pattern the entire thing will blow up, and was always | told that they maintain that code for 15 years and that didn't | happen yet. | whatever1 wrote: | Debugging a single file is much easier compared to debugging a | tangled mess of interconnected .h, .c, .tcc files with include | directives that only work in a specific sequence and with a | specific compiler. | | Fix your include/import systems before preaching for modularity. | rc_mob wrote: | man i really love the look and feel of that guys website. its | perfectly simple. | gwbas1c wrote: | > What is the moral of the story? | | Either figure out how to improve the situation, or leave. | kazinator wrote: | > _Once I dared to clean this up and reuse the authentication | response, but it broke everything._ | | Yet! Those non-programming people somehow managed to add their | little requirements over the years, without breaking the other | forms? | | There is probably more to the story. Like that, I suspect, users | who needed to produce a certain form probably had private, years- | old copies of the program that they used, impervious to | subsequent changes. | Jtsummers wrote: | I'd wager they _did_ break those other forms, the whole thing | had its own long list of bug reports per the writeup. They just | didn 't break it for themselves. | [deleted] | tonyedgecombe wrote: | Guilty. I wince wrote a win32 app in a single 30,000 line C++ | file. | | It was an interesting exercise but I'd never do it again. | thriftwy wrote: | An early 2000s web analytics tool awstats was a 500kb Perl file. | It was surprisingly easy to modify and hard to break - I spent a | lot of time adding SEO goodies to it. | lars-b2018 wrote: | Well, this is not necessarily a bad thing. If it was approachable | by a non-IT person in HR, and business rules could be updated | without contacting IT and waiting, then more power to them. I | have seen this sort of thing developed as a coping mechanism | because the official IT team could not be used, either due to | time, cost, priority, or whatever. Also, even being in IT, 1 | multi thousand line file can be a lot more manageable to work in | vs. a dozens of smaller files where it's not clear where to look | without being in an IDE. | choletentent wrote: | If well done, single file projects are not bad. They save a lot | boilerplate code. It is also easier to find things, since it is | all in the same file. | | EDIT: I'll go even further. Programmers who don't like long files | are probably using the scrollbar to navigate around the file. Vim | saves me from that bad habit. | bo0tzz wrote: | What programming language requires 'a lot' of boilerplate code | to use multiple files? That sounds awful. I don't think the | argument for things being easier to find goes up either, with a | tool like grep. | rwmj wrote: | Perl XS (the system used to interface with C) requires module | == file, so if you have a particularly large module then it | just has to live in a single file. Here's one: | $ wc -l perl/lib/Sys/Guestfs.xs 11930 | perl/lib/Sys/Guestfs.xs | | Worse still, this expands to C which can be large and takes a | noticable time to compile: 30019 | perl/lib/Sys/Guestfs.c | choletentent wrote: | You don't need to go far. In C, function prototypes in header | files are boilerplate ;) | irrational wrote: | I once wrote a sql query that was (well, is, since it is still in | use) 5,000 lines long. It still ran in less than a millisecond, | but I was surprised how long it really was once it was finished | and tested. | lkrubner wrote: | I've never written a file with 11,000 lines of code, but I have | often built Clojure projects like this, with everything in one | file. I think I might have once had a file with 4,000 lines of | code. Maybe 5,000? A complete system might be 5 apps, working | together, each made of one large file. It does help with some | things. Especially if I try to on-board another programmer, if | they don't know Clojure very well, using one file means they | don't ever get tripped up by name spaces, instead, they just open | one file, and then they can load it into the REPL and start | working. I would not recommend this style for every project, but | it does offer a kind of simplicity for the projects I work on. | [deleted] | kevin_thibedeau wrote: | WolfSSL has this 50K SLOC doozy: | | https://github.com/wolfSSL/wolfssl/blob/master/src/ssl.c | neurocat123 wrote: | A brilliant tool I once worked with is TetGen; it takes a hollow | 3D shape and creates a volumetric, space-filling mesh of the | inside using tetrahedra. Most of what is TetGen is in once giant | C++ file, clocking in at 36,566 raw SLOC. | | https://github.com/libigl/tetgen/blob/master/tetgen.cxx | mtoddsmith wrote: | But it's a class with lots of small methods. Maybe not the same | as the single large VB script the OP described. | invalidname wrote: | We got many of these including a 13k LoC file in our OSS project. | Yes, it isn't ideal but sometimes for performance and practical | reasons these things grow over time. | cookiengineer wrote: | Back in the days at Zynga, there was this ritual that new members | of the STG (Shared Tech Group, which developed the game engine | stack) had to try to refactor the road logic code. | | Suffice it to say, it's a 28k LOC file that was so bad, it could | even hold up in court as evidence that a South American company | stole the code of Zynga's -ville games. We could reproduce each | and every single bug and its effects 1:1 in their games, with all | the crashing scenarios that were easy to reproduce, hard to | debug, and almost impossible to fix. | | Once you dig into the hole of depth sorting and being smart by | "just slicing" everything into squared ground tiles on the fly, | there's no way out of that spaghetti code anymore. | | Fun times, was always a joy seeing people give up to a single | code file. The first step to enlightenment was always resignation | :) | iaaan wrote: | To be fair, the first step towards refactoring is understanding | the existing code -- ideally, knowing everywhere it is used, | all of its behaviors, and importantly, its history, so that you | don't break anything, and so that you don't reintroduce bugs | that have already been fixed over the years. Or, in lieu of all | that, a robust automated test suite. | | This cannot be done with a file containing 28k lines of code. | That is an insurmountable task. They may as well have been | asked to start from scratch and build a new engine. | | I'm curious what the purpose of this ritual was. Was it just | hazing, or was the thought that someone might actually be able | to accomplish this? | lupire wrote: | A 28Kloc file can be modular. | WalterBright wrote: | It's impossible to refactor spaghetti code without a | comprehensive test suite. But you can do it with a test suite - | I've done it with large code bases. | lanstin wrote: | You sometimes can. Maybe for any legacy code base someone | could, but I have tried and failed on more than one occasion. | Some people's thought process is just perversely different to | mine and I keep feeling, oh, this is the layer where that | happens, but no every time I have an aha moment I am | disillusioned. | WalterBright wrote: | If you don't have a test suite, you can't know if you're | making progress or making things worse. | | Learned repeatedly from painful experience. | manquer wrote: | To develop a comprehensive test suite can sometimes be | hard, especially for code that deals with say | concurrency, multi threaded code , locks , 2d/3d physics | , video , analog , hardware related , procedurally | generated or ML (meta-language ) and the other ML | (machine learning) etc. | | A lot of edge cases and race conditions would easily slip | through, also a different set of edge cases or race | conditions you never considered and therefore never | tested for in your first version could pop up in your | rewrite. | WalterBright wrote: | Of course. But dealing with that is why we get paid the | big bucks. | | I've dealt with concurrency issues. grep is a handy tool | to find related synchronization code, then I try to | replace it with an encapsulation. In general, I look for | things I can replace with algorithms, and things I can | encapsulate. And so on. | klyrs wrote: | It's funny, I've got a bit of a penchant for big ugly | refactors, and I'd love to sink my teeth into this thing (for a | reasonable fee). But, it's Zynga. I can't picture a number that | would make it worth my while. | hahamrfunnyguy wrote: | I worked on a project like this once. If I remember correctly, it | was even larger than 11,000 lines. ASP classic project with VB- | script. We tried to get the company to do an overhaul, but it was | more cost effective to try building it into the existing system. | | After the better part of a week becoming acquainted with the | code, I found a suitable integration point. Luckily for me, the | new feature being requested didn't depend too much on the | existing code so I didn't have to make too many modifications to | the existing code. I added the entry point to the new section | along with some comments describing how things worked and some | ascii art of a dragon. In the end, the new feature worked great | and the customer was very happy with the results. | | Some years later, I was working for a different consulting firm | and that project surfaced again. This time it was being re- | written in ASP .NET after being passed around to a couple of | different off-shore development teams. My coworker was working on | it and asked me if I had written a specific piece of code in | index.asp. I took a look and we both had a laugh, because my | ascii survived after all those years! | seancoleman wrote: | Moral of the story: no engineer worth their salt _wants_ to work | on a "codebase" like this, so as an independent contractor, you | can virtually name your price if you're willing to wade through | the mess and solve acute problems. | rvnx wrote: | It depends how you see it: I see a project that is quite | successful since it's running in production for mission-critical | needs and the code is solid enough that even non-programmers can | do improvements to it | semitones wrote: | That's like running down the side of a very steep mountain and | calling it a successful endeavor before you've reached the | bottom | doubled112 wrote: | Depends on your measure of success. Given a steep enough | slope, you are almost guaranteed to reach the bottom one way | or another. | m1ckey wrote: | The .NET runtime GC is a 47k C++ file. | | https://github.com/dotnet/runtime/blob/main/src/coreclr/gc/g... | rahimiali wrote: | Python's main interpreter loop is a single 4k line function. | | https://github.com/python/cpython/blob/main/Python/ceval.c#L... | [deleted] | TimTheTinker wrote: | Someone please tell me this is transpiled from a separate | project. | liversage wrote: | It's originally written in LISP and this is why it's a single | C++ file. However, I believe that it's now being maintained | in its C++ form. | phyrex wrote: | If I remember correctly it was in fact written in Common | Lisp; the output was originally that file but it may have | been modified since. You can probably google the truth with | those breadcrumbs :) | [deleted] | antirez wrote: | Code quality, internal modularity and clean interfaces between | the parts don't have anything to do with how it is split into | files. | albertopv wrote: | In 2012 I worked on a jsp web app project, we replaced another | consultant, so the project was inherited. A monstrous jsp page, | of 13k lines IIRC, once traspiled to pure java hit the max length | for a class method, 65535 bytes. | throwaway71271 wrote: | i used to work on 40-50k line files with one function and a bunch | of gotos in perl, in a multi billion dollar company | | its fine | | you just binary search your way into it, put print "AAAA" in the | middle, see if its printed, then put it in the half of the half | and try agian. | | emacs couldnt even find the bracket ending of the if condition | (not the block, the condition..), have you ever seen if | conditions(again, not the block) that spans your whole screen? | | its not as bad as you think, it made me realize we take code very | seriously, but its actually ok, 10k line file 100k line file, | whatever.. its all the same | veltas wrote: | This is the right attitude, sometimes you really do need the | duct tape approach. | thenipper wrote: | I once refactored a 15k plus line sql file. Fun times | pfraze wrote: | I'm guessing / hoping there was a lot of data in there as | INSERT statements. | | My horror story was being asked to do maintenance on PHP sites | written in the early days of PHP. Hundreds - maybe thousands - | of PHP files with copy-pasted HTML and intermingled logic. As | far as I can tell, the idea of instantiating a whole MVC | framework from a single entrypoint file came after that | particular site was created, so every possible page was its own | entrypoint with its own boilerplate. Source control also seemed | to predate this project, so you had plenty of .old.php and | .old.v2.php files. | | Programming at webdev agencies is a challenging experience. | userbinator wrote: | I remember many years ago coming across a reimplementation of the | server side for a popular MMORPG of the time, reverse-engineered | from the client (which was Flash) by what was likely a teenager | --- it was over 100k lines in a single file, written in Visual | Basic. Global variables everywhere, short names, and not even | indentation. All the account data was stored in flat files, there | was no actual DB. No "best practices" at all. Yet, not | surprisingly, it worked pretty well and was actually not | difficult to modify --- Ctrl+F would easily get you to the right | place in the code. | | I guess the moral is, never underestimate what determination and | creativity can do, and always be skeptical when someone says | there's only one best way to do something. | gofreddygo wrote: | Funnily enough, I have recently had great success by reversing | the "best practices" on a distributed "micro services" | architecture application into a single big Java file. | | Best practices were the usual suspects DRY, IOC, SQL + NoSQL, | separation of concerns, config files over code, composition | over inheritance, unexplainable overlapping annotations, dozens | of oversimplified components doing their own thing, and some | $something_someone_read_on_a_medium_post | | The Single Java File was around 500 lines no db, lots of | globals, a dozen or so classes and some interfaces, Threads for | simulating event based concurrency, generous use of Java queues | and stacks but i specifically made it static with Zero dynamic | hashmaps. | | It actually runs in my IDE, I can understand what the hell the | product is supposed to do what component is doing more than it | should and more valuable was to predict what could break if I | change that value in the helm chart from 5.0 to 5.1. | | It is quite useful and pleasing, I can actually reason about | things and I have new found use and appreciation for Type | Systems and compile errors. And I can write tests that run in | under 3seconds. | teaearlgraycold wrote: | Having the whole project actually in one project is critical. | I think some of these "best practices" are actually very | useful when applied with caution. But you sometimes need to | break the rules. Everything should be optimized for developer | convenience. Convenience in deployment. Convenience in | debugging. Convenience in refactoring. Only do what HN and | FAANG says is "right" when you need to. | sillysaurusx wrote: | Ironically, this is a description of hacker news itself. | https://github.com/shawwn/arc/blob/arc3.1/news.arc | | (HN has indentation, though.) | | It's important to realize that this is good design. It's hard | to separate yourself from the time you live in, but the rewards | are worthwhile. | lupire wrote: | 2600lines is not 100k lines. | exdsq wrote: | The only readability issue I have with that is the functions | expected arguments. Add some types and I'd be very happy to | work on it. I believe Facebook uses a single directory of | files now as best practice? With the file names including | namespaces. That was an HN comment from ages ago so could be | wrong or misinterpreted. | sillysaurusx wrote: | pg's new lisp, Bel, has something close to typed arguments: | (def add1 (x|int) (+ x 1)) | | http://www.paulgraham.com/bel.html | | I've been implementing it for a couple years now, though | not seriously till the past couple months. There are some | interesting (and overlooked) ideas in Bel. | | Bel is sort of the limit case of generality. For example, | you might expect the "type" above to be a separate kind of | object, the way that types are separate kinds of things in | TypeScript. | | But in fact, it's simply a function that receives the | argument and can throw an error. So for example, you can do | something like: (def positive (x) | (if (< x 0) (err 'negative) x)) (def sqrt | (x|positive) ...) | | I just wish he'd solved keyword arguments as thoroughly as | every other kind of argument. There are hints that it was | always in the back of his mind. Though it's true he never | needed them, so that's probably why he never made them. | gfunk911 wrote: | This is a good point, but there's also a key difference. | | There's a big difference between "code being in one file" and | "code being in one function." It sounds like the OP had | something reasonably close to "one function," whereas the HN | code has a lot of (what appear to be) small well designed | methods. | wefarrell wrote: | You can get away with a lot on a single developer project and | best practices aren't in place solely to make code functional. | | That application would likely fall apart if multiple developers | of with diverse backgrounds had to maintain it and add new | features. | suifbwish wrote: | To be fair even with using best practices, code can still | fall apart with multiple developers from diverse backgrounds. | userbinator wrote: | I don't know what you mean exactly by "diverse backgrounds" | and it doesn't matter in this case either, because there were | definitely multiple people working on it (although the | initial version was the work of one.) They effectively used a | forum thread as source control, and just attached their | modified versions to the posts. | formerly_proven wrote: | It always seemed surprising to me how some of the big Oblivion | and Skyrim mods would get by with fairly few bugs despite there | being no way to have automated tests and some of them having | 10k lines of scripting (or much more in some cases) spread | around dozens or hundreds of quests (quests in the CE engine | are not just the quests you as a player see, but also a huge | number of invisible quests because quest state machines and | associated scripting is how scripting works). | theobeers wrote: | I think it makes sense. Modders get a kind of obsessive | testing from their communities (including themselves) that | devs in most commercial contexts couldn't dream of. Skyrim | mods are, if anything, correcting for bugs in the game. | tokamak-teapot wrote: | Presumably VB.NET? Because the VB6 IDE wouldn't let you write | more than 65534 lines [1]. Don't ask how I discovered this. | | [1] https://docs.microsoft.com/en-us/previous- | versions/visualstu... | vsareto wrote: | >Don't ask how I discovered this. | | Like most of the world at one point, it ran on Excel 2003 | | /s | Forge36 wrote: | I too learned this lesson the hard way. | [deleted] | swayvil wrote: | Ah, that takes me back. Commodore 64 freeware games written in | Basic. | | Ya, you could just go in there and mess with the code all over | the place. | ohCh6zos wrote: | My company has REST client that is a single 9,000 line Go file | that I have nightmares about. By my estimate it is really a ~300 | line program written by someone who hated DRY. | ZiiS wrote: | More likely it was autogenerated from the API specs, if it | hasn't been edited too much you might be the hero who made | regenerating it part of the build. | ohCh6zos wrote: | That was my first thought too, but if there's a generator for | this it has been lost to time and retirements. | unfocussed_mike wrote: | One of the things I often tell people is that if you hear someone | say "Yes! It broke!" that person is probably an engineer. | | But the moral of the story in the article is that unfortunately | some things break much less easily than we'd like. | activitypea wrote: | I don't get the obsession with file length. What's the benefit of | having 100 files with one 50-line function per file, over having | a 5000 file with 100 functions? Obviously not counting extreme | cases where the file size would break some editors' buffers | enneff wrote: | I think the problem in this case was that the entire file was | the script that ran top to bottom. It's not so much that the | file was big, but that the function was huge and impossible to | reason about. | | I agree that obsessing over file length is it's own kind of | anti pattern. I have had colleagues who insist on putting every | little thing in a different file and that is its own special | kind of hell. | perlgeek wrote: | Usually (but not always) a single, huge file points towards | missing structure, missing abstractions, missing boundaries | that aid with understanding. | | If it were a huge, single file, with very understandable | modularity within that file, likely nobody would've bothered to | write a blog post about it :-) | zshrdlu wrote: | Code could be well-modularized in one single file, of course. | But we don't have the tools to write code like that (editors | and languages basically). | honkycat wrote: | Ability to structure your code base hierarchically | | Ability to search through your codebase by file name | | Ability to hide irrelevant information and expose a higher | level API through private functions | CamperBob2 wrote: | None of those are necessarily driven by file-level | organization, though, except for the one about the file names | themselves. | | My mortgage is paid by a 50 kLoC C program with a single | 11000-line _function_. I 'm always blown away by how many so- | called "code editors" can't give me a simple list of C | functions in the file, the way BRIEF could in 1987. | | Few things annoy me more than having to trudge through a | codebase with hundreds of .c files, inevitably all with | 8-character filenames. Any day when I have to break out | Eclipse to navigate an unfamiliar project is an official Bad | Day At Work. | chakkepolja wrote: | * Editor support for perfect semantic navigation may not be | taken granted | | * Compiler support for function-level incremental may not be | taken granted | | * Editor shows a nice file tree (although you can do that with | symbols too) | | * Working with git is easier | | * Reading code on site like GitHub is easier | montenegrohugo wrote: | Try debugging a single 10k loc file versus fifty small modules | where each takes care of a distinct part of the logic. | jokethrowaway wrote: | I'm not too bothered by the single 10k loc file (and I've | seen plenty of files with thousands of lines). I would aim at | files in the range of 200-300 LOC | | If you split it, it's crucial that you're splitting the logic | in the right way (if the modules are too small, they'll just | waste your time) and that you're making sure references can | be easily traced (eg. if you have modules with some DI system | which prevents references from being recognised, as it | happens frequently in certain node.js enterprise | applications). | tedunangst wrote: | But what's the difference between file1.c ... file50.c vs cat | file*.c > onefile.c? | timando wrote: | I imagine the 50 files have meaningful names. Just kidding, | no-one knows how to name anything. | userbinator wrote: | As someone who did a bit of enterprise Java, I much prefer | the former. Jumping around between lots of tiny files and not | being able to see where the actual work happens because it's | spread everywhere is a debugging nightmare. | heavenlyblue wrote: | I find IDEs work better with multiple files (i.e. navigation | around if you want to have several windows open at the same | time), but agree that's not so well defined. | deergomoo wrote: | Personally I find it much more difficult to keep n places in | one giant file in my head than I do n individual files. | | We have a few multi-kloc legacy monsters where I work and I | quite often completely lose my place when working on them (and, | by association, my train of thought), even though they're | actually structured somewhat reasonably. | niccl wrote: | I had this problem until I found an editor that had outlining | as it's core design paradigm. Now, with the outline always | visible, it's _really_ easy to navigate any length file. | | Unfortunately, at one point I got so used to navigating with | the outline that I ended up making a 1500 line function in C | (I was an even worse C programmer then than I am now). | Because of the outline, I could read and follow it easily, | but anyone with a different editor was royally screwed :-( | | If you're interested, the editor is LEO | (http://leoeditor.com/) it's been mentioned on HN a few times | rodgerd wrote: | At one point - and perhaps still today - Java would refuse to | JIT class files with more than a couple of thousand lines in | them, falling back to interpreted mode. So in that case, you | really, really wouldn't want the 5,000 line file. | [deleted] | shakna wrote: | I learned that not all text editors go to the effort of loading | the file data very carefully with careful underlying data | structures when I tried to open a 67K LOC COBOL file on a 32bit | system, a while back. (Sidenote: COBOL has a 999,999 LOC hard | limit in the compiler spec.) | | So very many editors just couldn't open it. | | Some would use so much memory that the system would either | freeze, or the OS would kill them. | | Some would silently truncate at 65,535 lines. | | Some would produce a load error. | | Some would pop up with an error indicating the developer thought | it was an unreachable state. e.g. "If you're seeing this error... | Call me. And tell me how the fuck you broke it." | | Others would manage to open it, but were completely unuseable. | Where moving the cursor would take literal minutes. | | There were exactly three editors I found at the time that worked | (none of which were graphical editors). And they worked without | any increased latency, letting you know that the developers just | thought through what they were doing: vim, emacs, nano. | | (A few details because people are probably curious - the vast | majority of that single file project was taxation formulae. It | was National Mutual's repository of all tax calculations for | every jurisdiction that they worked in, internationally, for the | entire several hundred years of the company. They just | transcribed all their tax calculation records into COBOL.) | kbrannigan wrote: | So afraid to write bad, spaghetti code, I ended up writing no | code at all. | | This thread made me realize that it's better to have a working | profitable project with bad code, than a perfect unfinished | project, with meticulously chosen design patterns. | | Afraid of being judged for bad code, I could not start until I | had the right architecture. | | I'm glad I read this. | | This is developers therapy. | knome wrote: | You sound like you should read this: [removed] | | apparently jwz decided not to be linked from here :/ | | there's an archive.org link below. | GranPC wrote: | Heads up: jwz.org redirects Hacker News visitors (via | "Referer") to an image of a slightly-hairy testicle sitting | inside an egg cup. So maybe don't click the link at work. | sillysaurusx wrote: | That's friggin' hilarious. What a boss. | AlexCoventry wrote: | Safe link: https://web.archive.org/web/20220318220659/https | ://www.jwz.o... | baud147258 wrote: | from the last time I saw that link on HN, opening it in a | private browsing window avoid the redirection. | Jtsummers wrote: | https://www.dreamsongs.com/RiseOfWorseIsBetter.html | | For future reference, a non-archive, non-jwz.org link. | Straight from the source as that's the author's own site. | albertzeyer wrote: | Put it like this: www.jwz.org/doc/worse-is-better.html | | Then just copy & paste it. | wincy wrote: | Uhh you should know that domain redirects traffic from hacker | news to an offensive image. | [deleted] | [deleted] | 29athrowaway wrote: | Restaurant industry version: I was so afraid | to cook in a dirty kitchen, I ended up not cooking at all. | This thread made me realize that's better to sell food prepared | on dirty surfaces with unrefrigerated ingredientes half-eaten | by rodents and roaches that makes people sick, than fresh food | prepared on clean surfaces with clean utensils. | I'm glad I read this. This is a restaurant worker | story. | | Construction industry version: I was so | afraid of not using the right construction materials and not | building code-compliant structures, I ended up not building at | all. This thread made me realize that's better to | sell houses with structural problems and low quality materials | that will be unsafe to live in, than houses built according to | code. I'm glad I read this. This is a | builder story. | | In any other industry, a person would go to jail for saying | that. You won't, because luckily for you, software development | is not a regulated activity, and people with your mindset can | make a happy living outside of jail. But hopefully one day some | types of neglect in software development become illegal. | | "Better is the enemy of the worse" is no excuse to have | spaghetti code, or 50,000 lines of code files. It means that | good is sometimes more convenient than perfect. Spaghetti code | is not good to begin with, so the wrong way of using the | proverb. | Oras wrote: | Been there. I worked in a company where we had a codebase like | the one mentioned in the article and over the years we started | developing microservices with 100% code coverage. | | The new shiny services took much longer to identify bugs and | add new features due to the complexity of the design and | endless interfaces. | khazhoux wrote: | I've sadly come to realize (after witnessing on many projects) | that there's a pattern that goes like this: | | * Team A writes code quickly. Not bad code, really, but they | take shortcuts everywhere they can. They don't have the | strongest tests, they don't generalize for all the known use | cases, etc. Their code goes to beta and gets users and makes | progress. | | * Team B deliberates and deliberates. They try to avoid taking | shortcuts. But in the end, even _their_ code doesn 't have the | strongest tests, doesn't generalize for all the known use | cases, etc. Team B never gets users or gains momentum, and | their code+architecture was probably no better than Team A -- | they just took 3x the time to get there. | 411111111111111 wrote: | I kinda agree on this, but not for the implicit reason you're | probably thinking of. | | Just starting and doing it is just unreasonably effective | because very few projects actually need novel solutions - | most are just fine with off-the-shelf hacked together | solutions. | | Thinkers are required if the software is actually | groundbreaking new work. Almost everyone's work on this forum | probably isn't that however (mine included), which is why I | agree with your sentiment | wk_end wrote: | I don't think this dichotomy is helpful. I'm presently | working at a startup that's trying to dig itself out of a | hole created by the first CTO, who in doing things "quickly" | created an MVP so buggy, inefficient, crash-prone, and | unmaintainable that we can't retain customers _or_ engineers. | As always, there's a balance to be struck, and ways to | operate quickly that don't sacrifice quality too much. | notreallyserio wrote: | I'm kind of surprised that you can't find engineers | interested in creating a new implementation of an existing | application that is actually used by people. I think that | might be my dream role. | gridspy wrote: | Be Team C. | | Team C works like team A. However every time a feature ships, | someone who knows that feature well immediately refactors the | relevant code to remove the prototype scaffolding. When code | becomes static, an expert adds good quality comments. When a | bug is found, it is recreated in a regression test prior to | being fixed for good. | jokethrowaway wrote: | The sad reality of tech companies is that there is little | incentive or bonuses for improving the situation. You won't | get a bonus for cleaning up the code or for rewriting | rotten code. | | Hack at the code for 4 years, collect your options and | leave the mess to someone else. | | To be honest, a hacky codebase written fast is not the | worst codebase to deal with. The worst type is when someone | had the time to overarchitect and overengineer things. | | Following references across 200 different files, tracing | calls through hundreds of microservices. Graphql servers | with complex resolver logic. | gridspy wrote: | > The worst type is when someone had the time to | overarchitect and overengineer things. | | I concur. Refactoring should be as much about removing | unneeded abstraction and features as it is about adding | same. | | > there is little incentive or bonuses for improving the | situation. | | Yeah, I just can't seem to believe this in my soul. I | just want to fix ugly code and can't stop myself. I get | huge satisfaction from speeding up, tidying up or fixing | up bad code. | | It doesn't help when management wants to minimize time | spent on such tidy-up, especially when it's hurting our | productivity to maintain it without fixing it. | wsc981 wrote: | I try to sneak in refactoring with other tasks. | kbrannigan wrote: | There's a YouTube video about beginner musicians vs | intermediate vs advanced. | | The beginner uses simple chords | | The intermediate uses advanced chords, crazy fills and runs | and riffs. | | The advanced uses simple chords | userbinator wrote: | That's not far from a similar saying in software: expert | developers write code that looks like a beginner's, but | simpler and with fewer bugs. | jbmny wrote: | "First there is a mountain, then there is no mountain, then | there is." | farmin wrote: | Tests, ha | kolinko wrote: | +1 | | I had a lot of trouble trying to explain this to juniors. | | The most important things is to have code that is easy to | refactor when you know what you're doing (i.e. everything is | working properly). Juniors I worked with had a nasty | definition of a pretty code being split into a hundred files, | each no longer than a screen, and each function no longer | than 5 lines. The onboarding of new devs to such code was way | worse than into a code that would be 10k lines in one file, | but with a flat structure and less interdependency. | lupire wrote: | > The most important things is to have code that is easy to | refactor | | This whole post is about how refactoring doesn't matter | because your project's development lifetime isn't long and | wide enough for maintentance to matter. | gary_0 wrote: | "Flat is better than nested" - The Zen of Python | | I had an "everything should be broken into a hierarchy!" | stage back when I was learning to code, and boy was I off | track. In my defense, at the time (and this dates me) OOP | was all the rage. | BurningFrog wrote: | > _The most important things is to have code that is easy | to refactor_ | | Very true. | | I'll just add that _another_ most important thing is to | actually _take time_ to refactor, even when things are | busy. | | I spend maybe 1/3 of my time refactoring, and that feels | good. | Scarblac wrote: | Less interdepency is absolutely the key to everything. | | But... isn't the easiest way to _show_ that there is little | interdepency to put them in separate files that don 't | import from each other? | lanstin wrote: | People misjudge where to draw the lines. You will have an | orchestration API call that does five things and each of | those five things, not used anywhere else, will get its | own class, interface, factory, and configuration, so to | read thru the five things you have to open like twenty | files. And to notice that despite all this engineering | they have static credentials in the code itself, you have | to be alert across so many lines of code. The whole thing | can be one longer file that reads coherently and in fact | lessens the cross class importing. | kcb wrote: | Something this reminds me of that I've been doing lately when | stuck on a particular problem is just coding something. Even | if it's the most shittiest, inefficient and naive solution. | More often than not I either discover a more proper solution | along the way or just realize my shitty solution actually | wasn't all that bad to begin with. | baggy_trough wrote: | Start with the simplest idea that might work. | Mawr wrote: | https://news.ycombinator.com/item?id=19956614 | ICodeSometimes wrote: | The scariest part of this isn't the 11K lines of code, it's the | lack of automated tests. It IS impossible to make any sort of | substantial change without breaking another part of the darn | thing. | | My favorite quote has got to be: "Unit tests aren't meant for you | now, it's insurance against a future developer". | sargstuff wrote: | Obviously Sam from accounting did this before Excel was t urning | complete. | | https://www.felienne.com/archives/2974 | | https://www.microsoft.com/en-us/research/blog/lambda-the-ult... | | No metion of a cell-u-lite version. | maxnoe wrote: | CORSIKA, current Version 7, is a program to simulate extensive | air showers, ie particle physics in the atmosphere. | | It's main file is 88kloc Fortran 77, started in the eighties, | still actively developed. | | https://www.iap.kit.edu/corsika/index.php | | Currently a rewrite in c++ is underway. | matsemann wrote: | That's me when I use languages that don't support circular | imports. Can't have circular imports if everything is in the same | file. Taps head. | tuyiown wrote: | Just in passing, generally you can break circular import by | isolating the coupling that triggers the circle chain in a file | dedicated to that. Bonus : each coupling use case gets to be | explicit | heavenlyblue wrote: | Python allows for circular imports as long as you don't | directly import things you use but instead their modules for | example. | | Is that the same for all languages? | _benj wrote: | For me the limitation of circular imports forced to drastically | rethink how I architected my software (golang, rewrote the | whole mvp 3 times be the first two I was either completely | blocked by no-circular-imports or the structure felt so hacky | that I didn't even wanna touch it... then I learned about | interfaces!!) | henning wrote: | The number of lines in a file doesn't necessarily mean anything. | SQLite is compiled as a single file: | https://www.sqlite.org/amalgamation.html . It depends on the | structure that is in that file. | | If this app was factored into 300 different files, it would still | be an impossible mess. The redundant and buggy logic would just | be in different files. | bedatadriven wrote: | SQLite concatenates its sources into a single file to ease | distribution, but it is developped in many seperated and well | organized .c and .h files. | [deleted] | beckingz wrote: | > What is the moral of the story? | | > I have no idea. | | Sometimes successful software just grows. | mgaunard wrote: | I don't understand why people complain about such things. | | A seasoned programmer should have no problem navigating a multi- | million-line codebase, that's just routine. | | There isn't anything that special about a 10k file. | User23 wrote: | Is it weird that I think that, armed with a modern editor, this | would be basically easy and practically pleasant compared to the | FooFactoryFactory "best practices" crap I've had to deal with? | prpl wrote: | I encountered the QGSJET-II model which we used for modeling | cosmic ray showers in the atmosphere. At one point I was asked | about finding a way to parallelize the code, but the 17k+ line | fortran file for the model, which I recall also included self | modifying code, was too deep for an undergrad to penetrate. | | https://gitlab.iap.kit.edu/AirShowerPhysics/crmc/-/blob/mast... | unnouinceput wrote: | Quote: "There was no test environment. If I made a change, I had | to test it in "production"." | | What year is this? 70's and mainframes? Because no way in hell | you cannot, on large organization that had "Jeff in marketing", | since 80's and PC's, duplicate the environment. Especially given | this was used by almost everybody in the organization. | | And once you're done duplicating the production, and create | proper test environment, you can start actually refactoring and | creating a beautiful app out of it instead of just "To this day I | sometimes lie in bed wondering what could have caused this". | | Conclusion - article's author is a whiner instead of a solver, no | better than the ones before him that "copy and pasted at some | point then later diverged" | honkycat wrote: | File length is a bit of a bike shed in my opinion. My main | concern here would be separation of concerns and code quality. | | I prefer many short files and folders structured hierarchically | and grouped semantically. I have no proof this is better so I | would probably just leave it to a vote with the team. | | In the end I think they is how a lot of this should be viewed | until we get proper research. How do you WANT to code? TDD? No | tests? One giant file? It should be a team and executive | decision. | | If you don't like the style on your team, and nobody wants to | change it, move on or adapt. | | Technical debt is like a superfund site. It renders the real | estate worthless and poisons the rest of the company. | | It does matter. My current gig is hemorrhaging money because we | can't keep devs even though the pay and benefits are great. We | cannot execute on mission critical initiatives. | | We cannot adapt our product to meet the needs of the market in an | agile way. | | This is due to people saying "a working product is more important | blah blah.." for years. I would argue there is a balance to | strike and you can do both with a good team and realistic | planning. But there is always the nay-sayer who is willing to | step in and say whatever product wants to hear. | | It is so bad we cannot train people to use the software anymore. | It is too poor quality and were can't on-board them before they | decide to go elsewhere. | | Everyone who knew anything has left and there is too much of it. | So the remaining devs get overwhelmed, they leave... It is a | vicious cycle. | | The funny thing is the money machine works, but it is so | frustrating to see all of the extra money we could be making and | having to leave it on the table. | thedanbob wrote: | I once inherited a mission-critical PHP project which had no | version control, no tests, and no development environment (all | edits were made directly on the server). It used a custom | framework of the original author's own devising which made | extensive use of global variables and iframes and mostly lived in | several enormous PHP files. I was able to clean it up somewhat, | but there was one particularly important file that was so | dependent on global variables and runtime state that I never | dared touch it. | | When I was finally able to retire the project several years | later, I first replaced the home page with this picture: | http://2.bp.blogspot.com/-6OWKZqvvPh8/UjBJ6xPxwjI/AAAAAAAAOv... | _joel wrote: | It wasn't mission critical but my very first production | programming project (n.b. I'm not a programmer and never had | any classical training or education as one) was an abomination. | I'd like to think the realisation of how bad it was, despite it | just about working, was a call to arms to up my game a little. | I ended up learning a lot about data structures, writing | understandable code and comment, when not to write code, all | that OOP stuff and things like STI, Generics (still not sure | what they are), testing (TDD AND BDD!!! Yea, Cucumber!) and a | plethora of other useful things. | | I'm still not a programmer. | lucb1e wrote: | My first thought when reading this description was that step | one is to make a local copy, get a development environment | setup where you can toy around, see how things fit together. | The 'stupid'er the setup (like using plain old files instead of | a database), the easier that actually gets (apt install apache2 | php; rsync da:files /var/www). Wouldn't that have helped solve | this particularly important but untouchable file? | thedanbob wrote: | If I remember correctly, the file was processing global state | from other parts of the system, and it was such a Byzantine | bit of code that I had almost no hope of understanding what | it was actually doing without being able to observe state in | the production system as it was being used. Plus at the time | I wasn't a particularly competent programmer myself (this was | my first programming job). In the end I figured it wasn't | worth risking breaking it when its replacement was on the | way. | k__ wrote: | Haha, same! | | The first project I inherited was PHP app that used a custom UI | framework created by an agency that didn't work with us | anymore. | | One file had 7000 LoC and it would generate hundreds LoC of | with-sprinkled JS code and send it to the browser on every | click. | | Debugging that thing was a nightmare. | atoav wrote: | Writing long files is okay, but u less a languages module system | gets in your way seperating different aspects of your code into | different files does no harm. | | There is no need -- at all -- to be evangelical about it one way | or another. | briantkelley wrote: | I worked on Word for years. Office has thousands of files over | 10,000 lines with, uh, various degrees of test coverage and | comprehensibility. After some time and experience, your mental | model of the architecture ends up being way more important than | simple metrics on source code organization. | | IMO, organizing source code in files seems archaic. E.g. tracing | the history of a function moved across files can be tedious even | with the best tools. I'd like to see more discussion around | different types of source storage abstraction. | | There are benefits of large source files... When compiling | hundreds of thousands of files (like Office), the overhead of | spawning a compiler process, re-parsing or deserializing headers, | and orchestrating the build is non-trivial. Splitting large files | into smaller ones could add hours of just overhead to the full | build time. | woah wrote: | What's an alternative to files that doesn't just have all the | same attributes of files anyway? If it involves breaking code | into multiple chunks of related functions, and possibly having | these chunks act as namespaces, that sounds like what a file | does. | rc_mob wrote: | I love how much this questions the status quo. | holoduke wrote: | Better one good organized file than 100s of folders and | subfolders and files and symlinks. I have worked on projects | where even after 2 years I didn't grasp the folder structure and | just used search to locate files. | civilized wrote: | People love to complain about things that are simple, fast, and | easy to complain about, without regard to whether the complaint | is insightful or useful. It's sort of the dark twin of | bikeshedding. | | If you divide the single 11k-line file into a thousand 11-line | files, it may become objectively much harder to understand, but | it'll also receive much less flak, guaranteed. | | I suspect this is also why Architecture Astronaut-ery can be so | successful within a company. If code is chock-full of | superficial signs of order and craftsmanship, such as | hierarchy, abstraction, and Design Patterns(TM), it takes a lot | of mental effort to criticize it, and most people won't. | smegsicle wrote: | > dark twin of bikeshedding | | is it different from regular bikeshedding? or are you saying | that the dark twin is the evolutionary process of eg. | architecture gaining complexity until it becomes difficult to | criticize.. | aaronchall wrote: | If you divide a single 11k-line file into 20 files averaging | 550 lines per each, by semantics and levels of abstraction, | your code will quite possibly be easier to read, maintain and | add to. Maybe. Perhaps. | civilized wrote: | I mostly agree, but it's often not that big a deal, and | some people and applications may favor bigger files. | | I have a 4000-line script in a single file that has served | me very well. It's perfectly organized and modular. I | thought about breaking it into more files but it seemed | pointless. | gameswithgo wrote: | lordnacho wrote: | No doubt dozens of devs will throw in their own 10k LOC story | here, and yes it's painful to watch so many people having | professional cramps over it. | | But don't forget society itself if governed by OOM larger bits of | text with no referential integrity, no machine to tell you if | it's inconsistent, and no way to test anything, other than making | humans write more text to each other and occasionally show up in | court. The law itself, even parts of it like the tax code, and | regulations on various areas, are a melange of text and cultural | understandings between lawyers, judges and government. We collect | the data for this machine in the form of contracts and receipts, | and it piles up in mountains. | | As with code, it's not just legal professionals who have to deal | with law. It spills into everyone's life, and there's nothing to | do about it other than either guess what to do or pay a pro to | tell you what to do. | Fwirt wrote: | The payroll check printer for my employer was once a couple | thousand lines that generated raw PCL to be sent to a LaserJet | that used magnetic toner to produce checks that had a working | MICR number. It was rendered into spaghetti by multiple GOTOs | that jumped to helpful labels like "666", and calls into other | helper programs to generate more PCL that did things like change | fonts and draw graphics. Of course none of it was commented, so | you had to have a copy of the PCL spec on hand to know what any | of it did. It was the product of a retired cowboy that had also | written the rest of our custom payroll system over a number of | years. | | I attacked it by printing out and taping together each program | into "scrolls" and tracing control flow with highlighters and | sharpies. Had them all taped up on my office wall so I could | refactor the whole thing from scratch, coworkers found that | entertaining. Got a much more readable replacement working | nicely. Then a couple years later HR bought a new system and we | stopped printing our own checks. I was not sorry to see the whole | thing go. | Lev1a wrote: | Reading your process I have that stereotypical TV series image | in my mind of a person so deep into the subject matter that | plaster every wall with notes and pull string all across the | room at head height to hang up ever more notes kinda like that | one NCIS episode (S8 E6 "Cracked"): | https://img.sharetv.com/shows/episodes/standard/616591.jpg | (although that image is only a small part of that whole view). | alpb wrote: | Obligatory "Do not try to simplify this code" reference here | https://github.com/kubernetes/kubernetes/blob/ec2e767e593953... | (https://news.ycombinator.com/item?id=18772873) from Kubernetes | persistent volume manager code. | ChrisMarshallNY wrote: | I can beat that. | | One of my first jobs was as a maintenance programmer for a | 100KLoC (or more) single-file FORTRAN IV (1970s vintage) | application (a proto-email server). | | Three-character variable names, no documentation, and having been | stepped on by every junior programmer that went before me. | | My best debugger was a Ouija board. | | The original author was a ringer for Donald "Duck" Dunn. | Interesting chap. | | It taught me the value of writing good code. | | It sucked. | | It was great (because of the lesson learned). | ben_w wrote: | That beats the 1000 lines inside a single if {} block that I once | found. | | (The conditional in that if {} always evaluated to true). | jandrese wrote: | Yeah, but the function only had a single exit point, so it was | following best practices. | moralestapia wrote: | Lol, reminds me of the meme: | | var a = true; | | if(a == true) then return true; | | else if(a == false) then return false; | | Or something like that. | asddubs wrote: | i hate to say that I've definitely written code like that. | not any time recently, thankfully (at least I think) | tonyedgecombe wrote: | The thing is it's easy to imagine a situation that leads to | this. It's five o'clock on Friday, your partner is hassling | you to get home because you have visitors, you are | exhausted because you worked 60 hours this week and your | boss is breathing down your neck because they want their | pet feature finished right now. | | This is why I'm always loathe to criticise stuff I see on | WTF. | [deleted] | Aardwolf wrote: | else throw "this should never happen"; | MBCook wrote: | I have seen that in real code. No kidding. | hibbelig wrote: | I once looked at 30 lines of code, analyzed them and found | out that they always computed true, never false. My | background knowledge told me that Zeit was the correct value. | But I did not dare to eliminate it in the main branch, only | in a refactoring branch that I think got never merged. I am | not proud. | [deleted] | foxhop wrote: | Just so you know awstats was a 25k+ line Perl script which i | maintained as an internal fork for employeer once. It was epic & | as hard to work with as you'd expect. | 11235813213455 wrote: | https://github.com/SheetJS/sheetjs/blob/master/xlsx.js with its | 24500 LOC, the author is cool | cosmiccatnap wrote: | I see people do this all the time with terraform and it's | madness. ___________________________________________________________________ (page generated 2022-04-03 23:00 UTC)