[HN Gopher] The Wrong Abstraction (2016) ___________________________________________________________________ The Wrong Abstraction (2016) Author : mkchoi212 Score : 512 points Date : 2020-07-05 15:54 UTC (7 hours ago) (HTM) web link (www.sandimetz.com) (TXT) w3m dump (www.sandimetz.com) | jack_h wrote: | I would say that if developers are hacking on an abstraction that | is ill-suited to the task until the code base is a nightmare, | they will take this advice and duplicate code until it's a | nightmare. | | The fact of the matter is every line of code that is written has | an associated cost. Developers all too often pay that cost by | incurring technical debt. | bob1029 wrote: | This whole thing exists on a normalized/de-normalized spectrum. | The problem is that both ends have pros/cons. | | On the normalized side, you have the benefit of single-point-of- | touch and enforcement of a standard implementation. This can make | code maintenance easier if used in the correct places. It can | make code maintenance a living nightmare if you try to normalize | too many contexts into one method. If you find yourself 10 layers | deep in a conditional statement trying to determine specific | context, you may be better off with some degree of de- | normalization (duplication). | | On the de-normalized side, you have the benefit of specific, | scoped implementations. Models and logic pertain more | specifically to a particular domain or function. This can make | reasoning with complex logic much easier as you are able to deal | with specific business processes in isolation. You will likely | see fewer conditionals in de-normalized codesites. Obvious | downsides are that if you need to fix a bug with some piece of | logic and 100 different features implement that separately, you | can wind up with a nasty code maintenance session. | | I find that a careful combination of both of these ideas results | in the most ideal application. Stateless common code abstractions | which cross-cut stateful, feature-specific code abstractions | seems to be the Goldilocks for our most complicated software. | djhaskin987 wrote: | Mods this article is old, should be labeled 2016. | tomphoolery wrote: | This again?? ;) | | I love this post. A lot of wasted hours were spent in the past | trying to use abstractions that no longer made sense, but Sandi | encouraged me to go back and rethink a lot of that and now my | code is way easier to read. Thanks Sandi! | gumby wrote: | Early de duplication is the equivalent of early optimization: a | bad idea that boxes you in. | | Duplicate code is a sign that there _could_ be a generalization | missing. | why-el wrote: | Rob Pike discusses similar points in this section of his talk on | Go Proverbs https://www.youtube.com/watch?v=PAAkCSZUG1c&t=9m28s. | haolez wrote: | That's mostly how I matured as a developer: I find myself | abstracting less and writing less code today than I did 10 years | ago, but I'm more productive today, my code is cheaper to | maintain and has fewer bugs. Sometimes, I will literally copy | paste a small amount of logic just to avoid making a future | reader of this code to keep hunting around where the business | logic is actually implemented. "It's right here, my dear future | reader!". | | Or maybe I was just a really bad programmer 10 years go :) | sheeshkebab wrote: | I'm not sure why this is #1... but since it is, both of these - | duplication and wrong abstractions - are otherwise known as | technical debt. | dasil003 wrote: | Not necessarily. Technical debt is when you do something quick | and dirty to get a feature out in the short-term knowing that | it won't be maintainable, scalable, etc, but you do it anyway | with the expectation that you'll fix it later. Some duplication | and wrong abstractions are caused by this, but definitely not | all. | hrhrhrd wrote: | No, technical debt is a very general category that includes | deliberate hacks, structural flaws, and small mistake bugs. | It's anything that over time will damage the code base, | duplications and wrong abstractions being very much included | in that | dasil003 wrote: | You're welcome to your own definitions, but personally I | keep bitrot, deferred maintenance, and "structural flaws" | (which can be subjective and dependent on use cases and | scale) out of the bucket of technical debt since it robs | the metaphor of a defining aspect: intentionality. Debt is | not something that happens passively as the world changes | around you, it's something which you sign up for. | quinnirill wrote: | If you unintentionally destroy property and have to pay | for it, you're in debt. | | We even have a concept of life debt. | | Some debt is intentional, some incidental. | | Most technical debt I've seen was not intentional, just a | well meaning design that was created to serve a purpose | that eventually outgrew it, and that's when the interest | started to pile up. | | And happening passively is exactly what it does, interest | rates change, your ability to make downpayments change. | All part of the very well functioning metaphor in this | context. | tarkin2 wrote: | "With C you can shoot your own foot. With C++ you can blow your | own leg off". I feel the same is true here. | | The abstraction may be right at the time of writing, yet further | on it often becomes not only wrong, but a massive hindrance. | | With time and effort, hacky code and be worked into shape. An | eventual wrong abstraction normally means a rewrite. | twirlock wrote: | cc my know-it-all coworker | random3 wrote: | This is so true, but so shallow too. I think the big mistake is | to treat the code as "the main thing" when in reality it's just a | model (a golem) mimicking some "other thing" | | We're missing an entire set of code characterizations. Yes we | have a "pattern language" but there's not much to characterize it | structurally wrt "code distance" from one part of the code to the | other (e.g. in call stack depth as well as in breadth). | | And again all of this needs to happen wrt the "abstraction" not | the code itself. Having 10 methods 90% duplicated in a single | file with 10% pecent difference is many times better than trying | to abstract it. | | Having the same "unit conversion" function duplicated in 3 parts | of the code can be disastrous. | | These two examples are very easy to see and understand, but in | reality you're always in a continuous state in between. And "code | smells" like passing too many parameters or doing "blast radius" | for certain code changes are only watching for side-effects of a | missing "code theory". An interesting book on the topic is "Your | code as a crime scene". | | The bottom line is we're trying to fix these problems over and | over again without having a good understanding of what the real | problem is and this leads to too many rules too easy to | misinterpret unless you are already a "senior artist" | ijidak wrote: | > Having the same "unit conversion" function duplicated in 3 | parts of the code can be disastrous. | | This. | | I feel like it's really about cognitive load to remember and | recognize the differences. | | Duplication in 3 distant files, places a heavy load on the | developer to: | | 1. Discover the duplication 2. To grasp the reason for the | differences in the 3 different locations. 3. Remember these | things | | Whereas when the duplication is in the SAME file, #1, #2, and | #3 can become very manageable cognitively. | | Now the question changes to.. | | Is the cognitive load of dealing with the different special | cases in a single de-duplicated method GREATER than simply | leaving them in separate methods? | | Often the answer is duplication WITHIN a file is less of a | cognitive load. | | Whereas duplication ACROSS files is a heavy cognitive load. | | Minimizing cognitive load minimizes mistakes. And minimizes | developer fatigue. Thus boosting productivity. | | At least, that's my development philosophy, even though I've | never seen it in a design pattern or a book. | | It just seems to make sense. | bcrosby95 wrote: | I find it interesting that comments on these articles mainly | discuss 1 aspect about it. But rarely this part: | | > Don't get trapped by the sunk cost fallacy. | | In my experience, yes, programmers are hesitant to throw out an | abstraction. Why not work to change this, rather than telling | people not to abstract? | ben509 wrote: | I don't think it's a sunk cost fallacy. I think the hesitation | is more for social reasons, often not wanting to do a big pull | request that's going to be scrutinized. | Tainnor wrote: | "Big pull requests" that are unannounced are always | problematic because who wants to be the person saying "all of | this work you've done is wrong"? | | In such situations, it's good to get buy-in from other people | before attempting to do such a thing. Make a proposal for a | big change and discuss it. There's still a chance that, in | the implementation it doesn't work as nicely as believed | initially, but at least now it's less likely that the idea | will be rejected wholesale during code review. | chiefalchemist wrote: | Why not simply duplicate the abstraction, refactor as needed, and | adjust the necessary caller(s)? | | Having to know, find and maintain the individual duplications | feels dirty and its own way wrong. | | Choose your wrongs wisely? | dfischer wrote: | Reminds me of this discussion: | https://news.ycombinator.com/item?id=12120752 (John Carmack on | inlined code). | Pxtl wrote: | Every Line Of Business codebase I've worked on has been the worst | "there I fixed it" copypasta spaghetti, and has never made it to | the point where "maybe we shouldn't add a parameter to this | existing, cleanly abstracted method to handle this new similar- | but-distinct use-case" was anywhere near my radar for | abstraction. | | I would _love_ to have developers where my problem was "maybe | you piggybacked on existing code _too much_ , in this case you | should've split out your own function". | misja111 wrote: | Every failed IT project that I have worked on in the last 20 | years (except those where the cause was non-technical such as | bad planning/ bad requirements), failed because it used too | many layers of abstraction. | pkolaczk wrote: | Counter: Every failed IT project that I have worked on in the | last 20 years had too much code. Code is bad. Delete code | mercilessly. | | Seriously though, the problem are bad abstractions, not just | abstractions. A total lack of abstractions is typically a | spaghetti you need to read fully to understand. | Ma8ee wrote: | I just had to debug code that had seven layers of classes on | top of dapper to call a stored procedure in SQL server. | IncRnd wrote: | Tnere is truth in this! I've also seen that some of the most | successful projects with the highest performers are the most | full of duplicate code. | | The operating theory is to be first to market in order to | capture the largest market share and be the market leader. | Programs are just tools that can be rewritten later. That's | similar to any large tech company today that "innovates" then | apologizes later. | collyw wrote: | If a company has been running long enough to be making | money, chances are their codebase will be crap. | sagichmal wrote: | I have had exactly and overwhelmingly the opposite experience. | I wonder if it's a function of our fields, or what... | mlthoughts2018 wrote: | It's been the exact opposite for me. The spaghetti code has | always come from poorly conceived abstractions and the massive | problem of inverting an API to reimplement functionality | _through_ the API that should be extensible _within_ the API | (but fails to be because of poor choices in abstraction or | abstracting prematurely). | | Later on that spaghetti code gets labeled as lacking | abstraction, similar to what you are saying, despite the actual | problem being _too much_ abstraction and poorly designed | abstraction that became load bearing in a way where everyone | decides that living with API inversion as a reality is the | lesser evil and figures they'll probably quit the company and | move on to greener pastures before it becomes their headache to | deal with. | | https://en.m.wikipedia.org/wiki/Abstraction_inversion | hesdeadjim wrote: | Absolutely this. I'd rather look at 200 lines of linear, | inline documented code then a spaghetti mess of "helper" | functions that do nothing better than obfuscate everything | going on. | | I've had a strict rule with my team of "1, 2, N". I don't | want to see an abstraction until we've solved a problem | similarly at least two times, and even then an abstraction | may still be a poor idea. | | Abstraction is an especially poor idea early in a project | because often you only half know what you're making (I'm in | games). Requirements change, or a special case needs to be | added, and all of a sudden you are trying to jam new behavior | into "generic" helpers without breaking the house of cards | built around them. | hackinthebochs wrote: | 200 lines of code means that you have to comprehend all 200 | lines simultaneously since any line could potentially | interact with any other line in that code block. Using | functions where the state is passed as parameters limits | the potential for code interactions through functional | boundaries. The point of abstractions are to limit | complexity by limiting potential interactions. Helper | methods do a fine job of this. | hesdeadjim wrote: | That's a gross over-generalization to assume that 200 | lines is always a self-referential mess. Functions | fundamentally transform data, and often that | transformation is a linear process. If it's not, sure, | break it up in a more sensible manner. | | Regardless, helper methods have a significant cognitive | cost as well. It's nice to pretend that a four word | function name can entirely communicate the state | transformation it does, but in reality you need to know | what it does and mentally substitute that when reading | the function using it. No free lunch. | gav wrote: | I worked on a webapp that our team inherited which had | 400-800 line controllers (and one that was a little over | 1200 lines). When I first started looking at the code I | was horrified but then I realized that everything was | self contained and due to the linear flow, pretty easy to | understand. You just had to get used to scrolling a lot! | | The issue that we started having is that pull requests, | code reviews, and anything that involved looking at diffs | was a lot of work. There were two main issues: | | 1) Inadvertently creating a giant diff with a minor | change that affected indenting, such as adding or | removing an `if' statement. | | 2) Creating diffs that had insufficient context to | understand: if your function is large enough, changes can | be separated with enough other lines of code to make the | diff not be standalone. You end up having to read a lot | of unchanged code to understand the significance of the | change (it would be an ideal way for a malicious | developer to sneak in security problems). | hackinthebochs wrote: | >That's a gross over-generalization to assume that 200 | lines is always a self-referential mess. | | The point is that you don't know this until you look. You | have to look at all 200 lines to understand the function | of even one line. When you leverage functional boundaries | you generally can ignore the complexity behind the | abstraction. | barrkel wrote: | You're fooling yourself, in a mature codebase, if you | think you can modify code and not look past function | boundaries. | | That assertion would be more credible in a language that | captures side effects in the type system, but that's not | what most people use. | hackinthebochs wrote: | I'm not sure what point you're making. If you are just | assuming that functional boundaries tend to not be | maintained in practice then you're not contradicting | anything I have said. Whether or not functional | boundaries are easy/hard to observe depends on the | language and coding conventions. | nfw2 wrote: | I agree that over-engineered helper function hell can be a | real problem. | | I disagree strongly with strictly enforcing the 3x rule. | The right abstraction can be helpful even if it is used | only once. The right abstraction will communicate its | purpose clearly and make it easier to reason about the | program, not harder. Obfuscating implementation details is | a feature not a bug, as long as the boundaries of the | abstraction are obvious. Another benefit is it makes it | easier to test the logical units of your codebase. | | "It's nice to pretend that a four word function name can | entirely communicate the state transformation it does, but | in reality you need to know what it does." Are you | suggesting you are cognizant of every line of code of every | library you use in your work? | hesdeadjim wrote: | Actually yes, you should know to at least depth=1 what | your magic incantations are doing when you call them. | | And that's part of my point, if you go that one level of | depth and find an excessive amount of DRY, you'll find it | that much harder to know what the hell is going on. | nfw2 wrote: | Yes, you should understand what a function does when you | call it. Not everyone who looks at a codebase is | modifying the codebase or adding new function calls. The | person referencing the code may already be 1-level deep | in parsing the implementation. | | Not all abstractions will seem like a magic incantations | when you use them. Something like "convertToCamelCase" | conveys its purpose clearly enough that the reader can | assume what the low-level operations are. They don't need | to look at these operations every time they need to | reference the code. | captainmuon wrote: | So much this. I've encountered many codebases (in science and | in tech) where the coder did not even use basic abstractions. | In one case there was a lot of plot('graph1') | plot('graph2') .... plot('graph100') | | because somebody didn't know how to create strings at runtime | in C++. Another codebase did complex vector calculations in | components, I was able to reduce a 500 lines function to 50 | lines (including comments, and with bugs fixed). | | I can sympathize with this a bit, I started programming with | BASIC - you could not return structs, you could not use | indirect variables (no pointers/references)... but at least you | had the FOR loop :-P | | People get often called out for over abstracting (rightly so), | but I've rarely seen somebody critisized for copypasta or for | overly stupid code. Probably because we're too accidentially | afraid to imply somebody can't code. | alephnil wrote: | Code like you describe is of often the result when a program | is written by someone that does not have programming as their | main profession. I have seen code like you describe in code | written by scientists (in other disciplines than computer | science). | | They may have very deep knowledge in their field, and have | written a program so solve some problem they have, but are | unfortunately not very good programmers. This often results | in quite naive code that still try to solve an advanced | problem. | | In code written by professional programmers, I have seen the | pattern described in the article far more often than the | naive style you describe. After all, programmers are trained | to avoid duplication and finding abstractions, and will often | add one abstraction too much rather than one too little. | klyrs wrote: | plot('graph1') plot('graph2') .... | plot('graph100') | | I've done a lot of that myself. What you might not be seeing | is the for loop in a scripting language that was used to | generate that text. It probably took less effort than looking | up and implementing it the "right" way. It might make _your_ | eyes bleed but if you need to change "plot" to another | function, that's just a find-and-replace-all away. Most | importantly, the code _works fine_ and doesn 't actually need | abstraction. | loopz wrote: | This is fine for code that belongs in the trash, ie. just | testing stuff, prototypes, debugging, learning the | language/framework, etc. | humbledrone wrote: | > the code works fine and doesn't actually need abstraction | | Well, maybe it works fine. We didn't see the other 97 lines | to verify that they actually include all the integers from | 3-99 without skipping or duplicating any. (NB with a loop | this verification would be trivial.) | klyrs wrote: | Maybe they deleted 57 because it triggers an edge case. | Put it back if you dare. ;) | | (no, that's the bad kind of tech debt that's | unfortunately common and I actually hate) | Sharlin wrote: | Yes, writing a for loop in _another language_ to generate | code instead of just writing the same loop in the language | you 're already using? Common technique, nothing wrong with | it whatsoever. | klyrs wrote: | Yes, a lot of scientists use their computers in ways that | horrify software developers. For example, learning | exactly enough of a compiled language to do some wicked | fast integer / floating point arithmetic, and not | bothering to waste time on the mundane crap you find | obvious. And that might mean falling back to a familiar | language that makes string formatting easy. | | If it ain't broke, don't fix it. | zbentley wrote: | > If it ain't broke, don't fix it. | | But scientific programming is _deeply broken_. Code | presented along with publications often doesn 't work, or | is an incomplete subpart/toy example that's supposed to | be invoked within some larger framework. That sounds | great until you realize that "some larger framework" | doesn't refer to a standardized tool, but some deeply | customized setup (a la the one you're responding to, that | uses e.g. ad hoc code generators across two--or sometimes | more--languages because the original authors didn't know | how to format a string in one of them). | | Even if you do get lucky enough to find a paper with all | requisite code included, in many cases it was only ever | invoked on extensively customized, hand-configured | environments. And that configuration was done by non tech | folks with a "just get it to where I can run the damn | simulation" attitude, so configs are neither documented | nor automated. And when I say configs, I'm talking about | vital stuff--e.g. env vars that control whether real | arithmetic or floating point is used. | | Often as not, you hack your way to try to get something-- | anything--running, and it either fails catastrophically | or produces the wrong result. Now you have to figure out | which of several situations you're in: is the research | bad? Were the authors just so non-technical they | accidentally omitted a vital piece of code? Was the | omission deliberate and profit-motivated (e.g. the PI | behind the paper plans on patenting some of the software | at some point, so didn't want to publish a special | sauce)? Was the omission deliberate and shame-motivated | (i.e. researchers didn't want to publish their insane | pile of hacks written to backfill an incomplete | understanding of the tools being used)? Is it an | environment-dependent thing? | | And all of _that_ is just as pertains to code in | published work--usually the higher-quality stuff. | Assuming ownership of in-house code from other scientific | programmers is much, much worse. | | This isn't abstract moaning about best practices. The | failure of labs, companies, publications, and | universities to combat this phenomenon has direct, | significant, and negative effects on the quality of | research and scientific advancement in many fields. | | TL;dr it is "broke". When programmers complain about | reproducibility crises in soft-science fields, they're | throwing rocks from glass houses. | [deleted] | platz wrote: | > I've rarely seen somebody critisized for copypasta or for | overly stupid code. | | Do you think that is in the realm of what the article is | concerned with? | toastal wrote: | This comes up very often and is probably a big part of the | distaste many people have for jQuery. You see so much | copypasta $(selector) that queries the entire DOM over and | over again instead of storing the intial query in a selector, | querying children based on a ParentNode, etc.. This | duplication is wasteful at best, and can hurt performance at | worst. | | But as others noted, this is usually the sign that the | creator is either green, or puts little focus in furthering | their programming because they normally do other things--not | malice or carelessness. | jpxw wrote: | I saw a post on here recently about the "proportionality of | code" (I think this was the term used) - as in, how much | one line of code translates to in terms of work for the | machine. Python was used as an example, in contrast with Go | (list comprehensions vs Go's verbose syntax). | | I think a similar line of thinking is applicable here. $ | hides a lot of work behind short syntax. The syntax isn't | "proportional" to the work. Not only that, but the amount | of work depends on the argument. Perhaps it's better that | we're forced to put the effort in and type out | "document.getElementById" - it makes us think about what | we're doing. | nick-garfield wrote: | wow, just reading that term "line of business" makes me | anxious. I used to work on a global payments platform that | supported "multiple LOBs", and it was a nightmare of ifs and | switch statements all the way down. The situation was made more | difficult by the fact that our org couldn't standardize the | LOBs into a common enum. | mrfredward wrote: | The business codebase I'm working on now was written by OOP | crazy people who thought inheritance was the solution to every | line of duplicated code. When they hit roadblocks, they filled | the base class with things like if(this.GetType() == | typeof(DerivedClass1)){... | | I would do anything to have the duplication instead. | fauigerzigerk wrote: | If you're truly OOP crazy you will always find ways to avoid | resorting to branching on types or even avoid branching | altogether (just on the language level of course). "There's a | design pattern for that" :-) | pierrebai wrote: | Checking for the type is the exact opposite of OO. | | The correct OO would be to think about what the check | represent, maybe abstract it in a base interface with pure | abstract methods and derive from that interface. | | What you describe is what people without understanding of OO | do when they come from a language without OO. | goatlover wrote: | > they filled the base class with things like | if(this.GetType() == typeof(DerivedClass1)){ | | That defeats the purpose of polymorphism. | raverbashing wrote: | Very relatable. And they even have the guts to call this code | "SOLID" | BurningFrog wrote: | Once you ask what the class is you're no longer even "OOP | crazy". | | You've just capitulated to the complexity and do whatever it | takes. | | I don't want to sound (too) condescending. I know how easy | the best intentions can lead a project there. This job is | _hard_. | isbvhodnvemrwvn wrote: | Then the very same people learn that inheritance bad, | composition good, and they'll create abstractions with no | meaning on their own, which call 10 vague other abstractions | (but hey, no inheritance!). Figuring out what happens there | is even worse than with inheritance. Some people grow out of | it, fortunately (mostly after having to deal with shit like | that once or twice). | grey-area wrote: | In contrast, every junior developer I've ever worked with has | wanted to abstract too early and often, and been slow to | recognise that abstraction has costs too (often far higher over | time than is initially obvious). | | There are costs to copying code, and costs to abstraction, and | there's a balance somewhere in between where the most resilient | and flexible code lives. The costs of both are paid later, | which makes it very hard to judge when starting out where that | balance lies, and hard to assign blame later on when problems | manifest. Was it too little abstraction, or too much, or the | wrong abstraction? | | Note that the article claims that duplication is cheaper than | the _wrong_ abstraction. The problem is not abstraction in | itself, but that abstraction is very hard to get right and is | better done _after_ code has been written and used. | Pxtl wrote: | What I run into with juniors is that yes, they want to | abstract the new problem, and that's good... But they show | disinterest in learning the existing abstractions and the | existing problems and how their new code would fit into that. | Given that approach, you end up with a million individual | "frameworks", each only solving a single specific case of a | series of overlapping similar problems. | | Because reading code is harder than writing it. And the only | thing worse than "there, I fixed it code" is "there, I fixed | it with this massive cool new framework I've built". | grey-area wrote: | _yes, they want to abstract the new problem, and that 's | good..._ | | I'm not sure that is good. I started off this way too, but | now I like to think carefully about abstractions and avoid | introducing them till I'm sure it will not hinder | understanding, hide changes/bugs, bury the actual behaviour | several layers deep, or worst of all make things hard that | should be easy later (the problem in the article). | | Building abstractions is world-building; it's adding to the | complicated structure other developers (including your | future self) have to navigate and keep in their head before | they can understand the code. So perhaps because of your | second point (that people rarely like other people's | abstractions), it's better to keep abstractions simple and | limited. | dynamite-ready wrote: | Nothing I hate more than seeing two files or more, sharing 90% | of the same code. No matter what justification one attempts to | use, there's a mistake somewhere in the design / development | process. | | I can see a case for what the OP is saying, but I feel it | should always be seen as a temporary measure. | naringas wrote: | sometimes it's better to copy and paste some code only to make | each copy diverge more and more over time (somewhat like a | starting template) as opposed to introduction an abstraction to | generalize some slightly different behaviors only to use said | abstraction twice. | | this makes even more sense when the code will live on in | different programs | | there's a point when incurring the cognitive overhead costs of | the abstraction become worthwhile, probably after the 3rd time. | but my point is that it's also important to consider that the | abstraction introduces some coupling between the parts of the | code. | cjfd wrote: | If there is one single article about programming that I hate it | is this one. It is completely the wrong message. One should | instead be very eager to eliminate duplication. To avoid the | pitfalls that the article notes one should create abstractions | that are the minimal ones required to remove the duplication to | avoid over-engineering. Also one should keep improving the | abstractions. That way one can turn the abstraction that turned | out to be wrong into the right one. It is the attitude of | constant improvement that will make one succeed as opposed to the | attitude of fear of changing something that this article seems to | encourage. When one does things one learns. When one is afraid to | try things everything will just calcify until it is no longer | possible to add any new features. What one does need to make the | refactoring work is automated tests. | Ensorceled wrote: | In 30 years, I can count on the fingers of one hand the number | of times I've encountered projects that were in trouble because | there was copy/pasted code everywhere and the team was not | abstracting out of fear of breaking the existing code. | | What I have encountered is dozens of projects that had | essentially ground to a halt because of numerous deeply, and | incorrectly, abstracted systems, modules and libraries. | | Correcting projects in this state has almost always been | refactoring into fewer abstractions; less complex, more | cohesive and less coupling. | dragonwriter wrote: | > In 30 years, I can count on the fingers of one hand the | number of times I've encountered projects that were in | trouble because there was copy/pasted code everywhere and the | team was not abstracting out of fear of breaking the existing | code. | | I think the level of experience where _underabstraction_ is | common as opposed to _overabstraction_ is so low that it 's | uncommon to find a team where that gets through, because even | if someone junior is at the level where it's common, they'll | get corrected before it becomes a widespread problem. | zbentley wrote: | I don't disagree and have seen the same thing. | | However, I've also noticed in those cases that it's very hard | to get people to agree on what the problem actually is. One | person's incorrect over-abstraction is another person's | incompletely-DRYed-up code. | adamkl wrote: | Sandi mentions this during a talk she gave on refactoring a few | years ago. [0] | | It's a great little video for showing junior developers how a | messy bit of code can be cleaned up with a few well chosen OOP | patterns (and a set of unit tests to cover your ass). | | [0] https://youtu.be/8bZh5LMaSmE | voodoologic wrote: | I'm very partial to this talk about proper abstraction (and not | just for junior developers): | https://www.youtube.com/watch?v=OMPfEXIlTVE | layer8 wrote: | The main takeaway from the article is that abstractions which | have become inadequate should be corrected (removed and/or | replaced by adequate ones) as soon as possible. A corollary is | that abstractions should be designed such that they can be | replaced or removed without too much difficulty. A common problem | in legacy code bases is not just that they contain many | inadequate abstractions, but that the abstractions are entangled | with each other such that changing one requires changing a dozen | others. You start pulling at one end and eventually realize that | it's all one large Gordian knot. One thing that I learned the | hard way over the years is to design abstractions as loosely | coupled and as independent from each other as possible. Then it | becomes more practical to replace them when needed. | allenu wrote: | In a large organization, the other thing you notice with trying | to fix duplicated code is, if you take on refactoring it all, you | are now responsible to make sure everything still works AND that | you do not inhibit any future work. You are now responsible for | more than you may have bargained for. | | Coming up with the right abstraction takes some predicting of | future use-cases. It's more than just refactoring work to put it | all in one place. | goto11 wrote: | I'm skeptical because it is really easy to un-share code by | copying it into multiple places but it is very hard to unify | duplicated code. So I prefer to err on the side of sharing. | | But yes, you should be ready to change sharing into duplication | if you realize the code is just "accidentally similar" and need | to evolve in separate directions. | | In practice I have seen a lot more pain due to duplicate code | compared to the issue of over-abstracting code, because the | latter is much easier to fix. | joeframbach wrote: | On the other hand, it's really difficult to know who is using | that shared code. If you make an innocuous change in a shared | method, it could affect someone else you don't know. | dtech wrote: | Not in any modern language or IDE. Not to mention that would | indicate a hole in the test suite | bcrosby95 wrote: | It's a million times easier than figuring out if those minor | differences in duplicate code are accidental or on purpose. | | As bad as a flag-laden method might be, you know the intent | of all callers. | mcintyre1994 wrote: | I find it much easier to find the call sites for a function | than to find code that's duplicating or a variant of the code | I just fixed a bug in so we can figure out if the same bug is | latent in the duplicates too. | TheCoelacanth wrote: | It's very easy with proper tooling. | BoiledCabbage wrote: | Outside of publishing a public API almost any modern language | and enviroment should make this easy. | kolinko wrote: | Depends on a specific codebase? I found exact opposite to be | true - very hard to reuse code that was abstracted too soon, | and abstracting copy&paste the right way is actually easier if | you have it in multiple cases and can see how it was used. | hackinthebochs wrote: | How is it harder to copy/paste the helper method and modify | as needed, vs tracking down and unifying multiple instances | of the same code written slightly differently? | amelius wrote: | Because the multiple instances are concrete while the | unified code is abstract. | | In general it is more difficult to read abstract code than | concrete code. | | Also code written using the wrong abstaction can get hairy | very quickly (lots of "if" statements for various cases). | fiddlerwoaroof wrote: | In Java, when I hit a bad abstraction, I hit the inline | shortcut (command-alt-n) and then evaluate the resulting | code with git diff. Other languages may be more manual, | but, at worst, you just use ripgrep or similar to find | all the relevant use sites and then manually expand the | abstraction: this is only really a problem it the | function is used hundreds of time: but, in that case, you | can always duplicate the abstraction and rename. | sethammons wrote: | My experience lines up with yours. Working in overly and | poorly abstracted codebases dramatically hurts productivity. | Poorly duplicated code increases the chance for missed | patches, but poor duplication has, in my experience, been | vastly easier to fix. One codebase comes to mind. Twisted | Python. Multiple layers of inheritance, multiple mixins, and | major overloading of methods. Just navigating the code was | pain. | sagichmal wrote: | > it is really easy to un-share code by copying it into | multiple places but it is very hard to unify duplicated code | | Code that already exists has a gravity, a presumption of | correctness. That presumption is very difficult to overcome, | especially for programmers new to the codebase. An abstraction | you think of as temporary will be, to those who come after you, | simply the way things are done; breaking it apart and re- | forming it is, for them, fraught with risk. It's good to keep | this in mind as you make commits. | goto11 wrote: | Then the same would be the case for code duplication which | really ought to be unified. | gorgoiler wrote: | Brilliant insight. Always remember: (1) make it work, (2) make it | right, (3) make it fast. 80% of projects get scrapped in between | (1) and (2) because you end up realizing you wanted something | completely different anyway. | nicoburns wrote: | On my projects code doesn't make it into the main branch until | it gets to at least (2). | willcipriano wrote: | > (1) make it work, (2) make it right, (3) make it fast. | | I've always disagreed with this. In my view you should make it | a habit to write optimized code. This isn't agonizing over | minor implementation details but keeping in mind the time | complexity of whatever you are writing and working towards a | optimal solution from the start. You should know what | abstractions in your language are expensive and avoid them. You | should know roughly the purpose of a database table you create | and add the indexes that make sense even if you don't intend to | use them right away. You should know that thousands of method | lookups in a tight loop will be slow. You should have a feel | for "this is a problem someone else probably solved, is there a | optimal implementation I can find somewhere?". You should know | when you use a value often and cache it to start with. Over | time the gap between writing unoptimized and mostly optimized | code gets smaller and smaller just like practice improves any | skill. | criddell wrote: | > You should know that thousands of method lookups in a tight | loop will be slow. | | That's not always the case. Modern compilers do a lot of | things like inlining and unrolling. These days I mostly try | to write code that is easy to understand. | willcipriano wrote: | > Modern compilers do a lot of things like inlining and | unrolling | | Smart ones do, I've been writing Java lately and that | behavior tends to be unpredictable and rare[0]. I'd use a | inline keyword if I had one, or preprocessor directive of | some kind if I had that but I don't. I agree it's harder to | read but I feel like changing a JVM flag to get a behavior | that I want is more inscrutable than having a long method | with a comment noting that this in inlined for performance | reasons. With modern machines and the price of memory I | tend to lean hard to the memory side of the time-memory | tradeoff. | | [0]"First, it uses counters to keep track of how many times | we invoke the method. When the method is called more than a | specific number of times, it becomes "hot". This threshold | is set to 10,000 by default, but we can configure it via | the JVM flag during Java startup. We definitely don't want | to inline everything since it would be time-consuming and | would produce a huge bytecode." | https://www.baeldung.com/jvm-method-inlining | sagichmal wrote: | > In my view you should make it a habit to write optimized | code. | | It depends on your domain. | | If you're writing for embedded, or games, or other things | where performance is table stakes, then sure. | | If you're writing code to meet (always changing) business | requirements in a team with other people, writing optimized | code first is actively harmful. It inhibits understandability | and maintainability, which are the most important virtues of | this type of programming. And this is true even if | performance is important: optimizations, i.e. any | implementation other than the most obvious and idiomatic, | must always be justified with profiling. | Tainnor wrote: | You're mostly right, but even in typical LOB applications, | there are some low-hanging fruits you should really pay | attention to. One common example are N+1 queries. | | And if you _do_ find yourself writing an algorithm | (something which happens more rarely in LOB applications, | but can still happen occasionally), it 's probably still | good to create algorithms that are of a lower complexity | class, provided they are not that much harder to understand | or don't have other significant drawbacks. I remember that | I once accidentally created an algorithm with a complexity | of O(n!). | ridaj wrote: | Previously discussed here: | https://news.ycombinator.com/item?id=17578714 | arendtio wrote: | I find that first comment particularly insightful. | | However, I am not sure about the order of state and coupling. | To me it seems to depend on the language, as for functional | languages, avoiding state is king and in object oriented | environments, coupling could be a more important factor. | ulisesrmzroche wrote: | "Premature optimization is the root of all evil" | Xlurker wrote: | I'd rather ctrl-f and change code in multiple places than deal | with abstraction hell. | jbmsf wrote: | One of the reasons duplication is used badly is that it is one of | the easiest abstractions to recognize. | | One of the ways I've seen DRY go horribly wrong involves reusable | code units evolving into shared dependencies that often | interdepend in complex ways. Unfortunately, the problems of such | a system are observed much later than the original code | duplication and fewer people have the experience to see it | coming. | kolinko wrote: | I wish this article was available two years ago when I tried to | explain this to a bunch of juniors working for me... | nnutter wrote: | " Posted on January 20, 2016 by Sandi Metz." | kolinko wrote: | Damn, I wish I saw it back then :) | pierrebai wrote: | Counter: Refactoring is far, far, far cheaper than duplication or | wrong abstraction. | | Duplication means you lose the wisdom that was gained when the | abstraction was written. It means that any bug or weird cases | will now only be fixed in one place and stay incorrect for all | the places you duplicated the code. | | About the rule of three: I personally extract functions for | single-use cases all the time. The goal is to make the caller be | as close to pseudo-code as possible. Then if a slightly different | case comes up, I will write the slightly different case as | another function right next to the original one. Otherwise, the | fact that you have multiple similar cases will be lost. | fiddlerwoaroof wrote: | Yeah, the rule of three is misleading: having a name for three | lines of code that do "one thing" is almost always a win and | nothing prevents a future developer from either inlining that | function, if it was a bad idea, or duplicating and modifying | the function. | twirlock wrote: | Yes, everyone on the planet understands why duplication is not | ideal. Literally nobody entered the discussion not knowing | that. You clearly don't grasp what abstraction is fyi. | adrianmonk wrote: | Two questions (genuine, not rhetorical): | | (1) How much of this is because it's _actually hard_ to back out | of the wrong abstraction and pivot to the correct one, and how | much of it is other causes? | | The article hints at this with, "Programmer B feels honor-bound | to retain the existing abstraction." Why do they feel this way, | and is the feeling legitimate? Do they lack the deep | understanding to make the change, or are they not rewarded for | it, or are they unwilling to take ownership, or is it some other | reason? I could see it going either way, but the point is to | understand whether you're really stuck with that abstraction or | not. | | (2) How much of the wrong abstraction is because people lack up | front information to be able to know what the right abstraction | is, and how much of it is because choosing good abstractions (in | general and specifically ones that are resilient in the face of | changing requirements) is a skill that takes | work/time/experience/etc. to develop? | | If it's due to being unable to predict the future, then it makes | sense to avoid abstractions. If it's due to not being as good as | you could be at creating abstractions, then maybe improving your | ability to do so would allow a third option: instead of choosing | between duplication and a bad abstraction, maybe you can choose a | good abstraction. | zbentley wrote: | > Why do they feel this way, and is the feeling legitimate? | | In my experience, it's because the amount of diff (red or | green) in a change request is--consciously or subconsciously-- | correlated with risk. | | Even though we killed SLoC as a productivity metric years ago, | the idea that "change/risk is proportional to diff size" is | still pervasive. | | I'm totally into YAGNI/"code volume is liability" school of | thought. But equating _change_ volume with liability is a | subtly different and very harmful pattern. | | Adding a single conditional inside your typical 1200 line | mixed-concern business-critical horrorshow function may assume | a much greater liability (liability as in bug risk and | liability as in risk/difficulty of future changes) than e.g. | deleting a bunch of unused branches, or doing a function- | extraction refactor pass. Standard "change one thing at a time" | good engineering practices still apply of course. | nfw2 wrote: | 1.) I think political and interpersonal issues can play a role | here. People are often hesitant to suggest other people's code | needs to be rewritten. This is especially true if an | abstraction is heavily-used by the organization. If there are | many stakeholders using the abstraction, the motivation behind | the refactor (ie the perceived defects), would likely need to | be communicated widely to justify the effort the refactor | requires. | Tainnor wrote: | I feel some people here are misunderstanding the blog post. | | Sandi Metz IMHO doesn't claim that the problem occurs at step 2 | or 3. She doesn't claim that it's wrong to introduce abstraction | when there is duplication. | | What she is saying instead is that the problem occurs from step 6 | onwards: when you find yourself wanting to reuse an abstraction | that, regardless of whether it made sense in the first place or | not, has outlived its usefulness. | | I think this is in agreement with other points that she often | makes, about being bold, but methodical about refactorings. | | The whole discussion about "you should never abstract away code | before you see the third duplication" has little to do with the | article, and I'm also really not sure it's good advice. | BoiledCabbage wrote: | > What she is saying instead is that the problem occurs from | step 6 onwards: when you find yourself wanting to reuse an | abstraction that, regardless of whether it made sense in the | first place or not, has outlived its usefulness. | | You're 100% correct in this. And what's even more amazing to me | is that even after you explicitly calling this out, the | majority of people replying to you (and presumably have read | the article) still think the problem is between 2 & 3. | | The argument she is making is not "don't make abstractions | until you're 100% certain they are correct". She is essentially | saying make abstractions where appropriate. Some of these | abstractions will be wrong. When you start seeing yourself | making certain behaviors it's probably because it's the wrong | abstraction, so back it out and refactor. | | Ultimately that abstraction seemed right based on the info | known at the time it was created, now that you know more don't | try to cling to it because it was already made. Be ok with | backing it out and refactoring. | qznc wrote: | If you see an abstraction does not fit, you have the choice | to consider it incomplete or unsuitable. If incomplete, you | can fix it (assuming write access). If unsuitable, you should | "back it out" as you say. | | In my opinion this distinction is applicable and thus useful | in contrast to whining about leaky abstractions: | http://beza1e1.tuxen.de/leaky_abstractions.html | kwhitefoot wrote: | > Sandi Metz IMHO doesn't claim that the problem occurs at step | 2 or 3. | | But the headline does. | | I had to read quite a long way down the page to discover that | all she is advocating is what i have always done: deduplicate | when practical, undo the duplication when new requirements make | it incorrect and push the unique parts into the callers. | DougBTX wrote: | > But the headline does. | | That's not really fair, it repeatedly says "wrong | abstraction", in the title and in the article. At steps 2 and | 3 it is still the right abstraction, duplication only becomes | better when it is the wrong abstraction. | tarsinge wrote: | That's not what I get from the article. The problem does indeed | occurs at step 2 and 3: leave duplication alone and don't | introduce abstraction if you are not sure about future | requirements. | Chris_Newton wrote: | Taken to its logical conclusion, doesn't that argument mean | we would almost never introduce any abstractions at all? That | doesn't seem very practical compared to the alternative of | introducing abstractions if they are useful at the time but | remaining willing to change or remove them again later if the | situation changes. | barrkel wrote: | I think you generally shouldn't create an abstraction until you | have at least three uses for it. | | That's very generally. You might want to create abstractions | before then, but be prepared that they will be wrong, and don't | invest in e.g. lots of unit tests, because when you break the | abstraction you'll throw away that work. Some unit tests yes, | but more in semi-integration tests that verify the stack | sandwiching the under-proven abstraction. | pkulak wrote: | Not to take this on a huge tangent, but I really _do_ think | it's good advice. Unrolling complicated abstractions is a lot | of work. Keeping two pieces of nearly identical code in sync is | work too, but I've never found it all that onerous. But there's | obviously a continuum; on one side it's obvious that it's a | shared concept, and on the other, code just happens to be | similar almost by chance, and not for much longer. But lately | duplication has been turned into a code smell to be linted out, | causing a lot of people to get rid of all of it, at all cost. | Tainnor wrote: | I think there are a couple of things at play here. | | One is the use of code quality tools like CodeClimate. It's | true that those can sometimes be extremely aggressive when it | comes to duplicate code to the point that I find their | complaints to be uselessely beside the point. This is | especially true if you have typical "structural" duplication | like "many controllers start with the same sequence of steps" | etc. or is even worse when you have to use configuration DSLs | etc. | | OTOH, it has been my personal experience that many people, if | they use CodeClimate etc., routinely just ignore them for the | most part, so I'm not always sure what the point of them is. | But maybe other people have different experiences and some | people really are routinely overabstracting the most | coincidental of duplication in which case I agree that that | is not a very useful thing to do. | | As for the advice itself: it is definitely problematic if it | is used as some sort of hard "rule". If it is taken as a | heuristic / "rule of thumb", then it might be ok as long as | you make sure people don't overemphasise it where | other/better rules of thumb would be appropriate. | | For example, if I were to write some billing code and | somebody else just duplicated that code somewhere instead of | using a shared abstraction, I would probably find that to be | a serious code health issue as you really shouldn't perform | billing calculations in two separate places: this is | something that needs to be kept in sync across the code base; | one sibling (nephew?) comment is right in pointing out that | here you have to consider the cost of things that need to | stay in sync accidentally going out of sync. | | There are many more examples which is why I think that if you | use "refactor on 3" as _one_ heuristic, it's fine, but if | it's the sole one, then less so. | lilyball wrote: | If you only have 2 instances but are spending effort keeping | them both in sync for changes then that might be a good time | to abstract anyway. The fact that you're keeping them in sync | means they aren't just coincidentally the same. But this is | very situational. | guenthert wrote: | The cost of maintaining two (or three ...) copies might | well be less than creating an abstraction. The danger | rather lies in situations where not _everyone potentially | modifying that code (now or later)_ is aware that there are | N copies (and where they are) which need to be maintained. | dynamite-ready wrote: | That's a big part of it for me. The abstraction would end | up being the best way to document the duplication, imo. | Far better than the likes of /* See also... */. | | It depends of course, but I personally feel the work of | 'simplifying an abstraction', is easier than the problem | of 'tracking down anything that might need to be edited'. | watwut wrote: | Then such person needs better IDE, because finding | callers is one shortcut away. | tuatoru wrote: | That's an amazingly mechanical view of code. | | If two blocks of code refer to exactly the same thing | (event, process, object, rule) _in the application domain_ | , then the duplication can be eliminated. | | You can't eliminate duplication without asking, "what does | this code _mean_ in this context? ". | qes wrote: | > you should never abstract away code before you see the third | duplication" has little to do with the article, and I'm also | really not sure it's good advice | | Absolutes like that are rarely good advice. | Tainnor wrote: | Sure, but some comments on here are literally saying that. | Not as a rule of thumb (although such comments can be found | here too, which is ok), but as a "as an engineer I always | enforce this rule" thing. | lucbocahut wrote: | Abstractions have other purpose than deduplication. They make | it easier to reason about your code as well. It might be the | smartest thing sometimes to abstract away the first | occurrence in a method imho. | majormajor wrote: | I'm not sure it matters about applying "prefer duplication over | the wrong abstraction" at step 3 or step 6 nearly as much as | applying that advice _at some point_. | | I often consider "is this abstraction going to be prone to | misuse" (regardless of if it's the second, third, fourth... | copy) and try to head it off with either strict typing or | comments or internal visibility - to try to do step 3 without | opening up as big of a door to step 6, but the important thing | is less when to do stuff like this but just _to try to avoid | things reaching step 7_. | foobiekr wrote: | I think it's fair to say that abstractions should have to prove | themselves as a necessity and that we make things abstract way | too early. Most really good abstractions in an app fall out of | well-written code to solve a specific problem. | | In day to day life as an engineer, I find that we have very few | _enduring_ abstractions - there are very deep ones, like the | concept of streams, things like filesystems and related ideas, | the concept of a virtual machine in the process sense, and so | on - and a lot of faddish abstractions that have a pretty wide | blast radius when they start to go wrong. A lot of the good | ones (networking has a _lot_ of these, such as the abstractions | above and below the model of an interface in professionally- | written network device code) are focused on layering. | Tainnor wrote: | I disagree, and this is one of the things where it's really | hard to get to a shared understanding because I don't know | what kind of problems you've worked on and what kind of code | you've seen and vice versa. | | But in my daily work, I routinely see abstractions just come | up very naturally all the time. Sometimes they turn out to be | slightly wrong, but often also not. I need to perform some | calculation (e.g. for billing)? That can be abstracted away. | I need to parse some unstructured user request into something | structured? That's an abstraction. And so on. A lot of these | things are clear to me even before I start writing code. | | I also tend to use (at least some amount of) DDD, to write | small, composable functions with few side effects and to be | as declarative as possible. All of this might help with | coming up with lasting abstractions. | | But I'm not denying that I'm running into lots of situations | in my daily work where it turns out some abstraction was | wrong. Just that I find that many more of them actually turn | out to be correct or at least correct for the most part (it | might be that something needs to receive an additional | parameter or to returns some slightly different structure to | account for error conditions or so, but it's still basically | the same abstraction). | barrkel wrote: | I sometimes review PRs where people are encouraged (by | other reviewers) to create new methods because there's a | _single line of code_ duplicated between two other methods. | I don 't think that rate of abstraction construction - and | every method is another abstraction - is helpful for the | health of the code, or its readability. | Tainnor wrote: | I can't judge that without more context. If this is just | accidental duplication, it's pointless to abstract it | away. But if it's a line of code that is necessary to | deal with some gotcha of a particular library etc., it's | probably good to extract and add an explanatory comment. | jiofih wrote: | If that line of code deals with a particular piece of | business logic, that should be consistent within the | application, it's a good thing. | Chris_Newton wrote: | Does it make sense to characterise abstractions as "right" | and "wrong" in the first place? This feels too absolute to | me. | | Abstraction is just hiding some complexity in | implementation details behind a simpler interface. It | offers benefits from reducing the need to deal with the | full complexity everywhere else. It also has costs. The | interface establishes a new concept, albeit a simpler one, | that must also be understood and maintained wherever client | code uses the abstraction. Moreover, if you need to | understand or modify the detailed implementation later, | there is now a barrier to doing so. | | When we define an abstraction, hopefully we do so because | the benefits outweigh the costs at that time. The simpler | the interface relative to the complexity of the | implementation it hides, the more likely this is to be | true. However, that balance is inevitably subject to change | as a program evolves and the relevance of the hidden | details to different parts of the system changes. | | So it feels like abstractions might be better characterised | by whether they represent good value under the current | circumstances. It is perfectly reasonable for an | abstraction to be cost-effective at the time it is added, | but to become more or less so as the context evolves. If it | reaches a point where it is no longer cost-effective, it | should be removed. Either the relevant parts of its | implementation can then be inlined at each place that | previously used it, or some new abstraction(s) can be | defined that better reflect the relevance of different | implementation details at that time. | betenoire wrote: | Why are your two examples in the second paragraph | necessarily "abstractions"? | Tainnor wrote: | Because "calculate_vat" is more abstract than the exact | sequence of calculations performed? | dpc_pw wrote: | We humans just can't help ourselves, but to invent mental | shortcuts. Making a judgment "is this really a good abstraction | or am I just mindlessly deduplicating code" is context- | dependent, nuanced and requires some mental effort - much more | work than "do I have it repeating 2 or 3 times already" which | is mindless and mechanical. | hackinthebochs wrote: | I couldn't disagree more. There is no such thing as abstracting | too early (this does not go for structural abstractions like | factories, singletons, etc). The best code is code you don't have | to read because of strong, well-named functional boundaries. | brandonmenc wrote: | Junior programmers duplicate everything. | | Intermediate programmers try to abstract away absolutely every | line that occurs more than once. | | Expert programmers know when to abstract and when to just let it | be and duplicate. | leafboi wrote: | The master never duplicates and all his abstractions are | intuitive, readable and flexible. | kristo wrote: | There should be a code tool to re-inline code from an abstraction | kevsim wrote: | Relevant post from earlier today | https://news.ycombinator.com/item?id=23735991 | vxNsr wrote: | I want to thank everyone here, I've been stuck for about a week | now on an issue that is entirely germane to this topic and the | whole conversation here really helped me flesh out what was wrong | and allowed me to understand a path forward. I'm honestly holding | myself back from popping onto my computer right now to start | working on it. | leto_ii wrote: | As I gain more and more experience (I would now call myself more | or less a mid-level developer), I find that the distinction that | matters is not abstraction vs duplication, but the one between | developer mindsets. | | I have many times met/worked with people who think the main task | of the developer is to 'get shit done'. Regardless of their level | of experience, these developers will churn out code and close | tickets quite fast, with very little regard for abstraction, | design, code reuse etc. | | Conversely, the approach that I feel more and more is the correct | one is to treat development as primarily a mental task. Something | that you first think about for a while and try to design a | little. The actual typing will in this case be a secondary | activity. Of course, this doesn't mean you shouldn't iterate on | your design if during execution problems come up. Just that the | 'thinking' part should come before the 'doing'. | | My feeling is that with this second approach the | abstraction/duplication trade-off will not matter so much | anymore. With enough experience you will figure out what you can | duplicate and what you can design. And when you design you will | develop an understanding of how far you should go. | | Approaching development as a task of simple execution I think | inevitably leads to illegible spaghetti down the line. | Tainnor wrote: | I agree that many issues with bad code could really be avoided | by first thinking about the solution a bit, of which the code | is just an expression. | | I'm not advocating weeks of architecture astronauting without | code feedback - because practical considerations (e.g. the | compiler can't deal with this kind of code due to some | limitations) matter - but some people seem overeager to just | start writing some code "and see what happens". | amelius wrote: | A manager once asked me: please reuse as much code as you | possibly can. | | This reminded me of that. | klyrs wrote: | I use DRY in two ways. The first is that I'm happy to make 2 or 3 | copies of a snippet before promoting that to a new function. | | The second is when I find a bug in a duplicated snippet. I'll | mend the snippet and its duplicates, once or twice before | promoting it to a function. | | In the rarer (in my line of work) instance that a common snippet | gets used with several intrusive variations, I usually document | the pattern. It's tempting to use templates, lambda functions, | closures, coroutines, etc but far simpler to duplicate the code. | But again, if a bug (or refactor) crops up and I need to fix it | in many places, then I'll spend some time thinking about | abstraction and weigh the options with the benefit of hindsight. | worik wrote: | Really this is stating the obvious. | | The social problem at step 6, 7, and 8 is a social and economic | one. Having the time, resources, and skill to do a job properly | is very important. But there are social and economic pressures to | "just get it done". | | This is a specific formulation of a general problem. | jpswade wrote: | You can't plan for what you don't know. | | This is why I like the "Rule of three"[1]. Only once you've done | it three times will you truly begin to understand what the | abstraction might need to look like. | | 1. https://wade.be/2019/12/10/rule-of-three.html | nbardy wrote: | The rule of three helped me get get over my initial abstraction | issues, but I leaned much more towards a rule of 5 or 6. Around | three you finally find an abstraction, but around six uses | there is a good chance it breaks down. Making an abstraction | saves you from having to make the same change to the code you | copied multiple times. But the cost of repeating yourself is so | low. With good keyboard mechanic repeating a change in four to | five place take just a bit longer than doing it once since most | of the upfront cost is in deciding on the correct change. It | does feel a bit like drudgery, but it's also very freeing to | not think about abstractions and just make progress at all | costs. It's strategy can bite you if you don't take the time to | look back and make a refactor later, but I find the approach of | churn out code and letting the patterns emerge then | restructuring with hindsight much more fruitful than pausing | frequently to think about it abstractions. They are really two | different mindsets and best left for different sessions of | work. | geophile wrote: | Exactly. With experience, you learn not to abstract too soon. | hesdeadjim wrote: | Seems so counterintuitive, but it's one of the most important | lessons I've learned in 15 years of development experience. | [deleted] | ed312 wrote: | Any advice on teaching this to junior engineers? Seems like | folks with 3-5 years of experience keep trying to not only | over-abstract but also keep re-inventing the wheel with | abstractions (vs looking for existing libraries). | ben509 wrote: | I think there are two parts to it. First, you want to push | them to get into the habit of solving problems by expressing | the question clearly enough that the answer falls naturally | from it. That's so fundamental that every aspect of | engineering benefits from it, but it's particularly important | as a first step in writing code. | | The second part is building the intuition for the | abstractions themselves. That's tricky as they have to teach | themselves. They need to build coherency in their internal | mental langauge of abstraction, and the only way to do that | is to work directly on real code, and work through the | consequences of doing it one way vs. another. | | That means you have to let them commit code you don't like. | By all means, explain what your concerns are, but then let | them see how it evolves and as it becomes more untenable, | that's when you go back to rethinking it and trying to state | the problem clearly. | | Likewise, when they do it well, you can highlight that, | especially drawing attention to changes to their code that | worked nicely. | cjhanks wrote: | Teach them about cyclomatic complexity and then review their | work in these terms. It gives them something concrete to | target rather than trying to accomplish some ethereal notion | of "proper abstraction". | [deleted] | tarsinge wrote: | Bring the idea that abstraction has a cost, like technical | debt. It's not something to be proud of, on the contrary, it | must be justified and serve a true purpose and not be only an | intellectual satisfaction. | ozim wrote: | My favorite example of really bad abstraction is add/edit | crammed into single popup/model. You know edit is basically a | copy paste of add so "ding ding ding here goes DRY!" in a | junior mind. But quickly enough it shows up that some | properties can be set in add, whereas in edit they have to be | read only. Quite often you get also other business rules that | can be applied only on edit or make sense only when adding | new entity. But when you create first version they look a lot | like the same code that should be reused. | | For me this is really good example of how similar looking | code is not the same because it has different use case. | dgb23 wrote: | I just had a case of this last week in a web-app I'm | writing. | | In the frontend code I decided to use an abstraction and | parametrization in the backend code I kept the logic | separated. | | It really depends on context. Specifically on the layer you | are operating on. | gridlockd wrote: | > But quickly enough it shows up that some properties can | be set in add, whereas in edit they have to be read only. | | So? Just put in some conditionals. | | What is the alternative? Duplicate most of the code with | minor, non-explicit differences? What's the benefit? You | just _moved_ complexity around, you didn 't get rid of it. | | The drawback is that now anything you have to add, you have | to add _and_ maintain it in two places. And since your | "add" and "edit" are probably 90% the same, it's going to | happen 90% of the time. It's very annoying during | development and you're likely to fuck it up at some point. | bonestormii_ wrote: | This is a good example of how this overall topic gets | reduced to "How much abstraction?" instead of "In what | ways should something be abstracted?" | | Obviously an Add/Edit field are operating on the same | record in a hypothetical database, so it makes little | sense to duplicate the model. | | On the other hand, if the conditionals within the | abstracted version become too complex or keep referencing | some notion of a mode of operation (like, ` if type(self) | == EditType && last_name != null` lines of thinking), | that is sometimes another type of smell. | | But say you make some kind of abstract base class that | validates all fields in memory before committing to the | database, and then place all of your checking logic in a | validate() method. That sounds like pretty clean | abstractions to me. | | And moreover, this is probably provided by an ORM system | and documented by that system anyway--so that's a | publicly documented and likely very common abstraction | that you see even between different ORMs. That, I think, | is the very best kind of abstraction, at least assuming | you are already working in such an environment as a high- | level language and ORM. Making raw SQL queries from C | programs still contain their own levels of abstractions | of course without buying whole sale into the many-layered | abstraction that is a web framework or something. | | This question becomes more important when you aren't just | updating a database though. If you're writing some novel | method with a very detailed algorithm, over abstraction | through OOP can really obscure the algorithm. In such a | case, I try to identify logical tangents within the | algorithm, and prune/abstract them away into some | property or function call, but retain a single function | for the main algorithm itself. | | The main algorithm gets its definition moved to the base | class, and the logical tangents get some kind of | stub/virtual method thingy in the base class so that they | have to be defined by subclasses. The more nested | tangents are frequently where detailed differences | between use cases emerge, which makes logical sense. It's | not just that it's abstract, but the logic is | categorically separated. | | It's a very general pattern supported by many languages, | so you see it all over the place. That organization and | consistency in itself helps you to understand new code. | In that way, it also becomes a kind of "idiom" which in a | sense is one more layer of abstraction, helping you to | manage complexity. | | As a counter of that, you see code where `a + x * y - b` | becomes self.minus(self.xy_add(a), b). More abstract, but | not more logical; not categorically separating; not | conforming to common idioms; obscuring the algorithm; and | so on... | | And then there is performance! Let's not talk about the | performance of runtime abstractions. | vxNsr wrote: | I mean, aren't we just bikeshedding inheritance at this | point? | abraae wrote: | Each to his own. If I found that a junior had created two | separate popups, one for add and one for edit, I'd want to | look into the code with them to understand if that was a | good choice, because usually it wouldn't be for anything | with more that one or two properties. | leafboi wrote: | It's largely because they're dealing with an area with no | theoretical tools. Any time you hit an area that are full of | people "Designing" solutions/abstractions rather then | "Calculating" an optimal solution/abstraction you know you've | hit an area where there's very little theoretical knowledge | and most people are just sort of wandering chaotically in | circles trying to find an "optimal" solution/abstraction | without even a formal definition of what "optimal" is.... I | mean what is the exact definition of the "perfect | abstraction"? What is bad about duplication what is a bad | over abstraction and what is this "cheaper" cost that the | title is talking about? It's all a bunch of words with fuzzy | meanings injected with peoples biased opinions. | | That being said theories on abstractions do exist. If you | learn it you'll be at the top of your game; but it's really | really hard to master. If you do master it, you'll be part of | a select group of unrecognized elites in a world of | programmers that largely turn to "design" while eschewing | theory. | | Here are two resources to get you started: | | The Algebra of Programming: | https://themattchan.com/docs/algprog.pdf | | Program Design by Calculation: | http://www4.di.uminho.pt/~jno/ps/pdbc.pdf | | You will note that both of these resources talk about | functional programming at its core which should indicate to | you that the path to the most optimal abstraction lies with | the functional style. | goto11 wrote: | I dislike any programming rule which includes a number. | | The issue is whether sections of similar code implement the | same idea or just happen to be accidentally similar. The number | of instances does not really matter. If you have 100 lines of | code which are almost the same two places in the program, then | you should unify sooner rather than later, before they are | allowed to diverge. | jpswade wrote: | Rules are great because they can be broken, if you know when | to do so. | seanalltogether wrote: | This quote from John Carmack speaks very succinctly to the | problems that many abstractions in a code base can cause, and | it's a constant reminder for me when building out business logic. | | > "A large fraction of the flaws in software development are due | to programmers not fully understanding all the possible states | their code may execute in." | | https://www.gamasutra.com/view/news/169296/Indepth_Functiona... | hackinthebochs wrote: | But abstractions reduce possible state and allows you to | specify that state in obvious ways, e.g. on function | parameters. Do not underestimate the power of functional | boundaries. | ben509 wrote: | They also tend to impose a degree of discipline. I've often | found myself wanting to shove a parameter in somewhere and | realized I didn't _need_ the damned thing. | hesdeadjim wrote: | This is one reason I love working in the Unity ECS framework. | Your data is public and state can't hide. Your systems are | still free to contain a plethora of bugs, but they are easier | to track down due to the functional nature of a system. | | In the regular Unity OOP land, developers inevitably sprinkle | state everywhere. Coroutines are by far one of the worst | offenders. Good luck seeing the current executing state of your | game when it's hidden in local variables inside a persistent | function body... | jonnycat wrote: | Reading that article and the context of the quote, it appears | that Carmack is using that statement to extol the benefits of | functional programming styles, not commenting on abstraction. | seanalltogether wrote: | To me the quote speaks to the general problem of juggling | state in your head when writing code. If an abstraction is an | attempt to funnel a bunch of code through common logic, it | can be hard to understand know what the state of your app | will look like when someone else modifies that common logic. | thinkloop wrote: | A related problem: duplication is not equality. If two things | happen to be the same right now, it doesn't mean they are | intrinsically the same thing. If you have multiple products | selling for $59.99 they shouldn't share a function to generate | the "duplicate" price. Abstractions needs to be driven by | conceptual equivalence, not value equivalence, where duplication | is a good hint for a potential candidate of abstraction, but not | the complete answer alone. | scrozart wrote: | DRY gets abused regularly in my experience. It doesn't stop at | method/class abstractions either; I've seen entire microservices | & plugins developed to ensure each app doesn't have that one | chunk of auth code, for instance, even though they each may have | subtly different requirements (those extra params again). The | logical end to this sort of thing is infinitely flexible/generic | multipurpose code, when the solution is really, probably | increased specificity. DRY is probably the lowest-hanging fruit | for practices/patterns, and I think this leads to a | disproportionate focus on it. | hesdeadjim wrote: | It's also easy compared to solving new problems, so it can be | an emotionally safe way of feeling productive. Failure is | difficult to measure until the abstraction falls flat on its | face months later, at which point it can be chalked up to the | demons of "changing requirements". | zbentley wrote: | That is a very, very important point; well put. | | The "of course it sucks: changing requirements!" boogeyman | means one of two things: "the code was written to do the | wrong thing because requirements changed/weren't | communicated" or "the code was _hard to change_ when it | needed to do a new thing ". | | Figuring out which of those two is in play is very important. | [deleted] | pps43 wrote: | Related to http://yosefk.com/blog/redundancy-vs-dependencies- | which-is-w... | [deleted] | memexy wrote: | > Re-introduce duplication by inlining the abstracted code back | into every caller. | | Ideally this type of workflow would be supported by the code | editor. I've done this manually a few times and it's not fun. | avodonosov wrote: | > they alter the code to take a parameter, and then add logic to | conditionally do the right thing based on the value of that | parameter | | But that's a textbook example of bad code, competent coders don't | do this. | | Update: for example see Thinking Forth chapter "Factoring | Techniques", around the tip "Don't pass control flags downward.". | Page 174 in the onscreen PDF downloadable from sourceforge. | | And there is no need for duplication. The bigger function can be | split into several parts so that instead of one call with flag | everyone calls needed set of smaller functions. | dragonwriter wrote: | Competent coders do suboptimal things all the time, especially | when there is delivery pressure; competent doesn't mean | infallible or perfect. | | There's also not a clear boundary between what is a single | appropriate abstraction and two (or _N_ ) distinct but | superficially related concepts. | zbentley wrote: | > that's a textbook example of bad code, competent coders don't | do this. | | That's reductive and dismissive. | | There's a ton of subtlety in even defining the terms for that | "best practice". What counts as a control flag versus a | necessary choice that must be made by callers? Are you still | passing control flags if you combine them into a settings | object? What if you use a builder pattern to configure flags | before invoking the business logic--is that better/worse/the | same? What if you capture settings inside a closure and pass | that around as a callback? How far "downward" is too far? How | far is not far enough (e.g. all callers are inlining every | decision point)? | | The answer to all of those is, of course, "it depends on a lot | of things". | | And that's before you even get into the reality (which a | sibling comment pointed out) that even if we grant that this is | inherently bad code, that doesn't imply anything about the | competence of the coder--some folks aren't put in positions | where they can do a good job. | | Unrelated aside: Thinking Forth is an excellent book! Easy to | jump into/out of in a "bite size" way, applicable to all sorts | of programming, not just Forth programming. | nbardy wrote: | This has been one of the hardest fought lessons I've learned it | my programming career, but also one of the most fruitful. I am to | make my abstractions too late rather than too early. My rule of | thumb tends to me copy things six to seven times before you try | to build an abstraction for it. | SkyPuncher wrote: | I think there's a big cultural challenge with adopting | duplication. It goes against most people's career growth | objectives. | | Being able to effectively create clean, re-usable abstractions is | a measure of being a "senior" engineer at many places. In other | words, to be viewed as senior, you need to be able to effectively | write abstraction frequently. It's hard to measure an abstraction | in the moment, so a lot of people assume that the senior simply | knows better. | | I find this extends to a lot of programming. Seniors will often | use unnecessary tricks or paradigms simply because they can. It | can make it extremely difficult for junior developers to grok | code. Often this re-enforces seniority. "If only the seniors can | work on a section of code, then they are senior". Likewise, there | are so many books on crazy architectures and patterns. It's | really neat to understand, but I've determined those books are | pretty much self-serving. | | ---- | | I've found that my work is often far more limited by the | domain/business logic than any sort of programming logic. I'll | happily write code that looks really basic - because I know | ANYBODY can come in and work with that code. If I write code that | a junior needs to ask me questions like "what is this pattern?" | or "what does this mean?", I've written bad code. | | ----- | | With all that being said, every single job interview I've ever | had expects me to write code at the level of complexity that my | title will be at. They'd much rather see me build some sort of | abstract/brittle concept than using some constants and switch | statements. The prior looks cool, the latter looks normal. | leto_ii wrote: | > I think there's a big cultural challenge with adopting | duplication. It goes against most people's career growth | objectives. | | My experience is the complete opposite :D. What I've noticed is | that the people who 'deliver' quickly (without much regard for | what might be called code quality) and fulfill business | requirements without much questioning are perceived as more | valuable. | | > I've found that my work is often far more limited by the | domain/business logic than any sort of programming logic. | | I broadly agree with this statement. However, just like a good | carpenter knows how to properly build a bookcase, a table, a | roof etc. a good developer should understand the programming | logic and know how to apply it. Business requirements need to | be fulfilled, but it's up to us to decide how to do that. More | so, I think it's up to us to push back when we feel business | requirements don't make sense from a technical point of view, | or even from a business point of view. | Tainnor wrote: | I find statements such as this to be profoundly anti- | intellectual. It suggests that we can't become better at what | we do and need to be stuck at the level of "a beginner can | understand that". | | Now, I agree that simplicity is a virtue and that some people | go overboard with crazy stuff just because they find it cool. | But, as Rich Hickey says, there is a difference between simple | and easy. If a junior dev doesn't understand "map", then we | should explain them what "map" is, instead of going to back to | writing everything with for loops. | kureikain wrote: | I think one of the cool thing about pattern matching or | language(In my case, it's Elixir) that support function operator | is we can have same method with different argument sigunatures. | So we don't have to duplicate or inherit whatever and still share | some common method. | hota_mazi wrote: | > prefer duplication over the wrong abstraction | | Such a strange advice. | | If you're able to recognize the wrong abstraction right away, | surely you would not use it, right? | allenu wrote: | I think the intent was to communicate that abstractions aren't | always right. | | Some people might think that because there's duplicate code and | that the abstracted code maps to the duplicated code 1 to 1 and | leads to fewer lines in total, it's a good abstraction, not | realizing that there are costs to doing this that may not be | aware of. | mrkeen wrote: | The reason is that you won't know it's the wrong abstraction | until it's time to modify it or add new features. | preommr wrote: | I strongly dislike this article because the title is much broader | than most of the substance of the article. | | Advising not to overextend an abstraction is inarguable. | | The actual title "Duplication is far cheaper than the wrong | abstraction", and the thing that people will really discuss, is a | loaded statement that's going to need a lot of caveats. | [deleted] | gm wrote: | This advice just _feels_ very wrong. After thinking about it and | seeing the other comments, some remarks: | | 1) It's fine to go back and duplicate code after you correct the | abstraction. But it should be the _first_ phase in doing a larger | pass to refactor code to fit the current business requirements. | If you forgo the _second_ step, which should be to search for | suitable abstractions again, you are absolutely guaranteed to be | left with shit code that breaks in this situation, but not that | other one, and no one knows why. I would absolutely only | duplicate code as the prequel to deduplicating it again with | updated abstractions. | | 2) If you do any of this without thorough unit tests you're | insane. Keep the wrongly-abstracted code unless you have time to | thoroughly fix the mess you will have made when you duplicate | code again and introduce bugs (you're human, after all). | | 2a) If you are going to do this and there are no unit tests, | create those unit tests before you touch the code initially | (before the duplication). | | 3) Some of the comments saying you should wait until you | implement something two or three times before creating an | abstraction seem like comp sci 101 rules of thumb. It's way too | simplistic a rule, way too general. Prematurely abstracted | (haha!). The type of project and the type of company/industry | will tell you what the right tradeoff is. | | That is all. | haolez wrote: | You are assuming that the code is a moving target. Not every | software project behaves that way. Sometimes, the software gets | done as is. | crazygringo wrote: | Another tip is: if you're duplicating, and they're not lines of | code that are visually obviously next to each other, then leave a | comment next to both instances mentioning the existence of the | other. | | There's nothing inherently wrong with duplication, except that if | you change or fix a bug in one, you need to not forget about the | other. Creating a single function solves this... but at the | potential cost of creating the wrong abstraction. | | When you're at only 1 or 2 extra instances of the code, just | maintaining a "pointer" to the other case(s) with a comment | serves the same purpose. | | (Of course, this requires discipline to always include the | comments, and to always follow them when making a change.) | stormdennis wrote: | Would the risk forgetting to update the comments not be a | reason for creating a wrapper method that handled calls to both | and contained the relevant advice? | zarathustreal wrote: | I've seen this "hot take" a few times before and even see | developers that I would have considered very good agree with it. | Consider that all code is computation, this is the point of a | computer: to compute. Consider that abstraction doesn't seem | valuable -to you- for a multitude of reasons. Perhaps you're | using a flawed paradigm that emphasizes objects over computation. | This would obviously mean abstraction -increases- the difficulty | of reasoning about your code. Perhaps you don't have a mental map | of appropriate abstractions due to a lack of education or | knowledge gap, this could lead you down the path of creating | abstractions which reduce duplicate characters or lines of text | but are not logically sound ("leaky abstractions.") All of these | things come together in a modern "enterprise" software | environment in just the right way such that abstraction starts to | seem like a bad idea. Do not fall into this line of thinking. | Study functional programming. Study algebraic structures. | Eventually the computer science will start to make sense. | [deleted] ___________________________________________________________________ (page generated 2020-07-05 23:00 UTC)