[HN Gopher] Why do so many developers get DRY wrong? ___________________________________________________________________ Why do so many developers get DRY wrong? Author : jerodsanto Score : 33 points Date : 2020-02-14 19:57 UTC (3 hours ago) (HTM) web link (changelog.com) (TXT) w3m dump (changelog.com) | crispinb wrote: | The comments here suggest two things: (1) most people | misunderstand DRY (ie. they think it's about code rather than | knowledge duplication), and (2)the article didn't do a great job | of clearing the issue up. | | Though an alternative to (1) is that the meaning of DRY in common | dev parlance has changed & has come to mean something different | from Thomas & Hunt's intention. | beaker52 wrote: | I find DRY fascinating. | | It's the source of a large portion of the accidental complexity I | find in code. "If I just create this abstraction, all this | duplicated code goes away" - we've all heard it and many of us | have told it, but few of us realise that it's the prequel to the | most popular story of all: "all this code is such a mess, there | are all these extra layers that don't really make sense and | unpicking it is such a pain, I can't believe someone wrote this". | | The story inbetween is about a young, inexperienced developer who | has 3-days to deliver the one-feature-to-rule-them-all, to | appease the almighty project manager, necessitating an adventure | into the labyrinth carefully crafted by the developer in the | first story. | Frost1x wrote: | A lot of principles like DRY (as described out of correct | context in this article) have cult like followings people | follow mindlessly leading to unnecessary introduced complexity. | | I'm always amazed at how eager people are to over-engineer a | solution that makes it a mess to deal with moving forward. | Developers at large like to appear clever, tend to have | (fragile) large egos, and don't seem to want to veer from | established dogma--much of which based on little evidence or | evidence that doesnt apply to a case they're dealing with. | dec0dedab0de wrote: | _It 's the source of a large portion of the accidental | complexity I find in code. "If I just create this abstraction, | all this duplicated code goes away" _ | | For me it's usually, "Oh crap that thing I changed I had to | change here and here too, whoops its good now... Wait no I also | had to change it here... and here... now that we're done with | that we should be fine... DAMMIT!" | rojobuffalo wrote: | Having a couple lines that are similar or copied in several | places shouldn't be considered such a bad thing. Repetition | reveals similarity, and having clear signals of similarity is | really important. It's often more expressive / easier to | understand than a single method name. | | Premature abstractions are way worse than repetition. A poor or | insufficient abstraction leads to obfuscation which leads to | misunderstanding which leads to novel constructs for the same | responsibility. Because a poor abstraction can be really really | difficult to back track, you end up with hacky work-arounds to | get something done. | | I think encountering novelty in a codebase is the biggest thing | that damages comprehension; and repetition actually enhances | comprehensibility. | jfengel wrote: | I hadn't heard of the Rule of Three, but it parallels my own | heuristic. The first time, I write the code to do the thing I | need. The second time I encounter a similar thing, if I can't | find the right abstraction to unify them, I go ahead and repeat | myself, writing a second, similar round of code that does what it | needs. | | If I encounter it a third time, then I've got enough data points | to make a good guess about what the right abstraction will be. If | I've done a good job so far, it shouldn't be too difficult to | refactor it. (Strong, static typing helps.) | | This is, of course, just a heuristic, and it's not all-or- | nothing. I'll take my best guess about what the right abstraction | is going to be, and I'll try to get it right the first time. The | second round also presents opportunities to take two points and | extrapolate a line. | | It all comes down to experience: not just with the system, but | with the domain that the system is about, and with the way | systems change and grow. No one rule of thumb ever encapsulates | all that. | AnimalMuppet wrote: | I use the same approach to automate processes. The first time, | I do it manually. The second time, I still do it manually, but | I think "Hey, I did this once before. This is looking like | something I maybe ought to automate." | | The third time I automate it. By then, I understand it well | enough to have good odds on being able to do the automation | successfully. | amelius wrote: | How often did you automate something yet? | | If it's more than three times, you ought to automate the | automation! | keithnoizu wrote: | skynet.gif | saber6 wrote: | DRY like anything can be properly used or misused. For example, | you can normalize a database so much that any basic query comes | with a massive overhead (recursion). There is a middle ground | between "religion" (pure DRY) and "chaos" (no DRY). | layer8 wrote: | What they meant by DRY is otherwise known as SPOT -- Single Point | Of Truth -- which is harder to misinterpret. The same "truth" -- | which can be data, values, behavior, policy, etc. -- should not | be defined multiple times in separate places, because a future | change would have to be applied to all the places, or else cause | different parts of a program or datastore to have inconsistent | views on what the "truth" is. | | If you google for it, you will find the synonymous "Single Source | Of Truth", which however makes for a worse acronym. | lr4444lr wrote: | I feel like this article is being critical about something | without justly staking a clear claim about what the right | approach is. In my experience, the benefit of DRY code is bug | reduction and overall increased new development velocity. There | is a whole class of bugs around similar behaviors that devs and | product managers _expect_ to move in sync which _don 't_, because | features develop over time and it was just easier to code | separate small bits than refactor into a common code path. Yes, | it can make readability harder to unify into abstractions and | create the right configs or import steps. But the time hunting | down and fixing the bugs, plus the drag on overall feature | development due to having to write updates in multiple places and | test for them is far worse to deal with for _not_ taking that | preventative measure. | ubu7737 wrote: | DRY is just an admonition for beginners. Nobody who learns | higher-level abstractions in a modern language needs to be | reminded of DRY. | AdriaanvRossum wrote: | Like DRY is wrong? I don't really get the point of this article. | aphextron wrote: | >Like DRY is wrong? I don't really get the point of this | article. | | More that it's a guideline, not a law. We should always use | best judgement to decide when the tradeoff of readability and | declarative code is worth a small amount of repetition, rather | than religiously refactoring something for the sake of it. | pkaye wrote: | Sometimes "A little copying is better than a little | dependency." | ubu7737 wrote: | Sometimes a little copying is better than 3 levels of | complexity to create 3 different types of object. | | It's fine to hone your craft as you work by making use of | abstractions that make sense at a larger scale of using that | abstraction. I forgive you. But at 3+ levels in the class | hierarchy to accomplish that unification of 3 different | types, I have to object strenuously that you are straining | purpose. | Quekid5 wrote: | I absolutely subscribe to that, but then again, I don't have | a Rule of Three or similar... | | It's a bit difficult to get across in text, but the minimum | number of repetitions of a piece of code to make it "worth" | putting it in a function is... 1. (According to me, and Tony | van Eerd of Postmodern C++ fame. I had come to this | conclusion on my own, but his talk really articulated it | well.) | | It's all about limiting the scope of side-effects, accidental | reuse or variables, etc. etc. such that a human can do | _chunking_ to understand the whole. | | I generally find that this is not an easy thing to capture in | "metrics" or "rules". Guidelines with reasonable rationales, | etc. etc. and when-not-to's, definitely, but that's a really | hard thing to do and it doesn't get many clicks. | | EDIT: ... and just to get back to DRY. The acronym is far too | absolutist, but Try-Not-To-Repeat-Yourself-Too-Much-Unless- | You-Have-Good-Reason-To isn't quite as catchy, is it? | wrmsr wrote: | I've seen so many times people going all in on DRY not | understanding that just as dangerous as duplication is | _coupling_ - the inevitable result being some ungodly | $COMPANY_NAME_common lib with a thousand dependencies, and | usually only depped in a codebase for a config parser and a | string helper. See also node_modules and left-pad.io. | brentjanderson wrote: | DRY was introduced in the Pragmatic Programmer, and Dave Thomas | pointed out in a recent Changelog episode that DRY doesn't mean | "Don't repeat code", it means "Don't repeat knowledge." | | One concrete example: If your software has to create really | complex objects, would you rather describe _how_ to create | those objects in 10 places or one place? That's a scenario | where you don't want to repeat yourself. | | Dan Abramov [wrote about](https://overreacted.io/goodbye-clean- | code/) this (linked in the OP), but in his example he's | removing repetitive code. He's not removing multiple copies of | the _knowledge_ about what the program is supposed to do. | | It's a subtle difference that seems more difficult to describe | than I'd like, but it's an important one. | crispinb wrote: | No, they're not saying it's wrong, but rather it's commonly | misunderstood. From the motto alone you might think the point | is to abjure all code duplication. But Hunt & Thomas' intent | was instead to warn against duplicating sources of | truth/knowledge - that is, all knowledge embedded in your code | needs to have a canonical source, and all other references | should derive from that source or you risk divergence (or in | the best case must always remember to make necessary changes in | multiple places). | | So for example, documentation (truths about the code) should | derive from the code (eg by doc generation). Otherwise the docs | & code will drift apart. Or if you're passing domain | information across the wire between client & server, you should | derive the data structures at both ends from a common source. | iSnow wrote: | >all knowledge embedded in your code needs to have a | canonical source | | I don't get it. Code /IS/ knowledge and whenever I copy-paste | code around, I duplicate not only code but also knowledge. | crispinb wrote: | Well (and I'm elucidating, not necessarily defending) they | mean knowledge somewhat specific to the project, not in an | absolute philosophical sense. | | So you have an API that belongs to this project. When you | change it, do so in one place, and then run your doc | generator rather than change it in both function/method | signatures and docs. | | Or you have domain knowledge embedded in classes, and a | wire protocol between peers or client & servers using | different languages. Derive the data structures in the two | different languages from common source (either one of the | languages, or both from metadata). | | I think the distinction between this kind of project- | specific 'knowledge' and more abstract 'everything is | knowledge' issues is clear enough in practice. But it is | just a rule of thumb rather than a deep philosophical | principle, and like all such will break down in individual | cases. | | _whenever I copy-paste code around_ | | But that's just one source of code duplication. Another | might be (for example) duplicated code deriving from code | generation. DRY might advocate this (as there's a clear | canonical source of knowledge), whereas a generic rule | against all 'duplication' wouldn't. | mrkeen wrote: | The comment that it's about knowledge, not code resonated with | me. | | Like, if you saw Http.getClient(...).doGetRequest(...) a few | times, it wouldn't be worth pulling them out into a | myGetRequest(...) method. Your teammates already understand the | existing, repeated statements, but they haven't seen | myGetRequest(...) before, so you wouldn't be making the code | any more readable to them. | | But if you had | Http.getClient("auth.myservice.com:8443").doGetRequest(...) in | a few places, then I would pull out the host and port (or maybe | the whole line), since it contains knowledge of where/how to | authenticate. | | Coming from the other direction: if I were reading the code, I | can imagine myself looking for the one place where the auth | happens, but I can't imagine myself needing to know the one | place where Get requests happen (even if the 'Get' code is | repeated much more than the 'auth' code) | 3pt14159 wrote: | Eh. Something I find pro devs do is just code the damn thing out | quickly and wait for the right abstraction to emerge before | stuffing it blindly into a function. If that means a bit of | repetition, fine. If you push everything into tiny little methods | or functions or abstract them into their own objects the first | time you come across a couple of repeated lines of code then the | clearer and better solution may not emerge as the requirements | start to change. On the other hand, self documenting code is most | easily done via method naming. | | This type of topic is hard to talk about. It's so nuanced that | saying a statement about how to do it sounds like a a gutless | generality. It also depends on the programming language and | lifetime of the project. I've banged out some real ugly code when | servers were on fire, but it was all stuff that was destined for | an early death. | networkimprov wrote: | Documenting code via single-caller functions is usually a | mistake, because to everyone else who looks, the set of | functions is an API. | | Both internal and external APIs must be kept coherent. | | Also when readers are trying to understand exactly how some | function changes the system state, having to refer to numerous | other functions it calls is tedious. | OJFord wrote: | > Documenting code via single-caller functions is usually a | mistake, because to everyone else who looks, the set of | functions is an API. | | That's a really good way of putting it. | | I try to avoid it by not writing new functions, but gladly | using existing ones, in my implementation of whatever single | new one. | | For example if I'm writing a find_and_update_foobar function, | I'll use find_foobar if it exists, and with the right | signature, but I won't write it just to implement the one I | actually care about; ditto update_foobar. | | But, professionally I've mainly only used python; so I find | it still deteriorates into a mess. (I type hint extensively, | but still it only takes some missing hints, or something too | loosely - or wrongly - typed.) | | I haven't used rust professionally/enough/on something large | enough to be sure, but my feeling is that it just having a | type checker prevents so much mis-refactoring. | rileymat2 wrote: | I don't disagree with this but it is more complicated because | often the order statements are executed in is knowledge. Much | duplicated code is duplicated knowledge. | | The question is whether it is a coincidence or the same concept. | aazaa wrote: | This article would benefit from some code examples. | | As it is, it left me with the same thought as those who claim to | never need debuggers or object-oriented features: Fine let's say | you're right - _how_ do I implement your system? | keeganjw wrote: | I feel like this article ended before it should have. I'm still | not exactly sure what the author means by Don't Repeat Knowledge. | Should we not be refactoring or... just don't go overboard? | mntmoss wrote: | When you refactor you are taking shots at moving around where | coupling occurs. If your code is maximally decoupled it is | primitive copy-paste code that never calls functions and | intoduces unique variables for each section - and if it's | maximally coupled it will look like swiss cheese, trying to | reuse the same functionality for everything with clever | parameterization, recursion, indirection and globals. | Intentionally coupled code is most common in memory-starved | environments since implict dependency helps reduce data | overheads. | | And so "DRY", to the extent that it's useful, encourages you to | find slack areas in the code where there's low potential for | introducing coupling, and to factor those out so that you have | code that is mostly-decoupled without also being redundant and | hard to modify - the factoring reflects "knowledge" about the | problem. And yet it's not always obvious when you have the | knowledge or not. Sometimes redundant-looking code is a form of | hardcoded data and a factoring would only push it towards being | fully data-driven(which exacts a price in debugging). The Rule | of Three is just a common way of making this decision about | knowledge. | krisroadruck wrote: | Hah I'm glad I'm not the only one. I finished reading it and | thought that it felt like an intro and they entirely left out | the meat of the article. Thought maybe it was just a lack of | related knowledge as I just dabble in coding but yeah I figured | at some point they would define what it was actually supposed | to mean in depth. When that didn't happen I was left wondering | what the point of the article was. | crispinb wrote: | _I 'm still not exactly sure what the author means by Don't | Repeat Knowledge_ | | I think they're assuming that everyone has read The Pragmatic | Programmer. To quote the original DRY principle from there: | | _Every piece of knowledge must have a single, unambiguous, | authoritative representation within the system_ | | Note that this has only an accidental relationship with code | duplication, and in some cases could increase the latter. | jschwartzi wrote: | The actual problem here isn't to refactor or not refactor. The | mistake a lot of developers make, including here on HN is in | thinking of it mechanistically, that the code is just a bunch | of operators and data and we've just gotta push it around the | page some without really understanding it. That's how a lot of | people interpret DRY, as an end in itself to be accomplished by | pushing the symbols around and sweeping them up into something | pretty. | | The most important thing isn't to apply these heuristics, but | to understand the problem space in which your code operates | before you lay down your abstractions. That's a difficult thing | to do without domain knowledge, and in a lot of enterprises you | will never get access to the kind of domain knowledge you need | to refactor effectively unless you're the lead or in | management. | | To the extent that your code is a series of statements about | how the system behaves in response to a particular data input | it's easy to read and documents itself. And to the extent that | your data structures and statements resemble statements that a | domain expert might make(move gantry 30 meters to the left, | then drop the crane) they become easy to change in response to | changing requirements. Domain knowledge tells you what the | fixed elements of the problem space are(is it always a gantry? | Does the gantry move in any directions other than left? What | does it mean to drop the crane and do we do it different | ways?). That informs how you structure your code and what the | most clear factoring is. | | It will not always be the smallest refactoring. ___________________________________________________________________ (page generated 2020-02-14 23:00 UTC)