[HN Gopher] Why do so many developers get DRY wrong?
       ___________________________________________________________________
        
       Why do so many developers get DRY wrong?
        
       Author : jerodsanto
       Score  : 33 points
       Date   : 2020-02-14 19:57 UTC (3 hours ago)
        
 (HTM) web link (changelog.com)
 (TXT) w3m dump (changelog.com)
        
       | crispinb wrote:
       | The comments here suggest two things: (1) most people
       | misunderstand DRY (ie. they think it's about code rather than
       | knowledge duplication), and (2)the article didn't do a great job
       | of clearing the issue up.
       | 
       | Though an alternative to (1) is that the meaning of DRY in common
       | dev parlance has changed & has come to mean something different
       | from Thomas & Hunt's intention.
        
       | beaker52 wrote:
       | I find DRY fascinating.
       | 
       | It's the source of a large portion of the accidental complexity I
       | find in code. "If I just create this abstraction, all this
       | duplicated code goes away" - we've all heard it and many of us
       | have told it, but few of us realise that it's the prequel to the
       | most popular story of all: "all this code is such a mess, there
       | are all these extra layers that don't really make sense and
       | unpicking it is such a pain, I can't believe someone wrote this".
       | 
       | The story inbetween is about a young, inexperienced developer who
       | has 3-days to deliver the one-feature-to-rule-them-all, to
       | appease the almighty project manager, necessitating an adventure
       | into the labyrinth carefully crafted by the developer in the
       | first story.
        
         | Frost1x wrote:
         | A lot of principles like DRY (as described out of correct
         | context in this article) have cult like followings people
         | follow mindlessly leading to unnecessary introduced complexity.
         | 
         | I'm always amazed at how eager people are to over-engineer a
         | solution that makes it a mess to deal with moving forward.
         | Developers at large like to appear clever, tend to have
         | (fragile) large egos, and don't seem to want to veer from
         | established dogma--much of which based on little evidence or
         | evidence that doesnt apply to a case they're dealing with.
        
         | dec0dedab0de wrote:
         | _It 's the source of a large portion of the accidental
         | complexity I find in code. "If I just create this abstraction,
         | all this duplicated code goes away" _
         | 
         | For me it's usually, "Oh crap that thing I changed I had to
         | change here and here too, whoops its good now... Wait no I also
         | had to change it here... and here... now that we're done with
         | that we should be fine... DAMMIT!"
        
       | rojobuffalo wrote:
       | Having a couple lines that are similar or copied in several
       | places shouldn't be considered such a bad thing. Repetition
       | reveals similarity, and having clear signals of similarity is
       | really important. It's often more expressive / easier to
       | understand than a single method name.
       | 
       | Premature abstractions are way worse than repetition. A poor or
       | insufficient abstraction leads to obfuscation which leads to
       | misunderstanding which leads to novel constructs for the same
       | responsibility. Because a poor abstraction can be really really
       | difficult to back track, you end up with hacky work-arounds to
       | get something done.
       | 
       | I think encountering novelty in a codebase is the biggest thing
       | that damages comprehension; and repetition actually enhances
       | comprehensibility.
        
       | jfengel wrote:
       | I hadn't heard of the Rule of Three, but it parallels my own
       | heuristic. The first time, I write the code to do the thing I
       | need. The second time I encounter a similar thing, if I can't
       | find the right abstraction to unify them, I go ahead and repeat
       | myself, writing a second, similar round of code that does what it
       | needs.
       | 
       | If I encounter it a third time, then I've got enough data points
       | to make a good guess about what the right abstraction will be. If
       | I've done a good job so far, it shouldn't be too difficult to
       | refactor it. (Strong, static typing helps.)
       | 
       | This is, of course, just a heuristic, and it's not all-or-
       | nothing. I'll take my best guess about what the right abstraction
       | is going to be, and I'll try to get it right the first time. The
       | second round also presents opportunities to take two points and
       | extrapolate a line.
       | 
       | It all comes down to experience: not just with the system, but
       | with the domain that the system is about, and with the way
       | systems change and grow. No one rule of thumb ever encapsulates
       | all that.
        
         | AnimalMuppet wrote:
         | I use the same approach to automate processes. The first time,
         | I do it manually. The second time, I still do it manually, but
         | I think "Hey, I did this once before. This is looking like
         | something I maybe ought to automate."
         | 
         | The third time I automate it. By then, I understand it well
         | enough to have good odds on being able to do the automation
         | successfully.
        
           | amelius wrote:
           | How often did you automate something yet?
           | 
           | If it's more than three times, you ought to automate the
           | automation!
        
             | keithnoizu wrote:
             | skynet.gif
        
       | saber6 wrote:
       | DRY like anything can be properly used or misused. For example,
       | you can normalize a database so much that any basic query comes
       | with a massive overhead (recursion). There is a middle ground
       | between "religion" (pure DRY) and "chaos" (no DRY).
        
       | layer8 wrote:
       | What they meant by DRY is otherwise known as SPOT -- Single Point
       | Of Truth -- which is harder to misinterpret. The same "truth" --
       | which can be data, values, behavior, policy, etc. -- should not
       | be defined multiple times in separate places, because a future
       | change would have to be applied to all the places, or else cause
       | different parts of a program or datastore to have inconsistent
       | views on what the "truth" is.
       | 
       | If you google for it, you will find the synonymous "Single Source
       | Of Truth", which however makes for a worse acronym.
        
       | lr4444lr wrote:
       | I feel like this article is being critical about something
       | without justly staking a clear claim about what the right
       | approach is. In my experience, the benefit of DRY code is bug
       | reduction and overall increased new development velocity. There
       | is a whole class of bugs around similar behaviors that devs and
       | product managers _expect_ to move in sync which _don 't_, because
       | features develop over time and it was just easier to code
       | separate small bits than refactor into a common code path. Yes,
       | it can make readability harder to unify into abstractions and
       | create the right configs or import steps. But the time hunting
       | down and fixing the bugs, plus the drag on overall feature
       | development due to having to write updates in multiple places and
       | test for them is far worse to deal with for _not_ taking that
       | preventative measure.
        
       | ubu7737 wrote:
       | DRY is just an admonition for beginners. Nobody who learns
       | higher-level abstractions in a modern language needs to be
       | reminded of DRY.
        
       | AdriaanvRossum wrote:
       | Like DRY is wrong? I don't really get the point of this article.
        
         | aphextron wrote:
         | >Like DRY is wrong? I don't really get the point of this
         | article.
         | 
         | More that it's a guideline, not a law. We should always use
         | best judgement to decide when the tradeoff of readability and
         | declarative code is worth a small amount of repetition, rather
         | than religiously refactoring something for the sake of it.
        
         | pkaye wrote:
         | Sometimes "A little copying is better than a little
         | dependency."
        
           | ubu7737 wrote:
           | Sometimes a little copying is better than 3 levels of
           | complexity to create 3 different types of object.
           | 
           | It's fine to hone your craft as you work by making use of
           | abstractions that make sense at a larger scale of using that
           | abstraction. I forgive you. But at 3+ levels in the class
           | hierarchy to accomplish that unification of 3 different
           | types, I have to object strenuously that you are straining
           | purpose.
        
           | Quekid5 wrote:
           | I absolutely subscribe to that, but then again, I don't have
           | a Rule of Three or similar...
           | 
           | It's a bit difficult to get across in text, but the minimum
           | number of repetitions of a piece of code to make it "worth"
           | putting it in a function is... 1. (According to me, and Tony
           | van Eerd of Postmodern C++ fame. I had come to this
           | conclusion on my own, but his talk really articulated it
           | well.)
           | 
           | It's all about limiting the scope of side-effects, accidental
           | reuse or variables, etc. etc. such that a human can do
           | _chunking_ to understand the whole.
           | 
           | I generally find that this is not an easy thing to capture in
           | "metrics" or "rules". Guidelines with reasonable rationales,
           | etc. etc. and when-not-to's, definitely, but that's a really
           | hard thing to do and it doesn't get many clicks.
           | 
           | EDIT: ... and just to get back to DRY. The acronym is far too
           | absolutist, but Try-Not-To-Repeat-Yourself-Too-Much-Unless-
           | You-Have-Good-Reason-To isn't quite as catchy, is it?
        
           | wrmsr wrote:
           | I've seen so many times people going all in on DRY not
           | understanding that just as dangerous as duplication is
           | _coupling_ - the inevitable result being some ungodly
           | $COMPANY_NAME_common lib with a thousand dependencies, and
           | usually only depped in a codebase for a config parser and a
           | string helper. See also node_modules and left-pad.io.
        
         | brentjanderson wrote:
         | DRY was introduced in the Pragmatic Programmer, and Dave Thomas
         | pointed out in a recent Changelog episode that DRY doesn't mean
         | "Don't repeat code", it means "Don't repeat knowledge."
         | 
         | One concrete example: If your software has to create really
         | complex objects, would you rather describe _how_ to create
         | those objects in 10 places or one place? That's a scenario
         | where you don't want to repeat yourself.
         | 
         | Dan Abramov [wrote about](https://overreacted.io/goodbye-clean-
         | code/) this (linked in the OP), but in his example he's
         | removing repetitive code. He's not removing multiple copies of
         | the _knowledge_ about what the program is supposed to do.
         | 
         | It's a subtle difference that seems more difficult to describe
         | than I'd like, but it's an important one.
        
         | crispinb wrote:
         | No, they're not saying it's wrong, but rather it's commonly
         | misunderstood. From the motto alone you might think the point
         | is to abjure all code duplication. But Hunt & Thomas' intent
         | was instead to warn against duplicating sources of
         | truth/knowledge - that is, all knowledge embedded in your code
         | needs to have a canonical source, and all other references
         | should derive from that source or you risk divergence (or in
         | the best case must always remember to make necessary changes in
         | multiple places).
         | 
         | So for example, documentation (truths about the code) should
         | derive from the code (eg by doc generation). Otherwise the docs
         | & code will drift apart. Or if you're passing domain
         | information across the wire between client & server, you should
         | derive the data structures at both ends from a common source.
        
           | iSnow wrote:
           | >all knowledge embedded in your code needs to have a
           | canonical source
           | 
           | I don't get it. Code /IS/ knowledge and whenever I copy-paste
           | code around, I duplicate not only code but also knowledge.
        
             | crispinb wrote:
             | Well (and I'm elucidating, not necessarily defending) they
             | mean knowledge somewhat specific to the project, not in an
             | absolute philosophical sense.
             | 
             | So you have an API that belongs to this project. When you
             | change it, do so in one place, and then run your doc
             | generator rather than change it in both function/method
             | signatures and docs.
             | 
             | Or you have domain knowledge embedded in classes, and a
             | wire protocol between peers or client & servers using
             | different languages. Derive the data structures in the two
             | different languages from common source (either one of the
             | languages, or both from metadata).
             | 
             | I think the distinction between this kind of project-
             | specific 'knowledge' and more abstract 'everything is
             | knowledge' issues is clear enough in practice. But it is
             | just a rule of thumb rather than a deep philosophical
             | principle, and like all such will break down in individual
             | cases.
             | 
             |  _whenever I copy-paste code around_
             | 
             | But that's just one source of code duplication. Another
             | might be (for example) duplicated code deriving from code
             | generation. DRY might advocate this (as there's a clear
             | canonical source of knowledge), whereas a generic rule
             | against all 'duplication' wouldn't.
        
         | mrkeen wrote:
         | The comment that it's about knowledge, not code resonated with
         | me.
         | 
         | Like, if you saw Http.getClient(...).doGetRequest(...) a few
         | times, it wouldn't be worth pulling them out into a
         | myGetRequest(...) method. Your teammates already understand the
         | existing, repeated statements, but they haven't seen
         | myGetRequest(...) before, so you wouldn't be making the code
         | any more readable to them.
         | 
         | But if you had
         | Http.getClient("auth.myservice.com:8443").doGetRequest(...) in
         | a few places, then I would pull out the host and port (or maybe
         | the whole line), since it contains knowledge of where/how to
         | authenticate.
         | 
         | Coming from the other direction: if I were reading the code, I
         | can imagine myself looking for the one place where the auth
         | happens, but I can't imagine myself needing to know the one
         | place where Get requests happen (even if the 'Get' code is
         | repeated much more than the 'auth' code)
        
       | 3pt14159 wrote:
       | Eh. Something I find pro devs do is just code the damn thing out
       | quickly and wait for the right abstraction to emerge before
       | stuffing it blindly into a function. If that means a bit of
       | repetition, fine. If you push everything into tiny little methods
       | or functions or abstract them into their own objects the first
       | time you come across a couple of repeated lines of code then the
       | clearer and better solution may not emerge as the requirements
       | start to change. On the other hand, self documenting code is most
       | easily done via method naming.
       | 
       | This type of topic is hard to talk about. It's so nuanced that
       | saying a statement about how to do it sounds like a a gutless
       | generality. It also depends on the programming language and
       | lifetime of the project. I've banged out some real ugly code when
       | servers were on fire, but it was all stuff that was destined for
       | an early death.
        
         | networkimprov wrote:
         | Documenting code via single-caller functions is usually a
         | mistake, because to everyone else who looks, the set of
         | functions is an API.
         | 
         | Both internal and external APIs must be kept coherent.
         | 
         | Also when readers are trying to understand exactly how some
         | function changes the system state, having to refer to numerous
         | other functions it calls is tedious.
        
           | OJFord wrote:
           | > Documenting code via single-caller functions is usually a
           | mistake, because to everyone else who looks, the set of
           | functions is an API.
           | 
           | That's a really good way of putting it.
           | 
           | I try to avoid it by not writing new functions, but gladly
           | using existing ones, in my implementation of whatever single
           | new one.
           | 
           | For example if I'm writing a find_and_update_foobar function,
           | I'll use find_foobar if it exists, and with the right
           | signature, but I won't write it just to implement the one I
           | actually care about; ditto update_foobar.
           | 
           | But, professionally I've mainly only used python; so I find
           | it still deteriorates into a mess. (I type hint extensively,
           | but still it only takes some missing hints, or something too
           | loosely - or wrongly - typed.)
           | 
           | I haven't used rust professionally/enough/on something large
           | enough to be sure, but my feeling is that it just having a
           | type checker prevents so much mis-refactoring.
        
       | rileymat2 wrote:
       | I don't disagree with this but it is more complicated because
       | often the order statements are executed in is knowledge. Much
       | duplicated code is duplicated knowledge.
       | 
       | The question is whether it is a coincidence or the same concept.
        
       | aazaa wrote:
       | This article would benefit from some code examples.
       | 
       | As it is, it left me with the same thought as those who claim to
       | never need debuggers or object-oriented features: Fine let's say
       | you're right - _how_ do I implement your system?
        
       | keeganjw wrote:
       | I feel like this article ended before it should have. I'm still
       | not exactly sure what the author means by Don't Repeat Knowledge.
       | Should we not be refactoring or... just don't go overboard?
        
         | mntmoss wrote:
         | When you refactor you are taking shots at moving around where
         | coupling occurs. If your code is maximally decoupled it is
         | primitive copy-paste code that never calls functions and
         | intoduces unique variables for each section - and if it's
         | maximally coupled it will look like swiss cheese, trying to
         | reuse the same functionality for everything with clever
         | parameterization, recursion, indirection and globals.
         | Intentionally coupled code is most common in memory-starved
         | environments since implict dependency helps reduce data
         | overheads.
         | 
         | And so "DRY", to the extent that it's useful, encourages you to
         | find slack areas in the code where there's low potential for
         | introducing coupling, and to factor those out so that you have
         | code that is mostly-decoupled without also being redundant and
         | hard to modify - the factoring reflects "knowledge" about the
         | problem. And yet it's not always obvious when you have the
         | knowledge or not. Sometimes redundant-looking code is a form of
         | hardcoded data and a factoring would only push it towards being
         | fully data-driven(which exacts a price in debugging). The Rule
         | of Three is just a common way of making this decision about
         | knowledge.
        
         | krisroadruck wrote:
         | Hah I'm glad I'm not the only one. I finished reading it and
         | thought that it felt like an intro and they entirely left out
         | the meat of the article. Thought maybe it was just a lack of
         | related knowledge as I just dabble in coding but yeah I figured
         | at some point they would define what it was actually supposed
         | to mean in depth. When that didn't happen I was left wondering
         | what the point of the article was.
        
         | crispinb wrote:
         | _I 'm still not exactly sure what the author means by Don't
         | Repeat Knowledge_
         | 
         | I think they're assuming that everyone has read The Pragmatic
         | Programmer. To quote the original DRY principle from there:
         | 
         |  _Every piece of knowledge must have a single, unambiguous,
         | authoritative representation within the system_
         | 
         | Note that this has only an accidental relationship with code
         | duplication, and in some cases could increase the latter.
        
         | jschwartzi wrote:
         | The actual problem here isn't to refactor or not refactor. The
         | mistake a lot of developers make, including here on HN is in
         | thinking of it mechanistically, that the code is just a bunch
         | of operators and data and we've just gotta push it around the
         | page some without really understanding it. That's how a lot of
         | people interpret DRY, as an end in itself to be accomplished by
         | pushing the symbols around and sweeping them up into something
         | pretty.
         | 
         | The most important thing isn't to apply these heuristics, but
         | to understand the problem space in which your code operates
         | before you lay down your abstractions. That's a difficult thing
         | to do without domain knowledge, and in a lot of enterprises you
         | will never get access to the kind of domain knowledge you need
         | to refactor effectively unless you're the lead or in
         | management.
         | 
         | To the extent that your code is a series of statements about
         | how the system behaves in response to a particular data input
         | it's easy to read and documents itself. And to the extent that
         | your data structures and statements resemble statements that a
         | domain expert might make(move gantry 30 meters to the left,
         | then drop the crane) they become easy to change in response to
         | changing requirements. Domain knowledge tells you what the
         | fixed elements of the problem space are(is it always a gantry?
         | Does the gantry move in any directions other than left? What
         | does it mean to drop the crane and do we do it different
         | ways?). That informs how you structure your code and what the
         | most clear factoring is.
         | 
         | It will not always be the smallest refactoring.
        
       ___________________________________________________________________
       (page generated 2020-02-14 23:00 UTC)