[HN Gopher] The most copied StackOverflow snippet of all time is...
       ___________________________________________________________________
        
       The most copied StackOverflow snippet of all time is flawed (2019)
        
       Author : vinnyglennon
       Score  : 128 points
       Date   : 2021-06-16 21:27 UTC (1 hours ago)
        
 (HTM) web link (programming.guide)
 (TXT) w3m dump (programming.guide)
        
       | bsaul wrote:
       | seems that most comments here missed the end of the article ,
       | where he points to the "production ready" version of the
       | solution, that is indeed very close to the original one,
       | including a while loop.
        
         | t0astbread wrote:
         | It's especially ironic given that this is about a StackOverflow
         | code snippet that many people probably also copied without
         | reading.
        
       | eutectic wrote:
       | This is why it's a good idea to have a real integer type.
        
         | enriquto wrote:
         | Isn't it impossible? Integers go arbitrarily large but
         | computers don't.
        
           | asdf3243245q wrote:
           | Computers also go arbitrarily large. Not infinite, but
           | arbitrarily large.
           | 
           | A real number type could be bounded by the amount of RAM you
           | have.
        
       | jrockway wrote:
       | > Sebastian then reached out to me to straighten it out, which I
       | did: I had not yet started at Oracle when that commit was merged,
       | and I did not contribute that patch. Jokes on Oracle. Shortly
       | after, an issue was filed and the code was removed.
       | 
       | Good thing it wasn't a range check function. I hear those are
       | expensive.
        
       | dokem wrote:
       | Something about this comes off as amateurish. The obsession with
       | minimization. Just use a switch statement. Now where is the bug
       | going to hide? The solution doesn't need to generalize, there is
       | only a small handful of different solutions. Just break them all
       | out. It's more maintainable and readable and requires less
       | thinking.
        
       | stefan_ wrote:
       | Why are you writing code? The question was for a static method in
       | Apache Commons, not your "I'm so clever" implementation. Think
       | the reading comprehension is flawed.
       | 
       | (Of course, this static method exists in Apache Commons, going
       | back at least 20 years. But the fellow "code golfers" of the
       | author voted someone to the first answer who similarly had the
       | irresistible urge to _try to be very clever_. It 's a scourge on
       | StackOverflow.)
        
       | [deleted]
        
       | unwind wrote:
       | I must admit I smiled at seeing that I edited the question, back
       | in the day. :) Can't say I remember the question, and didn't know
       | it has that epic feature of being the most-copied. Cool!
        
       | mweberxyz wrote:
       | Say what you want about the stability of the npm ecosystem, but
       | if this were JS, a new SemVer patch release could be cut, and it
       | would be fixed in thousands of code bases essentially instantly.
        
       | beermonster wrote:
       | > I wrote almost a decade ago was found to be the most copied
       | snippet on Stack Overflow. Ironically it happens to be buggy.
       | 
       | I don't find it ironic, I find it quite normal that even small
       | snippets of code contains bugs (given the daily review requests I
       | receive).
       | 
       | I think when copying code literally from StackOverflow what's
       | more important is understanding what the code does, and why ,
       | rather than copying it ad-verbatim by copy & pasting it into your
       | production code.
       | 
       | I also often find on StackExchange et al that quite often the
       | most upvoted is the one that 'fixes it' for 'most people' yet the
       | correct answer is down at number 3 or 4. Again, understanding the
       | answer and why it applies, helps give you the context to
       | understand if this is _actually_ the solution to _your_ problem
       | or just treats the symptom.
        
         | megalodon wrote:
         | One of the best tips I have gotten from the internet is to
         | never copy and paste code you have not written yourself. Even
         | rewriting it verbatim makes you think about what it is you are
         | actually copying.
         | 
         | It's a pretty neat rule to have in mind.
        
       | AceJohnny2 wrote:
       | > _Key Takeaways:_
       | 
       | > _[...]_
       | 
       | > _Floating-point arithmetic is hard._
       | 
       | I have successfully avoided FP code for most of my career. At
       | this point, I consider the domain sophisticated enough to be an
       | independent skill on someone's resume.
        
         | user3939382 wrote:
         | There are libraries that offer more appropriate ways of dealing
         | with it, but last time I ran into a FP-related bug (something
         | to do with parsing xlsx into MySQL) I fixed it quickly by
         | converting everything to strings and doing some unholy
         | procedure on them. It worked but it wasn't my proudest moment
         | as a programmer.
        
       | tasty_freeze wrote:
       | The thing that jumped out at me, as I've seen the same kind of
       | thing on the job, is the assumption that, eg, log(1000)/log(10)
       | is _exactly_ 3. Does the standard guarantee that the rounded
       | approximation of one transcendental number by the rounded
       | approximation of a related transcendental number will give 3.0
       | and not 2.999999999?
        
         | remram wrote:
         | Yeah that seems like a serious flaw to me too. On my Python:
         | >>> math.log(1000)/math.log(10)       2.9999999999999996
         | >>> int(math.log(1000)/math.log(10))       2
         | 
         | But I don't know about the guarantees provided in the
         | JavaScript standard (or more importantly those offered by
         | actual browsers).
        
       | danellis wrote:
       | > almost no branches
       | 
       | I wonder whether the author is suggesting that (potentially) nine
       | branches is a small number, or they overlooked ternary
       | expressions and function calls and are just counting the if
       | statement.
        
       | axiosgunnar wrote:
       | So it's not flawed (it does compute the correct result).
       | 
       | The author just thinks a completely unreadable (but supposedly
       | faster) variant using logarithms is "better" than the simple loop
       | used in the original snippet?
       | 
       | Write your code for junior devs in their first week at your
       | company, not for academic journals.
        
         | hardwaregeek wrote:
         | I think you might have misread the post. His logarithm code
         | became the most used snippet and had the bug.
        
           | [deleted]
        
           | [deleted]
        
         | [deleted]
        
         | ascar wrote:
         | His code snippet had rounding errors on the boundaries towards
         | the next unit.
         | 
         | However he notes:
         | 
         | > FWIW, all 22 answers posted, including the ones using Apache
         | Commons and Android libraries, had this bug (or a variation of
         | it) at the time of writing this article.
        
         | phist_mcgee wrote:
         | You should almost _always_ focus on code readability and
         | simplicity over inventiveness and cleverness.
         | 
         | Very few people I have encountered have complained about code
         | being 'too simple' or 'too readable', but the opposite happens
         | on a near daily/weekly basis.
         | 
         | Write comments, use a for loop, avoid global state, keep your
         | nesting limited to 2-3 levels, be kind to your junior devs.
        
       | jka wrote:
       | There might be an opportunity somewhere around this area to
       | combine the versioning, continuous improvement, and dependency
       | management of package repositories with the Q&A format of
       | StackOverflow.
       | 
       | Something like "cherry pick this answer, with attribution, and
       | notifications when flaws and/or improvements are found".
       | 
       | Maybe that's a terrible idea (there's definitely risk involved,
       | and the potential to spread and create bad software), but equally
       | I don't know why it would be significantly worse than
       | unattributed code snippets and trends towards single-function
       | libraries.
        
         | fennecfoxen wrote:
         | NodeJS did something a lot like this by having packages that
         | are just short snippets, but half the ecosystem flipped out
         | when someone messed up `leftpad`.
        
         | [deleted]
        
         | DylanSp wrote:
         | Not sure if it's quite what you had in mind, but SO is starting
         | to address the issue of updating old answers with the Outdated
         | Answers Project:
         | https://meta.stackoverflow.com/questions/405302/introducing-...
        
       | pkaye wrote:
       | Now the new code is unreadable.
        
         | ape4 wrote:
         | Its as easy as "KMGTPE"
        
       | penteract wrote:
       | This is a bit of a tangent, but while it may be conventional to
       | round to the value with the smallest difference, is that
       | convention good? In a case such as this where it's fine for the
       | prescision to vary with magnitude, then I'd argue it makes sense
       | to round to the value with the smallest ratio.
        
       | bla3 wrote:
       | > At the very least, the loop based code could be cleaned up
       | significantly.
       | 
       | Seems like the loop based code wasn't so bad after all...
        
         | meetups323 wrote:
         | Loop code has the same bug.
        
           | bla3 wrote:
           | This is Java, not JavaScript. The exponents table was likely
           | of integer type. Then it works.
        
         | spkm wrote:
         | This! If I had to choose between the two snippets I would have
         | taken the loop based one without a second though, because of
         | its simplicity. The second snippet is what usually happens when
         | people try to write "clever" code.
        
           | dataflow wrote:
           | The loop by itself isn't entirely clear on what it's doing.
           | Stuff like the direction of the > comparison and what to do
           | vs. >= and the byteCount / magnitudes[i] at the end really do
           | require you to pause & do mental analysis to check
           | correctness. I think the real solution here is to define an
           | integer log (ilog()?) function based on division and use that
           | in the same manner as the log(). That way you only do do the
           | analysis the first time you write that function, and after
           | that you just call the function knowing that it's correct.
        
         | twobitshifter wrote:
         | Premature optimization strikes again.
        
       | amelius wrote:
       | Wouldn't it be cool if you could call stack overflow answers
       | directly from your code?
        
       | hardwaregeek wrote:
       | Floating point is really really hard to get right, especially if
       | you want the numbers to be stable. Which begs the question, why
       | the heck does JavaScript, the most used language in the world,
       | not have an integer type? Sure, there's BigInt but that's quite
       | clunky to use. I know it's virtually impossible to add by now,
       | but I'd love a integer type for all my bit twiddling, byte
       | munching needs.
        
         | ascar wrote:
         | I just feel if you have bit twiddling, byte munching needs
         | JavaScript shouldn't be the language of choice. Doing that is a
         | rather rare edge case and if you're doing it for performance
         | reason, working in Javascript is the much bigger performance
         | problem.
        
       | colejohnson66 wrote:
       | What's wrong with a simple loop (like the one near the top)? Why
       | does it _have_ to branchless? Wouldn't the IO take longer than
       | missed branches /pipeline flushes?
       | 
       | Not to mention that the fixed version now has branches as well...
        
         | MauranKilom wrote:
         | The irony is that a single log computation is going to take
         | longer than the loop. (No idea if implementing a log
         | approximation involves loops either.)
        
           | [deleted]
        
           | bottled_poe wrote:
           | Sounds like textbook example of when theory is misaligned
           | with reality.
        
         | xxpor wrote:
         | the original version had branches too, in fact a majority of
         | the lines had them! ? is just shorthand for if.
        
           | enedil wrote:
           | This isn't true, this form of conditionals can be compiled
           | into cmov type of instructions, which is faster than regular
           | jump if condition.
        
             | dataflow wrote:
             | > This isn't true, this form of conditionals can be
             | compiled into cmov type of instructions, which is faster
             | than regular jump if condition.
             | 
             | IIRC cmov is actually quite slow. It's just faster than an
             | unpredictable branch. Most branches have predictability so
             | you generally don't want a cmov.
        
             | ncann wrote:
             | If the if/else is simple the compiler should be able to
             | optimize that anyway.
        
       | kmote00 wrote:
       | Update title: this is from 2019
        
       | mjevans wrote:
       | The author's lookup table is incorrect.
       | 
       | The question being answered clearly wanted base2 engineering
       | prefix units, rather than the standard base10 engineering prefix
       | units.
       | 
       | suffixes = [ "EB", "PB", "TB", "GB", "MB", "KB", "B" ]
       | 
       | magnitudes = [ 2^60, 2^50, 2^40, 2^30, 2^20, 2^10, 2^0 ] //
       | Pseudocode, also 64 bit integers required. (Compilers might
       | assume unsigned 32 for int)
        
         | returningfory2 wrote:
         | That code snippet is explicitly introduced in the article as
         | _not_ the author 's.
        
         | asdf3243245q wrote:
         | That is not the author's code. That is pseudocode for one of
         | the example answers that he is improving on.
         | 
         | The author's code gives an option for the units:
         | 
         | int unit = si ? 1000 : 1024;
        
       ___________________________________________________________________
       (page generated 2021-06-16 23:00 UTC)