[HN Gopher] Latency Numbers Every Programmer Should Know (2012)
       ___________________________________________________________________
        
       Latency Numbers Every Programmer Should Know (2012)
        
       Author : albertzeyer
       Score  : 80 points
       Date   : 2020-10-03 19:49 UTC (3 hours ago)
        
 (HTM) web link (gist.github.com)
 (TXT) w3m dump (gist.github.com)
        
       | pvg wrote:
       | Previously:
       | https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
        
         | dang wrote:
         | The threads with comments:
         | 
         | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
        
           | pvg wrote:
           | comments>2 might be a good default qualifier for the 'past'
           | link.
        
       | Waterluvian wrote:
       | Having a solid mental model for "how fast is fast" is, in my
       | opinion, critical to what makes an excellent engineer: knowing
       | when to care about performance up front.
       | 
       | And not even big O notation of algorithms or IO latency. But just
       | a general feel for the performance of higher level abstractions.
       | To look at a design that involves some input, data processing, a
       | transfer somewhere, rendering, presentation, or whatever, and to
       | instantly have an intuition on what parts to worry most about.
        
         | kevin_thibedeau wrote:
         | When I get onto an unfamiliar platform I do a test of
         | sequential search vs binary search to see where the crossover
         | point is on number of elements for small arrays. Then I know
         | roughly when not to bother with a better algorithm.
        
       | meisel wrote:
       | The % of programmers that _actually_ need to know any of these
       | numbers is tiny. I 've done plenty of optimizations for
       | performance-critical systems and have never known any of these
       | numbers. Certainly, I know general principles like "disk is a lot
       | slower than memory" and "don't block the UI on a network call".
       | But knowing whether an L1 cache read is 0.5ns or 5ns has never
       | been necessary. You can optimize so much "highly optimized" code
       | out there without those numbers. I'm sure there are _some_ people
       | that have to know that stuff, maybe a couple people on an AAA
       | video game title, or computer engineers at Intel. But it 's the
       | exception and not the norm.
        
         | gameswithgo wrote:
         | i work in food ordering and our volume is big enough noe that
         | understanding cpu caches is definitely important for our oft
         | hit endpoints
         | 
         | there are also domains like realtime control systems, video
         | conferencing, image processing, devices where battery life is
         | at a premium, ai, video editing, decoding, encoding, stock
         | market trading.....
        
         | pvg wrote:
         | This is one of the most popular answers on SO. The effects of
         | these latencies can easily become visible in pretty vanilla
         | programming.
         | 
         | https://stackoverflow.com/questions/11227809/why-is-processi...
        
           | chrisandchris wrote:
           | While I totally agree with the argument that everyone should
           | think sonetimes a bit more about the impact of his code (or
           | the execution of it) the relationship between your mentioned
           | SO article and the gist ist close to zero.
           | 
           | There's no way from conclude from ,,memory latency" to
           | ,,branch prediction".
        
             | pvg wrote:
             | 'Branch mis-predict' is the second item in the Latency
             | Numbers thing.
        
         | jariel wrote:
         | I think having a ballpark intuition for disk/SSD reads and long
         | network packets is possibly useful to a lot of devs. Much
         | beyond that it's academic unless you really need to know.
        
           | H8crilA wrote:
           | It's also good to know th relative costs of memory systems.
           | My rule of thumb, based on market prices:
           | 
           | RAM:SSD:Magnetic - 100:10:1
           | 
           | For example 10 terabytes of RAM costs as much as 1 petabyte
           | of spinning disk magnetic storage.
        
           | pvg wrote:
           | I think this is inadvertently a good critique of this popular
           | item - it doesn't really tell you why you might want to have
           | some exposure to these numbers.
           | 
           | For a lot of programming, the lowest of these latencies -
           | cache and memory access and branch mispredicts are averaged
           | out and essentially unnoticeable or not worth caring too much
           | about. Which is to be expected, being the design goal. But
           | it's not too rare, even in regular, high-level language
           | programming for them to become 'macroscopic' and that is a
           | useful, practical thing to be aware of, rather than some
           | academic curiosity.
        
             | dragontamer wrote:
             | I think you underestimate the domination of large numbers.
             | 
             | Let's say you have code that is 50% main memory and 50% L1
             | cache. Let's say there are 1000 operations.
             | 
             | You have (500 x 100ns) + (500 x 1ns) or 50500ns total time.
             | 
             | Now let's say you optimize the code so that they are all L3
             | operations: 20ns x 1000 operations is 20000ns, or over
             | twice as fast.
        
               | pvg wrote:
               | I'm not sure I'm underestimating anything since I'm not
               | estimating things and we seem to be kind of saying the
               | same thing?
               | 
               | Edit: to clarify a bit further - people read this list
               | and think '1 nanosecond, 5 nanoseconds, not important to
               | me, academic, etc'. My point is that's a misunderstanding
               | but the list alone doesn't do a good job of disabusing
               | one of the misunderstanding.
        
         | barumi wrote:
         | > The % of programmers that actually need to know any of these
         | numbers is tiny.
         | 
         | If you mean the whole range of values I would agree.
         | 
         | However, once I've interviewed a developer for a front-end
         | position that was entirely oblivious to the cost of making an
         | HTTP request, firmly believing that only large downloads had
         | any measurable impact on performance.
         | 
         | Even if you do not have to do back-of-the-napkin calculations
         | on cache latency, knowing the relative cost of each of these
         | operations does wonders to your decision process.
        
         | ncmncm wrote:
         | You may not feel you need to know them, but I use them every
         | day, and I will not hire anybody who doesn't know them.
        
       | wombatmobile wrote:
       | Consumer latency caused by cookie syncs
       | 
       | https://s6.io/consumer-latency-caused-by-cookie-syncs/
       | 
       | Latency in Digital Advertising: A Guide for Publishers
       | 
       | https://blog.ad-juster.com/latency-in-digital-advertising-a-...
       | 
       | Case Study: Ads Increase Page Load Times by 40 Percent
       | 
       | https://rigor.com/blog/ads-increase-latency-by-40-percent/
        
       | Sirupsen wrote:
       | More numbers with accompanying code:
       | https://github.com/sirupsen/napkin-math
        
       | formalsystem wrote:
       | NoBugsHare has an amazing chart
       | https://twitter.com/NoBugsHare/status/1022129373292445696?s=...
        
       | quietbritishjim wrote:
       | Mutex lock/unlock       25ns         Main memory reference  100ns
       | 
       | Honest question: how can this be right? Surely living a mutex
       | requires synchronising across the cores of a CPU, which requires
       | at least as much time - probably quite a bit more - than an
       | uncached access to memory?
        
         | GeneralMayhem wrote:
         | Funnily enough, there is another version of the slides these
         | numbers come from in which mutex lock/unlock is listed as the
         | same 100ns as memory access: http://static.googleusercontent.co
         | m/media/research.google.co.... I can't tell which came first,
         | or if one is a typo, or what's going on there.
         | 
         | In any case, assuming the mutex in cache, and appears to not
         | currently be locked, the core performing the atomic operation
         | needs to: (1) broadcast an invalidate/exclusive hold on the
         | cache line to the other cores, (2) update its own cache, (3)
         | eventually write back to memory. (1) is faster than main memory
         | access, since CPU cores are closer to each other, physically,
         | than they are to the memory bus, and have hard-wired access
         | specifically for this operation. The slowest part is (3), but
         | with a write-back cache the writer doesn't really pay the
         | latency cost, because it gets amortized into the next cache
         | flush.
        
           | quietbritishjim wrote:
           | Interesting, thanks. The idea of it just hitting the shared
           | cache had occurred to me, but somehow locking a mutex seemed
           | like a complex enough operation that it would surely take
           | more time overall than a full memory access. I'm glad to be
           | corrected about it.
        
         | dragontamer wrote:
         | Mutex lock/unlock is commonly over L3 cache on modern systems.
         | It won't hit main-memory.
        
         | chrisseaton wrote:
         | > Surely living a mutex requires synchronising across the cores
         | of a CPU
         | 
         | Right it's done over cache coherence, which doesn't need to
         | reach all the way out to main memory. And that's only if it's
         | contested.
        
           | dragontamer wrote:
           | A contested mutex lock / unlock is over L3 cache.
           | 
           | Uncontested mutex lock/unlock is just a swap instruction
           | followed (or in "unlock": preceded) by a memory barrier. The
           | flush pushes data to L3 cache, where it can be shared between
           | multiple cores. (L1 and L2 cache is local to a core).
           | 
           | L3 cache is far closer than main-memory, and roughly 20ns
           | these days.
           | 
           | -------
           | 
           | In "actuality", its more of a MESI message, but in the
           | abstract its your L3 caches communicating with each other and
           | synchronizing.
        
             | chrisseaton wrote:
             | Not sure if you're agreeing with me or contradicting me?
             | Isn't that what I said?
        
       | lbacaj wrote:
       | This was just posted and already there are two types of comments:
       | 
       | 1. Most devs don't need this, it's not so helpful to know etc.
       | 
       | 2. These are critical numbers to know and in the very least devs
       | should know these numbers.
       | 
       | This sort of disagreement is common in our industry, it's not
       | just this it's also Big O, algorithms and data structures, and
       | even OS fundamental people disagree on.
       | 
       | I'd love to take both groups of devs commenting and give each
       | group a set of programming tasks to complete. Judge them based on
       | correctness, speed of development, speed the tasks run, the
       | quality of the code etc, etc.
       | 
       | I think the results would be profound.
        
         | bJGVygG7MQVF8c wrote:
         | > set of programming tasks
         | 
         | You set off in the right direction but unfortunately haven't
         | engaged with the heart of the matter at all.
         | 
         | That there is no easily measurable "set of programming tasks"
         | in common between even highly proficient developers of
         | different types is the point. The core comms failure here is
         | that "software engineer" is too broad a term. We're actually
         | discussing superficially similar but essentially different
         | professions that aren't yet acknowledged as such.
        
       | [deleted]
        
       | stefan_ wrote:
       | Loading a webpage that is considered "lightning fast" - 1 second
       | 
       | Of course the common opinion expressed here is people couldn't
       | care less for those numbers - it's hard to get them to care that
       | the website they are working on can't finish rendering in the
       | time it takes anyone to pour a cup of coffee.
        
       | pieterr wrote:
       | Reminds me of Dan Luu's various latency articles.
       | 
       | https://danluu.com/input-lag/
        
       | Remnant44 wrote:
       | This gets posted here frequently - I submitted it a year ago -
       | but imo that's not a problem because it's insightful and useful.
       | Should have a (2012) tag though.
       | 
       | I also ran across this version that updates the numbers to 2020
       | values:
       | 
       | https://colin-scott.github.io/personal_website/research/inte...
       | 
       | edit: it appears they're just estimating for a given time above,
       | not measuring...
        
       ___________________________________________________________________
       (page generated 2020-10-03 23:00 UTC)