[HN Gopher] Latency Numbers Every Programmer Should Know (2012) ___________________________________________________________________ Latency Numbers Every Programmer Should Know (2012) Author : albertzeyer Score : 80 points Date : 2020-10-03 19:49 UTC (3 hours ago) (HTM) web link (gist.github.com) (TXT) w3m dump (gist.github.com) | pvg wrote: | Previously: | https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu... | dang wrote: | The threads with comments: | | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... | pvg wrote: | comments>2 might be a good default qualifier for the 'past' | link. | Waterluvian wrote: | Having a solid mental model for "how fast is fast" is, in my | opinion, critical to what makes an excellent engineer: knowing | when to care about performance up front. | | And not even big O notation of algorithms or IO latency. But just | a general feel for the performance of higher level abstractions. | To look at a design that involves some input, data processing, a | transfer somewhere, rendering, presentation, or whatever, and to | instantly have an intuition on what parts to worry most about. | kevin_thibedeau wrote: | When I get onto an unfamiliar platform I do a test of | sequential search vs binary search to see where the crossover | point is on number of elements for small arrays. Then I know | roughly when not to bother with a better algorithm. | meisel wrote: | The % of programmers that _actually_ need to know any of these | numbers is tiny. I 've done plenty of optimizations for | performance-critical systems and have never known any of these | numbers. Certainly, I know general principles like "disk is a lot | slower than memory" and "don't block the UI on a network call". | But knowing whether an L1 cache read is 0.5ns or 5ns has never | been necessary. You can optimize so much "highly optimized" code | out there without those numbers. I'm sure there are _some_ people | that have to know that stuff, maybe a couple people on an AAA | video game title, or computer engineers at Intel. But it 's the | exception and not the norm. | gameswithgo wrote: | i work in food ordering and our volume is big enough noe that | understanding cpu caches is definitely important for our oft | hit endpoints | | there are also domains like realtime control systems, video | conferencing, image processing, devices where battery life is | at a premium, ai, video editing, decoding, encoding, stock | market trading..... | pvg wrote: | This is one of the most popular answers on SO. The effects of | these latencies can easily become visible in pretty vanilla | programming. | | https://stackoverflow.com/questions/11227809/why-is-processi... | chrisandchris wrote: | While I totally agree with the argument that everyone should | think sonetimes a bit more about the impact of his code (or | the execution of it) the relationship between your mentioned | SO article and the gist ist close to zero. | | There's no way from conclude from ,,memory latency" to | ,,branch prediction". | pvg wrote: | 'Branch mis-predict' is the second item in the Latency | Numbers thing. | jariel wrote: | I think having a ballpark intuition for disk/SSD reads and long | network packets is possibly useful to a lot of devs. Much | beyond that it's academic unless you really need to know. | H8crilA wrote: | It's also good to know th relative costs of memory systems. | My rule of thumb, based on market prices: | | RAM:SSD:Magnetic - 100:10:1 | | For example 10 terabytes of RAM costs as much as 1 petabyte | of spinning disk magnetic storage. | pvg wrote: | I think this is inadvertently a good critique of this popular | item - it doesn't really tell you why you might want to have | some exposure to these numbers. | | For a lot of programming, the lowest of these latencies - | cache and memory access and branch mispredicts are averaged | out and essentially unnoticeable or not worth caring too much | about. Which is to be expected, being the design goal. But | it's not too rare, even in regular, high-level language | programming for them to become 'macroscopic' and that is a | useful, practical thing to be aware of, rather than some | academic curiosity. | dragontamer wrote: | I think you underestimate the domination of large numbers. | | Let's say you have code that is 50% main memory and 50% L1 | cache. Let's say there are 1000 operations. | | You have (500 x 100ns) + (500 x 1ns) or 50500ns total time. | | Now let's say you optimize the code so that they are all L3 | operations: 20ns x 1000 operations is 20000ns, or over | twice as fast. | pvg wrote: | I'm not sure I'm underestimating anything since I'm not | estimating things and we seem to be kind of saying the | same thing? | | Edit: to clarify a bit further - people read this list | and think '1 nanosecond, 5 nanoseconds, not important to | me, academic, etc'. My point is that's a misunderstanding | but the list alone doesn't do a good job of disabusing | one of the misunderstanding. | barumi wrote: | > The % of programmers that actually need to know any of these | numbers is tiny. | | If you mean the whole range of values I would agree. | | However, once I've interviewed a developer for a front-end | position that was entirely oblivious to the cost of making an | HTTP request, firmly believing that only large downloads had | any measurable impact on performance. | | Even if you do not have to do back-of-the-napkin calculations | on cache latency, knowing the relative cost of each of these | operations does wonders to your decision process. | ncmncm wrote: | You may not feel you need to know them, but I use them every | day, and I will not hire anybody who doesn't know them. | wombatmobile wrote: | Consumer latency caused by cookie syncs | | https://s6.io/consumer-latency-caused-by-cookie-syncs/ | | Latency in Digital Advertising: A Guide for Publishers | | https://blog.ad-juster.com/latency-in-digital-advertising-a-... | | Case Study: Ads Increase Page Load Times by 40 Percent | | https://rigor.com/blog/ads-increase-latency-by-40-percent/ | Sirupsen wrote: | More numbers with accompanying code: | https://github.com/sirupsen/napkin-math | formalsystem wrote: | NoBugsHare has an amazing chart | https://twitter.com/NoBugsHare/status/1022129373292445696?s=... | quietbritishjim wrote: | Mutex lock/unlock 25ns Main memory reference 100ns | | Honest question: how can this be right? Surely living a mutex | requires synchronising across the cores of a CPU, which requires | at least as much time - probably quite a bit more - than an | uncached access to memory? | GeneralMayhem wrote: | Funnily enough, there is another version of the slides these | numbers come from in which mutex lock/unlock is listed as the | same 100ns as memory access: http://static.googleusercontent.co | m/media/research.google.co.... I can't tell which came first, | or if one is a typo, or what's going on there. | | In any case, assuming the mutex in cache, and appears to not | currently be locked, the core performing the atomic operation | needs to: (1) broadcast an invalidate/exclusive hold on the | cache line to the other cores, (2) update its own cache, (3) | eventually write back to memory. (1) is faster than main memory | access, since CPU cores are closer to each other, physically, | than they are to the memory bus, and have hard-wired access | specifically for this operation. The slowest part is (3), but | with a write-back cache the writer doesn't really pay the | latency cost, because it gets amortized into the next cache | flush. | quietbritishjim wrote: | Interesting, thanks. The idea of it just hitting the shared | cache had occurred to me, but somehow locking a mutex seemed | like a complex enough operation that it would surely take | more time overall than a full memory access. I'm glad to be | corrected about it. | dragontamer wrote: | Mutex lock/unlock is commonly over L3 cache on modern systems. | It won't hit main-memory. | chrisseaton wrote: | > Surely living a mutex requires synchronising across the cores | of a CPU | | Right it's done over cache coherence, which doesn't need to | reach all the way out to main memory. And that's only if it's | contested. | dragontamer wrote: | A contested mutex lock / unlock is over L3 cache. | | Uncontested mutex lock/unlock is just a swap instruction | followed (or in "unlock": preceded) by a memory barrier. The | flush pushes data to L3 cache, where it can be shared between | multiple cores. (L1 and L2 cache is local to a core). | | L3 cache is far closer than main-memory, and roughly 20ns | these days. | | ------- | | In "actuality", its more of a MESI message, but in the | abstract its your L3 caches communicating with each other and | synchronizing. | chrisseaton wrote: | Not sure if you're agreeing with me or contradicting me? | Isn't that what I said? | lbacaj wrote: | This was just posted and already there are two types of comments: | | 1. Most devs don't need this, it's not so helpful to know etc. | | 2. These are critical numbers to know and in the very least devs | should know these numbers. | | This sort of disagreement is common in our industry, it's not | just this it's also Big O, algorithms and data structures, and | even OS fundamental people disagree on. | | I'd love to take both groups of devs commenting and give each | group a set of programming tasks to complete. Judge them based on | correctness, speed of development, speed the tasks run, the | quality of the code etc, etc. | | I think the results would be profound. | bJGVygG7MQVF8c wrote: | > set of programming tasks | | You set off in the right direction but unfortunately haven't | engaged with the heart of the matter at all. | | That there is no easily measurable "set of programming tasks" | in common between even highly proficient developers of | different types is the point. The core comms failure here is | that "software engineer" is too broad a term. We're actually | discussing superficially similar but essentially different | professions that aren't yet acknowledged as such. | [deleted] | stefan_ wrote: | Loading a webpage that is considered "lightning fast" - 1 second | | Of course the common opinion expressed here is people couldn't | care less for those numbers - it's hard to get them to care that | the website they are working on can't finish rendering in the | time it takes anyone to pour a cup of coffee. | pieterr wrote: | Reminds me of Dan Luu's various latency articles. | | https://danluu.com/input-lag/ | Remnant44 wrote: | This gets posted here frequently - I submitted it a year ago - | but imo that's not a problem because it's insightful and useful. | Should have a (2012) tag though. | | I also ran across this version that updates the numbers to 2020 | values: | | https://colin-scott.github.io/personal_website/research/inte... | | edit: it appears they're just estimating for a given time above, | not measuring... ___________________________________________________________________ (page generated 2020-10-03 23:00 UTC)