[HN Gopher] How Jeff Dean's "Latency Numbers Everybody Should Kn... ___________________________________________________________________ How Jeff Dean's "Latency Numbers Everybody Should Know" decreased from 1990-2020 Author : isaacimagine Score : 186 points Date : 2022-03-03 21:06 UTC (1 hours ago) (HTM) web link (colin-scott.github.io) (TXT) w3m dump (colin-scott.github.io) | dustingetz wrote: | How can network be faster than memory? | zamadatix wrote: | The memory number is measuring access time while the network | number is measuring average bandwidth. The two values can't be | compared even though they are presented using the same unit. | the-dude wrote: | The slider is very bad UX : I missed it too at first. It is not | pronounced enough, partly because it is all the way to the right. | | A former boss would say : _make it red_. | wolpoli wrote: | It's really hard to notice the grey slider when the content are | already red, green, blue, and black. | the-dude wrote: | Blinking Magenta then. | nhoughto wrote: | Oh right | | I was trying to see where the comparison was, totally missed | the slider, thanks! | greggsy wrote: | I sympathise that the site probably wasn't designed with mobile | in mind, but it's impossible to go beyond 2015 without hitting | the GitHub link. | ygra wrote: | You can also drag on the main view instead of the slider. | lamontcg wrote: | How are people practically taking advantage of the increase in | speed of SSDs these days compared to network latencies? It looks | like disk caches directly at the edge with hot data would be the | fastest way of doing things. | | I'm more familiar with the 2001-2006 era where redis-like RAM | caches for really hot data made a lot of sense, but with spinning | rust a disk drives, it made more sense to go over the network to | a microservice that was effectively a big sharded RAM cache than | to go to disk. | | Seems like you could push more hot data to the very edge these | days and utilize SSDs like a very large RAM cache (and how does | that interact with containers)? | | I guess the cost there might still be prohibitive if you have a | lot of edge servers and consolidation would still be a big price | win even if you take the latency hit across the network. | gameswithgo wrote: | I don't know, I have observed in my workloads, booting, game | load, and building programs, that super fast ssds make almost | no difference compared to cheap slow ssds. But any ssd is | miraculous compared to a spinny drive | | Presumably video editing or something might get more of a win | but I don't know. | noizejoy wrote: | When I got my first NVMe SSD, I was disappointed that it | wasn't significantly faster than my SATA SSD. | | But soon I realized that it was Samsung's Magician software | that made the SATA SSD competitive with an NVMe SSD via RAM | caching. | rvr_ wrote: | 20 years without meaningful improvements on memory access ? | gameswithgo wrote: | yep, got any ideas? | not2b wrote: | It takes at least one clock cycle to do anything, and clock | frequency stopped increasing in the 2003-2005 time frame, | mainly because of the horrible effects on power with very small | feature size. | aidenn0 wrote: | Good news is that SSDs are only 160x slower for random reads, | so maybe we should just beef up L3 or L4 cache and get rid of | ram? /s | ohazi wrote: | Diminishing returns over the last decade, as expected. It would | be interesting to look at the energy consumed by each of these | operations across the same time periods. | swolchok wrote: | The source displayed at the bottom if this page clearly shows | it's just extrapolating from numbers that are older than 2020. | gregwebs wrote: | According to this all latencies improved dramatically except for | SSD random read (disk seek only improved by 10x as well). Reading | 1 million bytes sequentially from SSD improved 1000x and is now | only 2-3x slower than a random read and for disk reading 1 | million bytes is faster than a seek. Conclusion: avoid random IO | where performance matters. | | CPU and RAM latencies stopped improving in 2005 but storage and | network kept improving. | csours wrote: | It looks like almost everything is blazing fast now. I'm not sure | how long the first X takes though - how long does it take to | establish a TCP/IP connection? How long does it take an actual | program to start reading from disk? | bob1029 wrote: | Latency is everything. | | I believe that sometime around 2010 we peaked on the best | software solution for high performance, low-latency processing of | business items when working with the style of computer | architecture we have today. | | https://lmax-exchange.github.io/disruptor/disruptor.html | | I have been building systems using this kind of technique for a | few years now and I still fail to wrap my brain around just how | fast you can get 1 thread to go if you are able to get out of its | way. I caught myself trying to micro-optimize a data import | method the other day and made myself to do it the "stupid" way | first. Turns out I was definitely wasting my time. Being able to | process and put to disk _millions of things per second_ is some | kind of superpower. | _pastel wrote: | These numbers focus on reads. How does writing speed to cache, | main memory, or disk compare? Anyone have some ballparks to help | me build intuition? | bmitc wrote: | Today I learned that I don't know any of these numbers that | "every" programmer should know. Where do I turn in my programmer | card, Jeff Dean? | morelisp wrote: | You could take this instead as an opportunity to learn them, | instead of reveling in your ignorance. | bmitc wrote: | There is plenty I don't know. It's not me reveling in my | ignorance. | | My point is that programming is an incredibly diverse field | and yet people, even people who supposedly should know | better, are obsessed with making global laws of programming. | I know relative comparisons of speeds that have been useful | in my day jobs, but I'd wager that needing to know the | details of these numbers, how they've evolved, etc. is a | relatively niche area. | | Regarding learning, I try to constantly learn. This is driven | by two things: (1) need, such as one finds in their day job | or to complete some side project; (2) interest. If something | hits either need or interest or hopefully both, I learn it. | zachberger wrote: | I don't think it's important to know the absolute numbers but | rather the relative values and rough orders of magnitude. | | I can't tell you how many times I've had to explain to | developers why their network attaches storage has higher | latency than their locally attached NVME SSD | morelisp wrote: | The absolute numbers are also important. I can't tell you how | many times I've had someone coming from a front-end world | tells me 5ms for some trivial task (e.g. sorting a 1000ish | element list) is "fast" just because it happened faster than | their reaction time. | kragen wrote: | Does anyone have a plot of these on a log-linear scale? Where | does the data come from? | | http://worrydream.com/MagicInk/ | ChuckMcM wrote: | Not an intuitive thing but the data is fascinating. A couple of | notes of people who are confused by it: | | 1) The 'ns' next to the box is a graph legend not a data label | (normally that would be in a box labeled legend to distinguish it | from graph data) | | 2) The weird box and rectangle thing on the top is a slider, I | didn't notice that until I was looking at the code and said "what | slider?" | | 3) The _only_ changes from 2005 to present are storage and | networking speeds. | | What item #3 tells you is that any performance gains in the last | decade and a half you've experienced have been driven by multi- | core, not faster processors. And _that_ means Amdahl 's Law is | more important than Moore's Law these days. | DonHopkins wrote: | At what point did Joy's Law -- "2^(year-1884) MIPS" -- break | with reality? | | https://medium.com/@donhopkins/bill-joys-law-2-year-1984-mil... | | https://en.wikipedia.org/wiki/Joy%27s_law_(computing) | pjc50 wrote: | It also tells us that the speed of light has not increased. | | (well, speed of signal on a PCB track is roughly 2/3 light and | determined by the transmission line geometry and the dielectric | constant, but you all knew that) | thfuran wrote: | Which latency are you suggesting is limited by the speed of | light? | not2b wrote: | It wasn't the speed of light, it was the size of atoms that | was the issue here. As old-style scaling (the kind used up | until about 2003) continued, leakage power was increasing | rapidly because charge carriers (electrons / holes) would | tunnel through gates (I'm simplifying a bit here, other bad | effects were also a factor). It was no longer possible to | keep increasing clock frequency while scaling down feature | size. Further reduction without exploding the power | requirement meant that the clock frequency had to be left | the same and transistors needed to change shape. | chubot wrote: | _What item #3 tells you is that any performance gains in the | last decade and a half you 've experienced have been driven by | multi-core, not faster processors. And that means Amdahl's Law | is more important than Moore's Law these days._ | | Uh or storage and networking? Not sure why you would leave that | out, since they're the bottleneck in many programs. | | The slowest things are the first things you should optimize | horsawlarway wrote: | Yeah... SSDs are so much faster than spinning disk it's not | even funny. | | I literally refuse to run a machine that boots its main OS | from spinning disk anymore. The 60 bucks to throw an SSD into | it is so incredibly cheap for what you get. | | My wife's work gave her a (fairly basic but still fine) | thinkpad - except they left the main drive as a fucking | 5400rpm hdd. Then acted like assclowns when we repeatedly | showed them that the machine is stalling on disk IO, while | the rest of the system is doing diddly squat waiting around. | I finally got tired of it and we "accidentally" spilled water | on it, and somehow just the hdd stopped working (I left out | the part where I'd removed it from the laptop first...). Then | I just had her expense a new SSD and she no longer hates her | work laptop. | | Long story short - Storage speeds are incredible compared to | what they were when I went to school (when 10k rpm was | considered exorbitant) | capitainenemo wrote: | The living room media/gaming machine at home is an 8 | terabyte spinning rust. I didn't bother with a separate SSD | boot partition. | | It's currently been running for 23 days. Booting takes ~15 | seconds even on spinning rust for a reasonable linux | distro, so I'm not going to stress about those 15 seconds | every couple of months. total | used free shared buff/cache available | | Mem: 31Gi 4.6Gi 21Gi 158Mi 5.1Gi 25Gi | | Swap: 37Gi 617Mi 36Gi | | 5.1 gigabytes mostly just file cache. As a result, | everything opens essentially instantly. For a bit better | experience, I did a: | | find ~/minecraft/world -type f -exec cat {} > /dev/null \; | | to forcibly cache that, but that was all I did. | horsawlarway wrote: | Hah, if you can fit the whole OS plus running | applications easily in RAM, and you don't boot often - | fine. But you're basically doing the same thing but with | extra steps :P | capitainenemo wrote: | Well, RAM is significantly faster than even SSD, and now | I don't have to muck about w/ a 2nd drive :) | | Not to mention the spinning rust is cheaper. | dekhn wrote: | (your hard drive story is the story of my life, up to about | 10 years ago. I have eliminated all but one hard drive from | my house and that one doesn't spin most of the time) | | Lately my vendor discussions have centered around how much | work you can get done with a machine that has half a | gigabyte of RAM, 96 cores, and 8 NVME SSDs (it's a lot). my | college box: 40MB disk, 4MB RAM, 1 66MHz CPU. | ianai wrote: | "And that means Amdahl's Law is more important than Moore's Law | these days." | | idk, sure seems like we could have 1-2 cores (permanently | pegged?) at 5 ghz for UI/UX then ($money / $costPerCores) | number of cores for showing off/"performance" by now. But the | OEMs haven't gone that way. | ChuckMcM wrote: | We probably see things differently. As I understand it, this | is exactly the use case for "big/little" microarchitectures. | Take a number of big fast cores that are running full bore, | and a bunch of little cores that can do things for them when | they get tasked. So far they've been symmetric but with | chiplets they needn't be. | ianai wrote: | Yes, for 'computational' loads. I've read though UI/UX | benefits the most from fastest response times. I'm talking | about the cores which actually draw the GUI the user | sees/uses being optimized for the task at the highest | possible rate. Then have a pool of cores for the rest of | it. | moonchild wrote: | UI should be drawn on the GPU. Absent rendering, slow | cores are more than sufficient to do layout/etc. | interactively. | ChuckMcM wrote: | You are talking about the GPU? Okay, really random tidbit | here; When I worked at Intel I was a validation engineer | for the 82786 (which most people haven't heard about) but | was a graphics chip that focused on building responsive, | windowed user interfaces by using hardware features to | display separate windows (so moving windows moved no | actual memory, just updated a couple of registers) to | draw the mouse, and to process character font processing | for faster updates. Intel killed it but if you find an | old "Number9 video card" you might find one to play with. | It had an embedded RISC engine that did bitblit and other | UI type things on chip. | | EVERYTHING that chip did, could in fact be done with a | GPU today. It isn't, for the most part, because window | systems evolved to be CPU driven, although a lot of | phones these days do the UI in the GPU, not the CPU for | this same reason. There is a fun program for HW engineers | called "glscopeclient" which basically renders its UI via | the GPU. | | So I'm wondering if I misread what you said and are | advocating for a different GPU micro architecture or | perhaps an integrated more general architecture on the | chip that could also do UI like APUs? | bee_rider wrote: | I would rather reserve the thermal headroom for actual | computations, rather than having those cores pegged at 5Ghz. | stuartmscott wrote: | > And that means Amdahl's Law is more important than Moore's | Law these days. | | 100%, we can no longer rely on faster processors to make our | code faster, and must instead write code that can take | advantage of the hardware's parallelism. | | For those interested in learning more about Why Amdahl's Law is | Important, my friend wrote an interesting article on this very | topic - https://convey.earth/conversation?id=41 | gameswithgo wrote: | There is some improvement from processors being faster, as more | instructions are done at once and more instructions get down | towards that 1ns latency that l1 caches provide. You see it | happen in real life but the gains are small. | [deleted] | muh_gradle wrote: | I would never have realized the slider functionality until I | read this comment. | raisedbyninjas wrote: | I noticed the year was an editable field but didn't change | the data before I noticed the slider. | jll29 wrote: | The time to open a Web browser seems roughlz constant since 1993. | bb123 wrote: | This site is completely unusable on mobile | [deleted] | dweez wrote: | Okay since we're not going to improve the speed of light any time | soon, here's my idea for speeding up CA to NL roundtrip: let's | straight shot a cable through the center of the earth. | DonHopkins wrote: | We could really use some decent Mexican food here in the | Netherlands. | | https://idlewords.com/2007/04/the_alameda_weehawken_burrito_... | almog wrote: | What's more, we can run that cable along a gravity train from | CA to NL, saving the costs of digging another tunnel. :) | Archelaos wrote: | From CA you will end up off the coast of Madagaskar, and from | the NL somewhere near New Zealand. You do not have to go very | deep inside the earth to get straight from CA to NL. | banana_giraffe wrote: | Assuming my math is right, it'd be a 10% faster trip, but I'd | be all for seeing that tunnel! | jeffbee wrote: | I doubt that same-facility RTT has been fixed at 500us for 30 | years. In EC2 us-east-1 I see < 100us same-availability-zone RTT | on TCP sockets, and those have a lot of very unoptimized software | in the loop. function getDCRTT() { | // Assume this doesn't change much? return 500000; // | ns } | genewitch wrote: | I show 180-350us between various machines on my network, all of | which have some fiber between them. devices with only a switch | and copper between them somehow perform worse, but this is | anecdotal because i'm not running something like smokeping! | | Oh, additionally between VMs i'm getting 180us, so that looks | to be my lower bound, for whatever reason. my main switches are | very old, so maybe that's why. | jeffbee wrote: | Are you measuring that with something like ICMP ping? I think | the way to gauge the actual network speed is to look at the | all-time minimum RTT on a long-established TCP socket. The | Linux kernel maintains this stat for normal TCP connections. | gameswithgo wrote: | An instructive thing here is that a lot of stuff has not improved | since ~2004 or so, and working around those things that have not | improved (memory latency from ram all the way down to l1 cache | really) requires fine control of memory layout and minimizing | cache pollution, which is difficult to do with all of our popular | garbage collected languages, even harder with languages that | don't offer memory layout controls, and jits and interpreters add | further difficulty. | | To get the most out of modern hardware you need to: | | * minimize memory usage/hopping to fully leverage the CPU caches | | * control data layout in memory to leverage the good throughput | you can get when you access data sequentially | | * be able to fully utilize multiple cores without too much | overhead and with minimal risk of error | | For programs to run faster on new hardware, you need to be able | to do at least some of those things. | greggsy wrote: | It's interesting that L2 cache has basically been steady at | 2MB/core since 2004 aswell. It hasn't changed speed in that | time, but is still an order of magnitude faster than memory | across that whole timeframe. Does this suggest that the memory | speed bottleneck means that there simply hasn't been a need to | increase availability of that faster cache? | gameswithgo wrote: | the bigger the cache the longer it takes to address it, and | kinda fundamental physics prevents it being faster | formerly_proven wrote: | Some of these numbers are clearly wrong. Some of the old | latency numbers seem somewhat optimistic (e.g. 100 ns main | memory ref in 1999), some of the newer ones are pessimistic | (e.g. 100 ns main memory ref in 2020). The bandwidth for | disks is clearly wrong, as it claims ~1.2 GB/s for a hard | drive in 2020. The seek time is also wrong. It crossed 10 ms | in 2000 and has reduced to 5 ms in 2010 and is 2 ms for 2020. | Seems like linear interpolation to me. It's also unclear what | the SSD data is supposed to mean before ~2008 as they were | not really a commercial product before then. Also, for 2020 | the SSD transfer rate is given as over 20 GB/s. Main memory | bandwidth is given as 300+ GB/s. | | Cache performance has increased massively. Especially | bandwidth, not reflected in a latency chart. Bandwidth and | latency are of course related; just transferring a cache line | over a PC66 memory bus takes a lot longer than 100 ns. The | same transfer on DDR5 takes a nanosecond or so, which leaves | almost all of the latency budget for existential latency. | | edit: https://github.com/colin- | scott/interactive_latencies/blob/ma... | | The data on this page is simply extrapolated using formulas | and guesses. | throwawaylinux wrote: | Bigger caches could help but as a rule of thumb cache hit | rate increases approximately with the square root of cache | size, so it diminishes. Then the bigger you make a cache, the | slower it tends to be so at some point you could make your | system slower by making your cache bigger and slower. | TillE wrote: | It's pretty remarkable that, for efficient data processing, | it's super super important to care about memory layout / cache | locality in intimate detail, and this will probably be true | until something fundamental changes about our computing model. | | Yet somehow this is fairly obscure knowledge unless you're into | serious game programming or a similar field. | efuquen wrote: | > Yet somehow this is fairly obscure knowledge unless you're | into serious game programming or a similar field. | | Because the impact in optimizing hardware like that can be | not so important in many applications. Getting the absolute | most out of your hardware is very clearly important in game | programming, but web apps where scale being served is not | huge (vast majority)? Not so much. And in this context | developer time is more valuable when you can throw hardware | at the problem for less. | | Traditional game programming you had to run on the hardware | people used to play, you are constrained by the client's | abilities. Cloud gaming might(?) be changing some of that, | but GPUs are super expensive too compared to the rest of the | computing hardware. Even in that case the amounts of data you | are pushing you need to be efficient within the context of | the GPU, my feeling is it's not easily horizontally scaled. | Gigachad wrote: | TBH I don't think cloud gaming is a long term solution. It | might be a medium term solution for people with cheap | laptops but eventually the chip in cheap laptops will be | able to produce photo realistic graphics and there will be | no point going any further than that | sigstoat wrote: | the code appears to just do a smooth extrapolation from some past | value. | | it claims that (magnetic) disk seek is 2ms these days. since when | did we get sub-4ms average seek time drives? | | it also seems to think we're reading 1.115GiB/s off of drives | now. transfer rate on the even the largest drives hasn't exceed | 300MiB/s or so, last i looked. | | ("but sigstoat, nvme drives totally are that fast or faster!" | yes, and i assume those fall under "SSD" on the page, not | "drive".) | CSSer wrote: | Is a commodity network a local network? | kens wrote: | Amazing performance improvements, except no improvement at all on | the packet roundtrip time to Netherlands. Someone should really | work on that. | [deleted] | skunkworker wrote: | Maybe if hollow core fiber is deployed we could see a 50% | reduction in latency (from .66c to .99c). | | Past that physics take over, and unfortunately the speed of | light is pretty slow. | aeyes wrote: | Could LEO satellite networks like Starlink with inter- | satellite links reduce the roundtrip time? | genewitch wrote: | the radius of the arc in low earth orbit (or whatever) is | going to be larger than the arc across the atlantic ocean. | | As no one has ever said: "I'll take glass over gas, | thanks." | toqy wrote: | We really need to go back to 1 supercontinent | dsr_ wrote: | Direct point to point conduits carrying fiber would reduce | latency to a worst case of 21ms, but requires a fiber that | doesn't melt at core temps (around 5200C). | warmwaffles wrote: | Now we just need to either beat the speed of light, or speed | light up. (thanks futurama) | mrfusion wrote: | How did it decrease? | csours wrote: | There's a slider at the top. It took me 2 minutes to find it. | bryanrasmussen wrote: | It used to take 3 minutes, that's quite an improvement. | genewitch wrote: | 6x10^10 ns (that's a lot of zeros!) | chairmanwow1 wrote: | Yeah this link isn't that useful without a comparison. | dekhn wrote: | latencies generally got smaller but spinning rust is still slow | and the speed of light didn't change | tejtm wrote: | lots till 2005, then not much since | [deleted] | jandrese wrote: | The "commodity network" thing is kind of weird. I'd expect that | to make a 10x jump when switches went from Fast Ethernet to | Gigabit (mid-late 2000s?) and then nothing. I certainly don't | feel like they've been smoothly increasing in speed year after | year. | | I'm also curious about those slow 1990s SSDs. ___________________________________________________________________ (page generated 2022-03-03 23:00 UTC)