[HN Gopher] How Jeff Dean's "Latency Numbers Everybody Should Kn...
       ___________________________________________________________________
        
       How Jeff Dean's "Latency Numbers Everybody Should Know" decreased
       from 1990-2020
        
       Author : isaacimagine
       Score  : 186 points
       Date   : 2022-03-03 21:06 UTC (1 hours ago)
        
 (HTM) web link (colin-scott.github.io)
 (TXT) w3m dump (colin-scott.github.io)
        
       | dustingetz wrote:
       | How can network be faster than memory?
        
         | zamadatix wrote:
         | The memory number is measuring access time while the network
         | number is measuring average bandwidth. The two values can't be
         | compared even though they are presented using the same unit.
        
       | the-dude wrote:
       | The slider is very bad UX : I missed it too at first. It is not
       | pronounced enough, partly because it is all the way to the right.
       | 
       | A former boss would say : _make it red_.
        
         | wolpoli wrote:
         | It's really hard to notice the grey slider when the content are
         | already red, green, blue, and black.
        
           | the-dude wrote:
           | Blinking Magenta then.
        
         | nhoughto wrote:
         | Oh right
         | 
         | I was trying to see where the comparison was, totally missed
         | the slider, thanks!
        
         | greggsy wrote:
         | I sympathise that the site probably wasn't designed with mobile
         | in mind, but it's impossible to go beyond 2015 without hitting
         | the GitHub link.
        
           | ygra wrote:
           | You can also drag on the main view instead of the slider.
        
       | lamontcg wrote:
       | How are people practically taking advantage of the increase in
       | speed of SSDs these days compared to network latencies? It looks
       | like disk caches directly at the edge with hot data would be the
       | fastest way of doing things.
       | 
       | I'm more familiar with the 2001-2006 era where redis-like RAM
       | caches for really hot data made a lot of sense, but with spinning
       | rust a disk drives, it made more sense to go over the network to
       | a microservice that was effectively a big sharded RAM cache than
       | to go to disk.
       | 
       | Seems like you could push more hot data to the very edge these
       | days and utilize SSDs like a very large RAM cache (and how does
       | that interact with containers)?
       | 
       | I guess the cost there might still be prohibitive if you have a
       | lot of edge servers and consolidation would still be a big price
       | win even if you take the latency hit across the network.
        
         | gameswithgo wrote:
         | I don't know, I have observed in my workloads, booting, game
         | load, and building programs, that super fast ssds make almost
         | no difference compared to cheap slow ssds. But any ssd is
         | miraculous compared to a spinny drive
         | 
         | Presumably video editing or something might get more of a win
         | but I don't know.
        
           | noizejoy wrote:
           | When I got my first NVMe SSD, I was disappointed that it
           | wasn't significantly faster than my SATA SSD.
           | 
           | But soon I realized that it was Samsung's Magician software
           | that made the SATA SSD competitive with an NVMe SSD via RAM
           | caching.
        
       | rvr_ wrote:
       | 20 years without meaningful improvements on memory access ?
        
         | gameswithgo wrote:
         | yep, got any ideas?
        
         | not2b wrote:
         | It takes at least one clock cycle to do anything, and clock
         | frequency stopped increasing in the 2003-2005 time frame,
         | mainly because of the horrible effects on power with very small
         | feature size.
        
         | aidenn0 wrote:
         | Good news is that SSDs are only 160x slower for random reads,
         | so maybe we should just beef up L3 or L4 cache and get rid of
         | ram? /s
        
       | ohazi wrote:
       | Diminishing returns over the last decade, as expected. It would
       | be interesting to look at the energy consumed by each of these
       | operations across the same time periods.
        
       | swolchok wrote:
       | The source displayed at the bottom if this page clearly shows
       | it's just extrapolating from numbers that are older than 2020.
        
       | gregwebs wrote:
       | According to this all latencies improved dramatically except for
       | SSD random read (disk seek only improved by 10x as well). Reading
       | 1 million bytes sequentially from SSD improved 1000x and is now
       | only 2-3x slower than a random read and for disk reading 1
       | million bytes is faster than a seek. Conclusion: avoid random IO
       | where performance matters.
       | 
       | CPU and RAM latencies stopped improving in 2005 but storage and
       | network kept improving.
        
       | csours wrote:
       | It looks like almost everything is blazing fast now. I'm not sure
       | how long the first X takes though - how long does it take to
       | establish a TCP/IP connection? How long does it take an actual
       | program to start reading from disk?
        
       | bob1029 wrote:
       | Latency is everything.
       | 
       | I believe that sometime around 2010 we peaked on the best
       | software solution for high performance, low-latency processing of
       | business items when working with the style of computer
       | architecture we have today.
       | 
       | https://lmax-exchange.github.io/disruptor/disruptor.html
       | 
       | I have been building systems using this kind of technique for a
       | few years now and I still fail to wrap my brain around just how
       | fast you can get 1 thread to go if you are able to get out of its
       | way. I caught myself trying to micro-optimize a data import
       | method the other day and made myself to do it the "stupid" way
       | first. Turns out I was definitely wasting my time. Being able to
       | process and put to disk _millions of things per second_ is some
       | kind of superpower.
        
       | _pastel wrote:
       | These numbers focus on reads. How does writing speed to cache,
       | main memory, or disk compare? Anyone have some ballparks to help
       | me build intuition?
        
       | bmitc wrote:
       | Today I learned that I don't know any of these numbers that
       | "every" programmer should know. Where do I turn in my programmer
       | card, Jeff Dean?
        
         | morelisp wrote:
         | You could take this instead as an opportunity to learn them,
         | instead of reveling in your ignorance.
        
           | bmitc wrote:
           | There is plenty I don't know. It's not me reveling in my
           | ignorance.
           | 
           | My point is that programming is an incredibly diverse field
           | and yet people, even people who supposedly should know
           | better, are obsessed with making global laws of programming.
           | I know relative comparisons of speeds that have been useful
           | in my day jobs, but I'd wager that needing to know the
           | details of these numbers, how they've evolved, etc. is a
           | relatively niche area.
           | 
           | Regarding learning, I try to constantly learn. This is driven
           | by two things: (1) need, such as one finds in their day job
           | or to complete some side project; (2) interest. If something
           | hits either need or interest or hopefully both, I learn it.
        
         | zachberger wrote:
         | I don't think it's important to know the absolute numbers but
         | rather the relative values and rough orders of magnitude.
         | 
         | I can't tell you how many times I've had to explain to
         | developers why their network attaches storage has higher
         | latency than their locally attached NVME SSD
        
           | morelisp wrote:
           | The absolute numbers are also important. I can't tell you how
           | many times I've had someone coming from a front-end world
           | tells me 5ms for some trivial task (e.g. sorting a 1000ish
           | element list) is "fast" just because it happened faster than
           | their reaction time.
        
       | kragen wrote:
       | Does anyone have a plot of these on a log-linear scale? Where
       | does the data come from?
       | 
       | http://worrydream.com/MagicInk/
        
       | ChuckMcM wrote:
       | Not an intuitive thing but the data is fascinating. A couple of
       | notes of people who are confused by it:
       | 
       | 1) The 'ns' next to the box is a graph legend not a data label
       | (normally that would be in a box labeled legend to distinguish it
       | from graph data)
       | 
       | 2) The weird box and rectangle thing on the top is a slider, I
       | didn't notice that until I was looking at the code and said "what
       | slider?"
       | 
       | 3) The _only_ changes from 2005 to present are storage and
       | networking speeds.
       | 
       | What item #3 tells you is that any performance gains in the last
       | decade and a half you've experienced have been driven by multi-
       | core, not faster processors. And _that_ means Amdahl 's Law is
       | more important than Moore's Law these days.
        
         | DonHopkins wrote:
         | At what point did Joy's Law -- "2^(year-1884) MIPS" -- break
         | with reality?
         | 
         | https://medium.com/@donhopkins/bill-joys-law-2-year-1984-mil...
         | 
         | https://en.wikipedia.org/wiki/Joy%27s_law_(computing)
        
         | pjc50 wrote:
         | It also tells us that the speed of light has not increased.
         | 
         | (well, speed of signal on a PCB track is roughly 2/3 light and
         | determined by the transmission line geometry and the dielectric
         | constant, but you all knew that)
        
           | thfuran wrote:
           | Which latency are you suggesting is limited by the speed of
           | light?
        
             | not2b wrote:
             | It wasn't the speed of light, it was the size of atoms that
             | was the issue here. As old-style scaling (the kind used up
             | until about 2003) continued, leakage power was increasing
             | rapidly because charge carriers (electrons / holes) would
             | tunnel through gates (I'm simplifying a bit here, other bad
             | effects were also a factor). It was no longer possible to
             | keep increasing clock frequency while scaling down feature
             | size. Further reduction without exploding the power
             | requirement meant that the clock frequency had to be left
             | the same and transistors needed to change shape.
        
         | chubot wrote:
         | _What item #3 tells you is that any performance gains in the
         | last decade and a half you 've experienced have been driven by
         | multi-core, not faster processors. And that means Amdahl's Law
         | is more important than Moore's Law these days._
         | 
         | Uh or storage and networking? Not sure why you would leave that
         | out, since they're the bottleneck in many programs.
         | 
         | The slowest things are the first things you should optimize
        
           | horsawlarway wrote:
           | Yeah... SSDs are so much faster than spinning disk it's not
           | even funny.
           | 
           | I literally refuse to run a machine that boots its main OS
           | from spinning disk anymore. The 60 bucks to throw an SSD into
           | it is so incredibly cheap for what you get.
           | 
           | My wife's work gave her a (fairly basic but still fine)
           | thinkpad - except they left the main drive as a fucking
           | 5400rpm hdd. Then acted like assclowns when we repeatedly
           | showed them that the machine is stalling on disk IO, while
           | the rest of the system is doing diddly squat waiting around.
           | I finally got tired of it and we "accidentally" spilled water
           | on it, and somehow just the hdd stopped working (I left out
           | the part where I'd removed it from the laptop first...). Then
           | I just had her expense a new SSD and she no longer hates her
           | work laptop.
           | 
           | Long story short - Storage speeds are incredible compared to
           | what they were when I went to school (when 10k rpm was
           | considered exorbitant)
        
             | capitainenemo wrote:
             | The living room media/gaming machine at home is an 8
             | terabyte spinning rust. I didn't bother with a separate SSD
             | boot partition.
             | 
             | It's currently been running for 23 days. Booting takes ~15
             | seconds even on spinning rust for a reasonable linux
             | distro, so I'm not going to stress about those 15 seconds
             | every couple of months.                              total
             | used        free      shared  buff/cache   available
             | 
             | Mem: 31Gi 4.6Gi 21Gi 158Mi 5.1Gi 25Gi
             | 
             | Swap: 37Gi 617Mi 36Gi
             | 
             | 5.1 gigabytes mostly just file cache. As a result,
             | everything opens essentially instantly. For a bit better
             | experience, I did a:
             | 
             | find ~/minecraft/world -type f -exec cat {} > /dev/null \;
             | 
             | to forcibly cache that, but that was all I did.
        
               | horsawlarway wrote:
               | Hah, if you can fit the whole OS plus running
               | applications easily in RAM, and you don't boot often -
               | fine. But you're basically doing the same thing but with
               | extra steps :P
        
               | capitainenemo wrote:
               | Well, RAM is significantly faster than even SSD, and now
               | I don't have to muck about w/ a 2nd drive :)
               | 
               | Not to mention the spinning rust is cheaper.
        
             | dekhn wrote:
             | (your hard drive story is the story of my life, up to about
             | 10 years ago. I have eliminated all but one hard drive from
             | my house and that one doesn't spin most of the time)
             | 
             | Lately my vendor discussions have centered around how much
             | work you can get done with a machine that has half a
             | gigabyte of RAM, 96 cores, and 8 NVME SSDs (it's a lot). my
             | college box: 40MB disk, 4MB RAM, 1 66MHz CPU.
        
         | ianai wrote:
         | "And that means Amdahl's Law is more important than Moore's Law
         | these days."
         | 
         | idk, sure seems like we could have 1-2 cores (permanently
         | pegged?) at 5 ghz for UI/UX then ($money / $costPerCores)
         | number of cores for showing off/"performance" by now. But the
         | OEMs haven't gone that way.
        
           | ChuckMcM wrote:
           | We probably see things differently. As I understand it, this
           | is exactly the use case for "big/little" microarchitectures.
           | Take a number of big fast cores that are running full bore,
           | and a bunch of little cores that can do things for them when
           | they get tasked. So far they've been symmetric but with
           | chiplets they needn't be.
        
             | ianai wrote:
             | Yes, for 'computational' loads. I've read though UI/UX
             | benefits the most from fastest response times. I'm talking
             | about the cores which actually draw the GUI the user
             | sees/uses being optimized for the task at the highest
             | possible rate. Then have a pool of cores for the rest of
             | it.
        
               | moonchild wrote:
               | UI should be drawn on the GPU. Absent rendering, slow
               | cores are more than sufficient to do layout/etc.
               | interactively.
        
               | ChuckMcM wrote:
               | You are talking about the GPU? Okay, really random tidbit
               | here; When I worked at Intel I was a validation engineer
               | for the 82786 (which most people haven't heard about) but
               | was a graphics chip that focused on building responsive,
               | windowed user interfaces by using hardware features to
               | display separate windows (so moving windows moved no
               | actual memory, just updated a couple of registers) to
               | draw the mouse, and to process character font processing
               | for faster updates. Intel killed it but if you find an
               | old "Number9 video card" you might find one to play with.
               | It had an embedded RISC engine that did bitblit and other
               | UI type things on chip.
               | 
               | EVERYTHING that chip did, could in fact be done with a
               | GPU today. It isn't, for the most part, because window
               | systems evolved to be CPU driven, although a lot of
               | phones these days do the UI in the GPU, not the CPU for
               | this same reason. There is a fun program for HW engineers
               | called "glscopeclient" which basically renders its UI via
               | the GPU.
               | 
               | So I'm wondering if I misread what you said and are
               | advocating for a different GPU micro architecture or
               | perhaps an integrated more general architecture on the
               | chip that could also do UI like APUs?
        
           | bee_rider wrote:
           | I would rather reserve the thermal headroom for actual
           | computations, rather than having those cores pegged at 5Ghz.
        
         | stuartmscott wrote:
         | > And that means Amdahl's Law is more important than Moore's
         | Law these days.
         | 
         | 100%, we can no longer rely on faster processors to make our
         | code faster, and must instead write code that can take
         | advantage of the hardware's parallelism.
         | 
         | For those interested in learning more about Why Amdahl's Law is
         | Important, my friend wrote an interesting article on this very
         | topic - https://convey.earth/conversation?id=41
        
         | gameswithgo wrote:
         | There is some improvement from processors being faster, as more
         | instructions are done at once and more instructions get down
         | towards that 1ns latency that l1 caches provide. You see it
         | happen in real life but the gains are small.
        
           | [deleted]
        
         | muh_gradle wrote:
         | I would never have realized the slider functionality until I
         | read this comment.
        
           | raisedbyninjas wrote:
           | I noticed the year was an editable field but didn't change
           | the data before I noticed the slider.
        
       | jll29 wrote:
       | The time to open a Web browser seems roughlz constant since 1993.
        
       | bb123 wrote:
       | This site is completely unusable on mobile
        
       | [deleted]
        
       | dweez wrote:
       | Okay since we're not going to improve the speed of light any time
       | soon, here's my idea for speeding up CA to NL roundtrip: let's
       | straight shot a cable through the center of the earth.
        
         | DonHopkins wrote:
         | We could really use some decent Mexican food here in the
         | Netherlands.
         | 
         | https://idlewords.com/2007/04/the_alameda_weehawken_burrito_...
        
         | almog wrote:
         | What's more, we can run that cable along a gravity train from
         | CA to NL, saving the costs of digging another tunnel. :)
        
         | Archelaos wrote:
         | From CA you will end up off the coast of Madagaskar, and from
         | the NL somewhere near New Zealand. You do not have to go very
         | deep inside the earth to get straight from CA to NL.
        
         | banana_giraffe wrote:
         | Assuming my math is right, it'd be a 10% faster trip, but I'd
         | be all for seeing that tunnel!
        
       | jeffbee wrote:
       | I doubt that same-facility RTT has been fixed at 500us for 30
       | years. In EC2 us-east-1 I see < 100us same-availability-zone RTT
       | on TCP sockets, and those have a lot of very unoptimized software
       | in the loop.                   function getDCRTT() {
       | // Assume this doesn't change much?             return 500000; //
       | ns         }
        
         | genewitch wrote:
         | I show 180-350us between various machines on my network, all of
         | which have some fiber between them. devices with only a switch
         | and copper between them somehow perform worse, but this is
         | anecdotal because i'm not running something like smokeping!
         | 
         | Oh, additionally between VMs i'm getting 180us, so that looks
         | to be my lower bound, for whatever reason. my main switches are
         | very old, so maybe that's why.
        
           | jeffbee wrote:
           | Are you measuring that with something like ICMP ping? I think
           | the way to gauge the actual network speed is to look at the
           | all-time minimum RTT on a long-established TCP socket. The
           | Linux kernel maintains this stat for normal TCP connections.
        
       | gameswithgo wrote:
       | An instructive thing here is that a lot of stuff has not improved
       | since ~2004 or so, and working around those things that have not
       | improved (memory latency from ram all the way down to l1 cache
       | really) requires fine control of memory layout and minimizing
       | cache pollution, which is difficult to do with all of our popular
       | garbage collected languages, even harder with languages that
       | don't offer memory layout controls, and jits and interpreters add
       | further difficulty.
       | 
       | To get the most out of modern hardware you need to:
       | 
       | * minimize memory usage/hopping to fully leverage the CPU caches
       | 
       | * control data layout in memory to leverage the good throughput
       | you can get when you access data sequentially
       | 
       | * be able to fully utilize multiple cores without too much
       | overhead and with minimal risk of error
       | 
       | For programs to run faster on new hardware, you need to be able
       | to do at least some of those things.
        
         | greggsy wrote:
         | It's interesting that L2 cache has basically been steady at
         | 2MB/core since 2004 aswell. It hasn't changed speed in that
         | time, but is still an order of magnitude faster than memory
         | across that whole timeframe. Does this suggest that the memory
         | speed bottleneck means that there simply hasn't been a need to
         | increase availability of that faster cache?
        
           | gameswithgo wrote:
           | the bigger the cache the longer it takes to address it, and
           | kinda fundamental physics prevents it being faster
        
           | formerly_proven wrote:
           | Some of these numbers are clearly wrong. Some of the old
           | latency numbers seem somewhat optimistic (e.g. 100 ns main
           | memory ref in 1999), some of the newer ones are pessimistic
           | (e.g. 100 ns main memory ref in 2020). The bandwidth for
           | disks is clearly wrong, as it claims ~1.2 GB/s for a hard
           | drive in 2020. The seek time is also wrong. It crossed 10 ms
           | in 2000 and has reduced to 5 ms in 2010 and is 2 ms for 2020.
           | Seems like linear interpolation to me. It's also unclear what
           | the SSD data is supposed to mean before ~2008 as they were
           | not really a commercial product before then. Also, for 2020
           | the SSD transfer rate is given as over 20 GB/s. Main memory
           | bandwidth is given as 300+ GB/s.
           | 
           | Cache performance has increased massively. Especially
           | bandwidth, not reflected in a latency chart. Bandwidth and
           | latency are of course related; just transferring a cache line
           | over a PC66 memory bus takes a lot longer than 100 ns. The
           | same transfer on DDR5 takes a nanosecond or so, which leaves
           | almost all of the latency budget for existential latency.
           | 
           | edit: https://github.com/colin-
           | scott/interactive_latencies/blob/ma...
           | 
           | The data on this page is simply extrapolated using formulas
           | and guesses.
        
           | throwawaylinux wrote:
           | Bigger caches could help but as a rule of thumb cache hit
           | rate increases approximately with the square root of cache
           | size, so it diminishes. Then the bigger you make a cache, the
           | slower it tends to be so at some point you could make your
           | system slower by making your cache bigger and slower.
        
         | TillE wrote:
         | It's pretty remarkable that, for efficient data processing,
         | it's super super important to care about memory layout / cache
         | locality in intimate detail, and this will probably be true
         | until something fundamental changes about our computing model.
         | 
         | Yet somehow this is fairly obscure knowledge unless you're into
         | serious game programming or a similar field.
        
           | efuquen wrote:
           | > Yet somehow this is fairly obscure knowledge unless you're
           | into serious game programming or a similar field.
           | 
           | Because the impact in optimizing hardware like that can be
           | not so important in many applications. Getting the absolute
           | most out of your hardware is very clearly important in game
           | programming, but web apps where scale being served is not
           | huge (vast majority)? Not so much. And in this context
           | developer time is more valuable when you can throw hardware
           | at the problem for less.
           | 
           | Traditional game programming you had to run on the hardware
           | people used to play, you are constrained by the client's
           | abilities. Cloud gaming might(?) be changing some of that,
           | but GPUs are super expensive too compared to the rest of the
           | computing hardware. Even in that case the amounts of data you
           | are pushing you need to be efficient within the context of
           | the GPU, my feeling is it's not easily horizontally scaled.
        
             | Gigachad wrote:
             | TBH I don't think cloud gaming is a long term solution. It
             | might be a medium term solution for people with cheap
             | laptops but eventually the chip in cheap laptops will be
             | able to produce photo realistic graphics and there will be
             | no point going any further than that
        
       | sigstoat wrote:
       | the code appears to just do a smooth extrapolation from some past
       | value.
       | 
       | it claims that (magnetic) disk seek is 2ms these days. since when
       | did we get sub-4ms average seek time drives?
       | 
       | it also seems to think we're reading 1.115GiB/s off of drives
       | now. transfer rate on the even the largest drives hasn't exceed
       | 300MiB/s or so, last i looked.
       | 
       | ("but sigstoat, nvme drives totally are that fast or faster!"
       | yes, and i assume those fall under "SSD" on the page, not
       | "drive".)
        
       | CSSer wrote:
       | Is a commodity network a local network?
        
       | kens wrote:
       | Amazing performance improvements, except no improvement at all on
       | the packet roundtrip time to Netherlands. Someone should really
       | work on that.
        
         | [deleted]
        
         | skunkworker wrote:
         | Maybe if hollow core fiber is deployed we could see a 50%
         | reduction in latency (from .66c to .99c).
         | 
         | Past that physics take over, and unfortunately the speed of
         | light is pretty slow.
        
           | aeyes wrote:
           | Could LEO satellite networks like Starlink with inter-
           | satellite links reduce the roundtrip time?
        
             | genewitch wrote:
             | the radius of the arc in low earth orbit (or whatever) is
             | going to be larger than the arc across the atlantic ocean.
             | 
             | As no one has ever said: "I'll take glass over gas,
             | thanks."
        
         | toqy wrote:
         | We really need to go back to 1 supercontinent
        
           | dsr_ wrote:
           | Direct point to point conduits carrying fiber would reduce
           | latency to a worst case of 21ms, but requires a fiber that
           | doesn't melt at core temps (around 5200C).
        
       | warmwaffles wrote:
       | Now we just need to either beat the speed of light, or speed
       | light up. (thanks futurama)
        
       | mrfusion wrote:
       | How did it decrease?
        
         | csours wrote:
         | There's a slider at the top. It took me 2 minutes to find it.
        
           | bryanrasmussen wrote:
           | It used to take 3 minutes, that's quite an improvement.
        
             | genewitch wrote:
             | 6x10^10 ns (that's a lot of zeros!)
        
         | chairmanwow1 wrote:
         | Yeah this link isn't that useful without a comparison.
        
         | dekhn wrote:
         | latencies generally got smaller but spinning rust is still slow
         | and the speed of light didn't change
        
         | tejtm wrote:
         | lots till 2005, then not much since
        
           | [deleted]
        
       | jandrese wrote:
       | The "commodity network" thing is kind of weird. I'd expect that
       | to make a 10x jump when switches went from Fast Ethernet to
       | Gigabit (mid-late 2000s?) and then nothing. I certainly don't
       | feel like they've been smoothly increasing in speed year after
       | year.
       | 
       | I'm also curious about those slow 1990s SSDs.
        
       ___________________________________________________________________
       (page generated 2022-03-03 23:00 UTC)