[HN Gopher] When network is faster than browser cache (2020) ___________________________________________________________________ When network is faster than browser cache (2020) Author : harporoeder Score : 173 points Date : 2022-06-29 16:55 UTC (6 hours ago) (HTM) web link (simonhearne.com) (TXT) w3m dump (simonhearne.com) | hgazx wrote: | When I had a very old and slow hard disk I ran my browsers | without disk cache precisely because of this reason. | agumonkey wrote: | What's the storage capacity of internet cables ? | moralestapia wrote: | On the order of petabytes. | arccy wrote: | https://github.com/yarrick/pingfs | LeonenTheDK wrote: | Explored here: http://tom7.org/harder/ | | There's a video and a paper linked on this page with the | information on this absurdity. | agumonkey wrote: | oh of course, how could I forget this :) | kreetx wrote: | Much confusion in the comments. | | Tl;dr: cache is "slow" because the number of ongoing requests-- | including to cache! --are throttled by the browser. I.e the cache | isn't slow, but reading it is waited upon and the network request | might be ahead in the queue. | neomantra wrote: | Different anecdote, but similar vibe.... | | In ~2010, I was benchmarking Solarflare (Xilinx/AMD now) cards | and their OpenOnload kernel-bypass network stack. The results | showed that two well-tuned systems could communicate faster | (lower latency) than two CPU sockets within the same server that | had to wait for the kernel to get involved (standard network | stack). It was really illuminating and I started re-architecting | based on that result. | | Backing out some of that history... in ~2008, we started using | FPGAs to handle specific network loads (US equity market data). | It was exotic and a lot of work, but it significantly benefited | that use case, both because of DMA to user-land and its filtering | capabilities. | | At that time our network was all 1 Gigabit. Soon thereafter, | exchanges started offering 10G handoffs, so we upgraded our | entire infrastructure to 10G cut-through switches (Arista) and | 10G NICs (Myricom). This performed much better than the 1G FPGA | and dramatically improved our entire infrastructure. | | We then ported our market data feed handler's to Myricom's user- | space network stack, because loads were continually increasing | and the trading world was continually more competitive... and | again we had a narrow solution (this time in software) to a | challenging problem. | | Then about a year later, Solarflare and it's kernel-compatible | OpenOnload arrived and we could then apply the power of kernel | bypass to our entire infrastructure. | | After that, the industry returned to FPGAs again with 10G PHY and | tons of space to put whole strategies... although I was never | involved with that next generation of trading tech. | | I personally stayed with OpenOnload for all sorts of workloads, | growing to use it with containerization and web stacks (Redis, | Nginx). Nowadays you can use OpenOnload with XDP; again a narrow | technology grows to fit broad applicability. | throw6554 wrote: | That reminds me of the when the openWRT and other open source | guys were complaining that the home gateways of the time did | not have a big enough CPU to max out the uplink (10-100mbps at | the time), and instead built in hardware accelerators. What | they did not know was that the hw accelerator was merely an | even smaller CPU running a proprietary network stack. | tomcam wrote: | That was such a cool comment that I actually went to | http://www.neomantra.com/ and briefly considered applying for a | job there and un-retiring | jxy wrote: | Does containerization have any impact on performance in your | use cases? | neomantra wrote: | I've not exhaustively characterized it, but RedHat has this | comparison [1]. It is not enough for the scope/scale of my | needs. I do still run bare metal workloads though. | | That said, I have had operational issues in migrating to | Docker, which is another sort of performance impact! I | reference some of my core isolation issues in that Redis gist | and this GitHub issue [2]. | | [1] https://www.redhat.com/cms/managed- | files/201504-onload_conta... [2] https://github.com/moby/moby | /issues/31086#issuecomment-30374... | oogali wrote: | I went down a similar path (Solarflare, Myricom, and Chelsio; | Arista, BNT, and 10G Twinax) and found we could get lower | latency internally by confining two OpenOnload-enabled | processes that needed to communicate with each other to the | same NUMA domain. | | We architected our applications around that as well. The firm | continued on to the FPGA path though that was after my time | there. | | I do still pick up SolarFlare cards off of eBay for home lab | purposes. | neomantra wrote: | Nice! We deploy like that too. I document how to do that with | Redis here [1]. | | [1] https://gist.github.com/neomantra/3c9b89887d19be6fa5708bf | 401... | oogali wrote: | Yup, all the same techniques we used (minus Docker but with | the addition of disabling Intel Turbo Boost and hyper | threading). | | A few years ago I met with a cloud computing company who | was following similar techniques to reduce the noisy | neighbor problem and increase performance consistency. | | Frankly, it's good to see that there still are bare metal | performance specialists out there doing their thing. | scott_s wrote: | I discovered something similar in the early to mid 2010s: | processes on different Linux systems communicated over TCP | faster than processes on the same Linux system. That is, going | over the network was faster than on the same machine. The | reason was simple: there is a global lock per system for the | localhost pseudo-device. | | Processes communicating on different systems had actual | parallelism, because they each had their own network device. | Processes communicating on the same system were essentially | serialized, because they were competing with each other for the | same pseudo-device. At the time, Linux kernel developers | basically said "Yeah, don't do that" when people brought up the | performance problem. | nine_k wrote: | I wonder if creating more pseudo-devices for in-host | networking would help. Doesn't Docker do that already, for | other purposes? | kcexn wrote: | That makes sense, Linux has many highly performant IPC | options that don't involve the network device. Just the time | it takes to correctly construct an ethernet frame is not | negligible. | btdmaster wrote: | The cache, for me, is 100:1 winning the races compared to the | network. Are there greatly different results for others in | about:networking#rcwn? | sattoshi wrote: | > This seemed odd to me, surely using a cached response would | always be faster than making the whole request again! Well it | turns out that in some cases, the network is faster than the | cache. | | Did I miss a follow up on this, or did it remain unanswered as to | what the benefit of racing against the network is? | | The post basically says that sometimes the cache is slower | because of throttling or bugs, but mostly bugs. | | Why is Firefox sending an extra request instead of figuring out | what is slowing down the cache? It seems like an overly expensive | mitigation... | philsnow wrote: | > Concatenating / bundling your assets is probably still a good | practice, even on H/2 connections. Obviously this comes on | balance with cache eviction costs and splitting pre- and post- | load bundles. | | I guess this latter part refers to the trade-off between | compiling all your assets into a single file, and then requiring | clients to re-download the entire bundle if you change a single | CSS color. The other extreme is to not bundle anything (which, I | gather from the article, is the standard practice since all major | browsers support HTTP/2) but this leads to the described issue. | | What about aggressively bundling, but also keeping track at | compile time of diffs between historical bundles and the new | bundle? Re-connecting clients could grab a manifest that names | the newest mega-bundle as well as a map from historical versions | to the patches needed to bring them up to date. A lot more work | on the server side but maybe it could be a good compromise? | | Of course that's the easy version, but it has a huge flaw which | is all first-time clients have to download the entire huge mega | bundle before the browser can render anything, so to make it | workable it would have to compile things into a few bootstrap | stages instead of a single mega-bundle. | | I am _clearly_ not a frontend dev. If you 're going to throw | tomatoes please also tell me why ;) | | * edit: found the repo that made me think of this idea, | https://github.com/msolo/msolo/blob/master/vfl/vfl.py but it's | from antiquity and probably predates node and babel/webpack, but | the idea is you name individual asset versions with a SHA or tag | and let them be cached forever, and to update the web app you | just change a reference in the root to make clients download a | different resource dependency tree root (and they re-use | unchanged ones) * | staticassertion wrote: | There's probably some balance here. Since there's a limit of 9 | concurrent requests before throttling occurs you can bucket 9 | objects, concatenating into each. So if you have a bunch of | static content, concat that into 1 bucket. If you have another | object that changes a lot, keep that separate. If you have two | other objects that change together, bucket those, etc. | | Seems like a huge pain to think about tbh. Seems like part of | the problem would be solved by compiling everything into a | single file that supported streaming execution. | Klathmon wrote: | There's a good middle ground of bundling your SPA into chunks | of related files (I prefer to name them the SHA hash of the | content), and giving them good cache lifetimes. | | You can have a "vendor" chunk (or a few) which just holds all | 3rd party dependencies, a "core components" chunk which holds | components which are likely used on most pages, and then | individual chunks for the rest of the app broken down by page | or something. | | It speeds up compilation, gives better caching, no need for a | stateful mapping file outside of the HTML file loaded (which is | kind of the point of the <link> tag anyway!), and has lots of | knobs to tune if you want to really squeeze out the best load | times. | philsnow wrote: | Another tool that bears on this idea is courgette [0], in that | it operates on one of the intermediate representations of the | final bundle in order to achieve better compression. | | https://blog.chromium.org/2009/07/smaller-is-faster-and-safe... | oblak wrote: | Well, yeah. Disk cache can take hundreds of MS to retrieve, even | on modern SSDs. I had a handful of oddly heated discussions with | an architect about this exact thing at my previous job. Showing | him the network tab did not because he had read articles and was | well informed about these things. | OJFord wrote: | Why are you measuring latency in mega Siemens? | kccqzy wrote: | At a previous job I worked on serving data from SSDs. I wasn't | really involved in configuring the hardware but I believe they | were good quality enterprise-grade SSDs. My experience was that | a random read (which could be a small number of underlying | reads) from mmap()'ed files from those SSDs took between 100 | and 200 microseconds. That's far from your figure of hundreds | of milliseconds. | | Of course 200 microseconds still isn't fast. That translates to | serving 5000 requests per second, leaving the CPU almost | completely idle. | | Another odd fact was that we in fact did have to implement our | own semaphores and throttling to limit concurrent reads from | SSDs. | TheDudeMan wrote: | According to this, it can take 10ms (not 100s of ms) and only | for a very large read (like 32MB). | https://i.gzn.jp/img/2021/06/07/reading-from-external-memory... | Full article: | https://gigazine.net/gsc_news/en/20210607-reading-from-exter... | [deleted] | Retr0id wrote: | This assumes it only takes a single disk read to locate and | retreive an object from the cache (which is unlikely to be | the case). | TheDudeMan wrote: | OK, one small read plus one huge read would top-out at | about 10.1ms, according to that graph. | Retr0id wrote: | Also a big assumption (which we could verify by looking | at the relevant implementations, but I'm not going to) | TheDudeMan wrote: | (Anyone who is not running their OS and temp space on NVME | should not expect good performance. Such a configuration has | been very cheap for several years now.) | zinekeller wrote: | > Such a configuration has been very cheap for several | years now. | | This is a very weird comment, considering that a) it's | cheap _er_ than yesteryear but SATA SDD (or even modern | magnetic HDDs) are still sold and are in active use and b) | ignores phones completely, where a large number of sites | would have mobile-dominated visitors and can 't just switch | to an NVMe-like performance even for those with large | disposable incomes (because at the end of the day even with | UFS phones are still slower than NVMe latency-wise). | staticassertion wrote: | The issue has nothing to do with disk speed. If you had read | the article you'd see a very nice chart that shows the vast | majority of cache hits returning in under 2 or 3ms. | divbzero wrote: | I wish I had a clearer memory or record of this, but I think | I've also ~100ms for browser cache retrieval on an SSD. Has | anyone else observed this and have an explanation? A sibling | comment points out that SSD read latency should be ~10ms at | most so the limitation must be in the software? | | OP mentioned specifically that "there have been bugs in | Chromium with request prioritisation, where cached resources | were delayed while the browser fetched higher priority requests | over the network" and that "Chrome actively throttles requests, | including those to cached resources, to reduce I/O contention". | I wonder if there are also other limitations with how browsers | retrieve from cache. | staticassertion wrote: | > Has anyone else observed this and have an explanation? | | Yes that is the subject of this post. | | https://simonhearne.com/2020/network-faster-than- | cache/#the-... | divbzero wrote: | The graphs in OP show that cache latency is mostly ~10ms | for desktop browsers. ~100ms would still be an outlier. | staticassertion wrote: | The Y axis of the chart that I linked, entitled 'Cache | Retrieval Time by Count of Cached Assets', shows latency | well above 100ms. | divbzero wrote: | Thanks. Switching the metric from _Average_ to _Max_ does | show that the cache retrieval time can reach ~100ms even | when cached resource count is low. | dblohm7 wrote: | Firefox's HTTP cache races with the network for precisely this | reason. | [deleted] | cwoolfe wrote: | So is the takeaway that data in the RAM of some server connected | by fast network is sometimes "closer" in retrieval time than that | same data on a local SSD? | philsnow wrote: | Back in ~2003 I had bought a new motherboard + cpu (a Duron | 800MHz IIRC) but as a poor college kid, only had enough money | left over for 128MB of RAM.. but the system I was replacing had | ~768MB. I made a ~640MB ramdisk on the old system and mounted | it on the new system as a network block device, and the result | was much, much faster than local swap (this was before consumer | SSDs though). | | [0] "nbd" / https://en.wikipedia.org/wiki/Network_block_device | ; this driver is of course still in the kernel; you could do | this today with an anemic raspberry pi if you wanted | adamius wrote: | Now I'm imagining a rack of raspis acting as one giant ram | swap drive over nbd. This could work for a given value of | "work". cost of pi vs stick of ram. A kv storage as well | perhaps. | | Then again, whats a TB worth on just one xeon server? | Probably cheaper... or not? | jhartwig wrote: | Have you seen the cost of Pis lately :) | staticassertion wrote: | Not really. The issue is with throttling. | r1ch wrote: | For me this is very noticeable whenever I open a new Chrome tab. | It takes 3+ seconds for the icons of the recently visited sites | to appear, whatever cache is used for the favicons is extremely | slow. Thankfully the disk cache for other resources runs at a | more normal speed. | TheDudeMan wrote: | Seems like a badly-designed cache. | charcircuit wrote: | When you make the request the server the server will have to | look up the image from its own "cache" before having to send it | back to you. The client's cache would have to be not only | slower than it's ping, but slower than it's ping + the server's | "cache." | staticassertion wrote: | The issue has nothing to do with the cache itself. It's about | the throttling behavior. | gowld wrote: | The throttling is part of the cache design. It's browser | cache, not a multi-client cache in the OS. | staticassertion wrote: | That doesn't sound right based on my reading of this | document. If I'm wrong please do correct me, I don't know a | ton about the internals here. | | https://docs.google.com/document/d/1Aa7OKFRdtmn4IFzgHYfqeqk | 5... | gowld wrote: | The article shows a lot of data about cache speed, but I don't | see a comparison to cacheless network. | dllthomas wrote: | ... oh, _that_ cache. | femiagbabiaka wrote: | I had the same reaction. It's a good title. | staticassertion wrote: | Since apparently no one is willing to read this excellent | article, which even comes with fun sliders and charts... | | > It turns out that Chrome actively throttles requests, including | those to cached resources, to reduce I/O contention. This | generally improves performance, but will mean that pages with a | large number of cached resources will see a slower retrieval time | for each resource. | pclmulqdq wrote: | I am a systems engineer. I read the article title, then started | reading the article, and realized it was a bait and switch. It | is not about "network" vs "cache" in computer systems terms, | which is what you might expect. It is about "network" vs "the | (usually antiquated) file-backed database your browser calls a | cache." The former would have been a compelling article. The | latter is kind of self-evident: the browser cache is there to | save bandwidth, not to be faster. | staticassertion wrote: | I find it odd to call it a bait and switch when the first | thing in the article is an outline. | | > It is about "network" vs "the (usually antiquated) file- | backed database your browser calls a cache." | | It's actually nothing to do with the design of the cache | itself as far as I can tell. If you finish reading the | article you'll see that it's about a throttling behavior that | interacts poorly with the common optimization advice for HTTP | 1.1+, exposed by caching. | | > The latter is kind of self-evident: the browser cache is | there to save bandwidth, not to be faster. | | I don't think that's something you can just state | definitively. I suspect most people do in fact view the cache | as an optimization for latency. Especially since right at the | start of the article, the first sentence, the "race the | cache" optimization is introduced - an optimization that is | clearly for latency and not bandwidth purposes. | karmakaze wrote: | It is a bait and switch. The issue is with local files and | throttling, neither word appears in the title or outline. | | Edit: I didn't need this post to tell me about the "waiting | for cache" message I used to see with Chrome. | dchftcs wrote: | When I see "browser cache" I can't think of anything | other than a local file storage. Maybe it confused you, | but there's no deliberate misleading. | karmakaze wrote: | But it's not even in general it's about Chrome's specific | implementation. | hinkley wrote: | Memcached exists for two reasons: Popular languages that | hit inflection points when in-memory caches exceeded a | certain size, and network cards becoming lower latency than | SATA hard drives. | | The latter is a well known and documented phenomenon. The | moment popular drives are consistently faster than network | again, expect to see a bunch of people writing articles | about using local or disk cache, recycling old arguments | from two decades ago. | staticassertion wrote: | OK but that's got nothing to do with the post. | pclmulqdq wrote: | When I have worked on distributed systems, there are often | several layers of caches that have nothing to do with | latency: the point of the cache is to reduce load on your | backend. Often, these caches are designed with the | principle that they should not hurt any metric (ie a well- | designed cache in a distributed system should not have | worse latency than the backend). This, in turn, improves | average latency and systemic throughput, and lets you serve | more QPS for less money. | | CPU caches are such a common thing to think about now that | we have associated the word "cache" with latency | improvements, since that is one of the most obvious | benefits of CPU caches. It is not a required feature of | caches in general, though. | | The browser cache was built for a time when bandwidth was | expensive, often paid per byte, the WAN had high latency, | and disk was cheap (but slow). I don't know exactly when | the browser cache was invented, but it was exempted from | the DMCA in 1998. Today, bandwidth is cheap and internet | latency is a lot lower than it used to be. From first | principles, it makes sense that the browser cache, designed | to save you bandwidth, does not help your website's | latency. | | Edit: In light of the changes in the characteristics of | computers and the web, this article seems to mainly be an | argument for disabling caching on high-bandwidth links on | the browser side, rather than suggesting "performance | optimizations" that might silently cost your customers on | low-bandwidth links money. | kreetx wrote: | Did you read the article? | | Cache is "slow" because the number of ongoing requests-- | including to cache! --are throttled by the browser. I.e | the cache isn't slow, but reading it is waited upon and | the network request might be ahead in the queue. | citrin_ru wrote: | > "network" vs "the (usually antiquated) file-backed database | your browser calls a cache." | | Firefox uses SQLite for on-disk cache backend. Not the latest | buzzword, but not exactly antiquated. I expect a cache backed | in Chrome to be at least as fast. | | > cache is there to save bandwidth, not to be faster | | In most cases a cache saves bandwidth and reduces page load | time at the same time. Internet connection which is faster | than a local SSD/HDD is a rare case. | cuteboy19 wrote: | It is not immediately obvious why a local filesystem would be | slower than network | bee_rider wrote: | Well. I mean, get a slow enough hard drive and a nice | enough network and we can get there, haha. | pclmulqdq wrote: | If you live in a rich country and are on fiber internet | rather than a cable modem (or if your cable modem is new | enough), you likely have better latency to your nearest | CDN than you do to the average byte on an HDD in your | computer. An SSD will still win, though. | | The browser cache is kind of like a database and also | tends to hold a lot of cold files, so it may take | multiple seeks to retrieve one of them. Apparently it has | throughput problems too, thanks to some bottlenecks that | are created by the abstractions involved. | roywashere wrote: | On 'recent' files your operating system will most typically | not even touch the disk because they will aggressively | cache those files in memory | cogman10 wrote: | Seems like a shortcut that shouldn't be. | | I can understand throttling network requests, but disk | requests? The only reason to do that would be for power savings | (you don't want to push the CPU into a higher state as it loads | up the data). | fnordpiglet wrote: | Depends on if you value latency for the user. Saying I try | both and choose the one coming first hurts no one but the | servers not being protected by a client cache. But there's | absolutely no reason to believe a client has a cache that | masks requests. AFAIK there's no standard that says clients | use caches for parsimony and not exclusively latency. As a | matter of fact I think this is a good idea if it ever takes | time to consult the cache, and the trade off is more | bandwidth consumption which we are awash in. If you care that | much run a caching proxy and use that and you'll get the same | effect of the client side cache masking requests. But I would | say it's superior because it always uses the local cache | first and doesn't waste user time on the edge condition in | their cache coherency. It comes from Netscape which famously | convinced everyone that it's one of the hardest problems. | That leads to the final benefit, the cache doesn't have to | cohere. If it's too expensive at that moment to cohere and | query then I can use the network source. Again the only | downside is the network bandwidth is more consistently user. | I would be hard pressed to believe most Firefox users already | are grossly bandwidth over provisioned, and the amount of a | fraction of a cable line a web browser loading from the cache | no one could even notice that. | cogman10 wrote: | > Depends on if you value latency for the user. | | I do. Which is why it's silly to throttle the cache IO. | | The read latency for disks is measured in microseconds. Is | it possible for the server to be able to respond faster? | Sure. However, if you aren't within 20 miles of the server | then I can't see how it could be faster (speed of light and | everything). | | These design considerations will depend greatly on where | you are at. It MIGHT be the case that eschewing a client | cache in a server to server talk is the right move because | your servers are likely physically close together. That'd | mean the server can do a better job making multiple client | requests faster through caching saving the memory/disk | space required for the clients. | | There is also the power consideration. It take a lot more | power for a cell phone to handle a network request than it | does to handle a disk request. Shouting into the ether | isn't cheap. | Spooky23 wrote: | What happens when your 200 browser tabs are all looking | for updates? | citrin_ru wrote: | It makes sense to apply some throttling (and reduce CPU | priority) for inactive tabs. | | But the active tab when browser window is an active one | should work without any throttling for lower response | time. | cogman10 wrote: | If your 200 browser tabs are saturating your 8gbps link | to your m.2 ssd, what do you think they'll do to your | 10mbps connection to the internet? | sieabahlpark wrote: | staticassertion wrote: | I imagine it's just that the cache is in an awkward place | where the throttling logic is unaware of the cache and the | cache is not able to preempt the throttling logic. | cogman10 wrote: | I'd guess the same thing is going on. Definitely FEELS like | a code structure problem. | cyounkins wrote: | Worth noting that around the end of 2020, Chrome and Firefox | enabled cache partitioning by eTLD+1 to prevent scripts from | gaining info from a shared cache. This kills the expected high | hit rate from CDN-hosted libraries. | https://developer.chrome.com/blog/http-cache-partitioning/ | https://developer.mozilla.org/en-US/docs/Web/Privacy/State_P... ___________________________________________________________________ (page generated 2022-06-29 23:00 UTC)