[HN Gopher] New HTTP standards for caching on the modern web ___________________________________________________________________ New HTTP standards for caching on the modern web Author : pimterry Score : 38 points Date : 2021-10-20 14:19 UTC (1 days ago) (HTM) web link (httptoolkit.tech) (TXT) w3m dump (httptoolkit.tech) | simonw wrote: | Really good article - I hadn't heard about either of these | headers and I really appreciated the clear explanation of both. | forgotmypw17 wrote: | One challenge I've experienced recently is I can't figure out how | to hint to the browser that it should refresh a particular cached | page. (Without appending ?time=1634851491 to the URL.) | | For example, let's say I've already cached the page /new.html | | Now, I click a button which triggers a change to the page, and I | am redirected back to it. | | Even though the page has changed, and the browser should see a | new timestamp in the header if pinging the server, it just | doesn't seem to happen. | | Has anyone dealt with this before? I tried to ask on | StackOverflow, but lately my questions don't seem to get any | attention, and I've run out of reputation to spend on bounties. | toast0 wrote: | There's no standard way for one page to invalidate another. | I've seen some private patches to do it in squid, but that | doesn't help because you want to do it for browsers. | | Your options are probably: | | a) redirect to a different URL as you've done by appending | stuff to it | | b) require revalidation on each request, recipies shown by | other posters | | c) POST to the url you want refreshed; post isn't cachable. | Note that you can't redirect to POST somewhere else, but you | can do it with javascript. | | d) use XHR to force a request as another poster mentioned. | ryanpetrich wrote: | This is what ETags are for. Upon a user's first visit the | server should return an ETag uniquely representing the current | version of the page. The browser will cache both the page and | the tag. Upon subsequent page visits the browser will send an | If-None-Match header containing the tag for the version of the | page it has cached. The server should compare the incoming tag | with the tag for the current version and return a "304 Not | Modified" response if the tags match or a full response with | the newer tag in the ETag header if they don't. | tshaddox wrote: | Yeah, and it works the same way with If-Modified-Since and | Last-Modified. | tyingq wrote: | It's a combination of different headers that's hard to sum up | in a short comment. A good article on the subject should talk | about all these headers: Expires, Cache-control, Etag, Pragma, | Vary, Last-Modified | | Key CDN has an article on it. They certainly would have | experience and expertise there. I didn't read the whole thing, | but it seems to have it covered: | https://www.keycdn.com/blog/http-cache-headers | | There's also some interesting exceptions where rules aren't | followed. Like browsers typically have a completely separate | cache for favicons. I suppose because they use the icons in | funny/different ways, like bookmarks. | | There are also sometimes proxies (especially corporate MITM | ones) that don't follow the rules. Hence the popularity of | cache-busting parameters like you described. | bandie91 wrote: | as of my understanding of the original design of HTTP, each | HTTP resource may state how long itself can be cached in the | response header; and the client (browser, proxy, etc) does not | have to re-request the resource before the expiry. this is the | sandard, so you can not hint that a resource has to be | revalidated - in standard way. obviously since then, several | tricks emerged, like your mentioned timestamped URL approach - | however i'm not sure upto what extent is it standardized in | clients to understand that "/path?query" is somehow related to | "/path", because originally the request string (path and url | parameters) was opaque to the http client, so they should be | cached independently. things obviously changed since then. the | method i use is to fire a request to the URL which has to be | refreshed by Ajax (XHR) with Cache-Control header (yes, it is a | request header too), then display the response content or | redirect to it. | scottlamb wrote: | > however i'm not sure upto what extent is it standardized in | clients to understand that "/path?query" is somehow related | to "/path", because originally the request string (path and | url parameters) was opaque to the http client, so they should | be cached independently. things obviously changed since then. | | It hasn't changed. Those are still cached completely | independently by the user agent. The ?time=... cache busting | trick is meant to produce a cache key that's never been used | before, thus requiring a fresh request. The new request | doesn't clean up the cache entries for the old URLs; it just | doesn't use them. That's one reason it's better to use etag | and such to make the caches work properly, rather than fight | them with this trick. | | On many servers, if new.html is a static file, the same | entity is produced regardless of parameters. But the user | agent doesn't know this. | bawolff wrote: | Cache-Control: max-age=0, must-revalidate | | Sounds like what you want (presuming your server handles 304 | logic correctly) | forgotmypw17 wrote: | I do want caching to happen, however -- until something | changes the page. | scottlamb wrote: | I appreciate that the Cache-Status: header they describe uses RFC | 8941 structured fields and thus ";" to separate items within each | cache and "," between caches. It's like someone put effort into | making it easy to parse. | | Rant time: I just finished writing a state machine parser for | "WWW-Authenticate:" and "Proxy-Authenticate:". Those headers use | comma both to separate challenges and to separate parameters | within a challenge, which just seems mean-spirited. Other things | about HTTP authentication that seem mean, dumb, annoying, or all | of the above: both the RFC 2069 example response and the RFC 7616 | SHA-512-256 example response are calculated incorrectly; RFC | 7616's userhash field seems to require the server to do | O(users_in_database) hashes to know what user to operate on; RFC | 7235's challenge grammar describes a token68 syntax that really | only is used for the credentials in basic, never a challenge; RFC | 7616 drops backwards compatibility for RFC 2069 even though I | bought a product this year that still uses RFC 2069-style | calculations; and it's based on old standards that followed "be | conservative in what you do, be liberal in what you accept from | others" so RFC 7230 section 7 has separate grammars for what | lists you must send and what lists you must accept, which further | complicates parsing the nested lists. ___________________________________________________________________ (page generated 2021-10-21 23:00 UTC)