[HN Gopher] New HTTP standards for caching on the modern web
       ___________________________________________________________________
        
       New HTTP standards for caching on the modern web
        
       Author : pimterry
       Score  : 38 points
       Date   : 2021-10-20 14:19 UTC (1 days ago)
        
 (HTM) web link (httptoolkit.tech)
 (TXT) w3m dump (httptoolkit.tech)
        
       | simonw wrote:
       | Really good article - I hadn't heard about either of these
       | headers and I really appreciated the clear explanation of both.
        
       | forgotmypw17 wrote:
       | One challenge I've experienced recently is I can't figure out how
       | to hint to the browser that it should refresh a particular cached
       | page. (Without appending ?time=1634851491 to the URL.)
       | 
       | For example, let's say I've already cached the page /new.html
       | 
       | Now, I click a button which triggers a change to the page, and I
       | am redirected back to it.
       | 
       | Even though the page has changed, and the browser should see a
       | new timestamp in the header if pinging the server, it just
       | doesn't seem to happen.
       | 
       | Has anyone dealt with this before? I tried to ask on
       | StackOverflow, but lately my questions don't seem to get any
       | attention, and I've run out of reputation to spend on bounties.
        
         | toast0 wrote:
         | There's no standard way for one page to invalidate another.
         | I've seen some private patches to do it in squid, but that
         | doesn't help because you want to do it for browsers.
         | 
         | Your options are probably:
         | 
         | a) redirect to a different URL as you've done by appending
         | stuff to it
         | 
         | b) require revalidation on each request, recipies shown by
         | other posters
         | 
         | c) POST to the url you want refreshed; post isn't cachable.
         | Note that you can't redirect to POST somewhere else, but you
         | can do it with javascript.
         | 
         | d) use XHR to force a request as another poster mentioned.
        
         | ryanpetrich wrote:
         | This is what ETags are for. Upon a user's first visit the
         | server should return an ETag uniquely representing the current
         | version of the page. The browser will cache both the page and
         | the tag. Upon subsequent page visits the browser will send an
         | If-None-Match header containing the tag for the version of the
         | page it has cached. The server should compare the incoming tag
         | with the tag for the current version and return a "304 Not
         | Modified" response if the tags match or a full response with
         | the newer tag in the ETag header if they don't.
        
           | tshaddox wrote:
           | Yeah, and it works the same way with If-Modified-Since and
           | Last-Modified.
        
         | tyingq wrote:
         | It's a combination of different headers that's hard to sum up
         | in a short comment. A good article on the subject should talk
         | about all these headers: Expires, Cache-control, Etag, Pragma,
         | Vary, Last-Modified
         | 
         | Key CDN has an article on it. They certainly would have
         | experience and expertise there. I didn't read the whole thing,
         | but it seems to have it covered:
         | https://www.keycdn.com/blog/http-cache-headers
         | 
         | There's also some interesting exceptions where rules aren't
         | followed. Like browsers typically have a completely separate
         | cache for favicons. I suppose because they use the icons in
         | funny/different ways, like bookmarks.
         | 
         | There are also sometimes proxies (especially corporate MITM
         | ones) that don't follow the rules. Hence the popularity of
         | cache-busting parameters like you described.
        
         | bandie91 wrote:
         | as of my understanding of the original design of HTTP, each
         | HTTP resource may state how long itself can be cached in the
         | response header; and the client (browser, proxy, etc) does not
         | have to re-request the resource before the expiry. this is the
         | sandard, so you can not hint that a resource has to be
         | revalidated - in standard way. obviously since then, several
         | tricks emerged, like your mentioned timestamped URL approach -
         | however i'm not sure upto what extent is it standardized in
         | clients to understand that "/path?query" is somehow related to
         | "/path", because originally the request string (path and url
         | parameters) was opaque to the http client, so they should be
         | cached independently. things obviously changed since then. the
         | method i use is to fire a request to the URL which has to be
         | refreshed by Ajax (XHR) with Cache-Control header (yes, it is a
         | request header too), then display the response content or
         | redirect to it.
        
           | scottlamb wrote:
           | > however i'm not sure upto what extent is it standardized in
           | clients to understand that "/path?query" is somehow related
           | to "/path", because originally the request string (path and
           | url parameters) was opaque to the http client, so they should
           | be cached independently. things obviously changed since then.
           | 
           | It hasn't changed. Those are still cached completely
           | independently by the user agent. The ?time=... cache busting
           | trick is meant to produce a cache key that's never been used
           | before, thus requiring a fresh request. The new request
           | doesn't clean up the cache entries for the old URLs; it just
           | doesn't use them. That's one reason it's better to use etag
           | and such to make the caches work properly, rather than fight
           | them with this trick.
           | 
           | On many servers, if new.html is a static file, the same
           | entity is produced regardless of parameters. But the user
           | agent doesn't know this.
        
         | bawolff wrote:
         | Cache-Control: max-age=0, must-revalidate
         | 
         | Sounds like what you want (presuming your server handles 304
         | logic correctly)
        
           | forgotmypw17 wrote:
           | I do want caching to happen, however -- until something
           | changes the page.
        
       | scottlamb wrote:
       | I appreciate that the Cache-Status: header they describe uses RFC
       | 8941 structured fields and thus ";" to separate items within each
       | cache and "," between caches. It's like someone put effort into
       | making it easy to parse.
       | 
       | Rant time: I just finished writing a state machine parser for
       | "WWW-Authenticate:" and "Proxy-Authenticate:". Those headers use
       | comma both to separate challenges and to separate parameters
       | within a challenge, which just seems mean-spirited. Other things
       | about HTTP authentication that seem mean, dumb, annoying, or all
       | of the above: both the RFC 2069 example response and the RFC 7616
       | SHA-512-256 example response are calculated incorrectly; RFC
       | 7616's userhash field seems to require the server to do
       | O(users_in_database) hashes to know what user to operate on; RFC
       | 7235's challenge grammar describes a token68 syntax that really
       | only is used for the credentials in basic, never a challenge; RFC
       | 7616 drops backwards compatibility for RFC 2069 even though I
       | bought a product this year that still uses RFC 2069-style
       | calculations; and it's based on old standards that followed "be
       | conservative in what you do, be liberal in what you accept from
       | others" so RFC 7230 section 7 has separate grammars for what
       | lists you must send and what lists you must accept, which further
       | complicates parsing the nested lists.
        
       ___________________________________________________________________
       (page generated 2021-10-21 23:00 UTC)