[HN Gopher] HTTPWTF ___________________________________________________________________ HTTPWTF Author : pimterry Score : 477 points Date : 2021-03-04 15:27 UTC (7 hours ago) (HTM) web link (httptoolkit.tech) (TXT) w3m dump (httptoolkit.tech) | [deleted] | bombcar wrote: | Referer being spelled wrong - I KNEW something was wrong about it | every time I saw it but it never actually clicked. | indentit wrote: | I just figured it was one of those words spelt differently in | American English, which most RFCs etc are written in. (British | English native here.) | sophacles wrote: | > (British English native here.) | | That's why you spelled 'spelled': _spelt_ :D | grishka wrote: | It's a bit infuriating when English isn't your native language | because I could never remember the right spelling. | Zash wrote: | Since we're sharing our own WTFs; | | You can include the same header multiple time in a HTTP message, | and this is equivalent to having one such header with a comma- | separated list of values. | | Then there's WWW-Authenticate (the one telling you to re-try with | credentials). It has a comma-separated list of parameters. | | The combination of those two leads to brokenness, like how | recently an API thing would not get Firefox to ask for username | and password, because it happened to have put "Bearer" before | "Basic" in the list. | | https://tools.ietf.org/html/rfc7235#section-4.1 | superhawk610 wrote: | This article [1] is a really great read on some of the pitfalls | you encounter due to the way duplicate headers are parsed in | different browsers (skip to "Let's talk about HTTP headers" if | you want to jump right into the code). | | [1]: https://fasterthanli.me/articles/aiming-for-correctness- | with... | richdougherty wrote: | And some headers have their own exceptions to this. | | The Set-Cookie header (sent by the server) should always be | sent as multiple headers, not comma separated as user agents | may follow Netscape's original spec. | https://stackoverflow.com/questions/2880047/is-it-possible-to- | set-more-than-one-cookie-with-a-single-set-cookie | https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set- | Cookie | | On the other hand in HTTP/1.1 the Cookie header should always | be sent as a single header, not multiple. In HTTP/2, they may | be sent as separate headers to improve compression. :) | https://stackoverflow.com/questions/16305814/are-multiple- | cookie-headers-allowed-in-an-http-request | wincy wrote: | I'm guessing based on the username OP is the original author, | caught a typo that could trip a novice up if they're reading : | | This becomes useful though if you send a request including a | Except: 100-continue header. That header tells the server you | expect a 100 response, and you're not going to send the full | request body until you receive it. | | I'm guessing that should be Expect? | | Overall interesting article, thanks for writing it! | plmpsu wrote: | In the same section, there's also a reference to the 101 status | instead of 100. | pimterry wrote: | Another good spot, now also fixed, thanks! | pimterry wrote: | Good catch! Thanks for that, now fixed. | wrboyce wrote: | Oh, hey Tim! Hope life is treating you well! | pimterry wrote: | Haha, hey Will! The internet is a small world :-) | 0xy wrote: | This is a little bit of a tangent but I sure love working with | Websockets. It really feels like Websockets are what HTTP | should've been. Asynchronous realtime communication. | | When things happen on the site and it's shown to the customer | immediately via WS, it's just a delightful experience. | Jach wrote: | Web goes round and round. ActiveX, Java applets, and Flash all | supported sockets. It is nice that we can have them now without | such things, but it's not like they're only a boon with no | tradeoffs. | gmfawcett wrote: | In its day, I don't think HTTP would ever have escaped orbit if | it had been designed as a stateful protocol, like Websockets. | luhn wrote: | imo, Server-Sent Events are the better solution for realtime | updates. Sometimes you need the stateful bidirectional protocol | WebSockets offer, but most of the time HTTP for RPC and SSE for | streaming updates gets you where you need to go with standard | HTTP, no special protocols. | jayd16 wrote: | They screwed this all up. | | Private? No. Cache! | tpetry wrote: | Hmm, 'no-cache' meaning please cache it in reality may have been | the problem in the past why those damn internet explorers cached | ajax responses and the only way to solve it was to append a | random query parameter? | s3cur3 wrote: | Reading that, I finally understood so many hours of debugging | throughout my life. | deathanatos wrote: | I'd add to this list: | | Chunk extensions. Most people know HTTP/1.1 can return a | "chunked" response body: it breaks the body up into chunks, so | that we can send a response whose length we don't know in | advance, but also it allows us to keep the connection open after | we're done. What most people _don 't_ know is that chunks can | carry key-value metadata. The spec technically requires an | implementation to at least parse them, though I think it is | permitted to ignore them. I've never seen anything ever use this, | and I hope that never changes. They're gone in HTTP/2. (So, also, | if you thought HTTP/2 was backwards compatible: not technically!) | | The "Authorization" header: like "Referer", this header is | misspelled. (It should be spelled, "Authentication".) Same | applies to "401 Unauthorized", which really ought to be "401 | Unauthenticated". ("Unauthorized" is "403 Forbidden", or | sometimes "404 Not Found".) | | Also, header values. They're basically require implementing a | custom string type to handle correctly; they're a baroque mix of | characters & "opaque octets". | varajelle wrote: | Also the ridiculous User-Agent header which everyone spoofs. | [deleted] | [deleted] | jsmith45 wrote: | Those key value pairs in chunked encoding ("chunk extensions") | are spec'ed to only be hop-by-hop, which makes them more or | less completely unsuitable for actually using by end | applications. Any proxy or reverse proxy are allowed to strip | them. Indeed it can be argued that a conformant proxy is | required to strip them, due to MUST ignore unknown extensions | value requirement. (I suspect most do not strip them, and there | is an argument to be made that blindly passing them through if | not changing encoding could be considered ignoring them, but | I'm not certain that is actually a conforming interpretation). | | Plus surely there are many crusty middleboxes that will break | if anybody tried to use that feature. Remember all the hoops | websockets had to jump through to have much of a chance working | for most people because of those? Many break badly if anything | they were not programmed to handle tries to pass through. | anticristi wrote: | I used this tons for MJPEG streams. | bluesmoon wrote: | Another thing to note about custom headers is that when used in | an XHR (eg: X-Requested-With), they will force a preflight | request (with the OPTIONS method). If your webserver isn't | configured to handle OPTIONS and return the correct CORS headers, | that will effectively break clients. | | Best to just never use custom headers. | | I've written more about this here: | https://developer.akamai.com/blog/2015/08/17/solving-options... | airza wrote: | I see this a lot as an anti CSRF technique in AJAX based SPAs. | bluesmoon wrote: | yeah, those techniques predate CORS, but even back then, | you'd typically add your anti-csrf token to the payload | rather than the header. CSRF is application level logic | rather than protocol level. | uuidgen wrote: | > they will force a preflight request | | That's why they're so great. use a custom header and never | worry about CSRF issues. | | Use custom header and be sure that if request comes from the | browser it was made by legitimate code from your origin. | pimterry wrote: | Yep, you've got to be careful with browser HTTP requests! | Conveniently on this very same site I built a CORS tool that | knows all those rules and can tell you how they work for every | case: https://httptoolkit.tech/will-it-cors/ | xg15 wrote: | > _HTTP 103: When the server receives a request that takes a | little processing, it often can 't fully send the response | headers until that processing completes. HTTP 103 allows the | server to immediately nudge the client to download other content | in parallel, without waiting for the requested resource data to | be ready._ | | Could someone explain why this needs a new status code at all? At | the point where the new status code sends "early headers", the | client was expecting the regular status code and headers anyway. | Why could the server not simply do: | | 1) Receive request | | 2) Send 200 OK and early headers, but only send a single trailing | newline (i.e., terminate the status line and last early header | field, but don't terminate the header list as a whole) | | 3) Do the actual request processing, heavy lifting, etc | | 4) Send remaining headers, double-newline and response body, if | any. | | On the client side, a client could simply start to preload link | headers as soon as it receives them, without waiting for the | whole response. | | This seems like it would lead to pretty much the same latency | characteristics without needing to extend the protocol. | | The only major new ability I see is to send headers before the | (final) status code. But what would be the use-case for that? | | Edit: | | The RFC[1] sheds some light on this: The point seems to be that | the headers sent in an 103 are only "canon" if they are repeated | in the final response. So a server could send a link header as an | early hint, then effectively say "whoops, disregard that, I | changed my mind" by _not_ sending the header again in the final | response. | | I still don't see a lot of ways a client could meaningfully | respond to that, but I guess it could at least abort preloading | to save bandwidth or purge the resource from the cache if it was | already preloaded. | | [1] https://tools.ietf.org/html/rfc8297#section-2 | derefr wrote: | As with a 100, a 103 is _tentative_ -- it doesn 't guarantee | that the final result will be 2xx. This can happen if e.g. your | web server is responsible for sending the early hints, before | proxying to your app server. | staplung wrote: | I used to encourage back-end web developers to write a web server | from scratch as a learning exercise. With HTTP 1.1 it was | actually pretty easy to write one in C (plus berkeley sockets); | the idea being that you learn a lot about how things actually | work at the lowest level without spending an inordinate amount of | time. It's not really practical with HTTP 2 anymore but in any | case, having done my own exercise I had no idea about many of | these quirks. | jventura wrote: | I teach web development and distributed systems in a local | university, and one of my lab exercises is building an HTTP/1.0 | Server in Python with sockets. I do have a blog post [1] that | shows how to do it if someone's interested.. | | [1] https://joaoventura.net/blog/2017/python-webserver/ | smoldesu wrote: | https://doc.rust-lang.org/book/ch20-00-final-project-a-web-s... | | The Rust Book has an awesome "final project" where it walks you | through building a multi-threaded web server. If you're a | battle-hardened C/C++ dev looking for an inroad to Rust, this | is a great place to start. | cosmodisk wrote: | Never touched Rust but having skimmed through this, looks | like a fantastic tutorial. | steveklabnik wrote: | Thank you! | cmehdy wrote: | Seems like you are the author of the book. Just wanted to | say that this book makes me want to pick up Rust even | though I have no specific goal for it, because the book | is appealing in writing and appearance, layout and | illustrations, ideas and execution.. basically good job | and thank you! | steveklabnik wrote: | One of two authors. I'll share this with my co-author, | thanks a ton :) | smoldesu wrote: | I'm also here to worship your work! The Rust book is one | of my favorite documentations around, and just the other | day I sent it to a colleague who was interested in | learning Rust. Even though he only had experience in | Typescript and Java, he made a working chess engine less | than three days later. | chucke wrote: | This reminded me of a post I wrote,a couple of years ago: | https://honeyryderchuck.gitlab.io/httpx/2019/02/10/falacies-... | banana_giraffe wrote: | My current favorite is chunked encoding. | | Does amazon.com really make it's page more performant by sending | 25 chunks less than 2k, some less than 50 bytes, while I'm trying | to grab 115k for a page? | | It's all so weird to me. | marcosdumay wrote: | It may make a huge difference for their servers, that can | generate 2kB of data and send you right away, instead of | generating the entire 115kB before they can send it. | | Those 2kB is a bit too small for top network performance, so | you may see a negative impact. But if they increase it to | something like 10kB, it's harmless. | banana_giraffe wrote: | I can get it sometimes, but some sites are just bizarre. | twitter.com sends out a few dozen 74 byte chunks. I can't | find it now, but I've seen pages composed of chunks of 10 or | 20 bytes big. So much overhead. | marcosdumay wrote: | Some frameworks make it really easy to create reusable code | that calculates something, pushes it into the network, and | returns to the rest of the page. | | You are right that it's not a great thing to do. A little | bit of buffering on the sender can improve things a lot. | But it's an easy thing to do, so people do it. | halter73 wrote: | Usually it's because the app doesn't know the length of the | entire response body up front and wants to start sending the | response before buffering the whole thing. The 50 byte chunks | probably aren't that useful, but that can happen as a | consequence. Something like nagling can prevent those small | chunks, but then there would likely be higher latency. | zlynx wrote: | From my experience (not with amazon) these strange chunk sizes | come from non-blocking IO. When a source gets some data and | triggers select/poll/epoll/whatever, the callback (or | equivalent) immediately writes it out as a chunk. | | This works even better in HTTP/2 or HTTP/3 / QUIC. A Go server | reading from a lot of microservices can produce pretty weird | output on HTTP/2 because now not only is it in odd sizes | determined by network timing, it doesn't even need to be in | order. | derefr wrote: | I've been searching for a while for a good way to know whether a | client has disconnected in the middle of a long-running HTTP | request. (We do heavyweight SQL queries of indeterminate length | in response to those requests, and we'd like to be able to cancel | them, rather than wasting DB-server CPU cycles calculating | reports nobody's going to consume.) | | You can't actually know whether the outgoing side of a TCP socket | is closed, unless you write something to it. But it's hard to | come up with something to write to an HTTP/1.1-over-TCP socket | _before_ you respond with anything, that would be a valid NOP | according to all the protocol layers in play. (TCP keepalives | _would_ be perfect for this... if routers didn 't silently drop | them.) | | But I guess sending an HTTP 102 every second or two could be used | for exactly this: prodding the socket with something that | middleboxes _will_ be sure to pass back to the client. | | If so, that's awesome! ...and also something I wish could be | handled for me automatically by web frameworks, because getting | that working sounds kind of ridiculous :) | mikl wrote: | Lampooning the Cache-Control header is all fun and games, but | remember it was designed in a time where Internet in big | organisations often was behind a caching proxy like Squid. With | that in mind, the explanations at | https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Ca... | make good sense. | djrogers wrote: | No, not really - those explanations are nothing more than | what's in TFA, and they don't help it make any more sense. | There's no reason in the world why no-cache should mean 'cache | this please but with caveats', when we have all of the English | language available to come up with an alternative header. | cbsmith wrote: | Let me try this... | | The "no-cache" was a hint not about caching the _content_ but | about caching subsequent _requests_ , and could optionally | specify specific fields that would indicate that a new | candidate request needed to be sent to the server as the | content might be different. There's this reality that just to | render content, the browser effectively _must_ have a cached | copy of the content, so the notion that the response wouldn | 't be cached wasn't really even in the cards. Whether you | used the cache or not was a decision made at the time you | were sending a request, not when you were consuming the | response. | | The "no-cache" directive meant, "hey, don't check for a | cached copy of the content, just go fetch new content". It | was often used by analytics pieces so that the server could | count how often content was looked at. | | Back in the day you had terrible latencies (particularly over | dialup). You also had issues with horribly asymmetric | bandwidth that meant the data you _sent_ could become the | bandwidth bottleneck (outbound bandwidth constraints would | mean ACK packets would get queued up, delaying downloads even | when you had plenty of download bandwidth), and of course | HTTP requests weren 't terribly compact, so this could really | make a big difference. | | Caching requests was a big deal. Performance could be | improved significantly by "cheating" and just not sending a | new request, and this lead to some very aggressive caching | strategies. The, "check if the content really _is_ different, | and just use the original copy if they aren 't" hack a pretty | common one. If nothing else, it saved the browser the | overhead of re-rendering the page and the accompanying | annoying user experience of seeing the re-render. | | The original protocol didn't have any notion of no-store, and | specifically mentioned that "private" didn't really provide | privacy, but more that the content should be "private" in the | sense that only the browser itself should store the content. | Again, there's an assumption that the browser is going to put | everything it gets into a "cache", because it has to. | | You could use "max-age", but a lot of caches would still | shove the object in their cache and only expire it on a FIFO | basis or when a new request was to be sent (and it was | vulnerable to clock skew problems). Sounds dumb, but it was | the kind of dumb that kept code simple and worked pretty | well. | | So now that the practices were in place, you need a _new_ | directive to say, "hold up, that old approach is NOT a good | idea here". So they came up with "no-store" as a way to say, | "don't even put it in the cache in the first place". | the_duke wrote: | Well, I can see how someone fixated on a "store" Vs "cache" | terminology might arrive at that name. | | Browsers store, proxies cache , so it should be no-cache, | obviously! | | Sure, it's stupid, but naming is hard and these things happen | all the time. | markdog12 wrote: | I always liked this http caching article, done in a | conversational tone: https://jakearchibald.com/2016/caching- | best-practices/ | mikl wrote: | Caching in this context means "no need to ask the server for | a new copy of this within the cache lifetime". no-cache then | does what it says: You can store it if you like, but you need | to check with the server before reusing it. | | That might be a little counter-intuitive, but if you read the | definitions of the words, it does make sense. | tinyhitman wrote: | NotOnly-cache? | have_faith wrote: | @author. Every time I click anything in the page the whole page | flashes, assumedly React is re-rendering for some reason. As | someone who highlights text as I read it was quite an interesting | experience :) | pimterry wrote: | Hmm, that's very weird. I don't see it myself in the latest | Firefox or Chrome. What browser & OS? | have_faith wrote: | FireFox 85 on MacOS. It happens anywhere I click on the page, | the body text very quickly flashes off and back on again. | pimterry wrote: | Ok, thanks, I'll look into it. | ensignavenger wrote: | Happens for me too- Firefox 86 on Kubuntu 18.04 | jedberg wrote: | I've been running public web servers for decades, and almost all | of this was new information. Excellent article! | | Fun fact, reddit used to have 'X-Bender: Bite my shiny metal ass' | on every response. Sadly they seem to have removed it. | ketralnis wrote: | You're thinking robots.txt https://www.reddit.com/robots.txt | and it's still there | jedberg wrote: | Ah yes, bender was in the robots file. But we also had a | funny X-header. Maybe you can find it in GitHub in the | haproxy config. | grishka wrote: | Not there, but it does have "x-moose: majestic" | nitrogen wrote: | _> User-Agent: bender | | > Disallow: /my_shiny_metal_ass_ | | This was there when I checked just now; was it removed and | re-added? | grishka wrote: | Yes I saw that, but the parent comment said about it | being a header ;) | richdougherty wrote: | Some HTTP headers support extended parameters, parameters with a | " _" after them which allow character encoding of the header | value, e.g. in UTF-8. Confusingly, they also support sending both | regular and extended parameters in the same header. | https://tools.ietf.org/html/rfc5987 | | E.g. sending the file "naive.txt" using the Content-Disposition | header. Content-Disposition: attachment; | filename=na_ve.txt; filename*=utf8''na%C3%AFve.txt | | https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Co... | The parameters filename and filename* differ only in that | filename* uses the encoding defined in RFC 5987. When both | filename and filename* are present in a single header field | value, filename* is preferred over filename when both are | understood. | | _ | aparks517 wrote: | What a delight! I implemented an HTTP server from scratch (well, | from RFC) in Objective-C some years ago. Many of these hit pretty | close to home. Lots of plot twists in those RFCs. | rank0 wrote: | > X-Requested-With: XMLHttpRequest - appended by various JS | frameworks including jQuery, to clearly differentiate AJAX | requests from resource requests (which can't include custom | headers like this). | | What does the author mean by this? Why can't a "resource request" | include custom headers? I am assuming that a "resource request" | is just a non AJAX request. Any HTTP client should be able to | include whatever headers they want no matter the source. | abahlo wrote: | I think they mean requests from a browser. | achillean wrote: | Woah I had no idea about these 100 responses. Looks like there | are quite a few of them on the Internet: | | https://beta.shodan.io/search/report?query=http.status%3A%3E... | avolcano wrote: | This is both a great post and an effective ad - I've been looking | for a lighter-weight Postman alternative (and HTTPie, while nice, | is no substitute for a graphical UI for such a thing). Will check | HTTP Toolkit out! | pimterry wrote: | Thanks! It's a difficult balance to walk, I've taken to just | trying to write great HTTP articles and ignoring the | advertising angle entirely, seems to be working OK. | | Do try out HTTP Toolkit and let me know what you think, but | it's not a general purpose HTTP client like Postman or HTTPie. | It's actually an HTTP debugger, more like | Fiddler/Charles/mitmproxy, for debugging & testing. A | convenient HTTP client is definitely planned as part of that | eventually, but not today. | avolcano wrote: | Ah, gotcha. I actually do have a good use case for that as | well (and do think they could go together nicely someday), so | I'll still check it out! | johns wrote: | Insomnia.rest | RMPR wrote: | If I'm not mistaken insomnia is also using electron. I | wouldn't really put it as a lightweight alternative to | postman. | grishka wrote: | In case anyone wants a native (Cocoa) REST client for macOS, | there's https://paw.cloud. It's paid, but they sometimes give | away free licenses for retweets, which is how I got mine. | notatoad wrote: | not that effective - i've also been on the lookout for this, | but without your comment wouldn't have realized that's what | this website was offering. | joeraut wrote: | Seconded. This was a great blog post, I think a more visible | plug at the end is more than justified. | anaphor wrote: | Another one is that it's technically valid to have a request | target of '*' for the HTTP OPTIONS request type. It's supposed to | return general information about the whole server. You can try it | out with e.g. `curl -XOPTIONS http://google.com --request-target | '*'` | | Nginx gives you a 400 Bad Request response, Apache does nothing, | and other servers vary in whether they return a non-error code. | | https://curl.se/mail/lib-2016-08/0167.html | | https://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html | bch wrote: | I'm trying this out of curiosity, but getting (e.g.): | HTTP/1.0 400 Invalid HTTP Request | | My mistake, or are there other working end-points out there (I | tried google, yahoo, and cbc.ca)? | PebblesRox wrote: | "X-Clacks-Overhead: GNU Terry Pratchett - a tribute to Terry | Pratchett, based on the message protocols within his own books." | | I'll enjoy knowing this next time I reread Going Postal :) | Twirrim wrote: | With nginx it's as easy as: add_header | X-Clacks-Overhead "GNU Terry Pratchett"; | | in your server{} block. | sophacles wrote: | Here's a site about it: http://www.gnuterrypratchett.com/ with | snippets (etc) to configure it into your servers and apps. ___________________________________________________________________ (page generated 2021-03-04 23:00 UTC)