[HN Gopher] Show HN: Micro HTTP server in 22 lines of C ___________________________________________________________________ Show HN: Micro HTTP server in 22 lines of C Author : jpegqs Score : 182 points Date : 2021-07-31 11:07 UTC (11 hours ago) (HTM) web link (twitter.com) (TXT) w3m dump (twitter.com) | milansuk wrote: | That's the beauty of original HTTP - simplicity. Same as parsing | HTML(in the 90s). With HTTPS(S as Secure) it's a whole different | story and most programmers use some library. | Arnavion wrote: | But HTTPS just adds TLS. You can use "some library" to do the | TLS handshake and subsequent encryption, and end up with a | readable-writable stream that you can then parse HTTP from | yourself. Your code is the same as when it was dealing with a | TCP stream directly. | secondcoming wrote: | There's nothing simple about HTTP. It looks like it should be | simple, but it isn't. | darnir wrote: | HTTP/1.x is anything but simple. They were under defined and | overly complex in many ways. The original RFC was so complex | that when reworked, they split it into 6 documents. | | I've worked heavily on some HTTP implementations and its | ridiculously hard to get them right. | | Not to mention, this "server" only responds to a simple well | formed GET request. Without handling about 90% of what the HTTP | specifications talk about. Its a nice project, but it doesn't | speak to the simplicity of HTTP | kevinoid wrote: | I agree. As with many things, it's only simple as long as you | ignore the complexities. As they say, the devil's in the | details. | | > this "server" only responds to a simple well formed GET | request. | | And not even that. The Request-URI in a Simple-Request line | (inherited from HTTP/0.9) may contain escape characters. | (e.g. `GET /my%20file.txt` to get `my file.txt`) HTTP/1.0 | states "The origin server must decode the Request-URI in | order to properly interpret the request."[1] This server does | not. | | Which is not to say that this server isn't interesting. Just | that it's not a demonstration of how easy HTTP/1 is to parse. | | [1]: https://www.w3.org/Protocols/HTTP/1.0/spec.html#Request- | URI | nly wrote: | No, parsing HTTP/1.x is a nightmare and definitely not simple. | It wasn't even particularly well defined until 2014 when the | original RFCs were modernized, and even now there are bugs | reported in HTTP parsers all the time. | | Node.js came out in 2009, a full ten years after HTTP/1.1 (RFC | 2068) and its original http-parser is rather hard to follow, | doesn't conform to the RFCs for performance reasons, and is | considered unmaintainable by the author of it's replacement[0] | | As for parsing HTML, well go look at how Cloudflare have | stumbled[1] | | [0] https://github.com/nodejs/llhttp | | [1] https://blog.cloudflare.com/incident-report-on-memory- | leak-c... | ibraheemdev wrote: | > Node.js came out in 2009, a full ten years after HTTP/1.1 | (RFC 2068) and it's original http-parser is full-on spaghetti | code, doesn't conform to the RFCs for performance reasons, | and is considered unmaintainable by the author of it's | replacement | | That's because of the way the parser is written. There are | other simpler parsers that are much more readable. | na85 wrote: | Seems like it's yet another example of the node ecosystem | being amateur hour, rather than a problem with HTTP. | jart wrote: | I'm the author of the fastest open source HTTP server. | Parsing HTTP 0.9, 1.0, and 1.1 is trivial. It's a walk in the | park. It only takes about a hundred lines of code to create a | proper O(n) parser. https://github.com/jart/cosmopolitan/blob | /0b317523a0875d83d6... | | The Joyent HTTP parser used by Node is very good but it's | implemented in a way that makes the problem much more | complicated than it needs to be. The biggest obstacle with | high-performance HTTP message parsing is the case-insensitive | string comparison of header field names. Some servers like | thttpd do the naive thing and just use a long sequence of | strcasecmp() statements. Joyent goes "fast" because it uses | callbacks, which effectively punts the problem to the caller, | and, for a few select headers which it handles itself, like | Content-Length, it uses this really complicated internal | "h_matching" thing for doing painstakingly written out | hardcoded character compares. Redbean solves the problem by | using better computer science: perfect hash tables. Thanks to | gperf command. That makes the API itself much more elegant | since the parser can not only go faster but return a hash- | table like structure where individual headers can be indexed | without performing string comparisons. | ysleepy wrote: | I consider Header value parsing and URL parsing part of | HTTP, those are certainly not trivial. | | The charset problems alone are a nightmare. | | Parsing the wire format is pretty breezy, (Don't forget | trailers!) | jart wrote: | Trailers can be parsed by invoking the function using | something along the lines of ParseHttpMessage((struct | HttpMessage){.t = kHttpStateName}, p, n) where you just | tell the parser to skip the first-line states. Charset | isn't a nightmare either. Headers are ISO-8601-1 so you | just say {0300 | c >> 6, 0200 | c & 077} to turn them | into UTF-8. It's not difficult. It might be if you want | to support MIME. But this is HTTP we're talking about. It | was made to be simple! We're talking Internet engineering | on the lowest difficulty setting. Implement a TCP or SIP | stack if you want hard. | mariusor wrote: | I think that implementing a proper state machine for the | header parsing with ragel would give a more comprehensive | result than using gperf or even the handmade one from your | code. | | I think there are already some versions of the ragel code | online, but they might be for other target programming | languages. | jart wrote: | I'm one of the authors of Ragel and I disagree with you. | HTTP is trivial enough that you'd be better served | writing the state machine yourself using a switch | statement. See my GitHub link above for an example. The | code easily ports to other languages, like Java. Lastly | when it comes to Ragel and gperf, they both do two | completely different things. Ragel would generate a | prefix trie search in generated code which would have | enormous code size compared to what gperf is doing, which | is much faster. With gperf, you only need to consider | exactly O(3) octets total to tell which header it is. | After that, it does a single quick string compare to | confirm it's one of the predetermined headers rather than | some unknowable value. | mariusor wrote: | I apparently never paid enough attention to how in the | spec there's a clear defined list and I always assumed | that a "parser" should handle all valid header*ish | looking pairs. | | Based on this consideration I was thinking that the ragel | state machine would generate faster code for the non | happy path (invalid non-ascii, or other types of error) | at least in the GOTO version. | | When working on the full list it makes perfect sense to | check the minimum amount of bytes for identifying | headers, so thank you for the clarifications, very | informative. :) | giancarlostoro wrote: | Somewhat related but in the Python space of things: I love | that Python has a standard for web frameworks so much so that | you can build your own web framework that targets said | standard and it can be deployed anywhere without getting lost | in the weeds of parsing HTTP. For example FastAPI is directly | a ASGI compliant framework, and it is known as one of the | fastest Python web frameworks out there. Bottle I think is | also a raw WSGI framework and its all in one file. (ASGI is | what became the natural progression for WSGI, think of it | like the http package Rust wants to standardize). | strictfp wrote: | The whole idea behind Node.js was to write a super-efficient | completely nonblocking http server in C, while keeping all | the business logic in a simple scripting language. | | You should not expect the Node.js parser to be simple. | hdjjhhvvhga wrote: | The fact that someone wrote a parser that's hard to follow | doesn't mean that parsing HTTP/1.x is extremely difficult. | What is really hard is to construct a parser that is at the | same time (1) fast, (2) complete, (3) secure. It is much | easier to choose just two, compare e.g. the one based on | Nginx[0] vs picohttpparser [1]. | | [0] https://github.com/Samsung/http- | parser/blob/master/http_pars... | | [1] https://github.com/h2o/picohttpparser/blob/master/picohtt | ppa... | a-dub wrote: | for the basics, however http/1.x is pretty simple. you can | test webserver health by literally typing in the request. | | i suspect the complexity you speak of is similar to MIME. | where SMTP/POP/IMAP are pretty simple, things got pretty | hairy with the introduction of MIME, SASL and friends. | | i think, though, that most of the complicated stuff in http | is optional, is it not? like if you don't send a header | that compression is supported, the server won't compress... | or am I misremembering? | | either way, simpler to understand from a packet capture | than a grpc stream or spdy/http2 stream. | jart wrote: | Pretty much everything is optional if you stick to | http/1.0. If you implement http/1.1 then you're required | to do a lot of non-essential stuff like chunk encoding, | pipelining, and provisionals which themselves are | reasonably trivial too but they make the server code less | elegant. If you want a protocol that's actually hard, | implement SIP. | jsjohnst wrote: | I once heard that it's impossible to build a "spec | compliant" IMAP4 library as the spec itself is | contradictory. Don't have a reference to prove it, so I | could be wrong. | gberger wrote: | I know this is just for fun and not intended for production use. | But what could be potential exploits and vulnerabilities in this | server? | sneak wrote: | GET ../../../etc/passwd | asah wrote: | sandbox it? e.g. docker, OpenBSD chroot? | junon wrote: | That's an environmental thing, the program itself can't | protect against the class of attacks those sorts of | environmental setups protect against. | jcelerier wrote: | that's not cross-platform tho. it should still be secure | even if it was running on MS-DOS 5.0 | NieDzejkob wrote: | Looks like you didn't even bother to test your claim. | jpegqs wrote: | I have provided protection against this. | Someone wrote: | I don't think it compiles on windows (netdb.h doesn't exist | there, I think), so you're fine there, too, from a security | viewpoint. | | However, if somebody did a quick and dirty "make it | compile" port (include winsock2.h instead and, possibly, | replace some functions/argument types), I think that would | create security vulnerabilities because the _fopen_ on | Windows might support using backslashes as path separators. | | Even if it doesn't, there's UNC paths (https://en.wikipedia | .org/wiki/Path_(computing)#Universal_Nam...) to worry | about. | | That made me wonder whether other OSes might have similar | features. Reading https://pubs.opengroup.org/onlinepubs/007 | 904975/basedefs/xbd..., I'm not sure that forbids Unix from | doing something similar. It says | | _"A pathname that begins with two successive slashes may | be interpreted in an implementation-defined manner, | although more than two leading slashes shall be treated as | a single slash."_ | | That opens the door for doing special things for paths that | start with //, for example by supporting | "//machine:foo/bar/baz" on clusters. | jart wrote: | GET %C0%AE%C0%AE/%C0%AE%C0%AE/%C0%AE%C0%AE/etc/passwd | Matthias247 wrote: | Regarding availabiltiy: It only handles a single connection, | and has no timeouts. If someone just connects, and does nothing | else, the server will be unavailable. | SahAssar wrote: | For the interested: this is called (or at least similar to) a | slow loris attack: | https://en.wikipedia.org/wiki/Slowloris_(computer_security) | jpegqs wrote: | I tried to make it secure and protect from such things. If | someone finds vulnerabilities, please let me know. | jijji wrote: | use strncpy() instead of strcpy() | jpegqs wrote: | It's calculated that strcpy() should never cause a buffer | overflow here. | throwaway984393 wrote: | You should still never use functions which have well | known security flaws if there is a widely available | alternative which avoids the flaws. Secure programming | isn't just about calculating whether your current code | has a bug, it's also about writing code that avoids bugs. | astrobe_ wrote: | Thank you Mr Weekend Secure Programming Expert. | _strncpy()_ has equally dangerous semantics, though. | jart wrote: | strncpy() isn't dangerous. People have their heads so | twisted around muh security that they don't even know | what the function was intended to do. The purpose of | strncpy() is to prepare a static search buffer so you can | do things like perform binary search: | static const struct People { char name[8]; | int age; } kPeople[] = { {"alice", | 29}, // {"bob", 42}, // }; | int GetAge(const char *name) { char k[8]; | int m, l, r; l = 0; r = | ARRAYLEN(kPeople) - 1; strncpy(k, s, 8); | while (l <= r) { m = (l + r) >> 1; | if (READ64BE(kPeople[m].s) < READ64BE(k)) { | l = m + 1; } else if (READ64BE(kPeople[m].s) | > READ64BE(k)) { r = m - 1; } | else { return kPeople[m].age; } | } return -1; } | | It was a really common practice back in the 70's and 80's | when the function was designed for databases to use | string fields of a specific fixed length. | jancsika wrote: | > strncpy() isn't dangerous | | Suppose the C specification said that string constants | are automatically null terminated _unless_ they are a | certain size that is platform-dependent. At that given | size the null is not added. (And let 's say above that | size there's a compiler error. Let's also say there's a | pragma for telling the compiler you want a bigger limit | on the maximum string constant size.) | | Would that behavior be dangerous in your opinion? | arp242 wrote: | If I look at the code as posted then "it uses strcpy | instead of strncpy" is very low on the list of | "problems". | | "Problems" in quotes because, you know, this is IOCCC | entry. You're taking a joke way to serious. | throwaway984393 wrote: | The author literally _asked for security advice_ , and | then ignored it. I'm trying to explain why one should not | just ignore it. There's a lot of novice programmers who | read these threads and might think it's perfectly fine to | use strcpy (outside of IOCCC submissions). And by the | way, who the hell cares about security vulns in IOCCC | submissions anyway? It's not supposed to be secure, it's | supposed to be obfuscated. | jart wrote: | I don't think anyone asked for free advice from a foul- | mouthed anonymous throwaway on how to secure their | computer. If I was building a website I'd want to secure | it _from_ you not with you. | [deleted] | LinAGKar wrote: | It's only that short because they've shoved a bunch of statements | onto the same line. | phoe-krk wrote: | That's the whole point of IOCCC. The way code is formatted is | as important as the way it functions. | lmilcin wrote: | It says "22 lines of C", not "22 statements of C". | | For this type of exercise it is assumed that some readability | is going to be lost... just look at Perl golf competition. | These tend to be written in a single line and it is not always | given you are going to even be able to tell where statements | start. | 34qlgkaer wrote: | Man I wish I could just read some software articles wihthout | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | | covid covid covid covid covid covid covid covid covid | lifthrasiir wrote: | Reminds me of 2001/cheong [1]. #include | <stdio.h> int l;int main(int o,char **O, int | I){char c,*D=O[1];if(o>0){ for(l=0;D[l ];D[l | ++]-=10){D [l++]-=120;D[l]-= 110;while | (!main(0,O,l))D[l] += 20; putchar((D[l]+1032) | /20 ) ;}putchar(10);}else{ c=o+ | (D[I]+82)%10-(I>l/2)* (D[I-l+I]+72)/10-9;D[I]+=I<0?0 | :!(o=main(c/10,O,I-1))*((c+999 )%10-(D[I]+92)%10);}return | o;} | | [1] https://www.ioccc.org/2001/cheong.hint | ducktective wrote: | `curl -s https://www.ioccc.org/2001/cheong.hint | nc | termbin.com 9999` | | https://termbin.com/5yaq | | In short: for a 2n-digit input, returns the integer part of its | square root (n-digits) | codetrotter wrote: | The ASCII-art formatted version is pretty nice looking. | | I was going to say that I don't however get why the "almost | readable version" is weirdly formatted. But then I ran it through | clang-format and it looks the same still and I saw that indeed | it's because it's made to do lots of things on the same line and | so it is not for lack of white space that it looks so messy. | | In conclusion, the "almost readable version" is exactly what it | should be in this case. | snet0 wrote: | I was surprised to read that this is actually a totally valid | HTTP/1.1 application, according to the RFC. The only thing you | _need_ is the status line (http version, status code, status | message, CRLF) and then the message body. | | Things sure have come a long way. | kevinoid wrote: | It's neat, but I don't believe it is a compliant implementation | of HTTP/1.1 (or 1.0). For example, it does not handle percent- | encoded characters in the request URI.[1][2] | | [1]: | https://datatracker.ietf.org/doc/html/rfc7230#section-3.1.1 | | [2]: https://www.w3.org/Protocols/HTTP/1.0/spec.html#Request- | URI | deathanatos wrote: | _Two_ CRLF pairs (one to terminate the status line, one to | terminate the (empty) headers), which this is one CR short of. | Trivially fixable, though it 'd mess up the P slightly... | coderzx wrote: | Great | Galanwe wrote: | Not sure what's so amazing here that it deserves to be on HN | front-page. | | So basically it's a C program that reads "GET /<something>" from | a socket and replies with the content of file <something> (with | some random error handling) . Is it really that amazing that it | fits in 22 lines of funky formatted C code...? | bruce343434 wrote: | Sure, IOCCC exercises are ones in futility, in the same way | that breaking a speed running record achieves nothing real- | world useful. But that doesn't mean it isn't spectacular and | damn impressive. | shric wrote: | Aside from using small variable names and odd whitespace, it | isn't particularly obfuscated. | snet0 wrote: | That's what I was going to say. If nondescript variable | names and poor use of whitespace is obfuscation, a few of | my friends could submit code they write every day. | jpegqs wrote: | This is what I am arguing about with another IOCCC winner. | What can be called obfuscation, and where are its | boundaries. | Galanwe wrote: | Come on there is no obfuscation here, you can literally read | the code without issue. The only attempt seems to be 80*101 | for 8080. | jpegqs wrote: | It's cool if you can read code like this without issue. I'm | chasing Kolmogorov complexity, rather than obfuscation. I | add things like this to fill gaps in a specific shape. | cpach wrote: | Beauty is in the eye of the beholder | Tempest1981 wrote: | Or beautifully formatted: | | https://github.com/ilyakurdyukov/ioccc/blob/main/practice/20... | 0des wrote: | Come on man, it's Saturday. It's fine. | exDM69 wrote: | And it's a "Show HN" post. | | Pretty nice obfuscated C too. It's art, not serious. | nuclearnice1 wrote: | Everything is just dirt. | cpach wrote: | And anyone who ever played a part | | Oh, they wouldn't turn around and hate it | [deleted] | SV_BubbleTime wrote: | On the line after the printf where it looks like they're getting | status strings for returns... it looks like there is are three | ternary options. Is that right? How does that work? | | https://pbs.twimg.com/media/E7mllyLXoAQmjbT?format=png&name=... | jpegqs wrote: | I can explain it: m = n ? /* if (n != | 0) */ /* adds index.html if path ends with "/" (means | the filename is omitted), otherwise copies zero */ | strcpy(b+i-1,b[i-2]-'/'?"":"index.html"), /* log the | requested filename to stdout */ printf("%s\n",b+5), | /* if "/." is in the path or an error occurred while opening | the file */ strstr(b,"/.")||!(f=fopen(b+5,"rb")) | ? "404 Not Found" : "200 OK" : "501 Not | Implemented"; /* if (n == 0) */ | | By filtering filenames with "/." I prevent exploits with ".." | and also don't allow to read files starting with a dot, these | are hidden files in Unix-like OS. | SV_BubbleTime wrote: | Ah. I see, figured it might be that but it was tough to read. | | Ok, so sometimes I think I know C pretty well, then I'll see | lunatic code like this and realize I Do Not! Thanks for the | answer and reformat. | formerly_proven wrote: | What about "GET //etc/passwd"? | NieDzejkob wrote: | Just fired up the server and that does indeed break it. I | suppose openat2 with RESOLVE_BENEATH and AT_FDCWD would be | a bullet-proof fix, but that's not very codegolf. | NieDzejkob wrote: | Huh, any reason to use printf("%s\n",...) instead of puts? | mianos wrote: | Also, if you want to include #include <microhttpd.h> you can | write a useful, safe (well tested), http server in a similar | number of lines. | secondcoming wrote: | That would defeat the whole point of the post! | | But microhttpd is fine is you want a minimal server; its way of | handling POST bodies is weird though. | rijoja wrote: | Why not #include<stdlib.h> and just run system("apache2") | Koshkin wrote: | Tried it, didn't work. (Now I want to try system(argv[0]) for | some reason...) | [deleted] | [deleted] ___________________________________________________________________ (page generated 2021-07-31 23:00 UTC)