[HN Gopher] Nginx gracefully upgrades executable on the fly ___________________________________________________________________ Nginx gracefully upgrades executable on the fly Author : pantuza Score : 119 points Date : 2022-01-04 17:40 UTC (5 hours ago) (HTM) web link (nginx.org) (TXT) w3m dump (nginx.org) | bragr wrote: | I've implemented this a few times in a few languages based on | exactly what nginx does. It works well, and it is pretty straight | forward if you are comfortable with posix style signals, sockets, | and daemons. | | I'm not sure it is super critical in the age of containerized | workloads with rolling deploys but at the very least the | connection draining is a good pattern to implement to prevent | deploy/scaling related error spikes. | wereHamster wrote: | Even with containerized workloads, you still have an ingress, | or SPOF (or multiple, when using multicast), and the seamless | restart is meant for exactly those processes. Nginx is often | used (https://kubernetes.github.io/ingress-nginx/), or when you | use AWS, GCS etc they provide such a service for you. | | Not sure how the cloud providers do it though, maybe | combination of low DNS TTL and rolling restart since they often | have huge fleets of servers which handle ingress? | nullify88 wrote: | A container though should be immutable and ideally shouldn't | have changes made to it. If the container were to die, it'd | revert back to the old version? It looks to me like these | seamless upgrades would be an anti pattern to containers. | | With ingress you'd have a load balancer in front or have it | routed in the network layer using BGP. | wereHamster wrote: | How do you restart the load balancer though, without | dropping traffic? | cbb330 wrote: | two nginx load balancers, reroute to the secondary via | dns, restart primary | nullify88 wrote: | You would need more than one to do a rolling restart. | Alternatively to do it with one instance of a software | load balancer is a bit more work, spin another instance | up and update DNS. Wait for traffic to the old one to die | as TTLs expire, then decommission. | | But I agree it isn't as easy as a in place upgrade. | krab wrote: | I think the parent was talking more about the fact that at | some point, you have a component that should be available | as much as possible. In the case you mention, that would be | the load balancer. Being able to upgrade it in place might | be easier than other ways. | Grollicus wrote: | If you really want to have no SPOF you'd probably build | something like this: | | Multihomed IP <-> Loadbalancer <-> Application | | By having the same setup running on multiple locations you | can replace the load banacers by taking one location offline | (stop announcing the corresponding route). Application | instances can be replaced by taking the application instance | out of the load balancer. | nullify88 wrote: | The systemv init script for nginx had an upgrade operation (in | addition to start/stop/reload etc) which would send the signal. | Worked like a charm. | moderation wrote: | See Envoy Proxy's Hot Restart [0] | | 0. | https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overv... | monroeclinton wrote: | Also HAProxy, they both use UNIX sockets via ancillary | messages+SCM_RIGHTS I believe. | | https://www.haproxy.com/blog/truly-seamless-reloads-with-hap... | jabedude wrote: | Seems like a useful feature for a service manager like systemd to | have for its managed services. It is already able to perform | inetd style socket activation, I imagine this would be a welcome | feature | jasonjayr wrote: | inetd style socket activation (iirc) forks a process for every | connection. | | So, simply replacing the binary on disk will cause all new | connections going forward to use the new binary, while existing | held connections (with in-memory references to the old binary's | inode) will finish the operations. Once they are done and all | references to that inode are gone, the blocks referencing the | binary will be removed. | wahern wrote: | > inetd style socket activation (iirc) forks a process for | every connection. | | inetd supports both process-per-connection and single | process/multiple connections using the "nowait" and "wait" | declarations, respectively. The former passes an accept'd | socket, the latter passes the listening socket. | TimWolla wrote: | This is already possible. You can configure whether you want | inetd style socket activation (where systemd calls accept() and | passes you the client socket)), or just systemd listening to | the socket (where systemd passes you the listen socket and your | binary calls accept()). | | https://www.freedesktop.org/software/systemd/man/systemd.soc... | secondcoming wrote: | I've always found the multi-process approach taken by both nginx | and apache to be nothing but a hindrance when you have to write a | custom module. It means that you may have to use shared memory, | which is a PITA. | | I don't know why they haven't moved on from it; it only really | made sense when uni-core processors were the norm. | politelemon wrote: | So if I understood correctly, would it be like this | | cp new/nginx /path/to/nginx kill -SIGUSR2 <processid> | | That does sound pretty neat if you're not running nginx in a | container. I wonder if they've built a Windows equivalent for | that. | majke wrote: | Just a shout out: it's super hard to do it for UDP / QUIC / H3. | Beware. | | (but I don't think nginx supports h3 out of the box yet) | krab wrote: | Why so? I thought UDP was stateless, making that process even | easier. But I never implemented it. | TimWolla wrote: | UDP itself is stateless, but QUIC itself is stateful. Without | knowing the background I would assume the issue to be that | the incoming UDP packets will be routed to the new process | after the reload and that new process is not aware of the | existing QUIC connections, because the state resides in the | old process. Thus it is not able to decrypt the packets for | example. | petters wrote: | How are quick/http3 servers usually upgraded? As you say, it | seems tricky. | monroeclinton wrote: | I've been working on something similar in a load balancer I've | been writing in Rust. It's still a work in progress. | | Basically the parent executes the new binary after it receives a | USR1 signal. Once the child is healthy it kills the parent via | SIGTERM. The listener socket file descriptor is passed over an | environment variable. | | https://github.com/monroeclinton/- (this is the proper url, it's | called dash) | mholt wrote: | We did this for Caddy 1 too [1]. It was really cool. I am not | sure how many people used this feature, so I haven't implemented | it for Caddy 2 yet, and in the ~two years that Caddy 2 has been | released, I've only had the request once. It's a bit | tricky/tedious to do properly, but I'm willing to bring it over | to Caddy 2 with a sufficient sponsorship. | | [1]: https://github.com/caddyserver/caddy/blob/v1/upgrade.go | eliaspro wrote: | I'm torn on this feature. | | In the one hand, an application should never be able to replace | itself with "random code" to be executed. I want my systems to | be immutable. I want my services to be run with the smallest | set of privileges required. | | On the other hand, it encourages "consumer level" users to keep | their software up-to-date, even when it wasn't installed from a | distribution's repository etc. | | So I think in general it's a good feature to have, as advanced | users/distributions will restrict what a service/process is | able to to anyways and won't have any downsides of not using | this feature. | | It should be optional, that's all! | blibble wrote: | if you can log into the machine and replace the nginx | executable you are probably capable of running it too | mholt wrote: | > an application should never be able to replace itself with | "random code" to be executed. | | To clarify: it doesn't, nor has it ever worked that way. You | have to be the one to do that (or someone with privileges to | write to that file on disk). Most production setups don't | give Caddy that permission. And you have to trigger the | upgrade too. | zimbatm wrote: | If Caddy were to support systemd socket activation, this self- | restart dance is not necessary as the parent process (systemd) | is holding the socket for you. And for other systems, they can | use https://github.com/zimbatm/socketmaster instead. I believe | this to be more elegant and robust than the nginx approach as | there is no PID re-parenting issues. | | But I suspect that most Caddy deployments are done via docker, | and that requires a whole container restart anyways. | tyingq wrote: | It's kind of fun to watch things go out of fashion and back | in. We used to use inetd, mostly because memory was | expensive, so it could spawn a service only when a request | came in, then the spawned process would exit and give the | memory back to the os. Then someone decided tcpd should sit | between inetd and servers, for security and logging. Then, | every service just ran as it's own daemon. Now I'm | occasionally seeing posts like this reviving inetd. | mholt wrote: | Good point, and I'm not sure which deployment method is more | popular. | | In general I am personally not a fan of Docker due to added | complexities (often unnecessary for static binaries like | Caddy) and technical limitations such as this. All my Caddy | deployments use systemd (which I don't love either, sigh). | JesseObrien wrote: | Can you explain any of the technical details around this | perchance? I'm super curious. I know that SO_REUSEPORT[1] | exists but is that the only little trick to make this work? | From what I've read with SO_REUSEPORT it can open up that port | to hijacking by rogue processes, so is that fine to rely on? | | [1] https://lwn.net/Articles/542629/ | fragmede wrote: | If an attacker is already running rogue processes on your | box, the minor details surrounding SO_REUSEPORT is the least | of your worries. An attacker could just restart nginx, and | won't care about lost requests. | duskwuff wrote: | You don't even need that. If the old server process exec()s | the new one, it can pass on its file descriptors -- including | the listening socket -- when that happens. | mholt wrote: | Yep, we don't use SO_REUSEPORT. We just pass it from the | old process to the new one. | tyingq wrote: | You could also be fancy and pass open sockets over a unix | domain socket with sendmsg(). | tyingq wrote: | >it can open up that port to hijacking by rogue processes | | That seems relevant if the process is using a non-privileged | port that's >= 1024. If we're talking about privileged ports | (<= 1023), though, only another root process could hijack | that, and those can already hijack you many other ways. | bogomipz wrote: | I am curious does anyone know why Nginx uses SIGWINCH for this? I | know Apache uses WINCH as well which makes me wonder if there was | some historical reason a server process wound up using a signal | meant for a TTY? | bob1029 wrote: | I've considered building something like this to allow for us to | update customer software while it's serving users. | | In my proposals, there would be a simple application-aware http | proxy process that we'd maintain and install on all environments. | It would handle relaying public traffic to the appropriate final | process on an alternate port. There would be a special pause | command we could invoke on the proxy that would buy us time to | swap the processes out from under the TCP requests. A second | resume command would be issued once the process is running and | stable. Ideally, the whole deal completes in ~5 seconds. Rapid | test rollbacks would be double that. You can do most of the work | ahead of time by toggling between an A and B install path for the | binaries, with a third common data path maintained in the middle | (databases, config, etc) | | With the above proposal, the user experience would be a brief | delay at time of interaction, but we already have some UX | contexts where delays of up to 30 seconds are anticipated. | Absolutely no user request would be expected to drop with this | approach, even in a rollback scenario. Our product is broad | enough that entire sections of it can be a flaming wasteland | while other pockets of users are perfectly happy, so keeping the | happy users unbroken is key. | kayodelycaon wrote: | https://en.wikipedia.org/wiki/Blue-green_deployment | | DNS not required. You can use a load balancer to do the same | thing. If you don't want a full second setup, do a rolling | restart of application servers instead. | | Edit: I forgot... you can do this with containers too. | rootlocus wrote: | How do the two processes listen to the same port? | nullify88 wrote: | Once the USR2 signal is received the master process forks, the | child process inherits the parents file descriptors including | listen(). One process stops accepting connections creating a | queue in the kernel. The new process takes over and starts | accepting connections. | | You can follow the trail by searching for ngx_exec_new_binary | in the nginx repo. | krab wrote: | Just to add - Nginx normally spawns several worker processes | that all process connections to the same port. | nullify88 wrote: | Correct but to clarify, only the master process binds to | the ports. The master process creates socketpairs to the | workers for interprocess communication. The workers accept | connections over the shared socket. | | https://www.nginx.com/blog/socket-sharding-nginx- | release-1-9... | | Page also has an example of how SO_REUSEPORT effects flow. | loeg wrote: | There's an ioctl for this on FreeBSD and Linux -- SO_REUSEPORT. | You could also just leave the listening socket open when | exec'ing the new httpd, or send it with a unix domain socket. | ctrlrsf wrote: | Using socket option SO_REUSEPORT allows multiple processes to | bind to same port. | VWWHFSfQ wrote: | is this what it's actually doing though? It doesn't say the | reuseport option to the listen directive is required for | this. | markbnj wrote: | This article on how haproxy uses SO_REUSEPORT goes into some | more detail: https://www.haproxy.com/blog/truly-seamless- | reloads-with-hap... ___________________________________________________________________ (page generated 2022-01-04 23:00 UTC)