[HN Gopher] Nginx gracefully upgrades executable on the fly
       ___________________________________________________________________
        
       Nginx gracefully upgrades executable on the fly
        
       Author : pantuza
       Score  : 119 points
       Date   : 2022-01-04 17:40 UTC (5 hours ago)
        
 (HTM) web link (nginx.org)
 (TXT) w3m dump (nginx.org)
        
       | bragr wrote:
       | I've implemented this a few times in a few languages based on
       | exactly what nginx does. It works well, and it is pretty straight
       | forward if you are comfortable with posix style signals, sockets,
       | and daemons.
       | 
       | I'm not sure it is super critical in the age of containerized
       | workloads with rolling deploys but at the very least the
       | connection draining is a good pattern to implement to prevent
       | deploy/scaling related error spikes.
        
         | wereHamster wrote:
         | Even with containerized workloads, you still have an ingress,
         | or SPOF (or multiple, when using multicast), and the seamless
         | restart is meant for exactly those processes. Nginx is often
         | used (https://kubernetes.github.io/ingress-nginx/), or when you
         | use AWS, GCS etc they provide such a service for you.
         | 
         | Not sure how the cloud providers do it though, maybe
         | combination of low DNS TTL and rolling restart since they often
         | have huge fleets of servers which handle ingress?
        
           | nullify88 wrote:
           | A container though should be immutable and ideally shouldn't
           | have changes made to it. If the container were to die, it'd
           | revert back to the old version? It looks to me like these
           | seamless upgrades would be an anti pattern to containers.
           | 
           | With ingress you'd have a load balancer in front or have it
           | routed in the network layer using BGP.
        
             | wereHamster wrote:
             | How do you restart the load balancer though, without
             | dropping traffic?
        
               | cbb330 wrote:
               | two nginx load balancers, reroute to the secondary via
               | dns, restart primary
        
               | nullify88 wrote:
               | You would need more than one to do a rolling restart.
               | Alternatively to do it with one instance of a software
               | load balancer is a bit more work, spin another instance
               | up and update DNS. Wait for traffic to the old one to die
               | as TTLs expire, then decommission.
               | 
               | But I agree it isn't as easy as a in place upgrade.
        
             | krab wrote:
             | I think the parent was talking more about the fact that at
             | some point, you have a component that should be available
             | as much as possible. In the case you mention, that would be
             | the load balancer. Being able to upgrade it in place might
             | be easier than other ways.
        
           | Grollicus wrote:
           | If you really want to have no SPOF you'd probably build
           | something like this:
           | 
           | Multihomed IP <-> Loadbalancer <-> Application
           | 
           | By having the same setup running on multiple locations you
           | can replace the load banacers by taking one location offline
           | (stop announcing the corresponding route). Application
           | instances can be replaced by taking the application instance
           | out of the load balancer.
        
       | nullify88 wrote:
       | The systemv init script for nginx had an upgrade operation (in
       | addition to start/stop/reload etc) which would send the signal.
       | Worked like a charm.
        
       | moderation wrote:
       | See Envoy Proxy's Hot Restart [0]
       | 
       | 0.
       | https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overv...
        
         | monroeclinton wrote:
         | Also HAProxy, they both use UNIX sockets via ancillary
         | messages+SCM_RIGHTS I believe.
         | 
         | https://www.haproxy.com/blog/truly-seamless-reloads-with-hap...
        
       | jabedude wrote:
       | Seems like a useful feature for a service manager like systemd to
       | have for its managed services. It is already able to perform
       | inetd style socket activation, I imagine this would be a welcome
       | feature
        
         | jasonjayr wrote:
         | inetd style socket activation (iirc) forks a process for every
         | connection.
         | 
         | So, simply replacing the binary on disk will cause all new
         | connections going forward to use the new binary, while existing
         | held connections (with in-memory references to the old binary's
         | inode) will finish the operations. Once they are done and all
         | references to that inode are gone, the blocks referencing the
         | binary will be removed.
        
           | wahern wrote:
           | > inetd style socket activation (iirc) forks a process for
           | every connection.
           | 
           | inetd supports both process-per-connection and single
           | process/multiple connections using the "nowait" and "wait"
           | declarations, respectively. The former passes an accept'd
           | socket, the latter passes the listening socket.
        
         | TimWolla wrote:
         | This is already possible. You can configure whether you want
         | inetd style socket activation (where systemd calls accept() and
         | passes you the client socket)), or just systemd listening to
         | the socket (where systemd passes you the listen socket and your
         | binary calls accept()).
         | 
         | https://www.freedesktop.org/software/systemd/man/systemd.soc...
        
       | secondcoming wrote:
       | I've always found the multi-process approach taken by both nginx
       | and apache to be nothing but a hindrance when you have to write a
       | custom module. It means that you may have to use shared memory,
       | which is a PITA.
       | 
       | I don't know why they haven't moved on from it; it only really
       | made sense when uni-core processors were the norm.
        
       | politelemon wrote:
       | So if I understood correctly, would it be like this
       | 
       | cp new/nginx /path/to/nginx kill -SIGUSR2 <processid>
       | 
       | That does sound pretty neat if you're not running nginx in a
       | container. I wonder if they've built a Windows equivalent for
       | that.
        
       | majke wrote:
       | Just a shout out: it's super hard to do it for UDP / QUIC / H3.
       | Beware.
       | 
       | (but I don't think nginx supports h3 out of the box yet)
        
         | krab wrote:
         | Why so? I thought UDP was stateless, making that process even
         | easier. But I never implemented it.
        
           | TimWolla wrote:
           | UDP itself is stateless, but QUIC itself is stateful. Without
           | knowing the background I would assume the issue to be that
           | the incoming UDP packets will be routed to the new process
           | after the reload and that new process is not aware of the
           | existing QUIC connections, because the state resides in the
           | old process. Thus it is not able to decrypt the packets for
           | example.
        
         | petters wrote:
         | How are quick/http3 servers usually upgraded? As you say, it
         | seems tricky.
        
       | monroeclinton wrote:
       | I've been working on something similar in a load balancer I've
       | been writing in Rust. It's still a work in progress.
       | 
       | Basically the parent executes the new binary after it receives a
       | USR1 signal. Once the child is healthy it kills the parent via
       | SIGTERM. The listener socket file descriptor is passed over an
       | environment variable.
       | 
       | https://github.com/monroeclinton/- (this is the proper url, it's
       | called dash)
        
       | mholt wrote:
       | We did this for Caddy 1 too [1]. It was really cool. I am not
       | sure how many people used this feature, so I haven't implemented
       | it for Caddy 2 yet, and in the ~two years that Caddy 2 has been
       | released, I've only had the request once. It's a bit
       | tricky/tedious to do properly, but I'm willing to bring it over
       | to Caddy 2 with a sufficient sponsorship.
       | 
       | [1]: https://github.com/caddyserver/caddy/blob/v1/upgrade.go
        
         | eliaspro wrote:
         | I'm torn on this feature.
         | 
         | In the one hand, an application should never be able to replace
         | itself with "random code" to be executed. I want my systems to
         | be immutable. I want my services to be run with the smallest
         | set of privileges required.
         | 
         | On the other hand, it encourages "consumer level" users to keep
         | their software up-to-date, even when it wasn't installed from a
         | distribution's repository etc.
         | 
         | So I think in general it's a good feature to have, as advanced
         | users/distributions will restrict what a service/process is
         | able to to anyways and won't have any downsides of not using
         | this feature.
         | 
         | It should be optional, that's all!
        
           | blibble wrote:
           | if you can log into the machine and replace the nginx
           | executable you are probably capable of running it too
        
           | mholt wrote:
           | > an application should never be able to replace itself with
           | "random code" to be executed.
           | 
           | To clarify: it doesn't, nor has it ever worked that way. You
           | have to be the one to do that (or someone with privileges to
           | write to that file on disk). Most production setups don't
           | give Caddy that permission. And you have to trigger the
           | upgrade too.
        
         | zimbatm wrote:
         | If Caddy were to support systemd socket activation, this self-
         | restart dance is not necessary as the parent process (systemd)
         | is holding the socket for you. And for other systems, they can
         | use https://github.com/zimbatm/socketmaster instead. I believe
         | this to be more elegant and robust than the nginx approach as
         | there is no PID re-parenting issues.
         | 
         | But I suspect that most Caddy deployments are done via docker,
         | and that requires a whole container restart anyways.
        
           | tyingq wrote:
           | It's kind of fun to watch things go out of fashion and back
           | in. We used to use inetd, mostly because memory was
           | expensive, so it could spawn a service only when a request
           | came in, then the spawned process would exit and give the
           | memory back to the os. Then someone decided tcpd should sit
           | between inetd and servers, for security and logging. Then,
           | every service just ran as it's own daemon. Now I'm
           | occasionally seeing posts like this reviving inetd.
        
           | mholt wrote:
           | Good point, and I'm not sure which deployment method is more
           | popular.
           | 
           | In general I am personally not a fan of Docker due to added
           | complexities (often unnecessary for static binaries like
           | Caddy) and technical limitations such as this. All my Caddy
           | deployments use systemd (which I don't love either, sigh).
        
         | JesseObrien wrote:
         | Can you explain any of the technical details around this
         | perchance? I'm super curious. I know that SO_REUSEPORT[1]
         | exists but is that the only little trick to make this work?
         | From what I've read with SO_REUSEPORT it can open up that port
         | to hijacking by rogue processes, so is that fine to rely on?
         | 
         | [1] https://lwn.net/Articles/542629/
        
           | fragmede wrote:
           | If an attacker is already running rogue processes on your
           | box, the minor details surrounding SO_REUSEPORT is the least
           | of your worries. An attacker could just restart nginx, and
           | won't care about lost requests.
        
           | duskwuff wrote:
           | You don't even need that. If the old server process exec()s
           | the new one, it can pass on its file descriptors -- including
           | the listening socket -- when that happens.
        
             | mholt wrote:
             | Yep, we don't use SO_REUSEPORT. We just pass it from the
             | old process to the new one.
        
             | tyingq wrote:
             | You could also be fancy and pass open sockets over a unix
             | domain socket with sendmsg().
        
           | tyingq wrote:
           | >it can open up that port to hijacking by rogue processes
           | 
           | That seems relevant if the process is using a non-privileged
           | port that's >= 1024. If we're talking about privileged ports
           | (<= 1023), though, only another root process could hijack
           | that, and those can already hijack you many other ways.
        
       | bogomipz wrote:
       | I am curious does anyone know why Nginx uses SIGWINCH for this? I
       | know Apache uses WINCH as well which makes me wonder if there was
       | some historical reason a server process wound up using a signal
       | meant for a TTY?
        
       | bob1029 wrote:
       | I've considered building something like this to allow for us to
       | update customer software while it's serving users.
       | 
       | In my proposals, there would be a simple application-aware http
       | proxy process that we'd maintain and install on all environments.
       | It would handle relaying public traffic to the appropriate final
       | process on an alternate port. There would be a special pause
       | command we could invoke on the proxy that would buy us time to
       | swap the processes out from under the TCP requests. A second
       | resume command would be issued once the process is running and
       | stable. Ideally, the whole deal completes in ~5 seconds. Rapid
       | test rollbacks would be double that. You can do most of the work
       | ahead of time by toggling between an A and B install path for the
       | binaries, with a third common data path maintained in the middle
       | (databases, config, etc)
       | 
       | With the above proposal, the user experience would be a brief
       | delay at time of interaction, but we already have some UX
       | contexts where delays of up to 30 seconds are anticipated.
       | Absolutely no user request would be expected to drop with this
       | approach, even in a rollback scenario. Our product is broad
       | enough that entire sections of it can be a flaming wasteland
       | while other pockets of users are perfectly happy, so keeping the
       | happy users unbroken is key.
        
         | kayodelycaon wrote:
         | https://en.wikipedia.org/wiki/Blue-green_deployment
         | 
         | DNS not required. You can use a load balancer to do the same
         | thing. If you don't want a full second setup, do a rolling
         | restart of application servers instead.
         | 
         | Edit: I forgot... you can do this with containers too.
        
       | rootlocus wrote:
       | How do the two processes listen to the same port?
        
         | nullify88 wrote:
         | Once the USR2 signal is received the master process forks, the
         | child process inherits the parents file descriptors including
         | listen(). One process stops accepting connections creating a
         | queue in the kernel. The new process takes over and starts
         | accepting connections.
         | 
         | You can follow the trail by searching for ngx_exec_new_binary
         | in the nginx repo.
        
           | krab wrote:
           | Just to add - Nginx normally spawns several worker processes
           | that all process connections to the same port.
        
             | nullify88 wrote:
             | Correct but to clarify, only the master process binds to
             | the ports. The master process creates socketpairs to the
             | workers for interprocess communication. The workers accept
             | connections over the shared socket.
             | 
             | https://www.nginx.com/blog/socket-sharding-nginx-
             | release-1-9...
             | 
             | Page also has an example of how SO_REUSEPORT effects flow.
        
         | loeg wrote:
         | There's an ioctl for this on FreeBSD and Linux -- SO_REUSEPORT.
         | You could also just leave the listening socket open when
         | exec'ing the new httpd, or send it with a unix domain socket.
        
         | ctrlrsf wrote:
         | Using socket option SO_REUSEPORT allows multiple processes to
         | bind to same port.
        
           | VWWHFSfQ wrote:
           | is this what it's actually doing though? It doesn't say the
           | reuseport option to the listen directive is required for
           | this.
        
         | markbnj wrote:
         | This article on how haproxy uses SO_REUSEPORT goes into some
         | more detail: https://www.haproxy.com/blog/truly-seamless-
         | reloads-with-hap...
        
       ___________________________________________________________________
       (page generated 2022-01-04 23:00 UTC)