[HN Gopher] Optimizing Docker image size and why it matters
       ___________________________________________________________________
        
       Optimizing Docker image size and why it matters
        
       Author : swazzy
       Score  : 113 points
       Date   : 2022-01-06 19:13 UTC (3 hours ago)
        
 (HTM) web link (contains.dev)
 (TXT) w3m dump (contains.dev)
        
       | bravetraveler wrote:
       | The article doesn't seem to do much... in the 'why'. I'm
       | inundated with _how_ , though.
       | 
       | I've been on both sides of this argument, and I really think it's
       | a case-by-case thing.
       | 
       | A highly compliant environment? As minimal as possible. A
       | hobbyist/developer that wants to debug? Go as big of an image as
       | you want.
       | 
       | It shouldn't be an expensive operation to update your image base
       | and deploy a new one, regardless of size.
       | 
       | Network/resource constraints (should) be becoming less of an
       | issue. In a lot of cases, a local registry cache is all you need.
       | 
       | I worry partly about how much time is spent on this quest, or
       | secondary effects.
       | 
       | Has the situation with name resolution been dealt with in musl?
       | 
       | For example, something like /etc/hosts overrides not taking
       | proper precedence (or working at all). To be sure, that's not a
       | great thing to use - but it _does_ , and leads to a lot of head
       | scratching
        
         | 3np wrote:
         | I mean on one hand, yeah, but comparing Debian (124 MB) with
         | Ubuntu (73 MB) shows that with some effort you can eat your
         | cake and have it too.
        
         | yjftsjthsd-h wrote:
         | > A highly compliant environment? As minimal as possible. A
         | hobbyist/developer that wants to debug? Go as big of an image
         | as you want.
         | 
         | Hah, I go the other way; at work hardware is cheap and the
         | company wants me to ship yesterday, so sure I'll ship the big
         | image now and hope to optimize later. At home, I'm on a slow
         | internet connection and old hardware and I have no deadlines,
         | so I'm going to carefully cut down what I pull and what I
         | build.
        
       | no_wizard wrote:
       | I like this article, and there is a ton of nuance in the image
       | and how you should choose the appropriate one. I also like how
       | they cover only copying the files you actually need, particularly
       | with things like vendor or node_modules, you might be better off
       | just doing a volume mount instead of copying it over to the
       | entire image.
       | 
       | The only thing they didn't seem to cover is consider your target.
       | My general policy is dev images are almost always going to be
       | whatever lets me do one of the following:
       | 
       | - Easily install the tool I need
       | 
       | - All things being equal, if multiple image base OS's satisfy the
       | above, I go with alpine, cause its smallest
       | 
       | One thing I've noticed is simple purpose built images are faster,
       | even when there are a lot of them (big docker-compose user myself
       | for this reason) rather than stuffing a lot of services inside of
       | a single container or even "fewer" containers
       | 
       | EDIT: spelling, nuisance -> nuance
        
         | Sebb767 wrote:
         | > I also like how they cover only copying the files you
         | actually need, particularly with things like vendor or
         | node_modules, you might be better off just doing a volume mount
         | instead of copying it over to the entire image.
         | 
         | I'd highly suggest not to do that. If you do this, you directly
         | throw away reproducibility, since you can't simply revert back
         | to an older image if something stops working - you need to also
         | check the node_modules directory. You also can't simply run old
         | images or be sure that you have the same setup on your local
         | machine as in production, since you also need to copy the
         | state. Not to mention problems that might appear when your
         | servers have differing versions of the folder or the headaches
         | when needing to upgrade it together with your image.
         | 
         | Reducing your image size _is_ important, but this way you 'll
         | loose a lot of what Docker actually offers. It might make sense
         | in some specific cases, but you should be very aware of the
         | drawbacks.
        
         | dvtrn wrote:
         | _I like this article, and there is a ton of nuisance in the
         | image and how you should choose the appropriate one._
         | 
         | By chance, did you mean _nuance_? Because while I can agree it
         | you can quickly get into some messy weeds optimizing an
         | image...hearing someone call it a  "nuisance" made me chuckle
         | this afternoon
        
           | no_wizard wrote:
           | I did! Edited for clarification, though it definitely can be
           | both!
        
       | somehnacct3757 wrote:
       | The analyzer product this post is content marketing for looks
       | interesting, but I would want to run it locally rather than
       | connect my image repo to it.
       | 
       | Am I being paranoid? Is it reasonable to connect my images to a
       | random third party service like this?
        
       | adamgordonbell wrote:
       | You might not need to care about image size at all if your image
       | can be packaged as stargz.
       | 
       | stargz is a gamechanger for startup time.
       | 
       | kubernetes and podman support it, and docker support is likely
       | coming. It lazy loads the filesystem on start-up, making network
       | requests for things as needed and therefore can often start up
       | large images very fast.
       | 
       | Take a look at the startup graph here:
       | 
       | https://github.com/containerd/stargz-snapshotter
        
       | yjftsjthsd-h wrote:
       | > 1. Pick an appropriate base image
       | 
       | Starting with: Use the ones that are supposed to be small. Ubuntu
       | does this by default, I think, but debian:stable-slim is 30 MB
       | (down from the non-slim 52MB), node has slim and alpine tags,
       | etc. If you want to do more intensive changes that's fine, but
       | start with the nearly-zero-effort one first.
       | 
       | EDIT: Also, where is the author getting these numbers? They've
       | got a chart that shows Debian at 124MB, but just clicking that
       | link lands you at a page listing it at 52MB.
        
       | alanwreath wrote:
       | I always feel helpless with python containers - it seems there
       | isn't much savings ever eeked out of multi-stage and other
       | strategies that typically are suggested. Docker container size
       | really has made compiled languages more attractive to me
        
       | bingohbangoh wrote:
       | For my two cents, if you're image requires anything not vanilla,
       | you may be better off stomaching the larger Ubuntu image.
       | 
       | Lots of edge cases around specific libraries come up that you
       | don't expect. I spent hours tearing my hair out trying to get
       | Selenium and python working on an alpine image that worked out-
       | of-the-box on the Ubuntu image.
        
         | aledalgrande wrote:
         | I would rather install the needed libraries myself and not have
         | to deal with tons of security fixes of libraries I don't use.
        
           | CJefferson wrote:
           | Do libraries just sat there on disc do any damage?
           | 
           | Also, are you going to update those libraries as soon as a
           | security issue arises? Debian/Ubuntu and friends have teams
           | dedicated to that type of thing.
        
             | postalrat wrote:
             | Can they be used somehow? They perhaps.
             | 
             | Depending where you work you might also need to pass some
             | sort of imaging scan that will look at the versions of
             | everything installed.
        
           | erik_seaberg wrote:
           | That's rolling your own distro. We could do that but it's not
           | really our job. It also prevents the libraries from being
           | shared between images, unless you build one base layer and
           | use it for everything in your org (third parties won't).
        
           | curiousgal wrote:
           | I mean honestly if you're _that_ paranoid then you shouldn 't
           | be using Docker in the first place.
        
             | aledalgrande wrote:
             | What does docker have to do with patching security fixes?
             | If you have an EC2 box it's going to be the same. I don't
             | consider that paranoid.
        
           | pas wrote:
           | musl DNS stub resolver is "broken" unfortunately (it doesn't
           | do TCP, which is a problem usually when you want to deploy
           | something into a highly dynamic DNS-configured environment,
           | eg. k8s)
        
           | coredog64 wrote:
           | Once you start adding stuff, I think Alpine gets worse. For
           | example, there's a libgmp issue that's in the latest Alpine
           | versions since November. It's fixed upstream but hasn't been
           | pulled into Alpine.
        
         | FinalBriefing wrote:
         | I generally agree.
         | 
         | I start all my projects based on Alpine (alpine-node, for
         | example). I'll sometimes need to install a few libraries like
         | ImageMagic, but if that list starts to grow, I'll just use
         | Ubuntu.
        
       | qbasic_forever wrote:
       | There's some more to consider with the latest buildkit frontend
       | for docker, check it out here:
       | https://hub.docker.com/r/docker/dockerfile
       | 
       | In particular cache mounts (RUN --mount-type=cache) can help the
       | package manager cache size issue, and heredocs are a game-changer
       | for inline scripts. Forget doing all that && nonsense, write
       | clean multiline run commands:                   RUN <<EOF
       | apt-get update           apt-get install -y foo bar baz
       | etc...         EOF
       | 
       | All of this works right now in plain old desktop docker you have
       | installed right now, you just need to use the buildx command
       | (buildkit engine) and reference the docker labs buildkit frontend
       | image above. Unfortunately it's barely mentioned in docs or
       | anywhere else other than their blog right now.
        
       | no_wizard wrote:
       | Somewhat tangentially related to the topic of this post: does
       | anyone know any good tech for keeping an image "warm". For
       | instance, I like to spin up separate containers for my tests vs
       | development so they can be "one single process" focused, but it
       | is not always practical (due to system resources on my local dev
       | machine) to just keep my test runner in "watch" mode, so I spin
       | it down and have to spin it back up, and there's always some
       | delay - even when cached. Is there a way to keep this "hot" but
       | not run a process as a result? I generally try to do watch mode
       | for tests, but with webdev I got alot of file watchers running,
       | and this can cause a lot of overhead with my containers (on macOS
       | for what its worth)
       | 
       | Is there anything one can do to help this issue?
        
         | pas wrote:
         | You could launch the container itself with sleep. (docker run
         | --entrypoint /bin/sh [image] sleep inf) Then start the dev
         | watch thing with 'docker exec', and when you don't need it
         | anymore you can kill it. (Eg. via htop)
         | 
         | With uwsgi you can control which file to watch. I usually just
         | set it to watch the index.py so when I want to restart it, I
         | just switch to that and save the file.
         | 
         | Similarly you could do this with "entr"
         | https://github.com/eradman/entr
        
         | PhilippGille wrote:
         | > keeping an image "warm"
         | 
         | Do you mean container? So you'd like to have your long running
         | dev container, and a separate test container that keeps running
         | but you only use it every now and then, right? Because you
         | neither want to include the test stuff in your dev container,
         | nor use file watchers for the tests?
         | 
         | Then while I don't know your exact environment and flow, could
         | you start the container with `docker run ... sh -c "while true;
         | do sleep 1; done"` to "keep it warm" and then `docker exec ...`
         | to run the tests?
        
       | 2OEH8eoCRo0 wrote:
       | I also liked this one:
       | 
       | https://fedoramagazine.org/build-smaller-containers/
       | 
       | I don't avoid large images because of their size, I avoid them
       | because it's an indicator that I'm packaging much more than is
       | necessary. If I package a lot more than is necessary then perhaps
       | I do not understand my dependencies well enough or my container
       | is doing too much.
        
       | nodesocket wrote:
       | A very common mistake I see (though not related to image size
       | perse) when running Node apps is to do CMD ["npm", "run",
       | "start"]. This is first memory wasteful, as npm is running as the
       | parent process and forking node to run the main script. Also, the
       | bigger problem is that the npm process does not send signals down
       | to its child thus SIGINT and SIGTERM are not passed from npm into
       | node which means your server may not be gracefully closing
       | connections.
        
         | Ramiro wrote:
         | I never really thought about this, it's a good point. What do
         | you suggest it's used instead of ["npm", "run", "start"]?
        
           | nicholasjarnold wrote:
           | This is a great use case for tini[0]. Try this, after
           | installing the tini binary to /sbin:
           | ENTRYPOINT ["/sbin/tini", "--"]       CMD ["node",
           | "/path/to/main/process.js"]
           | 
           | [0]: https://github.com/krallin/tini
           | 
           | edit: formatting, sorry.
        
             | remram wrote:
             | I think this is built into docker now:
             | https://docs.docker.com/engine/reference/run/#specify-an-
             | ini...
             | 
             | If you use Kubernetes then you have to add tini for now
             | (https://github.com/kubernetes/kubernetes/issues/84210)
        
           | bravetraveler wrote:
           | I'm not a Node/NPM person, but I imagine they had in mind the
           | equivalent of whatever is expected from npm. I expect some
           | nodejs command to invoke the service directly
           | 
           | Edit: Consequently this should make the container logs a bit
           | more useful, beyond better signal handling/respect
        
           | davidjfelix wrote:
           | ["node", "/path/to/your/entrypoint.js"]
        
           | pineconewarrior wrote:
           | I assume it'd be better to execute index.js directly with
           | node
        
         | j1elo wrote:
         | Node.js has both a Best Practices [0] and a tutorial [1] that
         | instruct to use _CMD [ "node", "main.js"]_. In short: do not
         | run NPM as main process; instead, run Node directly.
         | 
         | This way, the Node process itself will run as PID 1 of the
         | container (instead of just being a child process of NPM).
         | 
         | The same can be found in other collections of best practices
         | such as [2].
         | 
         | What I do is a bit more complex: an entrypoint.sh which ends up
         | running                   exec node main.js "$*"
         | 
         | Docs then tell users to use " _docker run --init_ "; this flag
         | will tell Docker to use the Tini minimal init system as PID 1,
         | which handles system SIGnals appropriately.
         | 
         | [0]: https://github.com/nodejs/docker-
         | node/blob/main/docs/BestPra...
         | 
         | [1]: https://nodejs.org/en/docs/guides/nodejs-docker-webapp/
         | 
         | [2]: https://dev.to/nodepractices/docker-best-practices-with-
         | node...
         | 
         | Edit: corrected the part about using --init for proper handling
         | of signals.
        
       | [deleted]
        
       | miyuru wrote:
       | There are another base images from google that are smaller than
       | the base images and come handy when deploying applications that
       | runs on single binary.
       | 
       | > Distroless images are very small. The smallest distroless
       | image, gcr.io/distroless/static-debian11, is around 2 MiB. That's
       | about 50% of the size of alpine (~5 MiB), and less than 2% of the
       | size of debian (124 MiB).
       | 
       | https://github.com/GoogleContainerTools/distroless
        
         | yjftsjthsd-h wrote:
         | So the percentage makes it look impressive, but... you're
         | saving no more than 5MB. Don't get me wrong, I like smaller
         | images, but I feel like "smaller than Alpine" is getting into
         | -funroll-loops territory of over-optimizing.
        
         | ImJasonH wrote:
         | It got removed from the README at some point, but the smallest
         | distroless image, gcr.io/distroless/static is 786KB compressed
         | -- 1/3 the size of this image of shipping containers[0], and
         | small enough to fit on a 3.5" floppy disk.
         | 
         | 0: https://unsplash.com/photos/bukjsECgmeU
        
         | Ramiro wrote:
         | Distroless are tiny, but sometimes the fact that don't have
         | anything on them other than the application binary makes them
         | harder to interact with, specially when troubleshooting or
         | profiling. We recently moved a lot of our stuff back to vanilla
         | debian for this reason. We figured that the extra 100MB
         | wouldn't make that big of a difference when pulling for our
         | Kubernetes clusters. YMMV.
        
           | jrockway wrote:
           | I've found myself exec-ing into containers a lot less often
           | recently. Kubernetes has ephemeral containers for debugging.
           | This is of limited use to me; the problem is usually lower
           | level (container engine or networking malfunctioning) or
           | higher level (app is broke, and there is no command "fix-app"
           | included in Debian). For the problems that are lower level,
           | it's simplest to resolve by just ssh-ing to the node (great
           | for a targeted tcpdump). For the problems that are higher
           | level, it's easier to just integrate things into your app (I
           | would die without net/http/pprof in Go apps, for example).
           | 
           | I was an early adopter of distroless, though, so I'm probably
           | just used to not having a shell in the container. If you use
           | it everyday I'm sure it must be helpful in some way. My
           | philosophy is as soon as you start having a shell on your
           | cattle, it becomes a pet, though. Easy to leave one-off fixes
           | around that are auto-reverted when you reschedule your
           | deployment or whatever. This has never happened to me but I
           | do worry about it. I'd also say that if you are uncomfortable
           | about how "exec" lets people do anything in a container,
           | you'd probably be even more uncomfortable giving them root on
           | the node itself. And of course it's very easy to break things
           | at that level as well.
        
           | gravypod wrote:
           | There are some tools that allow you to copy debug tools into
           | a container when needed. I think all that needs to be I'm the
           | container is tar and it runs `kubectl exec ... tar` in the
           | container. This allows you to get in when needed but still
           | keep your production attack surface low.
           | 
           | Either way as long as all your containers share the same base
           | layer it doesn't really matter since they will be
           | deduplicate.
        
             | theptip wrote:
             | I believe "Ephemeral containers" are intended to resolve
             | this issue; you can attach a "debug container" to your pod
             | with a shell and other tools.
             | 
             | https://kubernetes.io/docs/concepts/workloads/pods/ephemera
             | l...
             | 
             | Still beta, I haven't tried it yet myself. Looks
             | interesting though.
        
           | theptip wrote:
           | Also if you are running k8s, and use the same base image for
           | your app containers, you amortize this cost as you only need
           | to pull the base layers once per node. So in practice you
           | won't pull that 100mb many times.
           | 
           | (This benefit compounds the more frequently you rebuild your
           | app containers.)
        
             | PaulKeeble wrote:
             | Base images like alpine/debian/ubuntu get used by a lot of
             | third party containers too so if you have multiple
             | containers running on the same device they may in practice
             | be very small until the base image gets an upgrade.
        
               | erik_seaberg wrote:
               | This. The article talks about
               | 
               | > Each layer in your image might have a leaner version
               | that is sufficient for your needs.
               | 
               | when reusing a huge layer is cheaper than choosing a
               | small layer that is _not_ reused.
        
             | yjftsjthsd-h wrote:
             | Doesn't that only work if you used the _exact_ same base?
             | If I build 2 images from debian:11 but one of them used
             | debian:11 last month and one uses debian:11 today, I
             | thought they end up not sharing a base layer because they
             | 're resolving debian:11 to different hashes and actually
             | using the base image by exact image ID.
        
           | podge wrote:
           | I found this to be an issue as well, but there are a few ways
           | around this for when you need to debug something. The most
           | useful approach I found was to launch a new container from a
           | standard image (like Ubuntu) which shares the same process
           | namespace, for example:
           | 
           | docker run --rm -it --pid=container:distroless-app
           | ubuntu:20.04
           | 
           | You can then see processes in the 'distroless-app' container
           | from the new container, and then you can install as many
           | debugging tools as you like without affecting the original
           | container.
           | 
           | Alternatively distroless have debug images you could use as a
           | base instead which are probably still smaller than many other
           | base images:
           | 
           | https://github.com/GoogleContainerTools/distroless#debug-
           | ima...
        
           | staticassertion wrote:
           | The way I imagine this is best solved is by keeping a
           | compressed set of tools on your host and then mounting those
           | tools into a volume for your container.
           | 
           | So if you have N containers on a host you only end up with
           | one set of tooling across all of them, and it's compressed
           | until you need it.
           | 
           | You can decouple your test tooling from your
           | images/containers, which has a number of benefits. One that's
           | perhaps understated is reducing attacker capabilities in the
           | container.
           | 
           | With log4j some of the payloads were essentially just calling
           | out to various binaries on Linux. If you don't have those
           | they die instantly.
        
       | tonymet wrote:
       | This app is great for discovering waste
       | 
       | https://github.com/wagoodman/dive
       | 
       | I've found 100MB fonts and other waste.
       | 
       | All the tips are good, but until you actually inspect your
       | images, you won't know why they are so bloated.
        
         | Twirrim wrote:
         | Every now and then I break out dive and take a look at
         | container images. Almost without fail I'll find something we
         | can improve.
         | 
         | The UX is great for the tool, gives me absolutely everything I
         | need to see, in such a clear fashion, and with virtually no
         | learning curve at all for using it.
        
       | jasonpeacock wrote:
       | A common mistake that's not covered in this article is the need
       | to perform your add & remove operations in the same RUN command.
       | Doing them separately creates two separate layers which inflates
       | the image size.
       | 
       | This creates two image layers - the first layer has all the added
       | foo, including any intermediate artifacts. Then the second layer
       | removes the intermediate artifacts, but that's saved as a diff
       | against the previous layer:                   RUN ./install-foo
       | RUN ./cleanup-foo
       | 
       | Instead, you need to do them in the same RUN command:
       | RUN ./insall-foo && ./cleanup-foo
       | 
       | This creates a single layer which has only the foo artifacts you
       | need.
       | 
       | This why the official Dockerfile best practices show[1] the apt
       | cache being cleaned up in the same RUN command:
       | RUN apt-get update && apt-get install -y \             package-
       | bar \             package-baz \             package-foo  \
       | && rm -rf /var/lib/apt/lists/*
       | 
       | [1] https://docs.docker.com/develop/develop-
       | images/dockerfile_be...
        
         | gavinray wrote:
         | You can use "--squash" to remove all intermediate layers
         | 
         | https://docs.docker.com/engine/reference/commandline/build/#...
         | 
         | The downside of trying to jam all of your commands into a
         | gigantic single RUN invocation is that if it isn't correct/you
         | need to troubleshoot it, you can wind up waiting 10-20 minutes
         | between each single line change just waiting for your build to
         | finish.
         | 
         | You lose all the layer caching benefits and it has to re-do the
         | entire build.
         | 
         | Just a heads up for anyone that's not suffered through this
         | before.
        
           | yjftsjthsd-h wrote:
           | But then you end up with just one layer, so you lose out on
           | any caching and sharing you might have gotten. Whether this
           | matters is of course _very_ context dependent, but there are
           | times when it 'll cost you space.
        
           | imglorp wrote:
           | This is huge, thanks for the lead. Others should note it's
           | still experimental and your build command may fail with
           | 
           | > "--squash" is only supported on a Docker daemon with
           | experimental features enabled
           | 
           | Up til now, our biggest improvement was with "FROM SCRATCH".
        
             | selfup wrote:
             | Good to know. `FROM scratch` is such a breath of fresh air
             | for compiled apps. No need for Alpine if I just need to run
             | a binary!
        
               | jrockway wrote:
               | Do keep in mind that you might want a set of trusted TLS
               | certificates and the timezone database. Both will be
               | annoying runtime errors when you don't trust
               | https://api.example.com or try to return a time to a user
               | in their preferred time zone. Distroless includes these.
        
             | gavinray wrote:
             | No problem.                 > Others should note it's still
             | experimental and your build command may fail with
             | 
             | You might try _" docker buildx build"_, to use the BuildKit
             | client -- squash isn't experimental in that one I believe
             | =)
             | 
             | https://docs.docker.com/engine/reference/commandline/buildx
             | _...
        
           | selfup wrote:
           | Had no idea about squash. Using cached layers can really save
           | time, especially when you already have OS deps/project deps
           | installed. Thanks!
        
           | [deleted]
        
         | qbasic_forever wrote:
         | You don't have to do this anymore, the buildkit frontend for
         | docker has a new feature that supports multiline heredoc
         | strings for commands: https://www.docker.com/blog/introduction-
         | to-heredocs-in-dock... It's a game changer but unfortunately
         | barely mentioned anywhere.
        
       | epberry wrote:
       | If you click 'Pricing' on the main site an error occurs just FYI.
        
       ___________________________________________________________________
       (page generated 2022-01-06 23:00 UTC)