[HN Gopher] Optimizing Docker image size and why it matters ___________________________________________________________________ Optimizing Docker image size and why it matters Author : swazzy Score : 113 points Date : 2022-01-06 19:13 UTC (3 hours ago) (HTM) web link (contains.dev) (TXT) w3m dump (contains.dev) | bravetraveler wrote: | The article doesn't seem to do much... in the 'why'. I'm | inundated with _how_ , though. | | I've been on both sides of this argument, and I really think it's | a case-by-case thing. | | A highly compliant environment? As minimal as possible. A | hobbyist/developer that wants to debug? Go as big of an image as | you want. | | It shouldn't be an expensive operation to update your image base | and deploy a new one, regardless of size. | | Network/resource constraints (should) be becoming less of an | issue. In a lot of cases, a local registry cache is all you need. | | I worry partly about how much time is spent on this quest, or | secondary effects. | | Has the situation with name resolution been dealt with in musl? | | For example, something like /etc/hosts overrides not taking | proper precedence (or working at all). To be sure, that's not a | great thing to use - but it _does_ , and leads to a lot of head | scratching | 3np wrote: | I mean on one hand, yeah, but comparing Debian (124 MB) with | Ubuntu (73 MB) shows that with some effort you can eat your | cake and have it too. | yjftsjthsd-h wrote: | > A highly compliant environment? As minimal as possible. A | hobbyist/developer that wants to debug? Go as big of an image | as you want. | | Hah, I go the other way; at work hardware is cheap and the | company wants me to ship yesterday, so sure I'll ship the big | image now and hope to optimize later. At home, I'm on a slow | internet connection and old hardware and I have no deadlines, | so I'm going to carefully cut down what I pull and what I | build. | no_wizard wrote: | I like this article, and there is a ton of nuance in the image | and how you should choose the appropriate one. I also like how | they cover only copying the files you actually need, particularly | with things like vendor or node_modules, you might be better off | just doing a volume mount instead of copying it over to the | entire image. | | The only thing they didn't seem to cover is consider your target. | My general policy is dev images are almost always going to be | whatever lets me do one of the following: | | - Easily install the tool I need | | - All things being equal, if multiple image base OS's satisfy the | above, I go with alpine, cause its smallest | | One thing I've noticed is simple purpose built images are faster, | even when there are a lot of them (big docker-compose user myself | for this reason) rather than stuffing a lot of services inside of | a single container or even "fewer" containers | | EDIT: spelling, nuisance -> nuance | Sebb767 wrote: | > I also like how they cover only copying the files you | actually need, particularly with things like vendor or | node_modules, you might be better off just doing a volume mount | instead of copying it over to the entire image. | | I'd highly suggest not to do that. If you do this, you directly | throw away reproducibility, since you can't simply revert back | to an older image if something stops working - you need to also | check the node_modules directory. You also can't simply run old | images or be sure that you have the same setup on your local | machine as in production, since you also need to copy the | state. Not to mention problems that might appear when your | servers have differing versions of the folder or the headaches | when needing to upgrade it together with your image. | | Reducing your image size _is_ important, but this way you 'll | loose a lot of what Docker actually offers. It might make sense | in some specific cases, but you should be very aware of the | drawbacks. | dvtrn wrote: | _I like this article, and there is a ton of nuisance in the | image and how you should choose the appropriate one._ | | By chance, did you mean _nuance_? Because while I can agree it | you can quickly get into some messy weeds optimizing an | image...hearing someone call it a "nuisance" made me chuckle | this afternoon | no_wizard wrote: | I did! Edited for clarification, though it definitely can be | both! | somehnacct3757 wrote: | The analyzer product this post is content marketing for looks | interesting, but I would want to run it locally rather than | connect my image repo to it. | | Am I being paranoid? Is it reasonable to connect my images to a | random third party service like this? | adamgordonbell wrote: | You might not need to care about image size at all if your image | can be packaged as stargz. | | stargz is a gamechanger for startup time. | | kubernetes and podman support it, and docker support is likely | coming. It lazy loads the filesystem on start-up, making network | requests for things as needed and therefore can often start up | large images very fast. | | Take a look at the startup graph here: | | https://github.com/containerd/stargz-snapshotter | yjftsjthsd-h wrote: | > 1. Pick an appropriate base image | | Starting with: Use the ones that are supposed to be small. Ubuntu | does this by default, I think, but debian:stable-slim is 30 MB | (down from the non-slim 52MB), node has slim and alpine tags, | etc. If you want to do more intensive changes that's fine, but | start with the nearly-zero-effort one first. | | EDIT: Also, where is the author getting these numbers? They've | got a chart that shows Debian at 124MB, but just clicking that | link lands you at a page listing it at 52MB. | alanwreath wrote: | I always feel helpless with python containers - it seems there | isn't much savings ever eeked out of multi-stage and other | strategies that typically are suggested. Docker container size | really has made compiled languages more attractive to me | bingohbangoh wrote: | For my two cents, if you're image requires anything not vanilla, | you may be better off stomaching the larger Ubuntu image. | | Lots of edge cases around specific libraries come up that you | don't expect. I spent hours tearing my hair out trying to get | Selenium and python working on an alpine image that worked out- | of-the-box on the Ubuntu image. | aledalgrande wrote: | I would rather install the needed libraries myself and not have | to deal with tons of security fixes of libraries I don't use. | CJefferson wrote: | Do libraries just sat there on disc do any damage? | | Also, are you going to update those libraries as soon as a | security issue arises? Debian/Ubuntu and friends have teams | dedicated to that type of thing. | postalrat wrote: | Can they be used somehow? They perhaps. | | Depending where you work you might also need to pass some | sort of imaging scan that will look at the versions of | everything installed. | erik_seaberg wrote: | That's rolling your own distro. We could do that but it's not | really our job. It also prevents the libraries from being | shared between images, unless you build one base layer and | use it for everything in your org (third parties won't). | curiousgal wrote: | I mean honestly if you're _that_ paranoid then you shouldn 't | be using Docker in the first place. | aledalgrande wrote: | What does docker have to do with patching security fixes? | If you have an EC2 box it's going to be the same. I don't | consider that paranoid. | pas wrote: | musl DNS stub resolver is "broken" unfortunately (it doesn't | do TCP, which is a problem usually when you want to deploy | something into a highly dynamic DNS-configured environment, | eg. k8s) | coredog64 wrote: | Once you start adding stuff, I think Alpine gets worse. For | example, there's a libgmp issue that's in the latest Alpine | versions since November. It's fixed upstream but hasn't been | pulled into Alpine. | FinalBriefing wrote: | I generally agree. | | I start all my projects based on Alpine (alpine-node, for | example). I'll sometimes need to install a few libraries like | ImageMagic, but if that list starts to grow, I'll just use | Ubuntu. | qbasic_forever wrote: | There's some more to consider with the latest buildkit frontend | for docker, check it out here: | https://hub.docker.com/r/docker/dockerfile | | In particular cache mounts (RUN --mount-type=cache) can help the | package manager cache size issue, and heredocs are a game-changer | for inline scripts. Forget doing all that && nonsense, write | clean multiline run commands: RUN <<EOF | apt-get update apt-get install -y foo bar baz | etc... EOF | | All of this works right now in plain old desktop docker you have | installed right now, you just need to use the buildx command | (buildkit engine) and reference the docker labs buildkit frontend | image above. Unfortunately it's barely mentioned in docs or | anywhere else other than their blog right now. | no_wizard wrote: | Somewhat tangentially related to the topic of this post: does | anyone know any good tech for keeping an image "warm". For | instance, I like to spin up separate containers for my tests vs | development so they can be "one single process" focused, but it | is not always practical (due to system resources on my local dev | machine) to just keep my test runner in "watch" mode, so I spin | it down and have to spin it back up, and there's always some | delay - even when cached. Is there a way to keep this "hot" but | not run a process as a result? I generally try to do watch mode | for tests, but with webdev I got alot of file watchers running, | and this can cause a lot of overhead with my containers (on macOS | for what its worth) | | Is there anything one can do to help this issue? | pas wrote: | You could launch the container itself with sleep. (docker run | --entrypoint /bin/sh [image] sleep inf) Then start the dev | watch thing with 'docker exec', and when you don't need it | anymore you can kill it. (Eg. via htop) | | With uwsgi you can control which file to watch. I usually just | set it to watch the index.py so when I want to restart it, I | just switch to that and save the file. | | Similarly you could do this with "entr" | https://github.com/eradman/entr | PhilippGille wrote: | > keeping an image "warm" | | Do you mean container? So you'd like to have your long running | dev container, and a separate test container that keeps running | but you only use it every now and then, right? Because you | neither want to include the test stuff in your dev container, | nor use file watchers for the tests? | | Then while I don't know your exact environment and flow, could | you start the container with `docker run ... sh -c "while true; | do sleep 1; done"` to "keep it warm" and then `docker exec ...` | to run the tests? | 2OEH8eoCRo0 wrote: | I also liked this one: | | https://fedoramagazine.org/build-smaller-containers/ | | I don't avoid large images because of their size, I avoid them | because it's an indicator that I'm packaging much more than is | necessary. If I package a lot more than is necessary then perhaps | I do not understand my dependencies well enough or my container | is doing too much. | nodesocket wrote: | A very common mistake I see (though not related to image size | perse) when running Node apps is to do CMD ["npm", "run", | "start"]. This is first memory wasteful, as npm is running as the | parent process and forking node to run the main script. Also, the | bigger problem is that the npm process does not send signals down | to its child thus SIGINT and SIGTERM are not passed from npm into | node which means your server may not be gracefully closing | connections. | Ramiro wrote: | I never really thought about this, it's a good point. What do | you suggest it's used instead of ["npm", "run", "start"]? | nicholasjarnold wrote: | This is a great use case for tini[0]. Try this, after | installing the tini binary to /sbin: | ENTRYPOINT ["/sbin/tini", "--"] CMD ["node", | "/path/to/main/process.js"] | | [0]: https://github.com/krallin/tini | | edit: formatting, sorry. | remram wrote: | I think this is built into docker now: | https://docs.docker.com/engine/reference/run/#specify-an- | ini... | | If you use Kubernetes then you have to add tini for now | (https://github.com/kubernetes/kubernetes/issues/84210) | bravetraveler wrote: | I'm not a Node/NPM person, but I imagine they had in mind the | equivalent of whatever is expected from npm. I expect some | nodejs command to invoke the service directly | | Edit: Consequently this should make the container logs a bit | more useful, beyond better signal handling/respect | davidjfelix wrote: | ["node", "/path/to/your/entrypoint.js"] | pineconewarrior wrote: | I assume it'd be better to execute index.js directly with | node | j1elo wrote: | Node.js has both a Best Practices [0] and a tutorial [1] that | instruct to use _CMD [ "node", "main.js"]_. In short: do not | run NPM as main process; instead, run Node directly. | | This way, the Node process itself will run as PID 1 of the | container (instead of just being a child process of NPM). | | The same can be found in other collections of best practices | such as [2]. | | What I do is a bit more complex: an entrypoint.sh which ends up | running exec node main.js "$*" | | Docs then tell users to use " _docker run --init_ "; this flag | will tell Docker to use the Tini minimal init system as PID 1, | which handles system SIGnals appropriately. | | [0]: https://github.com/nodejs/docker- | node/blob/main/docs/BestPra... | | [1]: https://nodejs.org/en/docs/guides/nodejs-docker-webapp/ | | [2]: https://dev.to/nodepractices/docker-best-practices-with- | node... | | Edit: corrected the part about using --init for proper handling | of signals. | [deleted] | miyuru wrote: | There are another base images from google that are smaller than | the base images and come handy when deploying applications that | runs on single binary. | | > Distroless images are very small. The smallest distroless | image, gcr.io/distroless/static-debian11, is around 2 MiB. That's | about 50% of the size of alpine (~5 MiB), and less than 2% of the | size of debian (124 MiB). | | https://github.com/GoogleContainerTools/distroless | yjftsjthsd-h wrote: | So the percentage makes it look impressive, but... you're | saving no more than 5MB. Don't get me wrong, I like smaller | images, but I feel like "smaller than Alpine" is getting into | -funroll-loops territory of over-optimizing. | ImJasonH wrote: | It got removed from the README at some point, but the smallest | distroless image, gcr.io/distroless/static is 786KB compressed | -- 1/3 the size of this image of shipping containers[0], and | small enough to fit on a 3.5" floppy disk. | | 0: https://unsplash.com/photos/bukjsECgmeU | Ramiro wrote: | Distroless are tiny, but sometimes the fact that don't have | anything on them other than the application binary makes them | harder to interact with, specially when troubleshooting or | profiling. We recently moved a lot of our stuff back to vanilla | debian for this reason. We figured that the extra 100MB | wouldn't make that big of a difference when pulling for our | Kubernetes clusters. YMMV. | jrockway wrote: | I've found myself exec-ing into containers a lot less often | recently. Kubernetes has ephemeral containers for debugging. | This is of limited use to me; the problem is usually lower | level (container engine or networking malfunctioning) or | higher level (app is broke, and there is no command "fix-app" | included in Debian). For the problems that are lower level, | it's simplest to resolve by just ssh-ing to the node (great | for a targeted tcpdump). For the problems that are higher | level, it's easier to just integrate things into your app (I | would die without net/http/pprof in Go apps, for example). | | I was an early adopter of distroless, though, so I'm probably | just used to not having a shell in the container. If you use | it everyday I'm sure it must be helpful in some way. My | philosophy is as soon as you start having a shell on your | cattle, it becomes a pet, though. Easy to leave one-off fixes | around that are auto-reverted when you reschedule your | deployment or whatever. This has never happened to me but I | do worry about it. I'd also say that if you are uncomfortable | about how "exec" lets people do anything in a container, | you'd probably be even more uncomfortable giving them root on | the node itself. And of course it's very easy to break things | at that level as well. | gravypod wrote: | There are some tools that allow you to copy debug tools into | a container when needed. I think all that needs to be I'm the | container is tar and it runs `kubectl exec ... tar` in the | container. This allows you to get in when needed but still | keep your production attack surface low. | | Either way as long as all your containers share the same base | layer it doesn't really matter since they will be | deduplicate. | theptip wrote: | I believe "Ephemeral containers" are intended to resolve | this issue; you can attach a "debug container" to your pod | with a shell and other tools. | | https://kubernetes.io/docs/concepts/workloads/pods/ephemera | l... | | Still beta, I haven't tried it yet myself. Looks | interesting though. | theptip wrote: | Also if you are running k8s, and use the same base image for | your app containers, you amortize this cost as you only need | to pull the base layers once per node. So in practice you | won't pull that 100mb many times. | | (This benefit compounds the more frequently you rebuild your | app containers.) | PaulKeeble wrote: | Base images like alpine/debian/ubuntu get used by a lot of | third party containers too so if you have multiple | containers running on the same device they may in practice | be very small until the base image gets an upgrade. | erik_seaberg wrote: | This. The article talks about | | > Each layer in your image might have a leaner version | that is sufficient for your needs. | | when reusing a huge layer is cheaper than choosing a | small layer that is _not_ reused. | yjftsjthsd-h wrote: | Doesn't that only work if you used the _exact_ same base? | If I build 2 images from debian:11 but one of them used | debian:11 last month and one uses debian:11 today, I | thought they end up not sharing a base layer because they | 're resolving debian:11 to different hashes and actually | using the base image by exact image ID. | podge wrote: | I found this to be an issue as well, but there are a few ways | around this for when you need to debug something. The most | useful approach I found was to launch a new container from a | standard image (like Ubuntu) which shares the same process | namespace, for example: | | docker run --rm -it --pid=container:distroless-app | ubuntu:20.04 | | You can then see processes in the 'distroless-app' container | from the new container, and then you can install as many | debugging tools as you like without affecting the original | container. | | Alternatively distroless have debug images you could use as a | base instead which are probably still smaller than many other | base images: | | https://github.com/GoogleContainerTools/distroless#debug- | ima... | staticassertion wrote: | The way I imagine this is best solved is by keeping a | compressed set of tools on your host and then mounting those | tools into a volume for your container. | | So if you have N containers on a host you only end up with | one set of tooling across all of them, and it's compressed | until you need it. | | You can decouple your test tooling from your | images/containers, which has a number of benefits. One that's | perhaps understated is reducing attacker capabilities in the | container. | | With log4j some of the payloads were essentially just calling | out to various binaries on Linux. If you don't have those | they die instantly. | tonymet wrote: | This app is great for discovering waste | | https://github.com/wagoodman/dive | | I've found 100MB fonts and other waste. | | All the tips are good, but until you actually inspect your | images, you won't know why they are so bloated. | Twirrim wrote: | Every now and then I break out dive and take a look at | container images. Almost without fail I'll find something we | can improve. | | The UX is great for the tool, gives me absolutely everything I | need to see, in such a clear fashion, and with virtually no | learning curve at all for using it. | jasonpeacock wrote: | A common mistake that's not covered in this article is the need | to perform your add & remove operations in the same RUN command. | Doing them separately creates two separate layers which inflates | the image size. | | This creates two image layers - the first layer has all the added | foo, including any intermediate artifacts. Then the second layer | removes the intermediate artifacts, but that's saved as a diff | against the previous layer: RUN ./install-foo | RUN ./cleanup-foo | | Instead, you need to do them in the same RUN command: | RUN ./insall-foo && ./cleanup-foo | | This creates a single layer which has only the foo artifacts you | need. | | This why the official Dockerfile best practices show[1] the apt | cache being cleaned up in the same RUN command: | RUN apt-get update && apt-get install -y \ package- | bar \ package-baz \ package-foo \ | && rm -rf /var/lib/apt/lists/* | | [1] https://docs.docker.com/develop/develop- | images/dockerfile_be... | gavinray wrote: | You can use "--squash" to remove all intermediate layers | | https://docs.docker.com/engine/reference/commandline/build/#... | | The downside of trying to jam all of your commands into a | gigantic single RUN invocation is that if it isn't correct/you | need to troubleshoot it, you can wind up waiting 10-20 minutes | between each single line change just waiting for your build to | finish. | | You lose all the layer caching benefits and it has to re-do the | entire build. | | Just a heads up for anyone that's not suffered through this | before. | yjftsjthsd-h wrote: | But then you end up with just one layer, so you lose out on | any caching and sharing you might have gotten. Whether this | matters is of course _very_ context dependent, but there are | times when it 'll cost you space. | imglorp wrote: | This is huge, thanks for the lead. Others should note it's | still experimental and your build command may fail with | | > "--squash" is only supported on a Docker daemon with | experimental features enabled | | Up til now, our biggest improvement was with "FROM SCRATCH". | selfup wrote: | Good to know. `FROM scratch` is such a breath of fresh air | for compiled apps. No need for Alpine if I just need to run | a binary! | jrockway wrote: | Do keep in mind that you might want a set of trusted TLS | certificates and the timezone database. Both will be | annoying runtime errors when you don't trust | https://api.example.com or try to return a time to a user | in their preferred time zone. Distroless includes these. | gavinray wrote: | No problem. > Others should note it's still | experimental and your build command may fail with | | You might try _" docker buildx build"_, to use the BuildKit | client -- squash isn't experimental in that one I believe | =) | | https://docs.docker.com/engine/reference/commandline/buildx | _... | selfup wrote: | Had no idea about squash. Using cached layers can really save | time, especially when you already have OS deps/project deps | installed. Thanks! | [deleted] | qbasic_forever wrote: | You don't have to do this anymore, the buildkit frontend for | docker has a new feature that supports multiline heredoc | strings for commands: https://www.docker.com/blog/introduction- | to-heredocs-in-dock... It's a game changer but unfortunately | barely mentioned anywhere. | epberry wrote: | If you click 'Pricing' on the main site an error occurs just FYI. ___________________________________________________________________ (page generated 2022-01-06 23:00 UTC)