[HN Gopher] Minify your container
       ___________________________________________________________________
        
       Minify your container
        
       Author : JordanTenn
       Score  : 113 points
       Date   : 2022-08-03 17:42 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | CameronNemo wrote:
       | Is this an official Docker project? How is this not trademark
       | infringement?
        
         | bigpod wrote:
         | docker-slim started at docker hackathon in 2015 and company
         | behind Slim.AI has an extension for docker desktop in its
         | marketplace.
        
         | gtirloni wrote:
         | It doesn't seem to be associated with Docker Inc.
         | 
         | Docker's trademark guidelines say that products, services and
         | technology that are not their own shouldn't use the Docker name
         | so it seems it's a matter of time before they get a nice letter
         | from some lawyer.
        
           | JordanTenn wrote:
           | The product is a loose partner of Docker. They are an
           | official Docker Desktop Extension. DockerSlim is an open
           | source tool that led to the creation of Slim.Ai (and now
           | Slim.AI and therefore Docker Slim) are loose partners.
           | 
           | https://hub.docker.com/extensions/slimdotai/dd-ext
        
             | gtirloni wrote:
             | What's a "loose" partner?
             | 
             | Does Docker Inc promise to overlook trademark infringements
             | if you're a "loose" partner?
        
           | CameronNemo wrote:
           | I'm not cheering for that, but I also think impersonation is
           | dishonest.
        
       | alberth wrote:
       | Off topic: wish there was a slim variant of FreeBSD.
       | 
       | Seems like all past attempts have stalled and/or are dependent
       | upon FreeBSD creating a standard for what's in a minimal
       | userspace .
        
         | cperciva wrote:
         | Once pkgbase lands we'll probably see more progress there.
        
           | alberth wrote:
           | Given that it's been in the works for what seems like a
           | decade, do you think pkgbase will be finalized anytime soon?
           | 
           | Just curious (please don't take my comments as being
           | negative)
           | 
           | https://wiki.freebsd.org/PkgBase
        
             | cperciva wrote:
             | You're quite right, and to be honest at this point I've
             | given up trying to keep track of where it's at. It's
             | definitely something I'd like to see completed, but I've
             | been too busy with FreeBSD/EC2 and speeding up the boot
             | process to spend time looking at pkgbase too.
        
         | CameronNemo wrote:
         | Even further off topic but perhaps relevant: Chimera Linux,
         | which consists of the FreeBSD user land ported to Linux. I
         | wonder if q66 has OCI images published...
         | 
         | https://chimera-linux.org/
        
       | sockmeistr wrote:
       | Isn't this incredibly dangerous? I know everyone likes to pretend
       | they have perfect code coverage, but just ripping stuff out that
       | wasn't called during 'probing' feels like the perfect way to make
       | rare code paths even more dangerous.
        
         | OMGWTF wrote:
         | kkrieger (https://www.pouet.net/prod.php?which=12036) is an
         | impressive 3D shooter in only 96 KiloBytes. As one of their
         | optimization techniques they recorded all code paths and
         | discarded unused parts. At least in the first version this was
         | why you could only use CursorDown in the menus and CursorUp did
         | not work.
        
         | lapser wrote:
         | If you have a good pipeline to prod, should be okay. You should
         | hopefully have plenty of automated tests to ensure it doesn't
         | get to prod if there are errors.
        
           | axelthegerman wrote:
           | _should be okay_ is definitely not enough for me to ship
           | things to production.
           | 
           | And while I do have automated tests, they might sometimes
           | stub system calls as I'm mostly testing my code to keep
           | things stable and fast.
           | 
           | I'd rather explicitly declare my dependencies and use the
           | same container for development, test and production to feel
           | much more confident that it includes actually everything
           | that's needed.
        
             | bigpod wrote:
             | with good pipeline and knowledge about your app you should
             | be able to ensure it works without much of a problem
        
           | Volundr wrote:
           | I think a "good pipeline to prod" with sufficient automated
           | tests to ensure nothing is broken is the exception not the
           | rule. Even in places that think/say they have a "good
           | pipeline to prod". It's something that takes a shocking
           | amount of engineering effort to do well, and tons of
           | discipline to maintain.
        
             | EddySchauHai wrote:
             | Hire a test engineer to manage all of that - it's a full
             | time job but an important one!
        
         | killingtime74 wrote:
         | If you have good integration tests is it still a problem?
        
           | fwip wrote:
           | It depends on how comprehensive they are, and how important
           | it is that your container operates correctly.
           | 
           | For example, even the best integration tests (for small/mid-
           | size companies) don't always include tests that exercise
           | weird paths around dates/times - leap years, leap seconds,
           | daylight savings time, etc. We often trust that our datetime
           | library or code will handle these for us, but what if the
           | configuration is stored in a file that isn't accessed during
           | your integration tests?
           | 
           | Best case scenario is you hit the error-path soon in
           | production and your code either crashes or does something
           | correct-enough with a fallback path, but a worse scenario is
           | you start losing critical information and don't realize
           | it/fix it until it's gone on for a while.
        
           | mplewis wrote:
           | In a non-trivial app, you can never guarantee that your "good
           | integration tests" cover every edge case. If you could, we
           | wouldn't have outages in production.
        
         | mplewis wrote:
         | docker-slim is incredibly dangerous and should never be used
         | for a production app.
        
           | nicce wrote:
           | I guess the question is in which way dangerous? It might lead
           | for crash for sure, but is that crash controlled? If it is,
           | then it is just a crash. Stability vs. minimal attack surface
           | 
           | But I agree, this is just bandaid for lazy bois. Better use
           | Bazel etc. for distroless builds
        
             | mplewis wrote:
             | This is dangerous in that it strips assets, resources, and
             | files from your app without understanding how they are
             | used.
             | 
             | If you forget a critical code path when you build using
             | Docker-Slim, and a resource file is not used, that resource
             | will be stripped. The feature which depends on it will be
             | broken in production.
        
           | bigpod wrote:
           | i would disagree i use em in production apps, i configured it
           | and it works if you do it blindly it happens that sometimes
           | things break but if you configure it, it will work
        
             | mplewis wrote:
             | There is no guarantee that a blind code shaker will leave
             | in everything important while stripping out everything that
             | isn't. How could it possibly know?
             | 
             | If Docker-Slim is working for you in production apps, you
             | are either getting lucky or your app is trivial enough to
             | lack unseen code paths.
        
         | bigpod wrote:
         | thats why you should test and if there are stuff htat needs to
         | be included but arent and you know wont work fail the test and
         | add --include-path to your docker-slim command to ensure
         | something is added
        
         | CubsFan1060 wrote:
         | I guess it only seems dangerous to me if you blindly follow
         | it's recommendations. Feels like it could generate a list of
         | "things you may want to consider", that you'd then be able to
         | use to take a look at your container.
        
           | bigpod wrote:
           | it sometimes doesnt work sure but thats why we have tests and
           | test i minify all my containers nowdays and in most cases it
           | works in those that it doesnt i figured out the pattern when
           | and why for my apps and use include flags to ensure things
           | remain inside
        
         | jzelinskie wrote:
         | Even prior to docker-slim there were tools like Quay.io that
         | "did the right thing" by squashing images to just the contents
         | of the final image layer.
         | 
         | The best thing you can do is use minimal images and multi-stage
         | builds. This should help you immensely to reduce your attack
         | vector and do standard software bill of materials, too.
        
           | fwip wrote:
           | The quay.io squashing optimization is a lot safer though,
           | right, as it doesn't remove anything that should be visible
           | to the container?
           | 
           | I agree that the multi-stage builds are the best option, but
           | it can be hard to know if you've included everything that is
           | required or if you've accidentally excluded something that is
           | important in rare cases.
        
       | rockemsockem wrote:
       | I've had great success with reducing image size by running
       | docker-show-context (https://github.com/pwaller/docker-show-
       | context) and eliminating big and unnecessary files that it
       | reports. This seems to go just a bit further than that with what
       | seems like more complexity. I got timeouts when following their
       | instructions to run it on two different containers, one of which
       | is just a very simple web server.
        
         | CameronNemo wrote:
         | This is interesting for optimizing build time. But I think it
         | works a bit different from docker-slim, which is focused on the
         | final resulting image size.
         | 
         | Dive is a good tool for the latter IME.
         | https://github.com/wagoodman/dive
         | 
         | It doesn't do the work for you, but it does single out the big
         | layers in your image.
        
       | siddontang wrote:
       | We build our binary first with one image as the builder image,
       | then use `copy` to copy the binary from the builder to the final
       | executable image like alphine.
       | 
       | an example Dockerfile likes:                 FROM
       | golang:1.18.1-alpine as builder       # RUN apk add, wget, etc,
       | and build the binary            FROM alpine       # or FROM
       | scratch       COPY --from=builder builder/binary /binary
       | ENTRYPOINT ["/binary"]
        
         | U1F984 wrote:
         | For Go you can use FROM scratch and save a couple more
         | megabytes.
        
           | lrvick wrote:
           | This works on any language. I only use scratch in prod. Even
           | for nodejs or python... compile a static interpreter binary
           | and truck on.
           | 
           | Dev tools like bash, ls, grep, etc, have no place in
           | production and only increase attack surface.
        
         | fwip wrote:
         | Out of curiosity, what does alpine provide for your container
         | that you need? (I assume otherwise you'd be using `FROM
         | scratch`.)
        
           | maccard wrote:
           | I use wget from it for health checks [0]
           | 
           | [0] https://stackoverflow.com/questions/47722898/how-to-do-a-
           | doc...
        
           | siddontang wrote:
           | yes, `FROM scratch` may be better most of the time. I just
           | use `alphine` for many years, and have not tried `scratch`
           | before.
        
       | jollyllama wrote:
       | Wow, I remember when not including debug symbols was a slim
       | image.
        
       | jewayne wrote:
       | I have a minor in math, and I don't know what "shrinking by 30X"
       | means. To me, decreases always start from 100%. So I think we are
       | talking about a ~97% decrease in size?
        
         | AtNightWeCode wrote:
         | For compression or similar ratio is used.
        
         | JordanTenn wrote:
         | Thanks for this note. I'm part of the DockerSlim and Slim.AI
         | ecosystem. Will take this feedback and rework the way we phrase
         | things. Thank you!
        
         | reilly3000 wrote:
         | it means... a lot!
        
         | rr888 wrote:
         | Thanks I hate this, but seems to be everywhere now. "This
         | products is now 3 times cheaper!", WTF. They still haven't got
         | to percentages yet, like 200% off!!.
        
           | bigpod wrote:
           | its more about being more like people say as smaller by 200%
           | isnt as understandable as 30 times smaller
        
             | SomeBoolshit wrote:
             | Neither of those makes any sense.
        
         | bigpod wrote:
         | this is not valid but its what people say 30 times smaller.
        
           | jewayne wrote:
           | I know and it drives me crazy. "Bigger" and "smaller" express
           | _differences_ , not fractions or multiples.
        
         | fb03 wrote:
         | I don't have a minor in math, and I instinctively thought
         | "shrinking by 30x" means 1/30 of size.
        
           | jonas21 wrote:
           | Yeah. "growing" = numerator. "shrinking" = denominator.
           | 
           | It's nice because they're inverses - if you shrink by 30x,
           | then grow by 30x, you're back where you started, whereas a
           | 97% decrease in size followed by a 97% increase in size
           | leaves you at ~6% of the original size.
        
             | jewayne wrote:
             | I think if you want to say "1/30th of the size", you should
             | say that. Growth is usually measured as a difference. For
             | example, a 200% increase means the value has tripled.
        
             | bigpod wrote:
             | essentialy yes
        
             | Karellen wrote:
             | But "30x" is just another way of saying "3000%". Or,
             | "3000%" is just another way of saying "30x". "Shrinking by
             | 30x" means the same thing as "shrinking by 3000%".
        
         | OJFord wrote:
         | Do you know what 'two times smaller' means?
        
           | jewayne wrote:
           | No I don't. That's my point. To me, the number that's two
           | times smaller than x is -x. (x-2x)
        
             | OJFord wrote:
             | I'm no mathematician and it wouldn't often be my choice of
             | phrasing, but it seems unambiguous and clear to me; your
             | definition is much stranger/less intuitive to me.
             | 
             | We're clear on 'x is two times larger than y', right?
             | x = 2*y
             | 
             | An equivalent statement is 'y is two times smaller than x',
             | but it conveys a construction more like:
             | y = x/2
             | 
             | Which, since we're speaking English sentences, might change
             | the emphasis/implication.
        
         | Fnoord wrote:
         | > I have a minor in math, and I don't know what "shrinking by
         | 30X" means.
         | 
         | X is input
         | 
         | Y is output
         | 
         | X / 30 = Y
         | 
         | All you need to know, no minor required, as its taught on
         | elementary school (age 11/12 or so?).
        
           | Karellen wrote:
           | So, shrinking by 2x means dividing by 2?
           | 
           | But... doesn't shrinking by 1/2 also mean dividing by 2?
           | 
           | Therefore, 1/2 == 2x ??
           | 
           | I feel like my elementary school math is letting me down
           | somewhere.
        
         | inopinatus wrote:
         | If you try pronouncing that aloud as "...by a factor of 30"
         | it'll seem less ungrammatical.
        
       | hkgjjgjfjfjfjf wrote:
        
       | sequoia wrote:
       | Here's my less magical, more manual post on the subject of
       | reducing docker image sizes:
       | https://sequoia.makes.software/reducing-docker-image-size-pa...
        
       | aejnsn wrote:
       | > "Find SSL Certs"
       | 
       | So we're promoting secrets being saved within a container image
       | artifact? Ummmm?
        
         | frenchman99 wrote:
         | The cert is usually the public key. The private key is usually
         | named key. So it doesn't promote secrets being saved within a
         | container as far as I can see.
        
           | Karellen wrote:
           | Wait, I thought the cert was the CA's signature of the public
           | key.
        
       | kodah wrote:
       | This kind of looks like a tool that does the reverse of what
       | scratch does. Instead of _only_ including the binary and any
       | dynamically linked dependencies, it tries to figure out a minimum
       | set of dependencies based on access.
       | 
       | In practice, I'm curious how error prone the result is.
        
         | bigpod wrote:
         | it is error prone somewhat but it has flags to allow you to
         | fine tune what gets added back in. great thing is you can work
         | with any base image and language including those that wont work
         | with scratch
        
       | saidinesh5 wrote:
       | Why would someone want to use this instead of say base images
       | made specifically for containers? like alpine for eg.?
       | 
       | And for languages like golang (in their examples) - why/how would
       | anyone get such huge container images in the first place? Doesn't
       | go give a neat statically linked binary?
        
         | xtracto wrote:
         | Right Go binaries would be able to go with "scratch" imagine
         | which only contain the kernel.
        
           | piperswe wrote:
           | They don't even contain a kernel - Docker containers use the
           | host kernel. Container runtimes based on VMs like
           | Firecracker's firecracker-containerd typically supply the
           | kernel themselves.
        
             | FridgeSeal wrote:
             | Do you happen to know how the scratch containers differ
             | from googles distroless containers?
             | 
             | I've been using them (distroless) with great success for my
             | Rust applications.
        
         | bigpod wrote:
         | you can use whatever base image you want lets say ubuntu:latest
         | (i dont like alpine) and normaly base images tend to include a
         | lot of stuff that doesnt have any place in container think why
         | do i need a tool for ext4 managment inside contianer makes no
         | sense ok for production throw it out thats what docker-slim
         | does and gets rid of vulnarabilities in programs that are not
         | used by your program by simply getting rid of them
        
           | saidinesh5 wrote:
           | Ubuntu-minimal ( https://canonical.com/blog/minimal-ubuntu-
           | released ) doesn't have any of those binaries though.
           | 
           | And that's also why you have multistage docker builds. To
           | make sure your production container doesn't have all the
           | unneeded files from your development container.
           | https://docs.docker.com/develop/develop-images/multistage-
           | bu... .
        
             | bigpod wrote:
             | this removes far more then multistage docker build ever
             | would, do you need bash dash or passwd or many other
             | binaries and files in image that are in by default no you
             | dont only way to do anything simular to what docker-slim
             | does is with scratch image which doesnt work if you dont
             | copy everything you need in
        
               | saidinesh5 wrote:
               | The problem is not about removing though. The problem is
               | what/who guarantees that nothing broke after all these
               | files are removed? Especially in obscure code paths in
               | nested dependencies?
               | 
               | With something like alpine linux/ubuntu minimal, you
               | trust the package maintainers to make sure that if you
               | use python in your docker image it would work like it
               | worked for them. Out here, it just says "Yes (it is
               | safe)! Either way, you should test your Docker images.".
               | 
               | As a bad example, if a library used by your application
               | uses a different "theme" requiring different files at
               | night and different files during the day, you might still
               | say "it worked during my tests" but things definitely
               | broke and the only thing you can blame is this
               | overzealous tool.
               | 
               | That bad example was from back when i was trying to make
               | AppImages for an application we used. At first all we did
               | was recursively collect all the libraries reported by
               | ldd. Then it turned out some libraries were only being
               | dlopen'ed by other libraries under specific circumstances
               | and we missed them. So we manually added those libraries.
               | Then it turned out that we missed the config files and
               | other resources used by those libraries. Eventually we
               | shipped all the files belonging to all the distro
               | packages used by the libraries we used and left it at
               | that.
        
               | bigpod wrote:
               | your tests and your application knowledge should
               | 
               | in some cases i essentialy ensure my whole app remains
               | using --include-path flags so that i get a removal of you
               | know things that i absolutly dont need.
        
               | rcoveson wrote:
               | Still seems kind of silly. If you base everything on
               | ubuntu minimal, you'll only have the one copy of that
               | base image, which is a fraction of the size of the
               | `docker` and `dockerd` binaries added together. No server
               | running docker will have a problem keeping one or two
               | versions of ubuntu minimal on it.
               | 
               | But if you go around "minifying" all your applications
               | independently, you won't have that shared base layer. One
               | application needs `sh` and another doesn't? Now you get
               | two entire base layers, one with it and one without.
               | Sure, each image's total size will be less, but the size
               | of all your different images added up will be greater
               | because you killed the sharing.
               | 
               | If for some reason the 29 megs of ubuntu minimal (or even
               | fewer for alpine) are a problem (which they aren't on
               | your server that already has over a hundred megs of
               | `docker` binaries), then the right solution is to better
               | control layer _sharing_. Ensure that you _don 't_ have
               | different base layers between your applications. And then
               | --strictly for kicks and giggles--you could minify that
               | base layer to the minimal set of what _all_ your images
               | require. To save a 51K `passwd` binary (woohoo!).
        
               | bigpod wrote:
               | one question is is possible in any kind of way that that
               | passwd or any other binary that stays that you dont need
               | has a security vulnarability that could if someone got
               | into the container in one way or another(most likely your
               | app) cause trouble on the host.
               | 
               | hint yes it is and that could be a problem a giuant one
        
         | ufmace wrote:
         | Good question. Alpine is already small enough that it seems a
         | little odd to go to elaborate measures to reduce image size
         | further. Seems better to me to start with a minimalist image
         | and only add what you need to make your app work than to start
         | with a huge image with everything, install your app, and rely
         | on something like this to find only the things you don't need
         | to remove and not make any mistakes.
        
       | mplewis wrote:
       | docker-slim is the _wrong_ solution for container optimization.
       | You can 't just have a program rm -rf files that it didn't think
       | were in use. What if you missed a code path?
       | 
       | https://twitter.com/ariadneconill/status/1506482425458798593
        
         | fwip wrote:
         | I know your question was rhetorical, but probably depends on
         | how mission-critical the code path is (and the consequences of
         | hitting those missing files).
         | 
         | If you're running a website and the removed dependency is
         | related to a feature that is uncommon enough that isn't covered
         | by your automated tests, maybe .1% of your users experience a
         | broken page.
         | 
         | If you're running critical infrastructure and the removed
         | dependency has to do with leap-second handling, maybe eight
         | months from now, everything crashes and you lose millions of
         | dollars.
        
         | kylequest wrote:
         | Please explain how it's wrong without simply saying you prefer
         | other solutions, which was the case with Ariadne :) Dead code
         | elimination is a common construct in software engineering.
         | There's nothing magical about apps and their dependencies. They
         | are relatively straight forward to identify and for the web
         | apps with static assets there are helpers to help you ensure
         | you got everything.
        
         | bigpod wrote:
         | use --include-path to ensure its in
        
         | hnarn wrote:
         | Why did you link a tweet to someone saying they "feel like
         | [they] should do a blog about docker-slim at some point"? What
         | does this contribute?
        
           | mplewis wrote:
           | This is a Twitter thread that continues below at the
           | following tweet:
           | 
           | https://twitter.com/ariadneconill/status/1506483943352250371
           | 
           | Sorry - I forgot that the Twitter UI doesn't always lend
           | itself to proper threading.
        
       | jmercan wrote:
       | Personally I feel like shrinking images by guessing unused parts
       | is an a good way to have an image explode in your face randomly
       | in the future. (Probes and heuristics missing critical but rarely
       | used parts and more) Also wouldn't it hurt reproducibility?
       | Temporary runtime monitoring doesn't exactly sound like a
       | deterministic metric.
       | 
       | A containerizable project probably has its requirements known and
       | well-specified? I think building on top of a base with a smaller
       | unused surface is a better idea than using analysis that might
       | backfire. These days I am using apko + melange for my personal
       | images and they are super neat.
        
         | davidtpate wrote:
         | Some form of tree-shaking type of thing would probably be quite
         | handy for images, but yeah I'm a bit wary here as well. First
         | thought would be what happens when it hits Out-of-Memory, DNS
         | timeout, or loses network connectivity or another edge case
         | that totally happens in Production.
         | 
         | Removing those code paths would not be a good thing, but I
         | guess if you build your apps right you could just have your
         | container orchestration system recover by replacing the Pod.
        
           | game-of-throws wrote:
           | I wouldn't want anything killing pods every time there's a
           | network timeout. That sounds like a quick way to turn a tiny
           | problem into a huge problem.
        
       | HowardStark wrote:
       | Is there an equivalent tool for a normal running Linux system?
        
         | OJFord wrote:
         | Not the same, but potentially similar in intent, I use aconfmgr
         | to track what's installed, changes to configuration files, and
         | any potential changes/whole files left behind by some quick
         | test or since uninstalled software.
         | 
         | (Also, even primarily to me, but less relevantly, it's great
         | for gitting configuration & its reasons, and syncing across
         | machines.)
        
       | tbabej wrote:
       | For workloads where the image size was critical, I have achieved
       | a similar result with using strace to collect the required files
       | and then limiting the image to only those files in the build
       | process.
       | 
       | It's a neat approach, but ultimately brings non-negligible amount
       | of uncertainty as you can never be 100% sure your test set of
       | inputs did not miss a particular edge case which will require to
       | have a file present in the container that no other input does.
        
         | bigpod wrote:
         | yes that tends to be the problem with docker-slim as well that
         | is why it includes flags like --include-path with which you can
         | easily achive such fixes
         | 
         | personaly i highly recommend as it works in most cases and gets
         | rid of those vulnerablities that come with things like bash or
         | passwd that you dont need in prod apps
        
       | viraptor wrote:
       | Like others here, I wasn't very happy about / trusting automatic
       | coverage, so I made this instead
       | https://github.com/viraptor/cruftspy
       | 
       | Instead of going extreme with coverage analysis, it shows places
       | that can be manually cleaned during the build process. Maybe
       | someone will find it useful. Smaller space gains, but gives more
       | confidence.
        
         | bigpod wrote:
         | i have docker-slim in CICD completly automated seems to not
         | have a problem as i ahve configure it per pipeline, maybe check
         | out examples for docker-slim
        
       ___________________________________________________________________
       (page generated 2022-08-03 23:00 UTC)