[HN Gopher] Nixos-unstable's ISO_minimal.x86_64-Linux is 100% re...
       ___________________________________________________________________
        
       Nixos-unstable's ISO_minimal.x86_64-Linux is 100% reproducible
        
       Author : todsacerdoti
       Score  : 222 points
       Date   : 2021-06-20 20:01 UTC (2 hours ago)
        
 (HTM) web link (discourse.nixos.org)
 (TXT) w3m dump (discourse.nixos.org)
        
       | avalys wrote:
       | Can anyone comment on the significance of this accomplishment,
       | and why it was hard to achieve before?
       | 
       | I (naively, apparently) assumed this had been possible with open-
       | source toolchains for a long time.
        
         | peterkelly wrote:
         | For some reason, many compilers and build scripts have
         | traditionally been written in a way that's not referentially
         | transparent (a pure function from input to output). Unnecessary
         | information like the time of the build, absolute path names of
         | sources and intermediate files, usernames and hostnames often
         | would find their way into build outputs. Compiling the same
         | source on different machines or at different times would yield
         | different results.
         | 
         | Reproducible builds avoid all this and always produce the same
         | outputs given the same inputs. There's no good reason (that I
         | can think of) why this shouldn't have been the case all along,
         | but for a long time I guess it just wasn't seen as a priority.
         | 
         | The benefit of reproducible builds is that it's possible to
         | verify that a distributed binary was definitely compiled from
         | known source files and hasn't been tampered with, because you
         | can recompile the program yourself and check that the result
         | matches the binary distribution.
        
         | xyzzy_plugh wrote:
         | There's a lot of problems with reproducible builds. Filesystem
         | paths, timestamps, deterministic build order to say the least.
         | This is a pretty great achievement and I'm looking forward to a
         | non-minimal stable ISO.
        
           | bombcar wrote:
           | Yeah even the "gcc compiled Jan 23, 2021 at 11:23AM" messages
           | you often see breaks deterministic builds.
        
         | twisrkrr wrote:
         | The code has to be changed so that things like system specific
         | paths, time of compilation, hardware, etc. Don't cause the
         | compiled program to be unique to that computer (meaning
         | compiling the same code on a different computer will give you a
         | file that still works but has a different md5 hash)
         | 
         | By being able to reproduce the file completely, down to
         | identical md5 hashes, you know you have the same file the
         | creator has, and know with certainty that the file has not been
         | tampered with
        
           | secondcoming wrote:
           | Does this mean that the code cannot be built with CPU
           | specific optimisations (march option with gcc)
        
             | Avamander wrote:
             | Pretty much. But hopefully x86_64 feature levels will
             | provide the benefits of native builds to a reasonable
             | extent.
        
             | Denvercoder9 wrote:
             | The software doesn't suddenly become incompatible with CPU-
             | specific optimisations (or many other compiler flags that
             | change its output), but if you do so, you won't be able to
             | reproduce the distribution binaries. Distributions don't
             | enable CPU-specific optimisations anyway, since they want
             | to be usable on more than one CPU model.
        
             | pas wrote:
             | Likely it means that with the same input arguments the end
             | result is bit-by-bit identical. (As I understand the
             | problems were hard to control output elements. So it was
             | not enough to se the same args, set the same time, and use
             | the same path and filesystem, because there were things
             | that happened at different speeds, so they ended up
             | happening at relative different elapsed times, so the
             | outputs contained different timestamps, etc.)
        
             | clhodapp wrote:
             | No, just that you need to avoid naively conflating the
             | machine that is doing the compilation with the one that
             | optimization is being performed for.
             | 
             | Concretely, you would need to keep track of and reproduce
             | e.g. the march flag value as a part of your build input. If
             | you wanted to optimize for multiple architectures, that
             | would mean separate builds or a larger binary with function
             | multi-versioning.
        
             | maartenh wrote:
             | Nixpkgs contains the build / patch instructions for any
             | packages in NixOS.
             | 
             | If you want to compile any piece of software available in
             | Nixpkgs, you can override it's attributes (inputs used to
             | build it).
             | 
             | One can trivially have an almost identical operation system
             | to your colleagues install, but override just one package
             | to enable optimisations for a certain cpu. This would
             | however imply that you'd lose the transparent binary cache
             | that you could otherwise use.
             | 
             | Exactly this method is used to configure the entire
             | operating install! Your OS install is just another package
             | that has some custom inputs set.
        
         | danbst wrote:
         | Just recently, there were large non-reproducible projects:
         | python, gcc. Not sure where is the history of non-r13y.
         | 
         | ---
         | 
         | There is Debian initiative to create bit-to-bit reproducible
         | builds for all their software (well, all critical).
         | 
         | https://reproducible-builds.org/
         | 
         | R13y is akin to "computer proofs" in math -- if you don't have
         | it, that's fine, but if you have it, that's awesome.
         | 
         | There are practical reasons to favor reproducibility too, but
         | those are more for distro maintainers.
         | 
         | The fact that NixOS (not Debian) got this 100% is mostly
         | because
         | 
         | - minimal image has a small subset of packages
         | (https://hydra.nixos.org/build/146009592#tabs-build-deps)
         | 
         | - Nix tooling was created 15 years ago *exactly* for this, Nix
         | is mad to make packages bit-to-bit rebuildable from scratch.
         | 
         | - Nix/Nixpkgs is growing in number of maintainers and got more
         | funds
         | 
         | - Nix has fewer Docker/Snap pragmatics
        
           | Foxboron wrote:
           | >- Nix tooling was created 15 years ago _exactly_ for this,
           | Nix is mad to make packages bit-to-bit rebuildable from
           | scratch.
           | 
           | I don't think this is accurate?
           | 
           | Nix is about reproducing system behaviour, largely by
           | capturing the dependency graph and replaying the build. But
           | this doesn't entail bit-for-bit identical binaries. It's very
           | much sits in the same group such as Docker and similar
           | technologies. This is also how I read the original thesis
           | from Eelco[0].
           | 
           | And well, claims like this always rubs me the wrong way since
           | nixos only really started using the word "reproducible
           | builds" after Debian started their efforts in 2015-2016[1],
           | and started their reproducible builds effort later. It also
           | muddies the language since people are now talking about
           | "reproducible builds" in terms of system behavior as well as
           | bit-for-bit identical builds. The result has been that people
           | talk about "verifiable builds" instead.
           | 
           | [0]: https://edolstra.github.io/pubs/phd-thesis.pdf
           | 
           | [1]: https://github.com/NixOS/nixpkgs/issues/9731
        
           | infogulch wrote:
           | Being bit-for-bit reproduceable means you could do fun things
           | like distribute packages as just sources and a big blob of
           | signatures, and you can still run only signed binaries.
        
         | mananaysiempre wrote:
         | The GCC developers in particular were hostile to such efforts
         | for a long time, IIRC. (This is a non-trivial issue because
         | randomized data structures exist and can be a good idea to use:
         | treaps, universal hashes, etc. I'd guess it also pays for
         | compiler heuristics to be randomized sometimes. Incremental
         | compilation is much harder to achieve when you require bit-for-
         | bit identical output. Even just stripping your compile paths
         | from debug info is not entirely straightforward.)
        
           | pas wrote:
           | How/why was the randomness part not "solveable" via using
           | fixed seeds?
        
           | moonchild wrote:
           | > Incremental compilation is much harder to achieve when you
           | require bit-for-bit identical output
           | 
           | Presumably, incremental compilation is only for development.
           | For release, you would do a clean build, which would be
           | reproducible.
           | 
           | > Even just stripping your compile paths from debug info is
           | not entirely straightforward
           | 
           | Just use the same paths.
        
         | [deleted]
        
       | Arnavion wrote:
       | Is there a list of the 1486 packages in the minimal ISO?
        
         | danbst wrote:
         | https://hydra.nixos.org/build/146009592#tabs-build-deps
        
       | jnxx wrote:
       | A good sign that the friendly competition by Guix has a positive
       | influence :)
       | 
       | https://guix.gnu.org/manual/en/html_node/Bootstrapping.html
       | 
       | https://guix.gnu.org/en/blog/2020/guix-further-reduces-boots...
        
         | delroth wrote:
         | This smaller bootstrap seed thing is a different problem from
         | reproducible builds. nixpkgs does still have a pretty big
         | initial TCB (aka. stage0) compared to Guix. But as far as I can
         | tell NixOS has the upper hand in terms of how much can be built
         | reproducibly (aka. the output hash matches across separate
         | builds).
        
           | jnxx wrote:
           | Bootstrapping from a very small binary core (I think 512
           | bytes) with an initial C compiler written in Scheme also has
           | the advantage that the system can easily be ported to
           | different hardware. Which is one major strength of the GNU
           | projects and tools.
        
             | delroth wrote:
             | Not necessarily. Usually these very small cores end up
             | being more architecture specific binaries than a stage0
             | consisting of gcc + some other core packages. A good
             | illustration of this is that Guix's work on bootstrap seed
             | reduction has been so far mostly applied to i686/amd64 and
             | not even other architectures they support (at least, not
             | fully).
        
       | rejectedandsad wrote:
       | I really want to adopt Nix and NixOS for my systems but the cost
       | of wrapping packages is just a little too high for me right now
       | (or perhaps I'm out of date and a new cool tool that does it
       | automatically is out). IMHO, a dependency graph-based build
       | system that builds a hermetically sealed transitive closure of an
       | app's dependencies that can be plopped into a rootfs via Nix [0]
       | is far superior security wise to the traditional practice of
       | writing docker files.
       | 
       | [0] https://yann.hodique.info/blog/using-nix-to-build-docker-
       | ima...
        
       | koolba wrote:
       | There's something very poetic about "unstable" being
       | "reproducible". It's like controlled chaos.
        
       | fouronnes3 wrote:
       | Are there synergies with the Debian reproducible build project
       | that this can benefit from?
        
         | Denvercoder9 wrote:
         | In general, Debian aims to upstream the changes they make to
         | software. That allows all other distributions, including Nix,
         | to profit from their work making software reproducible.
        
       | amelius wrote:
       | Hopefully this will one day also work with NVidia's software
       | packages.
        
         | jeroenhd wrote:
         | The trick with nvidia on Linux is to not expect that they will
         | ever work on anything. If you want to be sure that stuff works,
         | either don't buy Nvidia or use Windows.
        
           | amelius wrote:
           | What would you recommend instead of NVidia's Jetson embedded
           | platform?
        
             | jeroenhd wrote:
             | I'm not familiar with the market the Jetson is in and what
             | purposes it serves. From a quick Google, it seems to build
             | boards for machine learning? If that's true, I'm pretty
             | sure Google and Intel have products in that space, and I'm
             | sure there's other brands I don't know of.
             | 
             | If Nvidia has its own distribution, it might well work for
             | as long as it's willing to maintain the software because
             | then they can tune their open source stuff to make it work
             | with their proprietary drivers, the same way Apple is
             | hiding their tensorflow code. I still would be hesitant to
             | rely on Nvidia in that case given their history.
        
       | solarkraft wrote:
       | That is a pretty big deal.
       | 
       | This means everyone building NixOS will get the _exact_ same
       | binary, meaning you can now trust _any_ source for it because you
       | can verify the hash.
       | 
       | It's a huge win compared to the current default distribution
       | model of "just trust these 30 american entities that the software
       | does what they say it does".
       | 
       | Big congratulations to the team.
        
       | groodt wrote:
       | This is a big deal. Congratulations to all involved.
       | 
       | In Software, complexity naturally increases over time and
       | dependencies and interactions between components become
       | impossible to reason about. Eventually this complexity causes the
       | Software to collapse under its own weight.
       | 
       | Truly reproducible builds (such as NixOS and Nixpkgs) provides us
       | with islands of "determinism" which can be taken as true
       | invariants. This enables us to build more Systems and Software on
       | top of deterministic foundations that can be reproduced by
       | others.
       | 
       | This reproducibility also enables powerful things like
       | decentralized / distributed trust. Different third-parties can
       | build the same software and compare the results. If they differ,
       | it could indicate one of the sources has been compromised. See
       | Trustix https://github.com/tweag/trustix
        
       | dcposch wrote:
       | This really deserves more love.
       | 
       | Who remembers Ken Thompson's "Reflections on Trusting Trust"?
       | 
       | The norm today is auto-updating, pre-built software.
       | 
       | This places a ton of trust in the publisher. Even for open-
       | source, well-vetted software, we all collectively cross our
       | fingers and hope that whoever is building these binaries and
       | running the servers that disseminate them, is honest and good at
       | security.
       | 
       | So far this has mostly worked out due to altruism (for open
       | source maintainers) and self interest (companies do not want to
       | attack their own users). But the failure modes are very serious.
       | 
       | I predict that everyone's imagination on this topic will expand
       | once there's a big enough incident in the news. Say some package
       | manager gets compromised, nobody finds out, and 6mo later every
       | computer on earth running `postgres:latest` from docker hub gets
       | ransomwared.
       | 
       | There are only two ways around this:
       | 
       | - Build from source. This will always be a deeply niche thing to
       | do. It's slow, inconvenient, and inaccessible except to nerds.
       | 
       | - Reproducible builds.
       | 
       | Reproducible builds are way more important than is currently
       | widely appreciated.
       | 
       | I'm grateful to the nixos team for being beating a trail thru the
       | jungle here. Retrofitting reproducibility onto a big software
       | project that grew without it, is hard work.
        
         | zamadatix wrote:
         | Unless you are going to be the equivalent of a full time
         | maintainer doing code review for every piece of software you
         | use you need to trust other software maintainers reproducible
         | builds or not. Considering this is Linux and not even Linus can
         | deeply review every change in just the kernel anymore that
         | philosophy can't apply to meaningfully large software like
         | Nixos.
        
           | Taek wrote:
           | You can't solve this problem without having a full history of
           | code to inspect (unless you are decompiling), reproducibility
           | is the first step and bootstrapability is the second step.
           | Then we refine the toolchains and review processes to ensure
           | high impact code is properly scrutinized.
           | 
           | What we can't do is throw our hands up and say anyone who
           | compromises the toolchain deep enough is just allowed to win.
           | It will happen at some point if we don't put the right
           | barriers in place.
           | 
           | It's the first step of a long journey, but it is a step we
           | should be taking.
        
           | jnxx wrote:
           | That's too black-and-white. Being able to reproduce stuff
           | makes some kind of attacks entirely uninteresting because
           | malicious changes can be traced back. Which is what many
           | types of attackers do not want. Debian, or the Linux kernel,
           | for example, are not fool-proof, but both are in practice
           | quite safe to work with.
        
             | zamadatix wrote:
             | Who are you going to trace it back to if not the maintainer
             | anyways? If the delivery method then why is the delivery of
             | the source from the maintainer inherently any safer?
        
               | jnxx wrote:
               | No, it is not always the maintainer. Imagine you download
               | a binary software package via HTTPS. In theory, the
               | integrity of the download is protected by the server
               | certificate. However, it is possible that certificates
               | get hacked, get stolen, or that nation states force CAs
               | to give out back doors. In that case, your download could
               | have been changed on the fly with arbitrary alterations.
               | Reproducible builds make it possible to detect such
               | changes.
        
               | zamadatix wrote:
               | Same as when you download the source instead of the
               | binary and see it reproducibly builds the backdoored
               | binary. And at this point we're back to "Build from
               | source. This will always be a deeply niche thing to do.
               | It's slow, inconvenient, and inaccessible except to
               | nerds." anyways.
               | 
               | It's not that reproducible builds provide 0 value it's
               | that they don't truly solve the trust problem as
               | initially stated. They also have non-security value to
               | boot which is often understated compared to the security
               | value IMO.
        
               | eptcyka wrote:
               | Even if the original attack happened upstream, if the
               | upstreamed piece of software was pinned via git, then
               | it'd be trivial to bisect the upstream project to find
               | the culprit.
        
               | dragonsky67 wrote:
               | This is great if you are looking at attributing blame.
               | Not so great if you are trying to prevent all the worlds
               | computers getting owned....
               | 
               | I'd imagine that if I were looking at causing world wide
               | chaos, I'd love nothing better than getting into the tool
               | chain in a way that I could later on utilise on a wide
               | spread basis.
               | 
               | At that point I would have achieved my aims and if that
               | means I've burnt a few people along the way, so be it,
               | I'm a bad guy, the damage has been done, the objective
               | met.
        
           | radicalcentrist wrote:
           | Reproducibility is what allows you to rely on other
           | maintainers' reviews. Without reproducibility, you can't be
           | certain that what you're running has been audited at all.
           | 
           | It's true that no single person can audit their entire
           | dependency tree. But many eyes make all bugs shallow.
        
         | jnxx wrote:
         | This is great! The one fly in the ointment, pardon, is that Nix
         | is a bit lax about trusting proprietary and binary-only stuff.
         | It would be great if there were a FLOSS-only core system for
         | NixOS which would be fully transparent.
        
           | rejectedandsad wrote:
           | > It would be great if there were a FLOSS-only core system
           | for NixOS
           | 
           | Might be wrong but isn't this part of the premise for
           | Guix/GuixSD?
        
           | quarantine wrote:
           | Nix/Nixpkgs blocks unfree packages by default, so I presume
           | it would be relatively easy to disable packages with the
           | `unFree` attribute.
        
             | jnxx wrote:
             | I totally believe it is possible, it is perhaps more of a
             | cultural thing.
        
               | eptcyka wrote:
               | It's the pragmatic thing. I wouldn't use nixOS if I
               | wasn't able to use it on a 16 core modern desktop. I
               | don't think there's a performant and 100% FLOSS
               | compatible computer that wouldn't make me want to gouge
               | my eyes out with a rusty spoon when building stuff for
               | ARM.
        
               | zamadatix wrote:
               | Talos has 44 core/176 thread server options which can
               | take 2 TBs of DDR4 that are FSF certified. The board
               | firmware is also open and has reproducible builds.
        
               | eptcyka wrote:
               | Thanks, I was legitimately unaware of this option. That
               | does smash my argument, but I'm not likely to be using a
               | system like that anytime soon due to cost concerns
               | mostly.
        
               | tadfisher wrote:
               | That is way more expensive than a 16-core desktop,
               | though. Workstations are a class above consumer-grade
               | desktops and that's reflected in the price.
        
         | 0xbadcafebee wrote:
         | Supply chain attacks are definitely important to deal with, but
         | defense-in-depth saves us in the end. Even if a postgres
         | container is backdoored, if the admins put postgres by itself
         | in a network with no ingress or egress except the webserver
         | querying it, an attack on the database itself would be very
         | difficult. If on the other hand, the database is run on
         | untrusted networks, and sensitive data kept on it... yeah,
         | they're boned.
        
         | radicalcentrist wrote:
         | Reproducibility is necessary, but unfortunately not sufficient,
         | to stop a "Trusting Trust" attack. Nixpkgs still relies on a
         | bootstrap tarball containing e.g. gcc and binutils, so
         | theoretically such an attack could trace its lineage back to
         | the original bootstrap tarball, if it was built with a
         | compromised toolchain.
        
           | mjg59 wrote:
           | Diverse double compilation should allow a demonstration that
           | the toolchain is trustworthy.
        
         | hsbauauvhabzb wrote:
         | I don't have the resources to audit every component of my
         | system. I favour enterprise distros who audit code which ends
         | up in their repos and avoid pip, npm, etc. but there are some
         | glaring trade offs on both productivity and scalability.
         | 
         | The problem is unmaintainability, I can't imagine it'd be
         | easier for medium sized teams where security isn't a priority,
         | either.
        
         | initplus wrote:
         | Building from source doesn't have to be inaccessible, if the
         | build tooling around it is strong. Modern compiled languages
         | like Go (or modern toolchains on legacy languages like vcpkg)
         | have a convention of building everything possible from source.
         | 
         | So at least for software libraries building from source is
         | definitely viable. Fro end user applications it's another story
         | though, doubt we will ever be at a point where building your
         | own browser from source makes sense...
        
           | garmaine wrote:
           | Binary reproducible builds are still pretty inaccessible
           | though.
        
       | georgyo wrote:
       | Mandatory link to the Debian single purpose site:
       | https://isdebianreproducibleyet.com/
       | 
       | However that is for everything in Debian, not just the iso. It is
       | truly remarkable to see all the Linux distributions move the
       | needle forward.
        
         | Foxboron wrote:
         | And Arch Linux :)
         | 
         | https://reproducible.archlinux.org/
        
       ___________________________________________________________________
       (page generated 2021-06-20 23:00 UTC)