[HN Gopher] Nixos-unstable's ISO_minimal.x86_64-Linux is 100% re... ___________________________________________________________________ Nixos-unstable's ISO_minimal.x86_64-Linux is 100% reproducible Author : todsacerdoti Score : 222 points Date : 2021-06-20 20:01 UTC (2 hours ago) (HTM) web link (discourse.nixos.org) (TXT) w3m dump (discourse.nixos.org) | avalys wrote: | Can anyone comment on the significance of this accomplishment, | and why it was hard to achieve before? | | I (naively, apparently) assumed this had been possible with open- | source toolchains for a long time. | peterkelly wrote: | For some reason, many compilers and build scripts have | traditionally been written in a way that's not referentially | transparent (a pure function from input to output). Unnecessary | information like the time of the build, absolute path names of | sources and intermediate files, usernames and hostnames often | would find their way into build outputs. Compiling the same | source on different machines or at different times would yield | different results. | | Reproducible builds avoid all this and always produce the same | outputs given the same inputs. There's no good reason (that I | can think of) why this shouldn't have been the case all along, | but for a long time I guess it just wasn't seen as a priority. | | The benefit of reproducible builds is that it's possible to | verify that a distributed binary was definitely compiled from | known source files and hasn't been tampered with, because you | can recompile the program yourself and check that the result | matches the binary distribution. | xyzzy_plugh wrote: | There's a lot of problems with reproducible builds. Filesystem | paths, timestamps, deterministic build order to say the least. | This is a pretty great achievement and I'm looking forward to a | non-minimal stable ISO. | bombcar wrote: | Yeah even the "gcc compiled Jan 23, 2021 at 11:23AM" messages | you often see breaks deterministic builds. | twisrkrr wrote: | The code has to be changed so that things like system specific | paths, time of compilation, hardware, etc. Don't cause the | compiled program to be unique to that computer (meaning | compiling the same code on a different computer will give you a | file that still works but has a different md5 hash) | | By being able to reproduce the file completely, down to | identical md5 hashes, you know you have the same file the | creator has, and know with certainty that the file has not been | tampered with | secondcoming wrote: | Does this mean that the code cannot be built with CPU | specific optimisations (march option with gcc) | Avamander wrote: | Pretty much. But hopefully x86_64 feature levels will | provide the benefits of native builds to a reasonable | extent. | Denvercoder9 wrote: | The software doesn't suddenly become incompatible with CPU- | specific optimisations (or many other compiler flags that | change its output), but if you do so, you won't be able to | reproduce the distribution binaries. Distributions don't | enable CPU-specific optimisations anyway, since they want | to be usable on more than one CPU model. | pas wrote: | Likely it means that with the same input arguments the end | result is bit-by-bit identical. (As I understand the | problems were hard to control output elements. So it was | not enough to se the same args, set the same time, and use | the same path and filesystem, because there were things | that happened at different speeds, so they ended up | happening at relative different elapsed times, so the | outputs contained different timestamps, etc.) | clhodapp wrote: | No, just that you need to avoid naively conflating the | machine that is doing the compilation with the one that | optimization is being performed for. | | Concretely, you would need to keep track of and reproduce | e.g. the march flag value as a part of your build input. If | you wanted to optimize for multiple architectures, that | would mean separate builds or a larger binary with function | multi-versioning. | maartenh wrote: | Nixpkgs contains the build / patch instructions for any | packages in NixOS. | | If you want to compile any piece of software available in | Nixpkgs, you can override it's attributes (inputs used to | build it). | | One can trivially have an almost identical operation system | to your colleagues install, but override just one package | to enable optimisations for a certain cpu. This would | however imply that you'd lose the transparent binary cache | that you could otherwise use. | | Exactly this method is used to configure the entire | operating install! Your OS install is just another package | that has some custom inputs set. | danbst wrote: | Just recently, there were large non-reproducible projects: | python, gcc. Not sure where is the history of non-r13y. | | --- | | There is Debian initiative to create bit-to-bit reproducible | builds for all their software (well, all critical). | | https://reproducible-builds.org/ | | R13y is akin to "computer proofs" in math -- if you don't have | it, that's fine, but if you have it, that's awesome. | | There are practical reasons to favor reproducibility too, but | those are more for distro maintainers. | | The fact that NixOS (not Debian) got this 100% is mostly | because | | - minimal image has a small subset of packages | (https://hydra.nixos.org/build/146009592#tabs-build-deps) | | - Nix tooling was created 15 years ago *exactly* for this, Nix | is mad to make packages bit-to-bit rebuildable from scratch. | | - Nix/Nixpkgs is growing in number of maintainers and got more | funds | | - Nix has fewer Docker/Snap pragmatics | Foxboron wrote: | >- Nix tooling was created 15 years ago _exactly_ for this, | Nix is mad to make packages bit-to-bit rebuildable from | scratch. | | I don't think this is accurate? | | Nix is about reproducing system behaviour, largely by | capturing the dependency graph and replaying the build. But | this doesn't entail bit-for-bit identical binaries. It's very | much sits in the same group such as Docker and similar | technologies. This is also how I read the original thesis | from Eelco[0]. | | And well, claims like this always rubs me the wrong way since | nixos only really started using the word "reproducible | builds" after Debian started their efforts in 2015-2016[1], | and started their reproducible builds effort later. It also | muddies the language since people are now talking about | "reproducible builds" in terms of system behavior as well as | bit-for-bit identical builds. The result has been that people | talk about "verifiable builds" instead. | | [0]: https://edolstra.github.io/pubs/phd-thesis.pdf | | [1]: https://github.com/NixOS/nixpkgs/issues/9731 | infogulch wrote: | Being bit-for-bit reproduceable means you could do fun things | like distribute packages as just sources and a big blob of | signatures, and you can still run only signed binaries. | mananaysiempre wrote: | The GCC developers in particular were hostile to such efforts | for a long time, IIRC. (This is a non-trivial issue because | randomized data structures exist and can be a good idea to use: | treaps, universal hashes, etc. I'd guess it also pays for | compiler heuristics to be randomized sometimes. Incremental | compilation is much harder to achieve when you require bit-for- | bit identical output. Even just stripping your compile paths | from debug info is not entirely straightforward.) | pas wrote: | How/why was the randomness part not "solveable" via using | fixed seeds? | moonchild wrote: | > Incremental compilation is much harder to achieve when you | require bit-for-bit identical output | | Presumably, incremental compilation is only for development. | For release, you would do a clean build, which would be | reproducible. | | > Even just stripping your compile paths from debug info is | not entirely straightforward | | Just use the same paths. | [deleted] | Arnavion wrote: | Is there a list of the 1486 packages in the minimal ISO? | danbst wrote: | https://hydra.nixos.org/build/146009592#tabs-build-deps | jnxx wrote: | A good sign that the friendly competition by Guix has a positive | influence :) | | https://guix.gnu.org/manual/en/html_node/Bootstrapping.html | | https://guix.gnu.org/en/blog/2020/guix-further-reduces-boots... | delroth wrote: | This smaller bootstrap seed thing is a different problem from | reproducible builds. nixpkgs does still have a pretty big | initial TCB (aka. stage0) compared to Guix. But as far as I can | tell NixOS has the upper hand in terms of how much can be built | reproducibly (aka. the output hash matches across separate | builds). | jnxx wrote: | Bootstrapping from a very small binary core (I think 512 | bytes) with an initial C compiler written in Scheme also has | the advantage that the system can easily be ported to | different hardware. Which is one major strength of the GNU | projects and tools. | delroth wrote: | Not necessarily. Usually these very small cores end up | being more architecture specific binaries than a stage0 | consisting of gcc + some other core packages. A good | illustration of this is that Guix's work on bootstrap seed | reduction has been so far mostly applied to i686/amd64 and | not even other architectures they support (at least, not | fully). | rejectedandsad wrote: | I really want to adopt Nix and NixOS for my systems but the cost | of wrapping packages is just a little too high for me right now | (or perhaps I'm out of date and a new cool tool that does it | automatically is out). IMHO, a dependency graph-based build | system that builds a hermetically sealed transitive closure of an | app's dependencies that can be plopped into a rootfs via Nix [0] | is far superior security wise to the traditional practice of | writing docker files. | | [0] https://yann.hodique.info/blog/using-nix-to-build-docker- | ima... | koolba wrote: | There's something very poetic about "unstable" being | "reproducible". It's like controlled chaos. | fouronnes3 wrote: | Are there synergies with the Debian reproducible build project | that this can benefit from? | Denvercoder9 wrote: | In general, Debian aims to upstream the changes they make to | software. That allows all other distributions, including Nix, | to profit from their work making software reproducible. | amelius wrote: | Hopefully this will one day also work with NVidia's software | packages. | jeroenhd wrote: | The trick with nvidia on Linux is to not expect that they will | ever work on anything. If you want to be sure that stuff works, | either don't buy Nvidia or use Windows. | amelius wrote: | What would you recommend instead of NVidia's Jetson embedded | platform? | jeroenhd wrote: | I'm not familiar with the market the Jetson is in and what | purposes it serves. From a quick Google, it seems to build | boards for machine learning? If that's true, I'm pretty | sure Google and Intel have products in that space, and I'm | sure there's other brands I don't know of. | | If Nvidia has its own distribution, it might well work for | as long as it's willing to maintain the software because | then they can tune their open source stuff to make it work | with their proprietary drivers, the same way Apple is | hiding their tensorflow code. I still would be hesitant to | rely on Nvidia in that case given their history. | solarkraft wrote: | That is a pretty big deal. | | This means everyone building NixOS will get the _exact_ same | binary, meaning you can now trust _any_ source for it because you | can verify the hash. | | It's a huge win compared to the current default distribution | model of "just trust these 30 american entities that the software | does what they say it does". | | Big congratulations to the team. | groodt wrote: | This is a big deal. Congratulations to all involved. | | In Software, complexity naturally increases over time and | dependencies and interactions between components become | impossible to reason about. Eventually this complexity causes the | Software to collapse under its own weight. | | Truly reproducible builds (such as NixOS and Nixpkgs) provides us | with islands of "determinism" which can be taken as true | invariants. This enables us to build more Systems and Software on | top of deterministic foundations that can be reproduced by | others. | | This reproducibility also enables powerful things like | decentralized / distributed trust. Different third-parties can | build the same software and compare the results. If they differ, | it could indicate one of the sources has been compromised. See | Trustix https://github.com/tweag/trustix | dcposch wrote: | This really deserves more love. | | Who remembers Ken Thompson's "Reflections on Trusting Trust"? | | The norm today is auto-updating, pre-built software. | | This places a ton of trust in the publisher. Even for open- | source, well-vetted software, we all collectively cross our | fingers and hope that whoever is building these binaries and | running the servers that disseminate them, is honest and good at | security. | | So far this has mostly worked out due to altruism (for open | source maintainers) and self interest (companies do not want to | attack their own users). But the failure modes are very serious. | | I predict that everyone's imagination on this topic will expand | once there's a big enough incident in the news. Say some package | manager gets compromised, nobody finds out, and 6mo later every | computer on earth running `postgres:latest` from docker hub gets | ransomwared. | | There are only two ways around this: | | - Build from source. This will always be a deeply niche thing to | do. It's slow, inconvenient, and inaccessible except to nerds. | | - Reproducible builds. | | Reproducible builds are way more important than is currently | widely appreciated. | | I'm grateful to the nixos team for being beating a trail thru the | jungle here. Retrofitting reproducibility onto a big software | project that grew without it, is hard work. | zamadatix wrote: | Unless you are going to be the equivalent of a full time | maintainer doing code review for every piece of software you | use you need to trust other software maintainers reproducible | builds or not. Considering this is Linux and not even Linus can | deeply review every change in just the kernel anymore that | philosophy can't apply to meaningfully large software like | Nixos. | Taek wrote: | You can't solve this problem without having a full history of | code to inspect (unless you are decompiling), reproducibility | is the first step and bootstrapability is the second step. | Then we refine the toolchains and review processes to ensure | high impact code is properly scrutinized. | | What we can't do is throw our hands up and say anyone who | compromises the toolchain deep enough is just allowed to win. | It will happen at some point if we don't put the right | barriers in place. | | It's the first step of a long journey, but it is a step we | should be taking. | jnxx wrote: | That's too black-and-white. Being able to reproduce stuff | makes some kind of attacks entirely uninteresting because | malicious changes can be traced back. Which is what many | types of attackers do not want. Debian, or the Linux kernel, | for example, are not fool-proof, but both are in practice | quite safe to work with. | zamadatix wrote: | Who are you going to trace it back to if not the maintainer | anyways? If the delivery method then why is the delivery of | the source from the maintainer inherently any safer? | jnxx wrote: | No, it is not always the maintainer. Imagine you download | a binary software package via HTTPS. In theory, the | integrity of the download is protected by the server | certificate. However, it is possible that certificates | get hacked, get stolen, or that nation states force CAs | to give out back doors. In that case, your download could | have been changed on the fly with arbitrary alterations. | Reproducible builds make it possible to detect such | changes. | zamadatix wrote: | Same as when you download the source instead of the | binary and see it reproducibly builds the backdoored | binary. And at this point we're back to "Build from | source. This will always be a deeply niche thing to do. | It's slow, inconvenient, and inaccessible except to | nerds." anyways. | | It's not that reproducible builds provide 0 value it's | that they don't truly solve the trust problem as | initially stated. They also have non-security value to | boot which is often understated compared to the security | value IMO. | eptcyka wrote: | Even if the original attack happened upstream, if the | upstreamed piece of software was pinned via git, then | it'd be trivial to bisect the upstream project to find | the culprit. | dragonsky67 wrote: | This is great if you are looking at attributing blame. | Not so great if you are trying to prevent all the worlds | computers getting owned.... | | I'd imagine that if I were looking at causing world wide | chaos, I'd love nothing better than getting into the tool | chain in a way that I could later on utilise on a wide | spread basis. | | At that point I would have achieved my aims and if that | means I've burnt a few people along the way, so be it, | I'm a bad guy, the damage has been done, the objective | met. | radicalcentrist wrote: | Reproducibility is what allows you to rely on other | maintainers' reviews. Without reproducibility, you can't be | certain that what you're running has been audited at all. | | It's true that no single person can audit their entire | dependency tree. But many eyes make all bugs shallow. | jnxx wrote: | This is great! The one fly in the ointment, pardon, is that Nix | is a bit lax about trusting proprietary and binary-only stuff. | It would be great if there were a FLOSS-only core system for | NixOS which would be fully transparent. | rejectedandsad wrote: | > It would be great if there were a FLOSS-only core system | for NixOS | | Might be wrong but isn't this part of the premise for | Guix/GuixSD? | quarantine wrote: | Nix/Nixpkgs blocks unfree packages by default, so I presume | it would be relatively easy to disable packages with the | `unFree` attribute. | jnxx wrote: | I totally believe it is possible, it is perhaps more of a | cultural thing. | eptcyka wrote: | It's the pragmatic thing. I wouldn't use nixOS if I | wasn't able to use it on a 16 core modern desktop. I | don't think there's a performant and 100% FLOSS | compatible computer that wouldn't make me want to gouge | my eyes out with a rusty spoon when building stuff for | ARM. | zamadatix wrote: | Talos has 44 core/176 thread server options which can | take 2 TBs of DDR4 that are FSF certified. The board | firmware is also open and has reproducible builds. | eptcyka wrote: | Thanks, I was legitimately unaware of this option. That | does smash my argument, but I'm not likely to be using a | system like that anytime soon due to cost concerns | mostly. | tadfisher wrote: | That is way more expensive than a 16-core desktop, | though. Workstations are a class above consumer-grade | desktops and that's reflected in the price. | 0xbadcafebee wrote: | Supply chain attacks are definitely important to deal with, but | defense-in-depth saves us in the end. Even if a postgres | container is backdoored, if the admins put postgres by itself | in a network with no ingress or egress except the webserver | querying it, an attack on the database itself would be very | difficult. If on the other hand, the database is run on | untrusted networks, and sensitive data kept on it... yeah, | they're boned. | radicalcentrist wrote: | Reproducibility is necessary, but unfortunately not sufficient, | to stop a "Trusting Trust" attack. Nixpkgs still relies on a | bootstrap tarball containing e.g. gcc and binutils, so | theoretically such an attack could trace its lineage back to | the original bootstrap tarball, if it was built with a | compromised toolchain. | mjg59 wrote: | Diverse double compilation should allow a demonstration that | the toolchain is trustworthy. | hsbauauvhabzb wrote: | I don't have the resources to audit every component of my | system. I favour enterprise distros who audit code which ends | up in their repos and avoid pip, npm, etc. but there are some | glaring trade offs on both productivity and scalability. | | The problem is unmaintainability, I can't imagine it'd be | easier for medium sized teams where security isn't a priority, | either. | initplus wrote: | Building from source doesn't have to be inaccessible, if the | build tooling around it is strong. Modern compiled languages | like Go (or modern toolchains on legacy languages like vcpkg) | have a convention of building everything possible from source. | | So at least for software libraries building from source is | definitely viable. Fro end user applications it's another story | though, doubt we will ever be at a point where building your | own browser from source makes sense... | garmaine wrote: | Binary reproducible builds are still pretty inaccessible | though. | georgyo wrote: | Mandatory link to the Debian single purpose site: | https://isdebianreproducibleyet.com/ | | However that is for everything in Debian, not just the iso. It is | truly remarkable to see all the Linux distributions move the | needle forward. | Foxboron wrote: | And Arch Linux :) | | https://reproducible.archlinux.org/ ___________________________________________________________________ (page generated 2021-06-20 23:00 UTC)