Is NixOS truly reproducible?

123 comments

·February 9, 2025

vlovich123

> Our most important finding is that the reproducibility rate in nixpkgs has increased steadily from 69% in 2017 to about 91% in April 2023. The high reproducibility rate in our most recent revision is quite impressive, given both the size of the package set and the absence of systematic monitoring in nixpkgs. We knew that it was possible to achieve very good reproducibility rate in smaller package sets like Debian, but this shows that achieving very high bitwise reproducibility is possible at scale, something that was believed impossible by practitioners4

I think people in this thread are focusing on the wrong thing. Sure, not all packages are reproducible, but the project is systematically increasing the percentage of projects that are reproducible while ALSO adding new projects and demonstrating conclusively that what was considered infeasible is actually readily achievable.

> The interesting aspect of these causes is that they show that even if nixpkgs already achieves great reproducibility rates, there still exists some low hanging fruits towards improving reproducibility that could be tackled by the Nix community and the whole FOSS ecosystem.

This work is helpful I think for the community to tackle the sources of unreproducible builds to push the percentage up even further. I think it also highlights the need for automation to validate that there aren't systematic regressions or regressions in particularly popular packages (doing individual regressions for all packages is a futile effort unless a lot of people volunteer to be part of a distributed check effort).

mikepurvis

What's even crazier is that Nix builds are this reproducible for free. Like, joe random developer can:

    nix build nixpkgs#vim
    nix build nixpkgs#vim --rebuild

The first invocation will substitute binaries, and the second will rebuild those locally and validate the bit for bit reproducibility of the results.

In Debian there is significant ceremony and special tools/wrappers required to set up the reproducible environment, so no one would bother to use it unless they were specifically working on the https://wiki.debian.org/ReproducibleBuilds initiative.

sublimefire

Some interesting related stats from Debian also show good reproducibility progress

https://tests.reproducible-builds.org/debian/reproducible.ht...

jchw

I think this debate comes down to exactly what "reproducible" means. Nix doesn't give bit-exact reproducibility, but it does give reproducible environments, by ensuring that the inputs are always bit-exact. It is closer to being fully reproducible than most other build systems (including Bazel) -- but because it can only reasonably ensure that the inputs are exact, it's still necessary for the build processes themselves to be fully deterministic to get end-to-end bit-exactness.

Nix on its own doesn't fully resolve supply chain concerns about binaries, but it can provide answers to a myriad of other problems. I think most people like Nix reproducibility, and it is marketed as such, for the sake of development: life is much easier when you know for sure you have the exact same version of each dependency, in the exact same configuration. A build on one machine may not be bit-exact to a build on another machine, but it will be exactly the same source code all the way down.

The quest to get every build process to be deterministic is definitely a bigger problem and it will never be solved for all of Nixpkgs. NixOS does have a reproducibility project[1], and some non-trivial amount of NixOS actually is properly reproducible, but the observation that Nixpkgs is too vast is definitely spot-on, especially because in most cases the real issues lie upstream. (and carrying patches for reproducibility is possible, but it adds even more maintainer burden.)

[1]: https://reproducible.nixos.org/

matrss

> The quest to get every build process to be deterministic [...] will never be solved for all of Nixpkgs.

Not least because of unfree and/or binary-blob packages that can't be reproducible because they don't even build anything. As much as Guix' strict FOSS and build-from-source policy can be an annoyance, it is a necessary precondition to achieve full reproducibility from source, i.e. the full-source bootstrap.

jchw

Nixpkgs provides license[1] and source provenance[2] information. For legal reasons, Nix also defaults to not evaluating unfree packages. Not packaging them at all, though, doesn't seem useful from any technical standpoint; I think that is purely ideological.

In any case, it's all a bit imperfect anyway, since it's from the perspective of the package manager, which can't be absolutely sure there's no blobs. Anyone who follows Linux-libre releases can see how hard it really is to find all of those needles in the haystack. (And yeah, it would be fantastic if we could have machines with zero unfree code and no blobs, but the majority of computers sold today can't meaningfully operate like that.)

I actually believe there's plenty of value in the builds still being reproducible even when blobs are present: you can still verify that the supply chain is not compromised outside of the blobs. For practical reasons, most users will need to stick to limiting the amount of blobs rather than fully eliminating them.

[1]: https://nixos.org/manual/nixpkgs/stable/#sec-meta-license

[2]: https://nixos.org/manual/nixpkgs/stable/#sec-meta-sourceProv...

0x69420

you can slap a hash on a binary distribution and it becomes "reproducible" in the same trivial sense as any source tarball. after that, the reproducibility of whatever "build process" takes place to extract archives and shuffle assets around is no more or less fraught than any other package (probably less considering how much compilers have historically had to be brought to heel, especially before reproducibility was fashionable enough for it to enter much into compiler authors' consideration!!)

colordrops

I'm curious, why couldn't packages that are fully reproduceable be marked with metadata, and in your config you set a flag to only allow reproduceable packages? Similar to the nonfree tag.

Then you'd have a 100% reproduceable OS if you have the flag set (assuming that required base packages are reproduceable)

jchw

You could definitely do that, I think the main thing stopping anyone is simply lack of demand for that specific feature. That, and also it might be hard to keep track of what things are properly reproducible; you can kind of only ever prove for sure that a package is not reproducible. It could be non-deterministic but only produce differences on different CPUs or an infinitesimally small percentage of times. Actually being able to assure determinism would be pretty amazing although I don't know how that could be achieved.

colordrops

I assume it would be somewhat of a judgement call. I mean that is the case with nonfree packages as well - licenses and whatnot have to be evaluated. I assume that there are no cases of non-trivially large software packages in the wild that have been formally proven to be reproducible, but I could be wrong.

sa46

> It is closer to being fully reproducible than most other build systems (including Bazel).

How so? Bazel produces the same results for the same inputs.

jchw

Bazel doesn't guarantee bit-exact outputs, but also Bazel doesn't guarantee pure builds. It does have a sandbox that prevents some impurities, but for example it doesn't prevent things from going out to the network, or even accessing files from anywhere in the filesystem, if you use absolute paths. (Although, on Linux at least, Bazel does prevent you from modifying files outside of the sandbox directory.)

The Nix sandbox does completely obscure the host filesystem and limit network access to processes that can produce a bit-exact output only.

(Bazel also obviously uses the system compilers and headers. Nix does not.)

paulddraper

> Bazel also obviously uses the system compilers and headers. Nix does not.

Bazel allows hermetic toolchains, and uses it for most languages: Java, Python, Go, Rust, Node.js, etc. You can do the same for C++, but Bazel doesn't provide that out-of-the-box. [1]

Bazel sandboxing can restrict system access on Linux with --experimental_use_hermetic_linux_sandbox and --sandbox_add_mount_pair. [2]

Every "reproducible builds" discussion requires an understand of what is permitted to vary. E.g. Neither Nix nor Bazel attempts to make build products the same for x86 host environments vs ARM host environments. Bazel is less aggressive than Nix in that it does not (by default) attempt to make build products the same for different host C++ compilers.

[1] https://github.com/bazelbuild/bazel/discussions/18332

[2] https://bazel.build/reference/command-line-reference#flag--e...

dijit

Uh, Either my understanding of Bazel is wrong, or everything you wrote is wrong.

Bazel absolutely prevents network access and filesystem access (reads) from builds. (only permitting explicit network includes from the WORKSPACE file, and access to files explicitly depended on in the BUILD files).

Maybe you can write some “rules_” for languages that violate this, but it is designed purposely to be hermetic and bit-perfect reproducible.

EDIT:

From the FAQ[0]:

> Will Bazel make my builds reproducible automatically?

> For Java and C++ binaries, yes, assuming you do not change the toolchain.

The issues with Docker's style of "reproducible" (meaning.. consistent environment; are also outlined in the same FAQ[1]

> Doesn’t Docker solve the reproducibility problems?

> Docker does not address reproducibility with regard to changes in the source code. Running Make with an imperfectly written Makefile inside a Docker container can still yield unpredictable results.

[0]: https://bazel.build/about/faq#will_bazel_make_my_builds_repr...

[1]: https://bazel.build/about/faq#doesn’t_docker_solve_the_repro...

gf000

I think talking about sandboxes is missing a point a bit.

It's an important constituent, but only complete OS-emulation with deterministic scheduling could (at a huge overhead) actually result in bit-by-bit reproducible artifacts with arbitrary build steps.

There are an endless source of impurities/randomness and most compilers haven't historically cared much about this.

gf000

No, most compilers are not themselves reproducible, even within very restrictive sandboxes (e.g. they may do some work concurrently and collect the results based on when it completes, then build on top of that. If they don't add a timing-insensitive sorting step, the resulting binary will (assuming no bugs) be functionally equivalent, but may not be bit-by-by equal), and a build tool can only do so much.

k__

What are the common issues besides timestamps?

colejohnson66

Compiler executing internal work concurrently and merging at the end. Thread scheduling changes will cause a different output ordering.

IHLayman

How this article discusses reproducibility in NixOS and declines to even mention the intensional model or efforts to implement it are surprising to me, since it appears they have done a lot of research into the matter.

If you don’t know, the intensional model is an alternative way to structure the NixOS store so that components are content-addressable (store hash is based on the targets) as opposed to being addressed based on the build instructions and dependencies. IIUC, the entire purpose of the intensional model is to make Nix stores shareable so that you could just depend on Cachix and such without the worry of a supply-chain attack. This approach was an entire chapter in the Nix thesis paper (chapter 6) and has been worked on recently (see https://github.com/NixOS/rfcs/pull/62 and https://github.com/NixOS/rfcs/pull/17 for current progress).

mschwaig

I think it would have been a good thing to mention, but difficult to do well in more than a quick reference or sidenote and could easily turn into a extensive detour. I'm saying this as someone who's working on exactly that topic. There is a little bit of overlap between the kind of quantitative work that they do and this design aspect: the extensional model leaves the identity of direct dependencies not entirely certain. In practice that means we don't know if they built direct dependencies from source or substituted them from cache.nixos.org, but this exact concern also applies to cache.nixos.org itself.

The intensional store makes the store shareable without also sharing trust relationships ('kind of trustless' in that sense), but only because it moves trust relationships out of the store, not because it gets rid of them. You still need to trust signatures which map an hash of inputs to a hash of the output, just like in the extensional model. You can however get really powerful properties for supply chain security from the intensional store model (and a few extra things). You can read about that in this recent paper of mine: https://dl.acm.org/doi/10.1145/3689944.3696169. I'm still working on this stuff and trying to find ways to get that work funded (see https://groundry.org/).

mikepurvis

You still need to trust something though. It's just that instead of trusting the signing of the binaries themselves, you trust the metadata that maps input hashes (computed locally) to content hashes (unknown until a build occurs).

The real win with content addressing in Nix is being able to proactively dedupe the store and also cut off rebuild cascades, like if you have dependency chain A -> B -> C, and A changes, but you can demonstrate that the result of B is identical, then there's no longer a need to also rebuild C. With input addressing, you have to rebuild everything downtree of A when it changes, no exceptions.

im3w1l

Is B remaining the same something that happens often enough for it to matter?

mikepurvis

I haven’t studied it, but yes I would imagine so. For example if a python build macro changes but the sphinx output remains unchanged, you get out of rebuilding thousands of packages that throw off sphinx docs as part of their build.

CamouflagedKiwi

That's one way to read the statistic. Another way you could read the graph is that they still have about the same number (~5k) of non-reproducible builds, which has been pretty constant over the time period. Adding a bunch of easily reproducible additional builds maybe doesn't make me believe it's solving the original issues.

> We knew that it was possible to achieve very good reproducibility rate in smaller package sets like Debian, but this shows that achieving very high bitwise reproducibility is possible at scale, something that was believed impossible by practitioners.

Maybe I miss some nuance here, but why is Debian written off as being so much smaller scale? The top end of the graph here suggests a bit over 70k packages, Debian apparently also currently has 74k packages available (https://www.debian.org/doc/manuals/debian-reference/ch02.en....); I guess there's maybe a bit of time lag here but I'm not sure that is enough to claim Debian is somehow not "at scale".

vlovich123

According to https://tests.reproducible-builds.org/debian/reproducible.ht... (which is what the article links to btw) there are ~37k packages tracked for reproducible builds which is ~2.7x smaller than Nix's 100k packages.

gf000

This is not really a Nix-issue to begin with.

It's a bit like asking what percentage of Nix-packaged programs have Hungarian translation -- if Nix packages some more stuff the rate might decrease, but it's not Nix's task to add that to the programs that lack it.

Nix does everything in its power to provide a sandboxed environment in which builds can happen. Given the way hardware works, there are still sources of non-determinism that are impossible to prevent, most importantly timing. Most programs depend on it, even compilers, and extra care should be taken by them to change that. The only way to prevent it would be to go full-on CPU and OS emulation, but that would be prohibitively expensive.

genewitch

> The only way to prevent it would be to go full-on CPU and OS emulation, but that would be prohibitively expensive.

How so?

For ref I used a Gentoo distcc chroot inside a devuan VM to bootstrap gentoo on a 2009 netbook. It worked fine. I did this around Halloween.

gf000

A compiler invoked twice on the same source file is not mandated to produce the same binary, but it should produce a binary with the same functionality.

There are infinite number of binaries that do the same thing (e.g. just padding random zeros in certain places wouldn't cause a functional problem).

Nix is very good at doing functionally reproducible builds, that's its whole thing. But there are build steps which are simply not deterministic, and they might produce correct, but not always the same outputs.

orbital-decay

OS scheduling is non-deterministic, and there are quite a few things that are sensitive to the order of operations (simplest example: floating point addition). Is you want to guarantee determinism, not just provide it on a best effort basis for things that are willing to cooperate, the only way do that is to put everything into a fully deterministic emulator, which is terribly slow.

dartos

Are they mostly the same 5k packages as 2017?

That seems to be the crux of it.

opan

Although I'm aware many distros care somewhat about reproducible builds these days, I tend to associate it primarily with Guix System, I never really considered it a feature of NixOS, having used both (though spent much more time on Guix System now).

For the record, even in the land of Guix I semi-regularly see reports on the bug-guix mailing list that some package isn't reproducible. It seems to get treated as a bug and fixed then. With that in mind, and personally considering Guix kind of the flagship of these efforts, it doesn't surprise me if anyone else doesn't have perfectly reproducible builds yet either. Especially Nix with the huge number of things in nixpkgs. It's probably easier for stuff to fall through the cracks with that many packages to manage.

rssoconnor

I'll repeat my comment from last time this came up.[0]

I could be wrong (and I probably am) but I feel like the term "reproducible build" has shifted/solidified since 2006 when Dolstra's thesis was first written (which itself doesn't really use that term all that much). As evidence the first wikipedia page on "Reproducible builds" seems to have appeared in 2016, a decade after Dolstra's thesis, and even that stub from 2016 appears to prefer to use the term "Deterministic compilation".

Anyhow, when the Nix project originally spoke about "reproducible builds", what I understood was meant by that term was "being able to repeat the same build steps with the same inputs". Because of the lack of determinstic compilation, this doesn't always yield bit-by-bit identical outputs, but are simply presumed to be "functionally identical". There is, of course, no reason to believe that they will necessarily be functionally identical, but it is what developers take for granted every day, and if otherwise would be considered a bug somewhere in the package.

With Nix, when some software "doesn't work for me, but works for you", we can indeed recursively compare the nix derivation files locating and eliminating potential differences, a debugging process I have used on occasion.

I agree that "reproducible builds" now means something different, but that isn't exactly the fault of Nix advocates. I guess a new term for "being able to repeat the same build steps with the same inputs" is needed.

[0]https://news.ycombinator.com/item?id=41953155

naasking

> There is, of course, no reason to believe that they will necessarily be functionally identical, but it is what developers take for granted every day, and if otherwise would be considered a bug somewhere in the package.

Yes, the only possible differences result from either a compiler bug or a program bug that depends on undefined behaviour, in which case "anything can happen" as they say. As others have noted, parallel compilation depends on non-deterministic thread-scheduling, so this non-determinism can't be solved unless you restrict all compilation to be single-threaded. It's still not the only possible source of non-determinism though.

tracnar

> I agree that "reproducible builds" now means something different, but that isn't exactly the fault of Nix advocates. I guess a new term for "being able to repeat the same build steps with the same inputs" is needed.

I've usually seen "repeatable builds" used for that.

steve-chavez

I think you want to link to https://news.ycombinator.com/item?id=41956044.

pmarreck

It's massively closer than any other solution in this regard (nods to other Nix-inspired distros like Guix, Lix, etc.)

Honestly, I believe every software developer owes it to themselves to read the original Nix paper. It's quite digestible and lays out a lot of what it brings to the table. I came away from it wondering why it took so long to realize it... which is a property I've found true of every new important discovery.

https://edolstra.github.io/pubs/nspfssd-lisa2004-final.pdf

If you want, you can even ask an LLM to sum up its main points for you. Or to sell it to you. =)

https://chatgpt.com/share/67ae1a08-7354-8004-8200-e956cb6b59...

I would like to say one thing about using Docker to "solve" this problem though: Once you think of builds in terms of functions, you realize that a Docker image is basically just the cached artifacts of a build that "just so happened" to work correctly. Consider a function that only occasionally produces a correct value: A Docker image is one of those values.

tlavoie

Thanks for the paper, will check it out! (Still skeptical that we should encourage LLM summarization though, suspect people would gain more and actually learn things from reading papers.)

pmarreck

By all means, read the paper then! It's quite readable, and one of the best papers in software development IMHO

jonhohle

I work on a matching decomp project that has tooling to recompile C into binaries matching a 28 year old game.

In the final binaries created by compiled with gcc 2.6.3 and assembled with a custom assembler there appear to be unused, uninitialized data that is whatever was in RAM when whoever compiled the game created the release build.

Since the goal is a matching (reproducible) binary, we have tools to restore that random data at specific offsets. Fortunately our targets are fixed

fngjdflmdflg

What even causes this to happen? ie. what dev tool would add random data from RAM to a binary? Is this likely a bug or is there some reason for it like needing to reach a specific file size somewhere?

aidenn0

Simply calling write() on a C struct can do that, if there is any padding in the struct. Then, of course, there are bugs.

dezgeg

By accidentally writing out uninitialized memory contents to the file with the game still working. It's even worse in DOS era where there is no memory protection so uninitialized memory can contain data used by other processes, so for example parts of source code can get leaked that way. There's a big list of those in https://tcrf.net/Category:Games_with_uncompiled_source_code

jonhohle

Yeah, this was originally all DOS and Windows 3.1 utilities for writing programs that would run on MIPS. The data is small enough that it isn’t relevant, just not reproducible through standard build tools because it was never meant to be bitwise reproducible.

tuananh

please do write more about it.

jonhohle

We use a tool named dirt-patcher[0], which was written for the project. It lets you write arbitrary bytes at specified offsets[1].

As far as we know at this time, they’re just uninitialized bytes that would have been padding for alignment or other reasons anyway. Maybe if we move to an official build tool chain we’ll find they are deterministic, but for now, we believe they are garbage that happened to make it into the final binary.

0 - https://github.com/Xeeynamo/sotn-decomp/blob/master/tools/di...

1 - https://github.com/Xeeynamo/sotn-decomp/blob/master/config/d...

lrvick

I would note that stagex is 100% reproducible, and full source bootstrapped.

Every artifact is reproduced and signed by multiple maintainers on independently controlled hardware and this has been the case since our first release around this time last year.

https://codeberg.org/stagex/stagex

advisedwang

Is anyone actually implementing the concept of checking hashes with trusted builders? This is all wasted effort if that isn't needed.

I've seen it pointed out (by mjg59, perhaps?) that if you have a trusted builder, why don't you just use their build? That seems to be the actual model in practice.

Reproducibility seems only to be useful if you have a pool of mostly trustworthy builders and somehow want to build a consensus out of that. Which I suppose is useful for a distributed community but does seem like a stretch for the amount of work going in to reproducible builds.

arccy

The superior distro Arch Linux does it: https://reproducible.archlinux.org/

maintainers build the packages, other people check: https://wiki.archlinux.org/title/Rebuilderd#Package_rebuilde...

__MatrixMan__

> if you have a trusted builder, why don't you just use their build

Pardon my tinfoil hat, but doing this would make them a high-value target. If I like them enough to trust their builds, I probably also like them enough to avoid focusing the attentions of the bad guys on them.

Better would be to have a lot of trusted builders all comparing hashes... like, every NixOS user you know (and also the ones they know) so that there's nobody in particular to target.

Timber-6539

That's no different from how NixOS does it. You are still comparing hashes from the first build done by the distribution. A more pure approach would be to use the source code files (simple sha256sum will suffice) as the first independent variable in the chain of trust.

__MatrixMan__

I'm not sure what you mean. It's your machine that calculates the hashes when it encounters the code.

If you bulld the directed graph made by the symlinks in the nix store, and walk it backwards, a sha256 of the source files is what you'll find, both in the form of a nix store path and possibly in a derivation that relies on a remote resource but provides a hash of that resource so we can know it's unchanged when downloaded later.

The missing piece is that they're not gossipped between users. So if I find some code in a dark alley somewhere and it has a nix flake to make building it easy, I've got no way to take the hashes and determine who else has experience with the same code and can help me decide if it's trustworthy.

sublimefire

> is useful for a distributed community but does seem like a stretch for the amount of work going in to reproducible builds

Good point but even in the case of a larger monolithic systems you want to be sure it is possible to forensically analyze your source, to audit it. Once you can trust that one hash relates to this specific thing you can sign it, etc. This can then be "sold" with some added value of trust down the stream. Tracking of hashes also becomes easier once they are reproducible because they mean much more than just a "version".

whazor

There is also an additional benefit to reproducible builds, where getting the same output every time could help avoiding certain regressions. For instance, if GitHub actions performs extensive testing on a particular executable. Then you want to be able to get the exact same executable in the future, not one that is slightly different.

mschwaig

Yes. Reproducibility also makes it possible to aggregate information about the links in dependency trees and distribute trust on that basis.

That stuff is useful to humans, but it is also really useful for cold hard automated logical reasoning about dependency trees.

c0balt

> is anyone actually implementing [..]

Not for NixOS as far as I can tell. You only have this for source derivations where a hash is (usually in a PR) submitted and must be reproducable in CI. This specific example however has the problem that linkrot can be hard to detect unless you regularly check upstream sources.

mschwaig

You also couldn't feasibly do that for derivations that actually build packages, instead of fixed output derivations only, because if you the update the package set to include a newer version of the compiler, which would often produce a different output, in addition to having to rebuild everything, you would have to update all of the affected hashes.

What you should be able to do in the future with a system like nix plus a few changes is use nix as a common underlying mechanism for precisely describing build steps, and then use whatever policy you like to determine who you trust.

One policy can be about having an attestation for every build step, another one can be about two different builders being in agreement about the output of a specific build step.

That way you can construct a policy that expresses reproducibility, and reproducibility strengthens any other verification mechanism you have, because it makes it so that you can aggregate evidence from different sources. and then have different build hosts

c0balt

> You also couldn't feasibly do that for derivations that actually build packages, [..] you would have to update all of the affected hashes.

You can actually, changes to stdenv are possible and "just" a lot of work. You will regularly see them between releases or on unstable and they cause mass rebuilds. This doesn't just affect a compiler but also all stdenv tooling as these changes tend to cause rebuilds across nixpkgs. This would be verifiable but it obviously multiples the amount of compute spent.

Hint: If you look at PRs for nixpkgs you will notice labels indicating the required amounts of rebuilds, e. G., rebuild-darwin:1-10. See for example https://github.com/NixOS/nixpkgs/pull/377186 with the rebuild-darwin:5001+ label.

est31

Note that NixOS's "build" step often actually doesn't do any compilation. Often it's just downloading a binary from github releases and runs NixOS's specific binary tools on it to make it look for libraries in the right places.

So if that process is reproducible, it's a different statement from a Debian package being reproducible, which requires build inputs in the preferred form of modification (source code).

IngoBlechschmid

You are right, but I would not agree with this appearing "often". I get the impression that the nixpkgs community tries quite hard to truly compile from source even quite complex projects like Firefox and LibreOffice.

max-privatevoid

Completely false. Building from the actual source code is strongly preferred and usually easier than patching a binary that wasn't built for such an environment.

beebaween

I've genuinely never really understood the appeal of Nix. I even attempted to use this to build and "maintain" the machines we used at an offsite factory and even then with just a basic electron app, a python installation and very basic mac configs Nix proved to be a complete nightmare.

HN

Is NixOS truly reproducible?

Is NixOS truly reproducible?