Skip to content(if available)orjump to list(if available)

Fifty Years of Open Source Software Supply Chain Security

ndiddy

> The OpenSSH project is careful about not taking on unnecessary dependencies, but Debian was not as careful. That distribution patched sshd to link against libsystemd, which in turn linked against a variety of compression packages, including xz's liblzma. Debian's relaxing of sshd's dependency posture was a key enabler for the attack, as well as the reason its impact was limited to Debian-based systems such as Debian, Ubuntu, and Fedora, avoiding other distributions such as Arch, Gentoo, and NixOS.

Does Fedora use Debian's patch set for sshd, or a similar patch set that adds libsystemd?

Edit: It looks like Fedora wasn't affected because the backdoor triggered a valgrind test failure, so they shipped it with a flag that disabled the functionality that was backdoored. Seems like they lucked out. https://lists.fedoraproject.org/archives/list/devel@lists.fe...

aragilar

I'm not sure show Fedora is derived from Debian...

If I recall correctly, the backdoor was set up to only activate on rpm and deb based systems, so it wouldn't have been trigged on Arch, Gentoo or NixOS, even if they linked systemd to ssh.

null

[deleted]

aadhavans

A very well-written piece. The section on funding open source is as relevant as it's ever been, and I don't think we've learnt much since last year.

As the proportion of younger engineers contributing to open-source decreases (a reasonable choice, given the state of the economy), I see only two future possibilities:

1. Big corporations take ownership of key open-source libraries in an effort to continue their development.

2. Said key open-source libraries die, and corporations develop proprietary replacements for their own use. The open source scene remains alive, but with a much smaller influence.

bluGill

Unfortunately I have no clue how to get a company to put money into the open source we use. Not just my current company, but any company. I've sometimes been able to get permission to contribute something I build on company time, but often what I really want is someone on the project to spend a year or two maintaining it. Do the boring effort of creating a release. Write that complex feature everyone (including me) wants.

In decades past companies you to pay for my license for Visual Studio (I think of a MSDN subscription), clear case, a dozen different issue/work trackers. However as soon as an open source alternative is used I don't know how to get the money that would have been spent to them.

Come to think of it I'm maintainer of a couple open source projects that I don't use anymore and I don't normally bother even looking at the project either. Either someone needs to pay me to continue maintaining it (remember I don't find them useful myself so I'm not doing it to scratch an itch), or someone needs to take them over from me - but given xz attacks I'm no longer sure how to hand maintenance over.

johngossman

In my prior career I talked to many companies about open source usage. If you tell them they are running an unsupported database or operating system in production, they will often see the value of buying support. But it is much harder to get them to pay for non-production stuff, especially development tools. And even if you find an enlightened manager, getting budget to pay a maintainer for a project is very difficult to even explain.

“We’re paying for contract development? But it’s not one of our products and we’ll have no rights to the software? They’ll fix all the bugs we find, right? Right?” This a hard conversation at most companies, even tech companies.

ghaff

Development tools was almost always a tough standalone business even before open source became so prevalent.

akoboldfrying

"They’ll fix all the bugs we find, right?" -- that sounds to me like a reasonable requirement on the maintainer, if they are going to be paid a non negligible amount.

ndiddy

At companies where I've worked, all of the money we've put into open source has been in contracting the developer(s) to add a feature we needed to the upstream version. Of course, this means that we didn't really fund ongoing maintenance on anything we used that had all the features we needed.

RossBencina

As an independent maintainer I don't really know where to start trying to organise an ongoing income stream from users to support maintenance.

I thought that the idea of a funding manifest to advertise funding requests was a good idea: https://floss.fund/funding-manifest/ No idea if it works.

bluGill

Most open source projects don't really need an income stream. It only becomes an issue when the project is large enough that there is desire for someone to work on it half time or more. Smaller projects can still be done as a hobbyist thing. (the project I "maintain" only needs a few hours of my time per year, but since I no longer use it I can't be bothered - there is a problem for those who still use it). Of course it is hard to say - curl seems like it should be a small project but in fact it is large enough to support someone full time.

throw83848484

Sadly, as OS dev, I see third way: development behind closed doors.

With AI and CV reference hunting, number of contributions is higher than ever. Open-source projects are basically spammed, with low quality contributions.

Public page is just a liability. I am considering to close public bugzilla, git repo and discussions. I would just take bug reports and patches from very small circle of customers and power users. Everything except release source tarball, and short changelog would be private!

Open-source means you get a source code, not free customer and dev support!

pabs3

The FOSSjobs wiki has a bunch of resources on this topic:

https://github.com/fossjobs/fossjobs/wiki/resources

lrvick

My company, Distrust, exists to produce, support, and fund our open source security tools.

So far our core full time team of 3 gets to spend about half our time consulting/auditing and half our time contributing to our open projects that most of our clients use and depend on.

The key is for companies to have visibility into the current funding status of the software they depend on, and relationships with maintainers, so they can offer to fund features or fixes they need instead of being blocked.

https://distrust.co

ozim

I think big corporations will take ownership - well not directly but via paying to foundations and it already is the case.

Second thing is there are bunch of things corporations need to use but don't want to develop on their own like SSH.

There is already too much internal tooling inside of big corporations that is rotting there and a lot of times it would be much better if they give it out to a foundation - like Apache foundation where projects go to die or limp through.

transpute

From Linux Security Summit 2019, a retrospective on mandatory access control and bounding "damage that can be caused by flawed or malicious applications" in Android, iOS, macOS, Linux, FreeBSD and Zephyr, https://static.sched.com/hosted_files/lssna19/e5/LSS2019-Ret...

  For the past 26 years, the speaker has been engaged in the design, implementation, technology transfer, and application of flexible Mandatory Access Control (MAC). In this talk, he describes the history and lessons learned from this body of work. The background and motivation for MAC is first presented, followed by a discussion of how a flexible MAC architecture was created and matured through a series of research systems. The work to bring this architecture to mainstream systems is then described, along with how the architecture and implementation evolved. The experience with applying this architecture to mobile platforms is examined. The role of MAC in a larger system architecture is reviewed in the context of a secure virtualization system. The state of MAC in mainstream systems is compared before and after our work. Work to bring MAC to emerging operating systems is discussed.
video: https://www.youtube.com/watch?v=AKWFbxbsU3o

edoceo

One of my struggles is to get docker to lockdown which images it loads. I'd like to only pull from my own blessed registry and it seems Docker wants to always go back to theirs.

For other "package" managers (eg: CPAN, Debian) I can point to my own archive and be sure everything I manage down stream gets the blessed bits.

I basically have a huge archive/mirror for the supply chain for my perl, PHP, JavaScript, etc.

If anyone has pro tips on how to "lock" docker to one registry that would be cool.

dgl

Don't use Docker, use podman (which has a registries.conf for this, with many settings). You can then use podman-docker to have command line Docker compatibility. Podman is more secure than Docker too, by default it runs as a user, rather than as root.

edoceo

Thanks, podman has moved up on my "to eval" list.

amiga386

A lovely article, but one section definitely needs a [citation needed]

> (OpenSSL is written in C, so this mistake was incredibly easy to make and miss; in a memory-safe language with proper bounds checking, it would have been nearly impossible.)

    package main

    import "fmt"

    type CmdType int

    const (
        WriteMsg CmdType = iota
        ReadMsg
    )

    type Cmd struct {
        t CmdType
        d []byte
        l int
    }

    var buffer [256]byte

    var cmds = []Cmd{
        Cmd{WriteMsg, []byte("Rain. And a little ice. It's a damn good thing he doesn't know how much I hate his guts."), 88},
        Cmd{WriteMsg, []byte("Rain. And a little ice."), 23},
        Cmd{ReadMsg, nil, 23},
        Cmd{ReadMsg, nil, 88}, // oops!
    }

    func main() {
        for c := range cmds {
            if cmds[c].t == WriteMsg {
                copy(buffer[:], cmds[c].d[:cmds[c].l])
            } else if cmds[c].t == ReadMsg {
                fmt.Println(string(buffer[:cmds[c].l]))
            }
        }
    }
The heartbleed problem was that user-controlled input could say how long it was, separate from how long it actually was. OpenSSL then copied the (short) thing into a buffer, but returned the (long) thing, thus revealing all sorts of other data it was keeping in the same buffer.

It wasn't caught because OpenSSL had built its own buffer/memory management routines on top of the actual ones provided by the language (malloc, memcpy, realloc, free), and all sorts of unsafe manipulations were happening inside one big buffer. That buffer could be in a language with perfect memory safety, the same flaw would still be there.

null

[deleted]

EVa5I7bHFq9mnYK

Related; https://news.ycombinator.com/item?id=43617352 North Korean IT workers have infiltrated the Fortune 500

mlinksva

Good article for what it covers, but sadly does not cover isolation/sandboxing/least privilege.

Alive-in-2025

Yes. The crucial issue to me is the increasing frequency of attacks where some piece of open source gets an update - leading to endless hidden supply chain attacks.

I don't see anything that is going to block this from getting worse and worse. It became a pretty common issue that I first heard about with npm or node.js and their variants, maybe because people update software so much there and have lots of dependencies. I don't see a solution. A single program can have huge numbers of dependencies, even c++ or java programs now.

It's not new, here's one from 6 years ago on c++ - https://www.trendmicro.com/en_us/research/19/d/analyzing-c-c...

Don't forget log4j - https://www.infoworld.com/article/3850718/developers-apply-t..., points to this recent paper https://arxiv.org/pdf/2503.12192

bitwize

Indeed. In 2020s, if you're not sandboxing each thing, and then sandboxing each library the thing depends on, you're running with way too many opportunities for vulnerability.

LtWorf

Well said! How?

zzo38computer

I have some ideas about operating system design (and stuff relating to the CPU design, too) to help with this and other issues (e.g. network transparency, resisting fingerprinting, better user programmability and interoperability, etc). This means that it is fully deterministic except I/O, and all I/O uses capabilities which may be proxied etc. Libraries may run in separate processes if desired (but this is not always required). However, other differences compared with existing systems is also necessary for improved security (and other improvements); merely doing other things like existing systems do has some problems. For example, USB will not be used, and Unicode also will not be used. Atomic locking/transactions of multiple objects at once will be necessary, too (this can avoid many kind of race conditions with existing systems, as well as other problems). File access is not done by names (files do not have names). And then, a specific implementation and distribution may have requirements and checking for the packages provided in the package manager and in the default installation (and the specification will include recommendations). These things alone still will not solve everything, but it is a start.

mlinksva

I don't really know because I haven't put work in to investigate, but some things in that direction seem to be, possibly in order of some combination of maturity and comprehensiveness.

  - CHERI compartmentalisation
  - LavaMoat (js)
  - Scala "capture checking"
  - Java "integrity by default"

bitwize

I have no freaking idea. Needless to say I don't think our current operating systems are up to the task of actually being secure. You have to be able to somehow dynamic-link in a library whilst only giving calls into that library certain permissions/capabilities... which I don't think even Windows can do.

lrvick

Great coverage, however it failed to mention code review and artifact signing as well as full source bootstrapping which are fundamental defenses most distros skip.

In our distro, Stagex, our threat model assumes at least one maintainer, sysadmin, or computer is compromised at all times.

This has resulted in some specific design choices and practices:

- 100% deterministic, hermetic, reproducible

- full source bootstrapped from 180 bytes of human-auditable machine code

- all commits signed by authors

- all reviews signed by reviewers

- all released artifacts are multi-party reproduced and signed

- fully OCI (container) native all the way down "FROM scratch"

- All packages easily hash-locked to give downstream software easy determinism as well

This all goes well beyond the tactics used in Nix and Guix.

As far as we know, Stagex is the only distro designed to strictly distrust maintainers.

https://stagex.tools

AstralStorm

Good step.

It doesn't distrust the developers of the software though, so does not fix the biggest hole. Multiparty reproduction does not fix it either, that only distrusts the build system.

The bigger the project, the higher the chance something slips through, if even an exploitable bug. Maybe it's the developer themselves being compromised, or their maintainer.

Reviews are done on what, you have someone reviewing clang code? Binutils?

pabs3

The code review problem is something solvable by something like CREV, where the developer community at large publishes the reviews they have done, and eventually there is good coverage of most things.

https://github.com/crev-dev/

lrvick

As the other (dead, but correct) commenter pointed out, job one is proving the released binary artifacts even match source code, as that is the spot that is most opaque to the public where vulns can most easily be injected (and have been in the past over and over and over).

Only with this problem solved, can we prove the code humans ideally start spending a lot more time reviewing (working on it) is actually the code that is shipped in compiled artifacts.

charcircuit

>can most easily be injected (and have been in the past over and over and over).

In practice this is much more rare then a user downloading and running malware or visiting a site that exploits their browser. Compare the number of 0days chrome has had over the years versus the number of times bad actors have hacked Google and replaced download links with links to malware.

TacticalCoder

> Reviews are done on what, you have someone reviewing clang code? Binutils?

There aren't random developers pushing commits to these codebases: these are used by virtually every Linux distro out there (OK, maybe not the Kubernetes one that ships only 12 binaries, forgot its name).

It seems obvious to me that GP is talking about protection against rogue distro maintainers, not fundamental packages being backdoored.

You're basically saying: "GP's work is pointless because Linus could insert a backdoor in the Linux kernel".

In addition to that determinism and 100% reproducibility brings another gigantic benefit: should a backdoor ever be found in clang or one of the binutils tool, it's going to be 100% reproducible. And that is a big thing: being able to reproduce a backdoor is a godsend for security.

lrvick

> OK, maybe not the Kubernetes one that ships only 12 binaries, forgot its name

You are likely thinking of Talos Linux, which incidentally also builds itself with stagex.

floxy

>full source bootstrapped from 180 bytes of human-auditable machine code

What does this mean? You have a C-like compiler in 180 bytes of assembler that can compile a C compiler that can then compile GCC?

mananaysiempre

That’s normally what this means, yes, with a few more intermediate steps. There’s only one bootstrap chain like this that I know of[1,2,3], maintained by Jeremiah Orians and the Guix project; judging from the reference to 180 bytes, that’s what the distro GP describes is using as well.

> This is a set of manually created hex programs in a Cthulhu Path to madness fashion. Which only have the goal of creating a bootstrapping path to a C compiler capable of compiling GCC, with only the explicit requirement of a single 1 KByte binary or less.

[1] https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-...

[2] https://savannah.nongnu.org/projects/stage0/

[3] https://github.com/oriansj/bootstrap-seeds

floxy

That's pretty awesome

skulk

As per their landing page, yes.

> stage0: < 190 byte x86 assembly seed is reproduced on multiple distros

> stage1: seed builds up to a tiny c compiler, and ultimately x86 gcc

> stage2: x86 gcc bootstraps target architecture cross toolchains

very impressive, I want to try this out now.

pabs3

The LWN article is a good place to start:

https://lwn.net/Articles/985739/

no-dr-onboard

100% reproducible? That's amazing. I'll be honest, I don't really believe you (which I suppose is the point, right?).

Do you all document how you got around system level sources of non-determinism? Filesystems, metadata, timestamps, tempfiles, etc? This would be a great thing to document for people aiming for the same thing.

What are you all using to verify commits? Are you guys verifying signatures against a public PKI?

Super interested as I manage the reproducibility program for a large software company.

lrvick

Indeed you do not have to believe me.

> git clone https://codeberg.org/stagex/stagex

> cd stagex

> make

Several hours later your "out" directory will contain locally built OCI images for every package in the tree, and the index.json for each should contain the exact same digests we commit in the "digests" folder, and the same ones multiple maintainers sign in the OCI standard "signatures" folder.

We build with only a light make wrapper around docker today, though it assumes you have it configured to use the containerd image store backend, which allows for getting deterministic local digests without uploading to a registry.

No reason you cannot build with podman or kaniko etc with some tweaks (which we hope to support officially)

> Do you all document how you got around system level sources of non-determinism? Filesystems, metadata, timestamps, tempfiles, etc? This would be a great thing to document for people aiming for the same thing.

We try to keep our package definitions to "FROM scratch" in "linux from scratch" style with no magic to be self documenting to be easy to audit or reference. By all means crib any of our tactics. We use no global env, so each package has only the determinism tweaks needed (if any). We heavily referenced Alpine, Arch, Mirage, Guix, Nix, and Debian to arrive at our current patterns.

> What are you all using to verify commits? Are you guys verifying signatures against a public PKI?

We all sign commits, reviews, and releases with well published PGP keys maintained in smartcards, with expected public keys in the MAINTAINERS file. Most of us have keyoxide profiles as well making it easy to prove all our online presences agree with the expected fingerprints for us.

> Super interested as I manage the reproducibility program for a large software company.

By all means drop in our matrix room, #stagex:matrix.org . Not many people working on these problems. The more we can all collaborate to unblock each other the better!

null

[deleted]

neuroelectron

Very suspicious article. Sounds like the "nothing to see here folks, move along" school of security.

Reproducibility is more like a security smell; a symptom you’re doing things right. Determinism is the correct target and subtly different.

The focus on supply chain is a distraction, a variant of The “trusting trust” attack Ken Thompson described in 1984 is still among the most elegant and devastating. Infected development toolchains can spread horizontally to “secure” builds.

Just because it’s open doesn’t mean anyone’s been watching closely. "50 years of security"? Important pillars of OSS have been touched by thousands of contributors with varying levels of oversight. Many commits predate strong code-signing or provenance tracking. If a compiler was compromised at any point, everything it compiled—including future versions of itself—could carry that compromise forward invisibly. This includes even "cleanroom" rebuilds.

TZubiri

I agree that it's handwavy, my take on supply chain vulns is that the only way to fight them is to reduce dependencies, massively.

Additionally the few dependencies you have should be well compensated to avoid 'alternative monetization'.

You can't have the cake (massive amounts of gratis software) and eat it too (security and quality warranties).

The 100 layers of signing and layer 4 package managers is a huge coping mechanism by those that are not ready to bite the tradeoff.

pabs3

The amount of software depended on is always going to be massive, its not like every developer is going to write a BIOS, kernel, drivers, networking stack, compilers, interpreters, and so on for every project. So there will always be a massive iceberg of other people's code underneath what each developer writes.

TZubiri

Sure, but all of those you mentioned are part of a base OS.

I'm not sure what the fallacy is called, but you say we have an excess of X and then the fallacy is "we can't live without X".

Modern projects especially in the javascript realm have like 10K dependencies. Having one dependency in an Operating System(even though it may itself have their own dependencies) is a huuuuuuuuuge difference.

You can pay cash money to Windows or Red Hat and have either a company that owns all of the deps, or a company that vets all of the dependencies, distributes some cash through donations, and provides a sensible base package.

It may sound extreme, but you don't need much more than a Base OS. If you reaaallly want something else, you can check the OS official package repository. Downloading some third party code is what's extreme to me.

lrvick

The best defense we have against the Trusting Trust attack is full source bootstrapping, now done by two distros: Guix and Stagex.

AstralStorm

No you do not. If you have not actually validated each and every source package your trust is only related to the generated binaries corresponding to the sources you had. The trusting trust attack was deployed against the source code of the compiler, poisoning specific binaries. Do you know if GCC 6.99 or 7.0 doesn't put a backdoor in some specific condition?

There's no static or dynamic analysis deployed to enhance this level of trust.

The initial attempts are simulated execution like in valgrind, all the sanitizer work, perhaps difference on the functional level beyond the text of the source code where it's too easy to smuggle things through... (Like on an abstracted conditional graph.)

We cannot even compare binaries or executables right given differing compiler revisions.

lrvick

So for example, Google uses a goobuntu/bazel based toolchain to get their go compiler binaries.

The full source bootstrapped go compiler binaries in stagex exactly match the hashes of the ones Google releases, giving us as much confidence as we can get in the source->binary chain, which until very recently had no solution at all.

Go has unique compiler design choices that make it very self contained that make this possible, though we also can deterministically build rust, or any other language from any OCI compatible toolchain.

You are talking about one layer down from that, the source code itself, which is our next goal as well.

Our plan is this:

1. Be able to prove all released artifacts came from hash locked source code (done)

2. Develop a universal normalized identifier for all source code regardless of origin (treehash of all source regardless of git, tar file etc, ignoring/removing generated files, docs, examples, or anything not needed to build) (in progress)

3. Build distributed code review system to coordinate the work to multiple signed reviews by reputable security researchers for every source package by its universal identifier (planning stages)

We are the first distro to reach step 1, and have a reasonably clear path to steps 2 and 3.

We feel step 2 would be a big leap forward on its own, as it would have fully eliminated the xz attack where the attack hid in the tar archive, but not the actual git tree.

Pointing out these classes of problem is easy. I know, did it for years. Actually dramatically removing attack surface is a lot more rewarding.

Help welcome!

rcxdude

That's a different problem. The threat in Trusting Trust is that the backdoor may not ever appear in public source code.

neuroelectron

Besides full source boostrapping which could adopt progressive verification of hardware features and assumption of untrusted hardware, integration of Formal Verification into the lowest levels of boostrapping is a must. Bootstap security with the compiler.

This won't protect against more complex attacks like RoP or unverified state. For that we need to implement simple artifacts that are verifiable and mapped. Return to more simple return states (pass/error). Do error handling external to the compiled binaries. Automate state mapping and combine with targeted fuzzing. Systemd is a perfect example of this kind of thing, what not to do: internal logs and error states being handled by a web of interdependent systems.

pabs3

Code review systems like CREV are the solution to backdoors being present in public source code.

https://github.com/crev-dev/

egberts1

Gentoo is a full source boostrapping if you include the build of GRUB2 and create the initramd file as well as the kernel.

lrvick

Full source bootstrapping meaning you build with 100% human auditable source code or machine code. The only path to do this today I am aware of is via hex0 building up to Mes and tinycc on up to a modern c compiler: https://github.com/fosslinux/live-bootstrap/blob/master/part...

As far as I know Gentoo, even from their "stage0" still assumes you bring your own bootstrap compiler toolchain, and thus is not self bootstrapping.

arkh

> Infected development toolchains can spread horizontally to “secure” builds.

Nowadays there are so many microcontrolers in your PC an hardware vendor could simply infect: your SSD, HDD, Motherboard or part of the processor. Good luck bootstrapping from hand rolled NAND.

null

[deleted]

successful23

[flagged]

LeoPanthera

Please don't use an LLM to write comments.