Skip to content(if available)orjump to list(if available)

Rust’s dependencies are starting to worry me

kion

IMO any system where taking a dependency is "easy" and there is no penalty for size or cost is going to eventually lead to a dependency problem. That's essentially where we are today both in language repositories for OSS languages and private monorepos.

This is partly due to how we've distributed software over the last 40 years. In the 80s the idea of a library of functionality was something you paid for, and painstakingly included parts of into your size constrained environment (fit it on a floppy). You probably picked apart that library and pulled the bits you needed, integrating them into your builds to be as small as possible.

Today we pile libraries on top of libraries on top of libraries. Its super easy to say `import foolib`, then call `foolib.do_thing()` and just start running. Who knows or cares what all 'foolib' contains.

At each level a caller might need 5% of the functionality of any given dependency. The deeper the dependency tree gets the more waste piles on. Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call, but all you did was take that one dependency to format a number.

In some cases the languages make this worse. Go and Rust, for example, encourage everything for a single package/mod to go in the same file. Adding optional functionality can get ugly when it would require creating new modules, but if you only want to use a tiny part of the module, what do you do?

The only real solution I can think of to deal with this long term is ultra-fine-grained symbols and dependencies. Every function, type, and other top-level language construct needs to declare the set of things it needs to run (other functions, symbols, types, etc). When you depend on that one symbol it can construct, on demand, the exact graph of symbols it needs and dump the rest for any given library. You end up with the minimal set of code for the functionality you need.

Its a terrible idea and I'd hate it, but how else do you address the current setup of effectively building the whole universe of code branching from your dependencies and then dragging it around like a boat anchor of dead code.

nicoburns

As far as I'm aware, LTO completely solves this from a binary size perspective. It will optimise out anything unused. You can still get hit from a build time perspective though.

samus

It's certainly better than in Java where LTO is simply not possible due to reflection. The more interesting question is which code effectively gets compiled so you know what has to be audited. That is, without disassembling the binary. Maybe debug information can help?

0x696C6961

In Go, the symbol table contains enough information to figure this out. This is how https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck is able to limit vulnerabilities to those that are actually reachable in your code.

pjmlp

Not only it is possible, it has been available for decades on commercial AOT compilers like Aonix, Excelsior JET, PTC, Aicas.

It is also done on the cousin Android, and available as free beer on GraalVM and OpenJ9.

metaltyphoon

Doesn’t Java offer some sort of trimming like C#? I know he won’t remove everything but at least they can trim down a lot of things.

kion

LTO only gets you so far, but IMO its more kicking the can down the road.

The analogy I use is cooking a huge dinner, then throwing out everything but the one side dish you wanted. If you want just the side-dish you should be able to cook just the side-dish.

floating-io

I see it more as having a sizable array of ingredients in the pantry, and using only what you need or want for a given meal.

01HNNWZ0MV43FF

Then another group of armchair programmers will bitch you out for using small dependencies

I just don't listen. Things should be easy. Rust is easy. Don't overthink it

dietr1ch

I don't think libraries are the problem, but we don't have a lot of visibility after we add a new dependency. You either take the time to look into it, or just add it and then forget about the problem (which is kind of the point of having small libraries).

It should be easy to build and deploy profiling-aware builds (PGO/BOLT) and to get good feedback around time/instructions spent per package, as well as a measure of the ratio of each library that's cold or thrown away at build time.

xlii

> Go and Rust, for example, encourage everything for a single package/mod to go in the same file.

Clarification: Go allows for a very simple multi-file. It’s one feature I really like, because it allows splitting otherwise coherent module into logical parts.

dcow

Further: I’ve never seen rust encourage anything of the sort. Module directory with a mod.rs and any number of files works just fine.

kion

I probably mischaracterized this as its been a while since I did more than trivial Rust. AFAIK its not possible to depend on only a part of a module in Rust though right? (At least without an external build system)

For example, you can't split up a module into foo.rs containing `Foo` and bar.rs containing `Bar`, both in module 'mymod' in such a way that you can `use mymod::Bar and foo.rs is never built/linked.

My point is the granularity of the package/mod encourages course-grained deps, which I argue is a problem.

kibwen

> In some cases the languages make this worse. Go and Rust, for example, encourage everything for a single package/mod to go in the same file.

What? I don't know about Go, but this certainly isn't true in Rust. Rust has great support for fine-grained imports via Cargo's ability to split up an API via crate features.

throwaway462663

> It's a terrible idea...

It's a terrible idea because you're trying to reinvent section splitting + `--gc-sections` at link time, which rust (which the article is about) already does by default.

kion

The article is about Rust, but I was commenting on dependencies in general.

Things like --gc-sections feels like a band-aid, a very practical and useful band-aid, but a band-aid none the less. You're building a bunch of things you don't need, then selectively throwing away parts (or selectively keeping parts).

IMO it all boils down to the granularity. The granularity of text source files, the granularity of units of distribution for libraries. It all contributes to a problem of large unwieldy dependency growth.

I don't have any great solutions here, its just observations of the general problem from the horrifying things that happen when dependencies grow uncontrolled.

zaptheimpaler

This is just a modern problem in all software development, regardless of language. We are doing more complex things, we have a much bigger library of existing code to draw from and there are many reasons to use it. Ultimately a dependency is untrusted code, and there's a long road to go in hardening entire systems to make running arbitrary dependencies safe (if its even possible).

In the absence of a technical solution, all others basically involve someone else having to audit and constantly maintain all that code and social/legal systems of trust. If it was pulled into Rust stdlib, that team would be stuck handling it, and making changes to any of that code becomes more difficult.

harha_

Regardless of language, really? I highly doubt that, you don't generally see such problems with C or even C++ because dependencies are more cumbersome to add, especially in a way that's cross-platform.

zaptheimpaler

Because most dependencies are either manually installed by the user, or are dynamic libraries that are provided and audited by the distro maintainers. The dependencies are there, they're just harder to see - https://wiki.alopex.li/LetsBeRealAboutDependencies

jerf

A true enough statement, but "Rust" is unnecessarily specific. Dependencies are getting scary in general. Supply chain attacks are no longer hypothetical, they're here and have been for a while.

If I were designing a new language I think I'd be very interested in putting some sort of capability system in so I can confine entire library trees safely, and libraries can volunteer somehow what capabilities they need/offer. I think it would need to be a new language if for no other reason than ecosystems will need to be written with the concept in them from the beginning.

For instance, consider an "image loading library". In most modern languages such libraries almost invariably support loading images from a file, directly, for convenience if nothing else. In a language that supported this concept of capabilities it would be necessary to support loading them from a stream, so either the image library would need you to supply it a stream unconditionally, or if the capability support is more rich, you could say "I don't want you to be able to load files" in your manifest or something and the compiler would block the "LoadFromFile(filename)" function at compile time. Multiply that out over an entire ecosystem and I think this would be hard to retrofit. It's hugely backwards incompatible if it is done correctly, it would be a de facto fork of the entire ecosystem.

I honestly don't see any other solution to this in the long term, except to create a world where the vast majority of libraries become untargetable in supply chain attacks because they can't open sockets or read files and are thus useless to attackers, and we can reduce our attack surface to just the libraries that truly need the deep access. And I think if a language came out with this design, you'd be surprised at how few things need the dangerous permissions.

Even a culture of minimizing dependencies is just delaying the inevitable. We've been seeing Go packages getting supply-chain-attacked and it getting into people's real code bases, and that community is about as hostile to large dependency trees as any can be and still function. It's not good enough.

assassinator42

Java and the .NET Framework had partial trust/capabilities mechanisms decades ago. No one really used them and they were deprecated/removed.

voxgen

I don't think retrofitting existing languages/ecosystems is necessarily a lost cause. Static enforcement requires rewrites, but runtime enforcement gets you most of the benefit at a much lower cost.

As long as all library code is compiled/run from source, a compiler/runtime can replace system calls with wrappers that check caller-specific permissions, and it can refuse to compile or insert runtime panics if the language's escape hatches would be used. It can be as safe as the language is safe, so long as you're ok with panics when the rules are broken.

It'd take some work to document and distribute capability profiles for libraries that don't care to support it, but a similar effort was proven possible with TypeScript.

0cf8612b2e1e

Is there anything in existence which has a version of this idea? It makes a ton of sense to me, but you are right that it would be practically impossible to do in a current language.

kibwen

Yes, but you can't enforce this at the language level if your objective is security (at least not for natively-compiled languages). You need OS-level support for capabilities, which some OSes do provide (SeL4, Fuchsia). But if you're in a VM rather than native code then you can enforce capabilities, which is what Wasm does with WASI.

metaltyphoon

.NET Framework, windows only, (non .NET, aka .NET Core)

wofo

I've thought about this (albeit not for that long) and it seems like you'd need a non-trivial revamp of how we communicate with the operating system. For instance, allowing a library to "read from a stream" sounds safe until you realize they might be using the same syscalls as reading from a file!

neilv

In the past (not in Rust, but other languages), for important systems, I've instituted policies of minimizing dependencies from these language-specific package repositories, and for the ones you do use, having to copy it to our own repos and audit each update before use.

But that's not practical for all situations. For example, Web frontend developer culture might be the worst environment, to the point you often can't get many things done in feasible time, if you don't adopt the same reckless practices.

I'm also seeing it now with the cargo-culting of opaque self-hosted AI tools and models. For learning and experimenting, I'd spend more time sufficiently compartmentalizing an individual tool than with using it.

This weekend, I'm dusting off my Rust skills, for a small open source employability project (so I can't invest in expensive dependency management on this one). The main thing thing bothering me isn't allocation management, but the sinking feeling when I watch the cast-of-thousands explosion of transitive dependencies for the UI and async libraries that I want to use. It's only a matter of time before one of those is compromised, if not already, and one is all it takes.

pjmlp

Best way is to have CI/CD systems only connected to the official internal repos.

Devs can add whatever they feel like on their workstations but it will be a sad build server if they get pushed without permission.

MeetingsBrowser

> Devs can add whatever they feel like on their workstations

A compromised dev machine is also a problem.

dsr_

s/Best way/The only safe way/

Anything else will get abused in the name of expediency and just-this-one-time.

Also, the process for adding a crate/gem/module/library needs to be the same as anything else: license review, code review, subscription to the appropriate mailing list or other announce channel, and assignment of responsibility. All of these except code review can be really, really fast once you have the process going.

All problems are, at least in part, dependency chain management problems.

sunrunner

I agree that some amount of friction when including third party dependencies is a vital thing to push people to consider the value versus cost of dependencies (and license review, code review, channel subscriptions are all incredibily important and almost always overlooked), however how should this work for transitive dependendencies? And the dependencies of _those_ dependencies?

The dependency trees for most interpreted or source-distributed languages are ridiculous, and review of even a few of those seems practically impossible in a lot of development environments.

eddd-ddde

The cool thing about rust is you can implement async yourself. You aren't tied to any specific implementation.

dboreham

Or not use async at all.

wofo

There are some voices trying to address this security risk (e.g. the proponents of this new RFC: https://github.com/rust-lang/rfcs/pull/3810). However, for some reason (probably culture) there isn't much momentum yet to change the status quo.

cogman10

The rust RFC process has, frankly, become somewhat of a CF.

There's literally 1000s of RFCs for rust with only a small handful that are integrated. Having this forest, IMO, makes it hard for any given proposal to really stand out. Further, it makes duplicate effort almost inevitable.

Rust's RFC process is effectively a dead letter box for most.

geodel

I think they can constitute committee for RFC review process(in case there is none today) and based on recommendation multiple domain specific teams/ groups can be created to review RFCs in timely manner.

palata

Similar feeling here.

Cargo makes it so simple to add tons of dependencies that it is really hard not to do it. But that does not stop here: even if I try to be careful with adding dependencies, a couple dependencies are likely to pull tens of transitive dependencies each.

"Then don't depend on them", you say. Sure, but that means I won't write my project, because I won't write those things from scratch. I could probably audit the dependency (if it wasn't pulling 50 packages itself), but I can't reasonably write it myself.

It is different with C++: I can often find dependencies that don't pull tens of transitive dependencies in C++. Maybe because it's harder to add dependencies, maybe because the ecosystem is more mature, I don't know.

But it feels like the philosophy in Rust is to pull many small packages, so it doesn't seem like it will change. And that's a pity, because I like Rust-the-language better than C++-the-language. It just feels like I trade "it's not memory-safe" for "you have to pull tons of random code from the Internet".

X0Refraction

This was linked from the top comment on the Rust subreddit: https://wiki.alopex.li/LetsBeRealAboutDependencies

I think it makes a good point that some of the difference here is just perception due to dependencies in C/C++ being less immediately visible since they're dynamically loaded. To some degree that is a plus though as you likely trust the maintainers of your OS distribution to provide stable, supported libraries.

As other commenters have said, perhaps this is an area where the Rust maintainers could provide some kind of extended standard library where they don't guarantee backwards compatibility forever, but do provide guarantees about ongoing fixes for security issues.

jampekka

I take bit less unstable dependencies over the total mess of C++ dependencies with CMake, shared libraries, version conflicts etc any time. There's probably also a bit of an illusion about C++ transitive dependencies due to them usually being precompiled (because compiling them is such pain).

ChocolateGod

The whole pkgconfig, cmake, autotools etc ecosystem is insane compared to how Rust and Go do things.

It's part of the reason why software distribution on Linux has been pushed to using containers, removing the point of having shared libraries. I think Google with it's C++ replacement (Carbon) plans on doing it's own system.

skydhash

From my point of view, the issue stems from developers wanting to control distribution. Fine if it's for your own usage, not really if you're planning for others to use it. You will find the most convoluted build system just because they have a pet platform they want to specially support making it hell to do anything on others.

It could be better, but the current solutions (npm, go, python,...) favor only the developers, not the maintainers and packagers.

antonvs

> Sure, but that means I won't write my project, because I won't write those things from scratch.

You need to think a bit harder about that, to help you decide whether your position is rational.

MeetingsBrowser

This confuses me as well. Is the implied solution to choose a language where you are forced to write those things from scratch?

perrygeo

> the philosophy in Rust is to pull many small package

I'm not sure it's a philosophy, more a pragmatic consideration for compilation speeds. Anyone who's done a non-trivial amount of Rust knows that moment when the project gets too big and needs to split into separate crates. It's kinda sad that you can't organize code according to proper abstractions, many times I feel forced to refactor for compiler performance.

imtringued

I have been wasting 6 hours yesterday on getting the bullet examples to compile outside of bullet itself with no success. It's more likely that a lot of software simply doesn't get written because C++ and CMake are a pain in the ass.

rs186

I once wanted to contribute to the popular swc project (https://github.com/swc-project/swc). I cloned the repo, ran build, and a whooping 20GB was gone from my disk. The parser itself (https://github.com/swc-project/swc/blob/main/crates/swc_ecma...) has over a dozen dependencies, including serde.

Meanwhile, the heaviest JavaScript parser implemented in JavaScript is more lightweight.

I decided that I should leave this project alone and spend my time elsewhere.

MeetingsBrowser

I agree that relying on unknown dependencies is a risk, but this misses the point IMO. Number of dependencies and disk space are kind of arbitrary.

> Meanwhile, the heaviest JavaScript parser implemented in JavaScript is more lightweight.

The lightest weight javascript program relies on V8 to run, which has multiple orders of magnitude more dependencies. Most of which you have never heard of.

At least cargo makes it easier to get a clearer picture of what the dependencies are for a program.

pixl97

Number of dependencies isn't exactly arbitrary...

If you have one huge dep it's easier to keep track you're on the latest update, also it's much less likely you'll fat finger it and import something typosquatting.

Also if you're in enterprise you'll have less 100 page SBOM reports.

munificent

> relies on V8 to run, which has multiple orders of magnitude more dependencies.

Actually, this isn't true. (Or at least wasn't a while back.) I used to work with a bunch of ex-V8 folks and they really despised third-party dependencies and didn't trust any code they didn't write. They used a few third-party libs but for them most part, they tried to own everything themselves.

schmichael

We need a term like “Mature” or similar for dependencies that are done. Mature dependencies have two characteristics:

1. Well defined scope

2. Infrequent changes

Nomad has many of these (msgpack, envparse, cli, etc). These dependencies go years without changing so the dependency management burden rapidly approaches zero. This is an especially useful property for “leaf” dependencies with no dependencies of their own.

I wish libraries could advertise their intent to be Mature. I’d choose a Mature protobuf library over one that constantly tweaked its ergonomics and performance. Continual iterative improvement is often a boon, but sometimes it’s not worth the cost.

procaryote

Java did this sometimes by essentially adding slightly tidied up versions of whatever was the de-facto standard to the standard library. Java 1.3 didn't have regexes but most people were using the same apache commons thing, so java 1.4 added regexes that looked exactly like that. Java's date handling was a pain so people mostly used joda-date; a later java version added something that mostly works like jodadate. Etc.

It is an easy way to get a somewhat OK standard library as the things you add became popular on their own merits at some point.

Once added, the lowest friction path is to just use the standard library; and as it is the standard library you have a slightly better hope someone will care to maintain it. You can still build a better one if needed for your use-case, but the batteries are included for basic usage

delusional

I have a lot of sympathy for this viewpoint, but I also ask that we try to remind ourselves. We are asking for professionalism from hobby projects.

If you want a mature protobuf implementation you should probably buy one. Expecting some guy/gal on the internet to maintain one for your for free seems ill advised.

procaryote

Isn't that an argument _for_ having a "mature" label? To avoid the hobbyists who have no intention to maintain their thing?

Also there are lots of lovely projects maintained at high levels by hobbyists, and plenty of abandonware that was at some point paid for

schmichael

A great point! All of the libraries I mentioned are created and maintained by corporations. Hobbyists, as always, are free to do as they please without judgement from me. :)

I will say I get great satisfaction from the little envparse library I wrote needing near-0 maintenance. It’s a rare treat to be able to consider any project truly done.

Orangeair

I think that https://blessed.rs does a pretty good job of providing recommendations for things that probably can't be crammed into the standard library, but which you'll almost certainly end up needing at one point or another. I honestly like that system a lot, it makes it so that the only packages you need to worry much about are usually doing something rather specific.

conradludgate

As a fellow rust developer, I love our dependencies but I put a lot of effort into pruning the ones I want to use. If I see a crate using too many I might contribute to it or find a replacement.

If you want to use dependencies, I wouldn't be surprised when you realise they also want to use dependencies. But you can put your money/time in the right places. Invest in the dependencies that do things well.

XxiXx

I think it's a "cultural" thing. With Go you often find developers/projects proudly mentioning that any or just a few non-std dependencies are used. Coming from Go it really feels strange when you see pages of dependencies scrolling over your screen when you build a Rust project.

null

[deleted]

sophacles

I have yet to come across a go project that doesn't pull in tons of 3rd party code as well. It seems like maybe you're over-stating the "culture" a bit.

hu3

> I have yet to come across a go project that doesn't pull in tons of 3rd party code as well.

These have Zero dependencies. It's not rare in Go land.

- https://github.com/go-chi/chi 19k stars

- https://github.com/julienschmidt/httprouter 16k stars

- https://github.com/gorilla/mux 21k stars

- https://github.com/spf13/pflag 2.6k stars

- https://github.com/google/uuid 5.6k starts

Many others have just a few dependencies.

meling

Yeah, while I’ve seen some great libraries that follow the practice of minimizing their dependencies, I’m a bit annoyed with the amount of dependencies that docker will bring along [1]. I’ve been on the lookout for alternatives for my docker needs, but the state of podman, buildah and some others that I checked is similar. They all bring in roughly the same number of dependencies… if anyone knows of a stripped down Go lib that can be used to build from a Dockerfile, pull, and run a container, I would be grateful for any suggestions. Heck docker / moby isn’t even using go.mod proper.

[1] https://github.com/moby/moby/blob/master/vendor.mod

klooney

> dotenv is unmaintained.

How much maintenance could you possibly need to load secrets from .env into the environment.

shepmaster

I agree with your general point, but for this specific functionality, I’ll point out that setting environment variables of the current process is unsafe. It took us a long time to realize it so the function wasn’t actually marked as unsafe until the Rust 2024 edition.

What this means in practice is that the call to invoke dotenv should also be marked as unsafe so that the invoker can ensure safety by placing it at the right place.

If no one is maintaining the crate, that won’t happen and someone might try to load environment variables at a bad time.

andy_xor_andrew

ok, I'm hooked - how is setting an env var in the current process unsafe? My gut says it's not unsafe in a memory-ownership sense, but rather in a race condition sense?

whatever the issue is, "setting an env var is unsafe" is so interesting to me that I'm now craving a blog post explaining this

Orangeair

It's a long standing bug, setenv and unsetenv are not thread-safe

https://www.evanjones.ca/setenv-is-not-thread-safe.html

iammrpayments

I find hilarious when people judge the quality of a repository by how many commits it has, as if 10.000 commits means the code is better.

prophesi

The maintainers themselves give this warning in the repo's README, so even if it were maintained, it still wouldn't be production ready.

> Achtung! This is a v0.* version! Expect bugs and issues all around. Submitting pull requests and issues is highly encouraged!

https://github.com/dotenv-rs/dotenv

0cf8612b2e1e

That is an escape hatch that is seemingly used everywhere. Nobody wants to release a 1.0 with backwards compatibility guarantees.

ZeroVer https://0ver.org/

csomar

On the other hand loading .env from the environment is critical (since you are usually passing secrets through .env). I wouldn't want to maintain that myself and not share it with a xxK other projects in case there is a vulnerability.

nemothekid

I feel like leftpad has given package managers a very bad name. I understand the OP's hesitation, but it feels a little ridiculous to me.

tokio is a work-stealing, asynchronous runtime. This is a feature that would be an entire language. Does OP consider it reasonable to audit the entire Go language? or the V8 engine for Node? v8 is ~10x more lines than tokio.

If Cloudflare uses Node, would you expect Cloudflare to audit v8 quarterly?

conradludgate

And for what it's worth, people do audit tokio. I have audited tokio. Many times in fact. Sure, not everyone will, but someone will :)

timewizard

If two different dependencies use a different version of some other dependency between them does cargo still include both versions by default?

This is something I've only ever seen cargo do.