Skip to content(if available)orjump to list(if available)

Rust’s dependencies are starting to worry me

kion

IMO any system where taking a dependency is "easy" and there is no penalty for size or cost is going to eventually lead to a dependency problem. That's essentially where we are today both in language repositories for OSS languages and private monorepos.

This is partly due to how we've distributed software over the last 40 years. In the 80s the idea of a library of functionality was something you paid for, and painstakingly included parts of into your size constrained environment (fit it on a floppy). You probably picked apart that library and pulled the bits you needed, integrating them into your builds to be as small as possible.

Today we pile libraries on top of libraries on top of libraries. Its super easy to say `import foolib`, then call `foolib.do_thing()` and just start running. Who knows or cares what all 'foolib' contains.

At each level a caller might need 5% of the functionality of any given dependency. The deeper the dependency tree gets the more waste piles on. Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call, but all you did was take that one dependency to format a number.

In some cases the languages make this worse. Go and Rust, for example, encourage everything for a single package/mod to go in the same file. Adding optional functionality can get ugly when it would require creating new modules, but if you only want to use a tiny part of the module, what do you do?

The only real solution I can think of to deal with this long term is ultra-fine-grained symbols and dependencies. Every function, type, and other top-level language construct needs to declare the set of things it needs to run (other functions, symbols, types, etc). When you depend on that one symbol it can construct, on demand, the exact graph of symbols it needs and dump the rest for any given library. You end up with the minimal set of code for the functionality you need.

Its a terrible idea and I'd hate it, but how else do you address the current setup of effectively building the whole universe of code branching from your dependencies and then dragging it around like a boat anchor of dead code.

WuxiFingerHold

> IMO any system where taking a dependency is "easy" and there is no penalty for size or cost is going to eventually lead to a dependency problem.

Go and C# (.NET) are counterexamples. They both have great ecosystems and just as simple and effective package management as Rust or JS (Node). But neither Go or C# have issues with dependency hell like Rust or even more JavaScript, because they have exceptional std libs and even large frameworks like ASP.NET or EF Core.

A great std lib is obviously the solution. Some Rust defenders are talking it down by giving Python as counter example. But again, Go and C# are proving them wrong. A great std lib is a solution, but one that comes with huge efforts that can only be made by large organisations like Google (Go) or Microsoft (C#).

athrowaway3z

No it doesn't.

A large stdlib solves the problems the language is focused on. For C# and Go that is web hosts.

Try using them outside that scope and the dependencies start to pile in (Games, Desktop) or they are essentially unused (embedded, phones, wasm)

Xeoncross

> A large stdlib solves the problems the language is focused on

That's part of it, but it also solves the problem of vetting. When I use a Go stdlib I don't have to personally spend time to vet it like it do when looking at a crate or npm package.

In general, Go & Rust packages on github are high quality to begin with, but there is still a pronounced difference between OS packages and what is approved to be part of the language's own stdlib.

It's nice to know thousands of different companies already found the issues for me or objected to them in reviews before the library was published.

dgb23

“Web server” is a pretty big use case though.

But I agree that graphics is often overlooked in std libs. However that’s a bit of a different beast. Std libs typically deal with what the OS provides. Graphics is its own world so to speak.

As for Wasm: first, that’s a runtime issue and not a language issue. I think GC is on the roadmap for Wasm. Second, Go and C# obviously predate Wasm.

In the end, not every language should be concerned with every use case. The bigger question is whether it provides a std lib for the category of programs it targets.

To take a specific example: JS isn’t great at efficiently and conveniently generating dynamic HTML. You can go far without (or minimal) dependencies and some clever patterns. But a lot of pain and work hours would have been saved if it had something that people want to use out of the box.

merb

actually dotnet also does not need too many dependencies for games and desktop apps.

zahlman

>A great std lib is obviously the solution. Some Rust defenders are talking it down by giving Python as counter example.

Python's standard library is big. I wouldn't call it great, because Python is over 30 years old and it's hard to add things to a standard library and even harder to remove them.

notpushkin

There are things added from tine to time, but yeah, some stuff in there just feels dated at this point.

I’m still hoping we can get a decently typed argparse with a modern API though (so much better for tiny scripts without deps!)

kokada

While not everything in Python's stdlib is great (I am looking at you urllib), I would say most of it is good enough. Python is still my favorite language to get stuff done exactly because of that.

nicce

Maybe Python 4 will just remove stuff.

fiedzia

> but neither Go or C# have issues with dependency hell like Rust or even more JavaScript, because they have exceptional std libs

They also have a lot narrower scope of use, which means it is easier to create stdlib usable for most people. You can't do it with more generic language.

pjmlp

I would say C# gets used almost everything at Microsoft between GUIs, backends, DirectX tooling (new PIX UI, Managed DirectX and XNA back in Creative Arcade days), Azure,..., alongside C++, and even if Microsoft <3 Rust, in much bigger numbers.

djfivyvusn

I didn't understand the embedded systems argument. Just because a standard lib is large doesn't mean it all ends up in the compilation target.

slashdev

I don’t want a large std lib. It stifles competition and slows the pace of development. Let libraries rise and fall on their own merits. The std lib should limit itself to the basics.

iTokio

I think this is partially true, but more nuanced than just saying that Rust std lib is lacking.

Compared to go and c#, Rust std lib is mostly lacking:

- a powerful http lib

- serialization

But Rust approach, no Runtime, no GC, no Reflection, is making it very hard to provide those libraries.

Within these constraints, some high quality solutions emerged, Tokio, Serde. But they pioneered some novel approaches which would have been hard to try in the std lib.

The whole async ecosystem still has a beta vibe, giving the feeling of programming in a different language. Procedural macros are often synonymous with slow compile times and code bloat.

But what we gained, is less runtime errors, more efficiency, a more robust language.

TLDR: trade-offs everywhere, it is unfair to compare to Go/C# as they are languages with a different set of constraints.

whstl

I would say compared to other languages Rust feels even more lacking.

All those AFAIR need 3rd party packages:

Regex, DateTime, base64, argument parsing, url parsing, hashing, random number generation, UUIDs, JSON

I'm not saying it's mandatory, but I would expect all those to be in the standard library before there is any http functionality.

nicce

> Procedural macros are often synonymous with slow compile times and code bloat.

In theory they should reduce it because you wouldn’t make proc macros to generate code you don’t need…right? How much coding time you save with macros compared to manually implementing them?

neonsunset

To be fair I think Rust has very healthy selection of options for both, with Serde and Reqwest/Hyper being de-facto standard.

Rust has other challenges it needs to overcome but this isn't one.

I'd put Go behind both C#/F# and Rust in this area. It has spartan tooling in odd areas it's expected to be strong at like gRPC and the serialization story in Go is quite a bit more painful and bare bones compared to what you get out of System.Text.Json and Serde.

The difference is especially stark with Regex where Go ships with a slow engine (because it does not allow writing sufficiently fast code in this area at this moment) where-as both Rust and C# have top of the line implementations in each which beat every other engine save for Intel Hyperscan[0].

[0]: https://github.com/BurntSushi/rebar?tab=readme-ov-file#summa... (note this is without .NET 9 or 10 preview updates)

PoignardAzur

> At each level a caller might need 5% of the functionality of any given dependency. The deeper the dependency tree gets the more waste piles on. Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call, but all you did was take that one dependency to format a number.

I'm not convinced that happens that often.

As someone working on a Rust library with a fairly heavy dependency tree (Xilem), I've tried a few times to see if we could trim it by tweaking feature flags, and most of the times it turned out that they were downstream of things we needed: Vulkan support, PNG decoding, unicode shaping, etc.

When I did manage to find a superfluous dependency, it was often something small and inconsequential like once_cell. The one exception was serde_json, which we could remove after a small refactor (though we expect most of our users to depend on serde anyway).

We're looking to remove or at least decouple larger dependencies like winit and wgpu, but that requires some major architectural changes, it's not just "remove this runtime option and win 500MB".

nullc

I was very 'impressed' to see multiple SSL libraries pulled into rust software that never makes a network connection.

andybak

This is where a) a strong stdlib and b) community consensus on common packages tends to help at least mitigate the problem.

My feeling is that Python scores fairly well in this regard. At least it used to. I haven't been following closely in recent years.

detaro

Did you dig any deeper over which paths that was pulled in?

sseagull

Not in Rust, but I've seen it with Python in scientific computing. Someone needs to do some minor matrix math, so they install numpy. Numpy isn't so bad, but if installing it via conda it pulls in MKL, which sits at 171MB right now (although I have memories of it being bigger in the past). It also pulls in intel-openmp, which is 17MB.

Just so you can multiply matrices or something.

fn-mote

> Someone needs to do some minor matrix math, so they install numpy

I’m just not convinced that it’s worth the pain to avoid installing these packages.

You want speedy matrix math. Why would you install some second rate package just because it has a lighter footprint on disk? I want my dependencies rock solid so I don’t have to screw with debugging them. They’re not my core business - if (when) they don’t “just work” it’s a massive time sink.

NumPy isn’t “left pad” so this argument doesn’t seem strong to me.

galangalalgol

Because rust is paying the price to compile everything fromch scratch on a release build, you can pay a little extra to turn on link time optimization and turn of parallelism on release builds and absolutely nothing gets compiled in that you don't use, and nothing gets repeated. Also enabling symbols to be stripped can take something with tokio, clap, serde, nalgebra (matrix stuff) and still be 2-5Mb binary. That is still huge to me because I'm old, but you can get it smaller if you want to recompile std along with your other dependencies.

cbarrick

MKL is usually what you want if you are doing matrix math on an Intel CPU.

A better design is to make it easy you to choose or hotswap your BLAS/LAPACK implementation. E.g. OpenBLAS for AMD.

Edit: To be clear, Netlib (the reference implementation) is almost always NOT what you want. It's designed to be readable, not optimized for modern CPUs.

bilbo-b-baggins

Symbol culling and dead code removal is already a thing in modern compilers and linkers, and rust can do it too: https://github.com/johnthagen/min-sized-rust

kion

Others have made similar comments, but tree-shaking, symbol culling and anything else that removes dead code after its already been distributed and/or compiled is too late IMO. It's a band-aid on the problem. A useful and pragmatic band-aid today for sure, but it fundamentally bothers me that we have to spend time compiling code and then spend more time to analyze and rip it back out.

Part of the issue I have with the dependency bloat is how much effort we currently go through to download, distribute, compile, lint, typecheck, whatever 1000s of lines of code we don't want or need. I want software that allows me to build exactly as much as I need and never have to touch the things I don't want.

zahlman

> Others have made similar comments, but tree-shaking, symbol culling and anything else that removes dead code after its already been distributed and/or compiled is too late IMO.

Why, in principle, wouldn't the same algorithms work before distribution?

For that matter, check out the `auditwheel` tool in the Python ecosystem.

nosianu

As others have pointed out elsewhere, that only removes static dependencies. If you have code paths that are used depending on dynamic function arguments static analysis is unable to catch those.

For example, you have a function calling XML or PDF or JSON output functions depending on some output format parameter. That's three very different paths and includes, but if you don't know which values that parameter can take during runtime you will have to include all three paths, even if in reality only XML (for example) is ever used.

Or there may be higher level causes outside of any analysis, even if you managed a dynamic one. In a GUI, for example, it could be functionality only ever seen by a few with certain roles, but if there is only one app everything will have to be bundled. Similar scenarios are possible with all kinds of software, for example an analysis application that supports various input and output scenarios. It's a variation of the first example where the parameter is internal, but now it is external data not available for an analysis because it will be known only when the software is actually used.

quotemstr

The situation isn't quite as dire as you portray. Compilers these days can also do devirtualization. The consequent static calls can become input to tree shaking in the whole program case. While it's true that we can't solve the problem in general, there's hope for specific cases.

sitkack

Way back when, I used to vendor all the libraries for a project (Java/Cpp/Python) into a mono repo and integrate building everything into the projects build files so anyone could rebuild the entire app stack with whatever compiler flags they wanted.

It worked great, but it took diligence, it also forces you to interact with your deps in ways that adding a line to a deps file does not.

saagarjha

One nice thing about cargo is that it builds all your code together, which means you can pass a unified set of flags to everything. The feature of building everything all the time as a whole has a bunch of downsides, many which are mentioned elsewhere, but the specific problem of not being able to build dependencies the way you want isn't one.

cmrdporcupine

This is the default way of doing things in the monorepo(s) at Google.

It feels like torture until you see the benefits, and the opposite ... the tangled mess of multiple versions and giant transitive dependency chains... agony.

I would prefer to work in shops that manage their dependencies this way. It's hard to find.

I_AM_A_SMURF

I've never seen a place that does it quite like Google. Is there one? It only works if you have one product or are a giant company as it's really expensive to do.

Being able to change a dependency very deep and recompile the entire thing is just magic though. I don't know if I can ever go back from that.

bonzini

It's the same that we're doing for external crates in QEMU's experiments with Rust. Each new dependency is added to the build by hand.

ArthurStacks

[flagged]

silon42

I would absolutely do this for any non-toy project.

Alternatively, for some project it might be enough to only depend on stuff provided by Debian stable or some other LTS distro.

ardit33

Maven was the one the started the downfall into dependency hell. (Ant as well, but it was harder to blindly include things into it)

Kids today don't know how to do that anymore...

deepsun

Yet Maven repository is still not that bloated even after 20+ years Java et al. being one of the most popular language.

Compared to Rust where my experience with protobuf lib some time ago was that there is a choice of not 1 but even 3 different libraries, one of which doesn't support services, another didn't support the syntax we had to support, and the third one was unmaintained. So out of 3 choices no single one worked.

Compared that to Maven, where you have only one official supported choice that works well and well maintained.

boguscoder

This works very well until different parts of the deps tree start pulling same Foo with slightly different flags/settings. Often for wrong reasons but sometimes for right ones, and then its new kind of “fun”. Sometimes buildsystem is there to help you but sometimes you are on your own. Native languages like C++ bring special kind of joy called ODR violations to the mix…

tester756

>At each level a caller might need 5% of the functionality of any given dependency. The deeper the dependency tree gets the more waste piles on. Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call, but all you did was take that one dependency to format a number.

So, what's is the compiler doing that he doesnt remove unused code?

ak_111

"dependency" here I guess means something higher-level that your compiler can't make the assumption you will never use.

For example you know you will never use one of the main functions in the parsing library with one of the arguments set to "XML", because you know for sure you don't use XML in your domain (for example you have a solid project constraint that says XML is out of scope).

Unfortunately the code dealing with XML in the library is 95% of the code, and you can't tell your compiler I won't need this, I promise never to call that function with argument set to XML.

espoal

Why the compiler can't detect it will not be used? Tree shaking is well implemented in Javascript compilers, an ecosystem which extensively suffer from this problem. It should be possible to build a dependency graph and analyze which functions might actually end up in the scope. After all the same is already done for closures.

sph

A surprising amount of code might be executed in rarely-used or undocumented code paths (for example, if the DEBUG environment variable is 1 or because a plugin is enabled even if not actually used) and thus not shaken out by the compiler.

tester756

What makes you think that a lot of code is hidden behind dbg env variable instead of e.g dbg build?

amiga386

> Its super easy to say `import foolib`, then call `foolib.do_thing()` and just start running.

It's effectively an end-run around the linker.

It used to be that you'd create a library by having each function in its own compilation unit, you'd create a ".o" file, then you'd bunch them together in a ".a" archive. When someone else is compiling their code, and they need the do_thing() function, the linker sees it's unfulfiled, and plucks it out of the foolib.a archive. For namespacing you'd probably call the functions foolib_do_thing(), etc.

However, object-orientism with a "god object" is a disease. We go in through a top-level object like "foolib" that holds pointers to all its member functions like do_thing(), do_this(), do_that(), then the only reference the other person's code has is to "foolib"... and then "foolib" brings in everything else in the library.

It's not possible for the linker to know if, for example, foolib needed the reference to do_that() just to initialise its members, and then nobody else ever needed it, so it could be eliminated, or if either foolib or the user's code will somehow need it.

> Go and Rust, for example, encourage everything for a single package/mod to go in the same file.

I can say that, at least for Go, it has excellent dead code elimination. If you don't call it, it's removed. If you even have a const feature_flag = false and have an if feature_flag { foobar() } in the code, it will eliminate foobar().

immibis

foolib is the name of the library, not an object.

It also happens to be an object, but that's just because python is a dynamic language and libraries are objects. The C++ equivalent is foolib::do_thing(); where foolib is not an object.

xlii

> Go and Rust, for example, encourage everything for a single package/mod to go in the same file.

Clarification: Go allows for a very simple multi-file. It’s one feature I really like, because it allows splitting otherwise coherent module into logical parts.

dcow

Further: I’ve never seen rust encourage anything of the sort. Module directory with a mod.rs and any number of files works just fine.

kion

I probably mischaracterized this as its been a while since I did more than trivial Rust. AFAIK its not possible to depend on only a part of a module in Rust though right? (At least without an external build system)

For example, you can't split up a module into foo.rs containing `Foo` and bar.rs containing `Bar`, both in module 'mymod' in such a way that you can `use mymod::Bar and foo.rs is never built/linked.

My point is the granularity of the package/mod encourages course-grained deps, which I argue is a problem.

tialaramex

Yeah, likewise Rust is completely fine after you say `mod foo` and have a file named foo.rs, if you also make a foo/ directory and put foo/whatever.rs and foo/something_else.rs that those are all part of the foo module.

Historically Rust wanted that foo.rs to be renamed foo/mod.rs but that's no longer idiomatic although of course it still works if you do that.

dathinab

to extend on this:

in rust crates are semantically one compilation unit (where in C oversimplified it's a .h/.c pair, and practically rustc will try to split it in some more units to speed up build time).

the reason I'm pointing this out is because many sources of "splitting a module across files" come from situations where 1 file is one compilation unit so you needed to have a way to split it (for organization) without splitting it (for compilation) in some sitation

mseepgood

Not just multiple files, but multiple directories. One versioned dependency (module) usually consists of dozens of directories (packages) and dozens to hundreds of files. Only newcomers from other languages create too many go.mod files when they shouldn't.

Aurornis

> Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call,

It’s getting hard to take these conversations seriously with all of the hyperbole about things that don’t happen. Nobody is producing Rust binaries that hit 500MB or even 50MB from adding a couple simple dependencies.

You’re also not ending up with mountains of code that never gets called in Rust.

Even if my Rust binaries end up being 10MB instead of 1MB, it doesn’t really matter these days. It’s either going on a server platform where that amount of data is trivial or it’s going into an embedded device where the few extra megabytes aren’t really a big deal relative to all the other content that ends up on devices these days.

For truly space constrained systems there’s no-std and entire, albeit small, separate universe of packages that operate in that space.

For all the doom-saying, in Rust I haven’t encountered this excessive bloat problem some people fret about, even in projects with liberal use of dependencies.

Every time I read these threads I feel like the conversations get hijacked by the people at the intersection of “not invented here” and nostalgia for the good old days. Comments like this that yearn for the days of buying paid libraries and then picking them apart anyway really reinforce that idea. There’s also a lot of the usual disdain for async and even Rust itself throughout this comment section. Meanwhile it feels like there’s an entire other world of Rust developers who have just moved on and get work done, not caring for endless discussions about function coloring or rewriting libraries themselves to shave a few hundred kB off of their binaries.

galangalalgol

I agree on the bloat, considering my rust projects typically don't use any shared libraries other than a libc a few Mb for a binary including hundreds of crates in dependencies (most pf which are part of rustc or cargo itself), doesn't seem so bad. I do get the asynch thing. It just isn't the right tool for most of my needs. Unless you are in the situation where you need to wait faster (for connections usually) threads are better for trying to compute faster than asynch is.

null

[deleted]

socalgal2

I'm curious if rust has this problem. The problem I notice in npm land is many developers have no taste. Example, there's a library for globbing call glob. You'd think it would just be a function that does globbing but no, the author decided it should ALSO be a standalone commandline executable and so includes a large commandline option parser. They could have easily made a separate commandline tool that include a library that does the glob but no, this is a common and shit pattern in npm. I'd say easily 25% or more of all "your dependencies are out of date" messages are related to the argument parcing for the commandline tool in these libraries. That's just one example.

Also there's arguably design. Should a 'glob' library actually read the file system and give you filenames or should it just tell you if a string matches a glob and leave the reset to you? I think it's better design to do the later, the simplest thing. This means less dependencies and more flexibility. I don't have to hack it or add option to use my own file system (like for testing). I can use it with a change monitoring system, etc...

And, I'm sure there are tons of devs that like the glob is a "Do everything for me" library instead of a "do one specific thing" library which makes it worse because you get more "internet points" the more your library doesn't require the person using it to be a good dev.

I can't imagine it's any different in rust land, except maybe for the executable thing. There's just too many devs and all of them, including myself, don't always make the best choices.

kazinator

> Should a 'glob' library actually read the file system and give you filenames

The POSIX glob function after which these things are named traverses the filesystem and matches directory entries.

The pure matching function which matches a glob pattern against a filename-like string is fnmatch.

But yes, the equivalent of fnmatch should be a separate module and that could be a dependency of glob.

Nobody should be trying to implement glob from scratch using a fnmatch-like function and directory traversal. It is not so trivial.

glob performs a traversal that is guided by the pattern. It has to break the pattern into path components. It knows that "*/*/*" has three components and so the traversal will only go three levels deep. Also "dir/*" has a component which is a fixed match, and so it just has to open "dir" without scanning the current directory; if that fails, glob has failed.

If the double star ** is supported which matches multiple components, that's also best if it likewise integrated into glob.

If brace expansion is supported, that adds another difficulty because different branches of a brace can have different numbers of components, like {*/x,*/*/x,*/*/*/x}. To implement glob, it would greatly help us to have brace expansion as a separate function which expands the braces, producing multiple glob patterns, which we can then break into path components and traverse.

hinkley

They eventually fixed it but grunt once upon a time used a glob implementation that could not short circuit on wildcards in ignore patterns. So I caught it scanning the node-modules directory and then dropping every file it found because it matched on “node_modules/**”. Builds got a lot faster when I pushed that update out.

There’s a lot of stupid ways to implement glob and only a couple of smart ones.

Olreich

> But yes, the equivalent of fnmatch should be a separate module and that could be a dependency of glob.

Interesting, lets look at fnmatch: https://pubs.opengroup.org/onlinepubs/9699919799/functions/f...

Well, fnmatch really does two things, it parses the pattern and then applies that to a string, so really, there should be a "ptnparse" library that handles the pattern matching that fnmatch has a dependency.

Though, thinking it through, the "ptnparse" library is responsible for patterns matching single characters and multiple characters. We should split that up into "singleptn" and "multiptn" libraries that ptnparse can take as dependencies.

Oh, and those flags that fnmatch takes makes fnmatch work in several different ways, let's decompose those into three libraries so that we only have to pull in the matcher we care about: pthmatch, nscmatch, and prdmatch. Then we can compose those libraries based on what we want in fnmatch.

This is perfect, now if we don't care about part of the fnmatch functionality, we don't have to include it!

/s

This decomposition is how we wind up with the notorious leftpad situation. Knowing when to stop decomposing is important. fnmatch is a single function that does less than most syscalls. We can probably bundle that with a few more string functions without actually costing us a ton. Glob matching at a string level probably belongs with all the other string manipulation functions in the average "strings" library.

Importantly, my suggestion that fnmatch belongs in a "strings" library does align with your suggestion that fnmatch shouldn't be locked into a "glob" library that also includes the filesystem traversal components.

jll29

> I can't imagine it's any different in [R]ust land

Taste is important; programmers with good architectural taste tend to use languages that support them in their endeavour (like Rust or Zig) or at least get out of the way (C).

So I would argue the problems you list are statistically less often the case than in certain other languages (from COBOL to JavaScript).

> There's just too many devs and all of them, including myself, don't always make the best choices.

This point you raise is important: I think an uncoordinated crowd of developers will create a "pile of crates" ("bazaar" approach, in Eric Raymond's terminology), and a single language designer with experience will create a more uniform class library ("cathedral" approach).

Personally, I wish Rust had more of a "batteries included" standard library with systematically named and namespaced official crates (e.g. including all major data structures) - why not "stdlib::data_structures::automata::weighted_finite_state_transducer" instead of a confusing set of choices named "rustfst-ffi", "wfst", ... ?

Ideally, such a standard library should come with the language at release. But the good news is it could still be devised later, because the Rust language designers were smart enough to build versioning with full backwards compatibility (but not technical debt) into the language itself. My wish for Rust 2030 would be such a stdlib (it could even be implemented using the bazaar of present-day crates, as long as that is hidden from us).

kibwen

We don't need to speak in hypotheticals, we can just look at the glob crate: https://crates.io/crates/glob

213M downloads, depends on zero external crates, one source file (a third of which is devoted to unit tests), and developed by the rust-lang organization itself (along with a lot of crates, which is something that people tend to miss in this discussion).

hombre_fatal

Which glob crate? https://crates.io/search?q=glob

I went to page 8 and there were still glob libraries.

kibwen

The one that shows up first, which is to say, the one with 200 million downloads, which is to say, the one whose name is the exact match for the search query.

timeinput

That's much more a statement about the search function on crates.io than it is the number of glob crates. I think if you have the standard glob crate as a dependency you show up in that search.

buzzerbetrayed

Finding a single library that avoids the problem is pretty useless. You can find great libraries in Node as well but everyone would agree that Node has a dependency problem.

kibwen

And yet it's telling that, when the author mused about library quality and unknowingly suggested an arbitrary library as an example, the Rust version turned out to be high quality.

hu3

glob was just an example. They weren't asking about a specific crate.

Also this crate is from official rust lang repo, so much less prone to individualistic misbehaving. A bad example all around.

kibwen

> Also this crate is from official rust lang repo, so much less prone to individualistic misbehaving.

To reiterate, lots of things that people in this thread are asking the language to provide are in fact provided by the rust-lang organization: regex, serde, etc. The goalposts are retreating over the horizon.

Rust's primary sin here is that it makes dependency usage transparent to the end-user. Nobody wants to think about how many libraries they depend upon and how many faceless people it takes to maintain those libraries, so they're uncomfortable when Rust shows you. This isn't a Rust problem, it's a software complexity problem.

timeinput

I think the parent was suggesting comparing and contrasting the glob dependency in rust, and npm. The one off isn't useful, but picking ten random, but heavily used packages probably is. The parent didn't really mention what the node version looked like though.

The npm glob package has 6 dependencies (those dependencies have 3+ dependencies, those sub dependencies have 6+ dependencies, ...)

As you point out the rust crate is from the official repo, so while it's not part of the standard library, it is maintained by the language maintenance organization.

Maybe that could make it a bad example, but the npm one is maintained by the inventor of npm, and describes him self as "I wrote npm and a pretty considerable portion of other node related JavaScript that you might use.", so I would say that makes it a great example because the people who I would expect care the most about the language are the package maintainers of these packages, and are (hopefully) implementing what they think are the best practices for the languages, and the eco-systems.

null

[deleted]

hedora

Historically, the borrow checker has been a good shield against developers that have no taste.

Not sure how long that’ll last.

lionkor

Not really, there are plenty of large libraries today that were designed by complete boneheads. Granted, you only notice if you know that domain very well.

devnullbrain

Dynamically typed languages do the opposite.

bsaul

macro seems to compensate for that. there's definitely a "C++ template programming" vibe in some libraries.

throwitaway1123

It's worth pointing out that Node has a built in globbing function: https://nodejs.org/docs/latest-v24.x/api/fs.html#fspromisesg...

> Also there's arguably design. Should a 'glob' library actually read the file system and give you filenames or should it just tell you if a string matches a glob and leave the reset to you?

There's a function in Node's stdlib that does this as well (albeit it's marked as experimental): https://nodejs.org/docs/latest-v24.x/api/path.html#pathmatch...

blub

What you’re describing regarding glob is not lack of taste, it’s an architectural “bug”.

Taste is what Steve Jobs was referring to when he said Microsoft had none. In software it’s defined by a humane, pleasant design that almost(?) anybody can appreciate.

Programming languages cannot be tasteful, because they require time and effort to learn and understand. Python has some degree elegance and Golang’s simplicity has a certain je ne sais quoi… but they’re not really fitting the definition.

Still, some technologies such as git, Linux or Rust stand out as particularly obscure even for the average developer, not just average human.

IshKebab

Yeah that's one huge advantage Rust has over NPM - Rust developers are a lot more skilled and crates are generally much higher quality.

bsaul

Random remark: I've noticed the quality of rust libraries too. Which made me really surprised to see the overengineered mess that is the async-openai crate.

how can one take an api as simple as openai's one, and turn it to this steaming pile of manure ? in the end, i used reqwest and created my queries manually. I guess that's what everyone does...

wizzwizz4

The kinds of people who think OpenAI's tech is worth touching with a bargepole are generally not the kinds of people who develop and maintain high-quality Rust libraries.

hombre_fatal

Maybe that were true back when Rust wasn't mainstream on social media nor across tech influencer videos, but it's not true anymore.

https://crates.io/search?q=is-even

detaro

Oh no, people in the Rust community make jokes, how unprofessional!!111

You'll notice these packages are not actually used by anything.

IshKebab

That's a joke. Leftpad wasn't.

jerf

A true enough statement, but "Rust" is unnecessarily specific. Dependencies are getting scary in general. Supply chain attacks are no longer hypothetical, they're here and have been for a while.

If I were designing a new language I think I'd be very interested in putting some sort of capability system in so I can confine entire library trees safely, and libraries can volunteer somehow what capabilities they need/offer. I think it would need to be a new language if for no other reason than ecosystems will need to be written with the concept in them from the beginning.

For instance, consider an "image loading library". In most modern languages such libraries almost invariably support loading images from a file, directly, for convenience if nothing else. In a language that supported this concept of capabilities it would be necessary to support loading them from a stream, so either the image library would need you to supply it a stream unconditionally, or if the capability support is more rich, you could say "I don't want you to be able to load files" in your manifest or something and the compiler would block the "LoadFromFile(filename)" function at compile time. Multiply that out over an entire ecosystem and I think this would be hard to retrofit. It's hugely backwards incompatible if it is done correctly, it would be a de facto fork of the entire ecosystem.

I honestly don't see any other solution to this in the long term, except to create a world where the vast majority of libraries become untargetable in supply chain attacks because they can't open sockets or read files and are thus useless to attackers, and we can reduce our attack surface to just the libraries that truly need the deep access. And I think if a language came out with this design, you'd be surprised at how few things need the dangerous permissions.

Even a culture of minimizing dependencies is just delaying the inevitable. We've been seeing Go packages getting supply-chain-attacked and it getting into people's real code bases, and that community is about as hostile to large dependency trees as any can be and still function. It's not good enough.

tialaramex

You want a special purpose language.

In your particular example of image loading, you want WUFFS. https://github.com/google/wuffs

In WUFFS most programs are impossible. Their "Hello, world" doesn't print hello world because it literally can't do that. It doesn't even have a string type, and it has no idea how to do I/O so that's both elements of the task ruled out. It can however, Wrangle Untrusted File Formats Safely which is its sole purpose.

I believe there should be more special purpose languages like this, as opposed to the General Purpose languages most of us learn. If your work needs six, sixteen or sixty WUFFS libraries to load different image formats, that's all fine because categorically they don't do anything outside their box. Yet, they're extremely fast because since they can't do anything bad by definition they don't need those routine "Better not do anything bad" checks you'd write in a language like C or the compiler would add in a language like Rust, and because they vectorize very nicely.

assassinator42

Java and the .NET Framework had partial trust/capabilities mechanisms decades ago. No one really used them and they were deprecated/removed.

lenkite

It was not bad, but without memory/cpu isolates, it was pretty useless. The JSR for isolation got abandoned when Sun went belly up.

pjmlp

It was more like no one used them correctly.

eikenberry

Wouldn't that mean they were poorly implemented. If no one uses something correctly, seems like that isn't a problem with the people but the thing.

j1elo

Maybe we need a stronger culture of Sans-IO dependencies in general. To the point of pointing out and criticising like it happens with bad practices and dark patterns. A new lib (which shouldn't be used its own file access code) is announced in HN, and the first comment: "why do you do your own IO?"

Edit - note it's just tongue in cheek. Obviously libraries being developed against the public approval wouldn't be much of a good metric. Although I do agree that a bit more common culture of the Sans-IO principles would be a good thing.

voxgen

I don't think retrofitting existing languages/ecosystems is necessarily a lost cause. Static enforcement requires rewrites, but runtime enforcement gets you most of the benefit at a much lower cost.

As long as all library code is compiled/run from source, a compiler/runtime can replace system calls with wrappers that check caller-specific permissions, and it can refuse to compile or insert runtime panics if the language's escape hatches would be used. It can be as safe as the language is safe, so long as you're ok with panics when the rules are broken.

It'd take some work to document and distribute capability profiles for libraries that don't care to support it, but a similar effort was proven possible with TypeScript.

zavec

I actually started working on a tool like that for fun, at each syscall it would walk back up the stack and check which shared object a function was from and compare that to a policy until it found something explicitly allowed or denied. I don't think it would necessarily be bulletproof enough to trust fully but it was fun to write.

mleonhard

I love this idea and I hope I get to work on it someday. I've wanted this ever since I was a starry-eyed teenager on IRC listening to Darius Bacon explain his capability-based OS idea, aptly called "Vapor".

I think it could be possible in Rust with a linter, something like https://github.com/geiger-rs/cargo-geiger . The Rust compiler has some unsoundness issues such as https://github.com/rust-lang/rust/issues/84366 . Those would need fixing or linter coverage.

wofo

I've thought about this (albeit not for that long) and it seems like you'd need a non-trivial revamp of how we communicate with the operating system. For instance, allowing a library to "read from a stream" sounds safe until you realize they might be using the same syscalls as reading from a file!

simonask

I love this idea. There is some reminiscence of this in Rust, but it's opt in and based on convention, and only for `unsafe` code. Specifically, there's a trend of libraries using `#![deny(unsafe_code)]` (which will cause a compilation error if there is any `unsafe` code in the current crate), and then advertising this to their users. But there's no enforcement, and the library can still add `#[allow(unsafe_Code)]` to specific functions.

Perhaps a capability system could work like the current "feature" flags, but for the standard library, which would mean they could be computed transitively.

QuartzLibrary

FYI: `#[forbid(_)]` cannot be bypassed by the affected code (without a never-to-be-stabilised nightly feature meant to be used only in `std` macros).

https://doc.rust-lang.org/rustc/lints/levels.html

simonask

Ah right, forgot about forbid!

notnullorvoid

I don't think you need to get very complex to design a language that protects libraries from having implicit system access. If the only place that can import system APIs is in the entry program, then by design libraries need to use dependency injection to facilitate explicit passing of capabilities.

One can take just about any existing language and add this constraint, the problem however is it would break the existing ecosystem of libraries.

loglog

If you want this today, Haskell might be the only choice.

tome

Yes, there is a sense in which Haskell's "effect systems" are "capability systems". My effect system, Bluefin, models capabilities as values that you explicitly pass around. You can't do I/O unless you have the "IOE" capability, for example.

https://hackage.haskell.org/package/bluefin

zaptheimpaler

This is just a modern problem in all software development, regardless of language. We are doing more complex things, we have a much bigger library of existing code to draw from and there are many reasons to use it. Ultimately a dependency is untrusted code, and there's a long road to go in hardening entire systems to make running arbitrary dependencies safe (if its even possible).

In the absence of a technical solution, all others basically involve someone else having to audit and constantly maintain all that code and social/legal systems of trust. If it was pulled into Rust stdlib, that team would be stuck handling it, and making changes to any of that code becomes more difficult.

kdps

I'd argue that the severity varies between languages, despite the core problem being universal. Languages with comprehensive standard libraries have an advantage over those with minimal built-in functionality, where people rely on external dependencies even for the most basic things (e.g. see Java/.NET vs JS/Node). Lightweight is not always better.

RossBencina

> Languages with comprehensive standard libraries have an advantage

I don't see the advantage. Just a different axis of disadvantage. Take python for example. It has a crazy big standard library full of stuff I will never use. Some people want C++ to go in that direction too -- even though developers are fully capable of rolling their own. Similar problem with kitchen-sink libraries like Qt. "batteries included" languages lead to higher maintenance burden for the core team, and hence various costs that all users pay: dollars, slow evolution, design overhead, use of lowest common denominator non-specialised implementations, loss of core mission focus, etc.

zaptheimpaler

It's a tradeoff. Those languages also have a very difficult time evolving anything in that standard library because the entire ecosystem relies on it and expects non-breaking changes. I think Rust gets sort of best of both worlds because dependencies are so easy to install it's almost as good as native, but there's a diversity of options and design choices, easy evolution and winners naturally emerge - these become as high quality as a stdlib component because they attract people/money to work on them but with more flexibility to change or be replaced

zeroxfe

> If it was pulled into Rust stdlib, that team would be stuck handling it, and making changes to any of that code becomes more difficult.

I think Rust really needs to do more of this. I work with both Go and Rust daily at work, Go has its library game down -- the standard library is fantastic. With Rust it's really painful to find the right library and keep up for a lot of simple things (web, tls, x509, base64 encoding, heck even generating random numbers.)

f_devd

I disagree, as I see it Rust's core-lib should be to interact with abstract features (intrinsics, registers, memory, borrow-checker, etc), and std-lib should be to interact with OS features (net, io, threads). Anything else is what Rust excels at implementing, and putting them into stdlib would restrict the adoption of different implementations.

For example there are currently 3, QUIC (HTTP/3) implementations for rust: Quiche (Cloudflare), Quinn, S2N-QUIC (AWS). They are all spec compliant, but may use different SSL & I/O backends and support different options. 2 of them support C/C++ bindings. 2 are async, 1 is sync.

Having QUIC integrated into the stdlib wouuld means that all these choices would be made beforehand and be stuck in place permanently, and likely no bindings for other languages would be possible.

smj-edison

Gilad Bracha has a really interesting approach to sandboxing third party libraries: Remove imports, and do everything with dependency injection. That way if you never inject say the IO subsystem, the third party code won't be able to break out. And there's no overhead, since it's all based on capabilities.

Even cooler, if you want to only expose read operations, you can wrap the IO library in another library that only exposes certain commands (or custom filtering, etc).

EDIT: I should say this doesn't work with systems programming, since there's always unsafe or UB code.

zavec

That sounds neat, is that newspeak?

smj-edison

Yep! One of the many cool concepts packed in that language :)

jongjong

Yes, but a lot of the complexity is unnecessary bloat. Almost every project I've ever seen or worked on was full of unnecessary complexity. People naturally tend to over-complicate things, all the programming books, including software design books focus on unimportant aspects and miss all the important ones. It's incredibly frustrating.

Yet, if someone were to write a book which explained things properly (probably a 3000 word article would suffice to turn anyone into a 10x dev), nobody would buy it. This industry is cooked.

loglog

Do you mean this article?: https://grugbrain.dev/

jongjong

No. Me write better article.

xnickb

To quote one famous developer: "Talk is cheap. Show me code!"

udev4096

Maybe we should have a way to run every single library we use in an isolated environment and have a structure like QubesOS. Your main code is dom0 and you can create bunch of TemplateVMs which are your libraries and then create AppVMs for using those libraries. Use network namespaces for communicating between these processes. For sensitive workloads (finance, healthcare, etc), it makes sense to deploy something like that

harha_

Regardless of language, really? I highly doubt that, you don't generally see such problems with C or even C++ because dependencies are more cumbersome to add, especially in a way that's cross-platform.

tialaramex

With C++ it's hilarious because the C++ community is so allergic to proper dependency management and also so desperate for stuff from third party libraries that the committee spends large amounts of its time basically doing dependency management for the community by baking in large features you'd ordinarily take as a dependency into the mandatory standard library.

I'm sure I'll miss some, but IIRC C++ 26 is getting the entire BLAS, two distinct delayed reclamation systems and all of the accompanying infrastructure, new container types, and a very complicated universal system of units.

All of these things are cool, but it's doubtful whether any of them could make sense in a standard library, however for C++ programers that's the easiest way to use them...

It's bedlam in there and of course the same C++ programmers who claim to be "worried" that maybe somebody hid something awful in Rust's crates.io are magically unconcerned that copy-pasting tens of millions of lines of untested code from a third party into absolutely every C++ program to be written in the future could be a bad idea.

zahlman

> copy-pasting tens of millions of lines of untested code from a third party into absolutely every C++ program to be written in the future could be a bad idea.

Is it really that bad? (By my count, as a point of reference, the Python 3.13 standard library is just under 900k lines for the .py files.)

blub

If something is in the standard library, then it’s written and vetted by the standard library provider, not by a random third party like you make it sound.

With Rust, it’s literally a random third party.

zaptheimpaler

Because most dependencies are either manually installed by the user, or are dynamic libraries that are provided and audited by the distro maintainers. The dependencies are there, they're just harder to see - https://wiki.alopex.li/LetsBeRealAboutDependencies

harha_

Sure, there are various dependencies, but it's nothing like "cargo install crate-name". Cargo makes it so effortless to joink the dumbest dependency for the simplest thing.

pjmlp

Kind of true, when not using vcpkg/conan.

hedora

Don’t forget cmake. (It makes adding dependencies easy, and everything else basically impossible)

monkeyelite

> We are doing more complex things

In my experience we have more complex methodologies to the same things, but the goals are not more complex.

palata

Similar feeling here.

Cargo makes it so simple to add tons of dependencies that it is really hard not to do it. But that does not stop here: even if I try to be careful with adding dependencies, a couple dependencies are likely to pull tens of transitive dependencies each.

"Then don't depend on them", you say. Sure, but that means I won't write my project, because I won't write those things from scratch. I could probably audit the dependency (if it wasn't pulling 50 packages itself), but I can't reasonably write it myself.

It is different with C++: I can often find dependencies that don't pull tens of transitive dependencies in C++. Maybe because it's harder to add dependencies, maybe because the ecosystem is more mature, I don't know.

But it feels like the philosophy in Rust is to pull many small packages, so it doesn't seem like it will change. And that's a pity, because I like Rust-the-language better than C++-the-language. It just feels like I trade "it's not memory-safe" for "you have to pull tons of random code from the Internet".

X0Refraction

This was linked from the top comment on the Rust subreddit: https://wiki.alopex.li/LetsBeRealAboutDependencies

I think it makes a good point that some of the difference here is just perception due to dependencies in C/C++ being less immediately visible since they're dynamically loaded. To some degree that is a plus though as you likely trust the maintainers of your OS distribution to provide stable, supported libraries.

As other commenters have said, perhaps this is an area where the Rust maintainers could provide some kind of extended standard library where they don't guarantee backwards compatibility forever, but do provide guarantees about ongoing fixes for security issues.

zahlman

> This was linked from the top comment on the Rust subreddit: https://wiki.alopex.li/LetsBeRealAboutDependencies

It was also posted here, shortly before this thread: https://news.ycombinator.com/item?id=43934343

(And several times in the past, too.)

> I think it makes a good point that some of the difference here is just perception due to dependencies in C/C++ being less immediately visible since they're dynamically loaded.

The point wasn't so much about the loading mechanism, but about the fact that the system (especially on Linux) provides them for you; a good amount come pre-installed, and the rest go through a system package manager so you don't have to worry about the language failing to have a good package system.

palata

> some of the difference here is just perception due to dependencies in C/C++ being less immediately visible since they're dynamically loaded.

Not in my case. I manually compile all the dependencies (either because I need to cross-compile, or because I may need to patch them, etc). So I clearly see all the transitive dependencies I need in C++. And I need a lot less than in Rust, by a long shot.

elabajaba

Part of the rust dependency issue is that the compiler only multithreads at the crate level currently (slowly being improved on nightly, but there's still some bugs before they can roll out the parallel compiler), so most libraries split themselves up into a ton of small crates because otherwise they just take too long to compile.

edit: Also, `cargo-vet` is useful for distributed auditing of crates. There's also `cargo-crev`, but afaik it doesn't have buy in from the megacorps like cargo-vet and last I checked didn't have as many/as consistent reviews.

https://github.com/mozilla/cargo-vet

https://github.com/crev-dev/cargo-crev

jampekka

I take bit less unstable dependencies over the total mess of C++ dependencies with CMake, shared libraries, version conflicts etc any time. There's probably also a bit of an illusion about C++ transitive dependencies due to them usually being precompiled (because compiling them is such pain).

ChocolateGod

The whole pkgconfig, cmake, autotools etc ecosystem is insane compared to how Rust and Go do things.

It's part of the reason why software distribution on Linux has been pushed to using containers, removing the point of having shared libraries. I think Google with it's C++ replacement (Carbon) plans on doing it's own system.

skydhash

From my point of view, the issue stems from developers wanting to control distribution. Fine if it's for your own usage, not really if you're planning for others to use it. You will find the most convoluted build system just because they have a pet platform they want to specially support making it hell to do anything on others.

It could be better, but the current solutions (npm, go, python,...) favor only the developers, not the maintainers and packagers.

palata

> It's part of the reason why software distribution on Linux has been pushed to using containers

My understanding of people distributing their software in containers is that they can't be arsed to learn how to do it properly. They would install their software and ship the entire computer if that was cost effective.

perrygeo

> the philosophy in Rust is to pull many small package

I'm not sure it's a philosophy, more a pragmatic consideration for compilation speeds. Anyone who's done a non-trivial amount of Rust knows that moment when the project gets too big and needs to split into separate crates. It's kinda sad that you can't organize code according to proper abstractions, many times I feel forced to refactor for compiler performance.

antonvs

> Sure, but that means I won't write my project, because I won't write those things from scratch.

You need to think a bit harder about that, to help you decide whether your position is rational.

MeetingsBrowser

This confuses me as well. Is the implied solution to choose a language where you are forced to write those things from scratch?

palata

My point is that if, in the language, everybody is incentivise to use fewer dependencies, then a random library that I would not write myself (because it is an entire project in itself) would have fewer dependencies. Because it is not the case, either I take that library and accept its transitive dependencies, or I don't have a library at all.

In Rust, I'm sometimes actually tempted to wrap a C/C++ library (and its few dependencies) instead of getting the Rust alternative (and its gazillion dependencies).

palata

And you need to think a bit about that (probably not very hard), to help you decide whether I'm irrational or whether you may not have totally understood my point.

imtringued

I have been wasting 6 hours yesterday on getting the bullet examples to compile outside of bullet itself with no success. It's more likely that a lot of software simply doesn't get written because C++ and CMake are a pain in the ass.

palata

I find CMake pretty easy, and I only use a few core features from it. Usually the pain comes from completely wrong setups by people who didn't learn the basic. But it's true of everything, I think.

Orangeair

I think that https://blessed.rs does a pretty good job of providing recommendations for things that probably can't be crammed into the standard library, but which you'll almost certainly end up needing at one point or another. I honestly like that system a lot, it makes it so that the only packages you need to worry much about are usually doing something rather specific.

bjackman

Also shout out to cargo-vet.

It lets you track what packages you "trust". Then you can choose to transitively trust the packages trusted by entities you trust.

This lets you have a policy like "importing a new 3rd party package requires a signoff from our dependency tzar. But, packages that Google claim to have carefully reviewed are fine".

You can also export varying definitions of "trust". E.g. Google exports statements like:

- "this package has unsafe code, one of our unsafe experts audited it and thinks it looks OK"

- "this package doesn't do any crypto"

- "this is a crypto library, one of our crypto experts audited it and thinks it looks ok"

https://github.com/google/rust-crate-audits/blob/main/auditi...

Basically it's a slightly more formal and detailed version of blessed.rs where you can easily identify all the "it's not stdlib, but, it's kinda stdlib" stuff and make it easily available to your team without going full YOLO mode.

It can also give you a "semi-YOLO" approach, it supports rationales like "this package is owned by a tokio maintainer, those folks know what they're doing, it's probably fine". I think this is a nice balance for personal projects.

zahlman

Would love to see something like this for Python.

the__alchemist

Did a review; this is solid!

nemothekid

I feel like leftpad has given package managers a very bad name. I understand the OP's hesitation, but it feels a little ridiculous to me.

tokio is a work-stealing, asynchronous runtime. This is a feature that would be an entire language. Does OP consider it reasonable to audit the entire Go language? or the V8 engine for Node? v8 is ~10x more lines than tokio.

If Cloudflare uses Node, would you expect Cloudflare to audit v8 quarterly?

conradludgate

And for what it's worth, people do audit tokio. I have audited tokio. Many times in fact. Sure, not everyone will, but someone will :)

j1elo

How does one approach doing so? Do you open the main.rs file (or whichever is the entry point) and start reading code and referenced functions on a breadth-first search (BFS) manner?

timewizard

If two different dependencies use a different version of some other dependency between them does cargo still include both versions by default?

This is something I've only ever seen cargo do.

rcxdude

It'll do that if there isn't a single version that meets both requirements. Which is a great thing, because most other languages will just fail the build in that case (well, there are still cases where it won't even work in rust, if types from those sub-dependencies are passed in between the two closer dependencies)

metaltyphoon

> If two different dependencies use a different version of some other dependency between them does cargo still include both versions by default?

No, cargo will resolve using sem ver compatibility and pick the best version. Nuget, for C# does something very similar.

pas

> This is something I've only ever seen cargo do.

npm does this (which causes [caused?] the node_modules directory to have a megazillion of files usually, but sometimes "hoisting" common dependencies helps, and there's Yarn's PnP [which hooks into Node's require() and keeps packages as ZIPs], and pnpm uses symlinks/hardlinks)

neilv

In the past (not in Rust, but other languages), for important systems, I've instituted policies of minimizing dependencies from these language-specific package repositories, and for the ones you do use, having to copy it to our own repos and audit each update before use.

But that's not practical for all situations. For example, Web frontend developer culture might be the worst environment, to the point you often can't get many things done in feasible time, if you don't adopt the same reckless practices.

I'm also seeing it now with the cargo-culting of opaque self-hosted AI tools and models. For learning and experimenting, I'd spend more time sufficiently compartmentalizing an individual tool than with using it.

This weekend, I'm dusting off my Rust skills, for a small open source employability project (so I can't invest in expensive dependency management on this one). The main thing thing bothering me isn't allocation management, but the sinking feeling when I watch the cast-of-thousands explosion of transitive dependencies for the UI and async libraries that I want to use. It's only a matter of time before one of those is compromised, if not already, and one is all it takes.

pjmlp

Best way is to have CI/CD systems only connected to the official internal repos.

Devs can add whatever they feel like on their workstations but it will be a sad build server if they get pushed without permission.

dsr_

s/Best way/The only safe way/

Anything else will get abused in the name of expediency and just-this-one-time.

Also, the process for adding a crate/gem/module/library needs to be the same as anything else: license review, code review, subscription to the appropriate mailing list or other announce channel, and assignment of responsibility. All of these except code review can be really, really fast once you have the process going.

All problems are, at least in part, dependency chain management problems.

sunrunner

I agree that some amount of friction when including third party dependencies is a vital thing to push people to consider the value versus cost of dependencies (and license review, code review, channel subscriptions are all incredibily important and almost always overlooked), however how should this work for transitive dependendencies? And the dependencies of _those_ dependencies?

The dependency trees for most interpreted or source-distributed languages are ridiculous, and review of even a few of those seems practically impossible in a lot of development environments.

MeetingsBrowser

> Devs can add whatever they feel like on their workstations

A compromised dev machine is also a problem.

pjmlp

True, hence we can go next level and also deal with limited accounts for developers, and I can tell you most folks on HN would hate to work in such corporate environments.

wofo

There are some voices trying to address this security risk (e.g. the proponents of this new RFC: https://github.com/rust-lang/rfcs/pull/3810). However, for some reason (probably culture) there isn't much momentum yet to change the status quo.

dathinab

> isn't much momentum yet to change the status quo.

it's complex problem with tons of partial solutions which each have tons of ways to implement them with often their no being a clear winner

i.e. it's the kind of hard to solve by consensus problem

e.g. the idea of a extended standard library is old (around since the beginning of rust) but for years it was believed it's probably the best to make it a separate independent project/library for various reason. One being that the saying "the standard library is the place where code goes to die" has been quite true for multiple ecosystems (most noticeably python)

as a side note ESL wouldn't reduce the LOC count it would increase it as long as you fully measure LOCs and not "skip" over some dependencies

cogman10

The rust RFC process has, frankly, become somewhat of a CF.

There's literally 1000s of RFCs for rust with only a small handful that are integrated. Having this forest, IMO, makes it hard for any given proposal to really stand out. Further, it makes duplicate effort almost inevitable.

Rust's RFC process is effectively a dead letter box for most.

geodel

I think they can constitute committee for RFC review process(in case there is none today) and based on recommendation multiple domain specific teams/ groups can be created to review RFCs in timely manner.

eddd-ddde

The cool thing about rust is you can implement async yourself. You aren't tied to any specific implementation.

elcritch

Except that libraries using different async libraries in Rust seem generally incompatible.

pjmlp

Same in C++, partially true in .NET/C# and F#.

dboreham

Or not use async at all.

schmichael

We need a term like “Mature” or similar for dependencies that are done. Mature dependencies have two characteristics:

1. Well defined scope

2. Infrequent changes

Nomad has many of these (msgpack, envparse, cli, etc). These dependencies go years without changing so the dependency management burden rapidly approaches zero. This is an especially useful property for “leaf” dependencies with no dependencies of their own.

I wish libraries could advertise their intent to be Mature. I’d choose a Mature protobuf library over one that constantly tweaked its ergonomics and performance. Continual iterative improvement is often a boon, but sometimes it’s not worth the cost.

procaryote

Java did this sometimes by essentially adding slightly tidied up versions of whatever was the de-facto standard to the standard library. Java 1.3 didn't have regexes but most people were using the same apache commons thing, so java 1.4 added regexes that looked exactly like that. Java's date handling was a pain so people mostly used joda-date; a later java version added something that mostly works like jodadate. Etc.

It is an easy way to get a somewhat OK standard library as the things you add became popular on their own merits at some point.

Once added, the lowest friction path is to just use the standard library; and as it is the standard library you have a slightly better hope someone will care to maintain it. You can still build a better one if needed for your use-case, but the batteries are included for basic usage

delusional

I have a lot of sympathy for this viewpoint, but I also ask that we try to remind ourselves. We are asking for professionalism from hobby projects.

If you want a mature protobuf implementation you should probably buy one. Expecting some guy/gal on the internet to maintain one for your for free seems ill advised.

pclmulqdq

> I have a lot of sympathy for this viewpoint, but I also ask that we try to remind ourselves. We are asking for professionalism from hobby projects.

Nobody is asking for professional quality standards from hobby projects. At best, they are asking for hobby projects to advertise themselves as such, and not as "this is a library for [x] that you can use in your stuff with the expectations of [maintenance/performance/compatibility/etc.]."

Resume-driven development seems to cause people to oversell their hobby projects as software that is ready to have external users.

> If you want a mature protobuf implementation you should probably buy one

No software is ever developed this way. For some reason, libraries are always free. Approximately nobody will buy paid libraries.

delusional

> At best, they are asking for hobby projects to advertise themselves as such

That's also work. You don't get to ask the hobby programmer to do your work of vetting serious/maintained projects for you. As the professional with a job, you have to do that. If some rando on GitHub writes in their readme that it's maintained, but lies. You're the idiot for believing him. He's probably 12 years old, and you're supposedly a professional.

> No software is ever developed this way.

That's just inaccurate. In my day job we pay for at least 3-4 3rd party libraries that we either have support contracts on or that were developed for us along with a support contract. Besides those there's also the myriad of software products, databases, editors, Prometheus, grafana, that we pay for.

Software people really underestimate how much business guys are willing to pay for having somebody to call. It's not "infinitely scalable" in the way VC's love, but it's definitely a huge business opportunity.

jeffparsons

> For some reason, libraries are always free. Approximately nobody will buy paid libraries.

I suspect this is in no small part because figuring out a licensing (edit: pricing!) model that is both appealing to consumers and sustainable for authors is damn near impossible.

procaryote

Isn't that an argument _for_ having a "mature" label? To avoid the hobbyists who have no intention to maintain their thing?

Also there are lots of lovely projects maintained at high levels by hobbyists, and plenty of abandonware that was at some point paid for

delusional

> Also there are lots of lovely projects maintained at high levels by hobbyists, and plenty of abandonware that was at some point paid for

There certainly are. I would never say to disregard anything because it was a hobby project. You just don't get to expect it being that way.

My basic point is that a hobby project can never take responsibility. If you have a support contract you are allowed to have some expectation of support. If you do not, then no expectation is warranted and everything you get is a gift.

A "mature" label carries the same problem. You are expecting the author to label something for you. That's work. If you're pulling from the commons, you must respect that people can label stuff whatever they like, and unmotivated blanket lies are not illegal.

null

[deleted]

schmichael

A great point! All of the libraries I mentioned are created and maintained by corporations. Hobbyists, as always, are free to do as they please without judgement from me. :)

I will say I get great satisfaction from the little envparse library I wrote needing near-0 maintenance. It’s a rare treat to be able to consider any project truly done.

sethops1

I feel like the Go ecosystem almost serendipitously has this built in - modules marked v0.X.Y being immature and under development, and v1 or greater being mature, keeping changes mostly down to bug fixes. I think some folks may even follow this convention!

Charon77

One of the good thing in cargo packages are the feature flags. If a repo uses too much dependencies then it's time to open an issue or PR to hide them behind feature flags. I do that a lot with packages that requires std even though it could do with core and alloc.

cargo tree helps a lot on viewing dependency tree. I forgot if it does LoC count or not..

> to see what lines ACTUALLY get compiled into the final binary,

This doesn't really make much sense as a lot of the functions that make it to the binary get inlined so much that it often becomes part of 'main' function

ramon156

100%, I still miss feature flags in npm. Is there a package manager that can do this already? I'd love to expand our internal libs with framework-specific code

rs186

I once wanted to contribute to the popular swc project (https://github.com/swc-project/swc). I cloned the repo, ran build, and a whooping 20GB was gone from my disk. The parser itself (https://github.com/swc-project/swc/blob/main/crates/swc_ecma...) has over a dozen dependencies, including serde.

Meanwhile, the heaviest JavaScript parser implemented in JavaScript is more lightweight.

I decided that I should leave this project alone and spend my time elsewhere.

EasyMark

I just built it with "git clone --depth 1 ..." and the build from cargo build --release is 2.9GB (2.3GB in the target folder)?

pornel

Rust generates absurd amounts of debug info, so the default debug builds are much much larger.

Zero-cost abstractions don't have zero-cost debug info. In fact, all of the optimized-away stuff is intentionally preserved with full fidelity in the debug info.

eviks

You should also add the dev build

constantcrying

I am counting 13 dependencies, the rest are internal ones. Are any of these superfluous or only needed for small edge cases? Serde seems exactly a case where you absolutely should use an external dependency.

Also, repository size seems an extremely irrelevant metric.

rs186

13 > 12 so over a dozen dependencies. If you look at acorn or babel/parser, they barely have any dependency.

Repository size is directly related to how long it takes to run a build, which is extremely important if I were to contribute to the project.

> Serde seems exactly a case where you absolutely should use an external dependency.

I can't see any reason a parser has a hard dependency on a serialization library.

constantcrying

>13 > 12 so over a dozen dependencies. If you look at acorn or babel/parser, they barely have any dependency.

Which ones are superfluous?

There are good reasons to use dependencies. If someone has solved a problem you need to solve as well it is pointless to duplicate the effort.

>Repository size is directly related to how long it takes to run a build, which is extremely important if I were to contribute to the project.

Totally false. There is zero inherent relation.

>I can't see any reason a parser has a hard dependency on a serialization library.

And because you can't see a reason there is none?

It is totally meaningless to talk about any of this if you can not point out why this is superfluous.

MeetingsBrowser

I agree that relying on unknown dependencies is a risk, but this misses the point IMO. Number of dependencies and disk space are kind of arbitrary.

> Meanwhile, the heaviest JavaScript parser implemented in JavaScript is more lightweight.

The lightest weight javascript program relies on V8 to run, which has multiple orders of magnitude more dependencies. Most of which you have never heard of.

At least cargo makes it easier to get a clearer picture of what the dependencies are for a program.

pixl97

Number of dependencies isn't exactly arbitrary...

If you have one huge dep it's easier to keep track you're on the latest update, also it's much less likely you'll fat finger it and import something typosquatting.

Also if you're in enterprise you'll have less 100 page SBOM reports.

MeetingsBrowser

What is more likely to be vulnerable, a 100k LoC project developed by ten people, or ten 10k LoC single maintainer projects.

Keeping track of the latest version is trivial with cargo.

rs186

No, it has very little to do with v8 or any runtime. Those parsers run on any decent and recent enough runtime, including browsers and Node.js. If you look at the actual code, they use basic APIs in the JavaScript language that you can find in almost any other language.

munificent

> relies on V8 to run, which has multiple orders of magnitude more dependencies.

Actually, this isn't true. (Or at least wasn't a while back.) I used to work with a bunch of ex-V8 folks and they really despised third-party dependencies and didn't trust any code they didn't write. They used a few third-party libs but for them most part, they tried to own everything themselves.

dathinab

they are also Google

.. as in they can afford to rewrite everything

.. can afford to suffer from not invented here syndrome

.. and are under _massive_ threat of people doing supply chain attacks compared to most other projects (as they end up running on nearly any desktop computer and half the phones out there)

this just isn't viable for most projects, not just resource/time investment wise, but also reinventing/writing everything isn't exactly good to reduce bugs if you haven't to reliably access to both resources _and_ expertise. Most companies have to live with having many very average developers, and very tight resource limits.

XxiXx

I think it's a "cultural" thing. With Go you often find developers/projects proudly mentioning that any or just a few non-std dependencies are used. Coming from Go it really feels strange when you see pages of dependencies scrolling over your screen when you build a Rust project.

api

Go has a fatter standard library and a "fat" runtime with built-in green threads (an asynchronous runtime basically) and garbage collection, so you get more out of the box and thus end up using fewer dependencies.

sophacles

I have yet to come across a go project that doesn't pull in tons of 3rd party code as well. It seems like maybe you're over-stating the "culture" a bit.

hu3

> I have yet to come across a go project that doesn't pull in tons of 3rd party code as well.

These have Zero dependencies. It's not rare in Go land.

- https://github.com/go-chi/chi 19k stars

- https://github.com/julienschmidt/httprouter 16k stars

- https://github.com/gorilla/mux 21k stars

- https://github.com/spf13/pflag 2.6k stars

- https://github.com/google/uuid 5.6k starts

Many others have just a few dependencies.

meling

Yeah, while I’ve seen some great libraries that follow the practice of minimizing their dependencies, I’m a bit annoyed with the amount of dependencies that docker will bring along [1]. I’ve been on the lookout for alternatives for my docker needs, but the state of podman, buildah and some others that I checked is similar. They all bring in roughly the same number of dependencies… if anyone knows of a stripped down Go lib that can be used to build from a Dockerfile, pull, and run a container, I would be grateful for any suggestions. Heck docker / moby isn’t even using go.mod proper.

[1] https://github.com/moby/moby/blob/master/vendor.mod

lantastic

Wow, that's massive. I guess it's inevitable that a popular piece of open-source software for end-users will be compelled to accrue dependencies due to popular demand for features that require them.

I feel Telegraf made a good compromise: out of the box, it comes with a _ton_ of stuff[1] to monitor everything, but they make it possible to build only with pieces that you need via build tags, and even provide a tool to extract said tags from your telegraf config[2]. But lots of supply-chain security stuff assume everything in go.mod is used, so that can results in a lot of noise.

[1] https://github.com/influxdata/telegraf/blob/master/go.mod [2] https://github.com/influxdata/telegraf/tree/master/tools/cus...

null

[deleted]