Skip to content(if available)orjump to list(if available)

Fat Rand: How many lines do you need to generate a random number?

BoardsOfCanada

One reason to care about this is that random number generators are a very lucrative target to attack for security services. See for example ECDual https://eprint.iacr.org/2015/767.pdf or Linus Torvalds' answer about NSA approaching him about a backdoor: https://www.youtube.com/watch?v=wwRYyWn7BEo

isoprophlex

> So let's vendor it. How much code is there? After removing all tests, we end up with 29 individual crates vendored taking up 62MB disk space. Tokei reports 209,150 lines of code.

> Now this is a bit misleading, because like many times most of this is within windows-. But how much of windows- does getrandom need? A single function.

See also the Azure CLI. There's a github issue, still open, from 2018 complaining about their 350 MB monstrosity bloating up a docker container. By now the thing weighs more than 1.5 GB. Fucking MS.

https://github.com/Azure/azure-cli/issues/7387

buildbot

I assume, as an ex Microsoft person, that’s due to shipping the Org chart at a team level. Each sub package of the azure cli probably has it's own team that hates talking to anyone else, so implements the versions of just what they need in their subpackage. They probably have more than one python included at this point because some team needed python 3.12 and another is still on 3.6 and claims updating would take a year with a team of 10.

(Disclaimer, I didn’t work anywhere near this, and am just making up a guess).

isoprophlex

beautiful, because this is probably exactly what is happening here... all the subcommands bring their own jungle of crap along.

the_mitsuhiko

The reason why the windows-sys (and below the windows-targets) crate are so beefy is that they are basically a bunch of binary blobs that are needed to link stuff together due to how import libs work on Windows. https://kennykerr.ca/rust-getting-started/understanding-wind...

In theory that would not be necessary any more on more modern rustc versions, but if you want to target rustc < 1.70 you still need that.

adolph

Any system that doesn’t ruthlessly self edit winds up like a genomic katamari.

Systems are like babies: once you get one, you have it.[a.] They don’t go away. On the contrary, they display the most remarkable persistence. They not only persist; they grow. And as they grow, they encroach. The growth potential of Systems was explored in a tentative, preliminary way by Parkinson, who concluded that Administrative Systems maintain an average rate of growth of five to six percent per annum (corrected for inflation) regardless of the work to be done.

- from Systematics by John Gall

0. https://en.wikipedia.org/wiki/Junk_DNA (Well aware of how “junk” DNA is linked to functions elsewhere, and that the subtleties involved represent an evolutionary aggregation equivalent to why acli is 1.5G.)

1. https://en.wikipedia.org/wiki/Katamari_Damacy

0. https://en.wikipedia.org/wiki/Systemantics

vlovich123

I think that the rand crate is much bigger than it needs to be and is conflating unrelated concepts *. This is a sore spot for the stdlib - it should standardize 1 PRNG and a CSPRNG so that they’re available on all platforms as a default and the types that everyone can use so that you can properly plug in whatever PRNG / CSPRNG that you want. It should also standardize what interfaces random distributions should conform to and implement really common ones like Norm & Uniform. Those two changes alone would remove the need for the vast majority of dependencies, especially if a crate wants to delegate selection of the RNG to their users.

That being said, I simultaneously think the concerns are slightly overblown on the safety part. Having stable pillar crates that everyone builds around is a good thing not a bad thing. The build issues for things like that should be solved at the language/tooling level (e.g. pulling in a crate for 1 function should be trivially cheap) while relying on the network effects of auditing the components (i.e. it’s OK to rely on a crate with a stronger chain of trust than you have yourself).

* To be fair, they call out alternatives that you might find more appealing, but the type and module system being what it is (+ the name rand being so concise and appealing when you come at it with a first glance), it becomes the de facto standard.*

dwattttt

I think it's more a conscious decision to default to a CSPRNG unless you know better. If you know don't know the difference, a safer default means less problems.

EDIT: it's a similar situation to general purpose allocators. If you know you can use a simple one, it's orders of magnitude less code and complexity than a general one.

vlovich123

The Rust stdlib doesn’t come with a CSPRNG so the point is moot. On the other hand, if you’re rolling your own crypto you deserve what you get if you don’t know the difference between CSPRNG and PRNG & you can name them differently even (e.g. rand(), insecure_rand()).

glitchc

I'm not sure if this is possible. All systems do not have access to an entropy source of the same quality.

vlovich123

Then the CSPRNG API just wouldn’t be available (just the traits). There are plenty of platform-specific APIs available within the stdlib. The PRNG would always be available though.

portaltonowhere

I agree with his sentiments in the article. I love Rust as a PL, but the situation with certain crates and dependency trees is a bit of a nightmare IMO. It's certainly a trade off.

I recently ripped out the rand crate and replaced it with some much simpler code ported from a C++ codebase. Still does what I need it to do but way fewer LOC and way less complexity. Is it as flexible as what rand and related crates offer? Maybe not, but that flexibility comes at a cost.

Ygg2

I also disagree, first off rand is working on simplifying it. Plus out of those dependencies it's hard to see something I'd rather do myself than trust other people with.

Windows-sys is necessary for w Windows OS kernel, libc is similar thing for *Nix, cfg-if is necessary for specializing targets per OS, arch, or SIMD capabilities.

Biggest offender is honestly zerocopy-derive. Which pulls in most dependencies.

kstrauser

I love Armin’s blog. I don’t always agree with everything he says, but always come away with a new appreciation for his POV.

the_mitsuhiko

Thank you for that. I appreciate this.

dvh

    function rand() {
      return 4;
    }

cwmoore

    function rand() {
      return ++4++;
    }

malcolmgreaves

Does —release do tree shaking to remove unused code in the final executable?

the_mitsuhiko

Rust's compiler is very good at removing most unused code. You are however going to pay a lot for the compilation. In case of some of those common dependencies you are not infrequently ending up with them multiple times in your dependency tree since not all libraries move up to the latest version. For instance today I have three different versions of windows-sys in my dependency tree and three zerocopy versions.

malcolmgreaves

I see — thank you for the information!

Do you happen to also know if there is any ongoing work in rustc to make compilation faster for this situation?

I’m imagining that one could typecheck and then see if ASTs are used / unused and eliminate them before generating code. Maybe that would speed up compilation? Perhaps this is already being done.

the_mitsuhiko

Rust's compilation unit is an entire crate. There is not much that can be done here as far as I can tell without changing the compilation model. It's not like C++ where you can just compile individual object files. You really are hoping that the linker cleans it up.