Skip to content(if available)orjump to list(if available)

Fat Rand: How many lines do you need to generate a random number?

isoprophlex

> So let's vendor it. How much code is there? After removing all tests, we end up with 29 individual crates vendored taking up 62MB disk space. Tokei reports 209,150 lines of code.

> Now this is a bit misleading, because like many times most of this is within windows-. But how much of windows- does getrandom need? A single function.

See also the Azure CLI. There's a github issue, still open, from 2018 complaining about their 350 MB monstrosity bloating up a docker container. By now the thing weighs more than 1.5 GB. Fucking MS.

https://github.com/Azure/azure-cli/issues/7387

buildbot

I assume, as an ex Microsoft person, that’s due to shipping the Org chart at a team level. Each sub package of the azure cli probably has it's own team that hates talking to anyone else, so implements the versions of just what they need in their subpackage. They probably have more than one python included at this point because some team needed python 3.12 and another is still on 3.6 and claims updating would take a year with a team of 10.

(Disclaimer, I didn’t work anywhere near this, and am just making up a guess).

mmh0000

There's a great meme of the team structure at Microsoft:

https://www.reddit.com/r/ProgrammerHumor/comments/6jw33z/int...

_blk

Nice! Any idea how Tesla would look?

db48x

quietbritishjim

Maybe you could have just told us what the "unbreakable law" is, so we can all see in 2 seconds? Instead of posting a link to video that doesn't seem very keen to tell us even in the first couple of minutes?

isoprophlex

beautiful, because this is probably exactly what is happening here... all the subcommands bring their own jungle of crap along.

adolph

Any system that doesn’t ruthlessly self edit winds up like a genomic katamari.

Systems are like babies: once you get one, you have it.[a.] They don’t go away. On the contrary, they display the most remarkable persistence. They not only persist; they grow. And as they grow, they encroach. The growth potential of Systems was explored in a tentative, preliminary way by Parkinson, who concluded that Administrative Systems maintain an average rate of growth of five to six percent per annum (corrected for inflation) regardless of the work to be done.

- from Systematics by John Gall

0. https://en.wikipedia.org/wiki/Junk_DNA (Well aware of how “junk” DNA is linked to functions elsewhere, and that the subtleties involved represent an evolutionary aggregation equivalent to why acli is 1.5G.)

1. https://en.wikipedia.org/wiki/Katamari_Damacy

0. https://en.wikipedia.org/wiki/Systemantics

the_mitsuhiko

The reason why the windows-sys (and below the windows-targets) crate are so beefy is that they are basically a bunch of binary blobs that are needed to link stuff together due to how import libs work on Windows. https://kennykerr.ca/rust-getting-started/understanding-wind...

In theory that would not be necessary any more on more modern rustc versions, but if you want to target rustc < 1.70 you still need that.

akx

That's my issue! <3

EDIT: It's gotten even worse in the last 6 years!

portaltonowhere

I agree with his sentiments in the article. I love Rust as a PL, but the situation with certain crates and dependency trees is a bit of a nightmare IMO. It's certainly a trade off.

I recently ripped out the rand crate and replaced it with some much simpler code ported from a C++ codebase. Still does what I need it to do but way fewer LOC and way less complexity. Is it as flexible as what rand and related crates offer? Maybe not, but that flexibility comes at a cost.

Ygg2

I also disagree, first off rand is working on simplifying it. Plus out of those dependencies it's hard to see something I'd rather do myself than trust other people with.

Windows-sys is necessary for w Windows OS kernel, libc is similar thing for *Nix, cfg-if is necessary for specializing targets per OS, arch, or SIMD capabilities.

Biggest offender is honestly zerocopy-derive. Which pulls in most dependencies.

jicea

I really like Armin "food for thought" articles and I'm also concerned with the dependencies attitude in Rust. I like the language a lot, but I cringe when I clone some project and see the number of dependencies... It's really a balance: when you add a dependency you get a lot of code for "free" but nothing is free: now you've potentially new bug/security failure/things to update. It's a balance.

On the article, two random thoughts:

- I like that the Rust standard lib is "tiny" and a lot of stuff is delegated to third-parties crates. I wish crates support namespaces (for instance std) so it's easier to see what dependencies are blesser. For the moment, you can obtain a good name, like http, and squat it forever (http being just an example, don't know if there is an actual crate)

- when you vendor a Rust project, does it vendor also the flag dependencies? For instance, Rand is dependent on Serde because of an optional flag for serializing a random generator. Serde loc should be ignored in that case if we count the code lines.

vlovich123

I think that the rand crate is much bigger than it needs to be and is conflating unrelated concepts *. This is a sore spot for the stdlib - it should standardize 1 PRNG and a CSPRNG so that they’re available on all platforms as a default and the types that everyone can use so that you can properly plug in whatever PRNG / CSPRNG that you want. It should also standardize what interfaces random distributions should conform to and implement really common ones like Norm & Uniform. Those two changes alone would remove the need for the vast majority of dependencies, especially if a crate wants to delegate selection of the RNG to their users.

That being said, I simultaneously think the concerns are slightly overblown on the safety part. Having stable pillar crates that everyone builds around is a good thing not a bad thing. The build issues for things like that should be solved at the language/tooling level (e.g. pulling in a crate for 1 function should be trivially cheap) while relying on the network effects of auditing the components (i.e. it’s OK to rely on a crate with a stronger chain of trust than you have yourself).

* To be fair, they call out alternatives that you might find more appealing, but the type and module system being what it is (+ the name rand being so concise and appealing when you come at it with a first glance), it becomes the de facto standard.*

dwattttt

I think it's more a conscious decision to default to a CSPRNG unless you know better. If you know don't know the difference, a safer default means less problems.

EDIT: it's a similar situation to general purpose allocators. If you know you can use a simple one, it's orders of magnitude less code and complexity than a general one.

vlovich123

The Rust stdlib doesn’t come with a CSPRNG so the point is moot. On the other hand, if you’re rolling your own crypto you deserve what you get if you don’t know the difference between CSPRNG and PRNG & you can name them differently even (e.g. rand(), insecure_rand()).

dwattttt

This was in response to the size of rand & dependencies, trying to be the safe default for every use. I don't have a strong opinion on whether it belongs in the stdlib.

glitchc

I'm not sure if this is possible. All systems do not have access to an entropy source of the same quality.

vlovich123

Then the CSPRNG API just wouldn’t be available (just the traits). There are plenty of platform-specific APIs available within the stdlib. The PRNG would always be available though.

koakuma-chan

> All of these are great crates, but do I need all of this just to generate a random number?

You can use getrandom directly.

getrandom v0.3.1 ├── cfg-if v1.0.0 └── libc v0.2.169

koakuma-chan

I personally use aws_lc_rs's random since I have it for TLS anyway, https://docs.rs/aws-lc-rs/latest/aws_lc_rs/rand/index.html

kstrauser

I love Armin’s blog. I don’t always agree with everything he says, but always come away with a new appreciation for his POV.

the_mitsuhiko

Thank you for that. I appreciate this.

malcolmgreaves

Does —release do tree shaking to remove unused code in the final executable?

the_mitsuhiko

Rust's compiler is very good at removing most unused code. You are however going to pay a lot for the compilation. In case of some of those common dependencies you are not infrequently ending up with them multiple times in your dependency tree since not all libraries move up to the latest version. For instance today I have three different versions of windows-sys in my dependency tree and three zerocopy versions.

malcolmgreaves

I see — thank you for the information!

Do you happen to also know if there is any ongoing work in rustc to make compilation faster for this situation?

I’m imagining that one could typecheck and then see if ASTs are used / unused and eliminate them before generating code. Maybe that would speed up compilation? Perhaps this is already being done.

the_mitsuhiko

Rust's compilation unit is an entire crate. There is not much that can be done here as far as I can tell without changing the compilation model. It's not like C++ where you can just compile individual object files. You really are hoping that the linker cleans it up.

BoardsOfCanada

One reason to care about this is that random number generators are a very lucrative target to attack for security services. See for example ECDual https://eprint.iacr.org/2015/767.pdf or Linus Torvalds' answer about NSA approaching him about a backdoor: https://www.youtube.com/watch?v=wwRYyWn7BEo

graycat

Random number generation?

There still is:

     R. R. Coveyou, R. D. MacPherson,
     "Fourier Analysis of Uniform Random
     Number Generators", Journal of the
     ACM, Volume 14, Issue 1, Jan. 1967,
     Pages 100-119.
From memory (might check the paper):

     i -- a positive integer
     ip1 -- a positive integer
     Set i = 1
     Do Forever
       Set ip1 = i * 5^15 + 1 mod 2^47
       Return(ip1)
       Set i = ip1
     End

some1else

Here's one I use for cryptographic purposes:

  let invocations = 0;

  export function rand3000() {
    invocations += 1;

    const timestamp = new Date().getTime() + invocations * 1000;
    const masked = (timestamp ^ (timestamp >> 8)) & 0xFF;
    const result = masked / 255;

    return result;
  }

stephc_int13

1/ What?

If this kind of insane dependencies for such a simple thing is common practice in the Rust ecosystem then the language is in a lot worse place than I imagined it to be.

koakuma-chan

Most of those dependencies are for procedural macros (compile time only), don't be misled by this silly article.

kibwen

It's not insane, the author has been bitten by their poor experiences with dependencies in other languages and is misapplying that experience to Rust out of hand.

Listen, I'd be as happy as anyone to have random numbers in the Rust standard library. Compared to the Rust developers, I'm a believer in stdlib maximalism, downsides be damned. But all this recent hand-wringing about dependencies is a tiresome moral panic.

burntsushi

"moral panic" is a bit of a reach don't you think? Increasing dependencies is a real problem with real downsides. There are plenty of characters expressing unreasonable things, but that doesn't mean everyone expressing concern about dependencies is indulging in a moral panic. There is nuance!

If there weren't real costs to dependencies then I personally never would have published regex-lite.

kibwen

> If there weren't real costs to dependencies

The OP isn't addressing the real costs of dependencies, the moral panic in question is the automatic assertion that more dependencies is worse than fewer dependencies, which implies that e.g. all the work you have done to cleanly separate regex out into reusable regex-syntax and regex-automata crates has done a disservice to your users. There are real arguments to be made about wrangling one's trusted computing base, but this isn't making that argument, and by throwing the baby out with the bathwater it sets us back as a profession.

dvh

    function rand() {
      return 4;
    }

cwmoore

    function rand() {
      return ++4++;
    }