Rust’s worst feature

147 comments

·January 27, 2025

DannyBee

This seems not even close to the worst feature of rust - this seems like it needs some more design work and baking. Like lots of things.

The amount of hyperbole in this article makes it a bit hard to take the author all that seriously.

Is there evidence more baking won't happen? While i have my loves and hates about rust, it definitely always felt like they had a pretty thorough/careful process for additions like this. If you go constructively into the threads and offer some concerns, you will usually get some reasonable response.

(All processes of course, fail, so this is not always true, but it's mostly true)

While i think it's fine to write rants on blogs, and don't feel like everyone has a responsibility to file bugs or whatever before they write a rant about it, if you actually want to see this "worst feature" fixed, this probably won't help very much.

(IE You don't have to be constructive, or even helpful, but if you want to be constructive or helpful, this ain't how you do it)

dathinab

> Is there evidence more baking won't happen?

No, actually there is a lot of evidence that it will still be worked on.

Normally I would just say look at the issue linked in the nightly docs but due to an overlap of tacking PR and moving it from std to core PR it's not supper useful.

Tracking Issue: https://github.com/rust-lang/rust/issues/78485

If the author has constructive critique they should probably mention it there (after skimming through the discussion to make sure this wasn't already considered and not done due to subtleties they overlooked in the blog post (like e.g. that it's a standard/core feature which has to work across all targets and as such can't rely on anything being initialized to 0 by the OS, or that depending on the global allocator used you definitely can't rely on things being zeroed even if the OS only hands out zeroed memory, etc. etc.))

Xunjin

I'm not as much low-level development as the author seems to be, but the hyperbole made me think “you have a point, which is/can be valid, but aren't you stretching the reasons to fit in your sentiment/PoV?”

Being honest, plenty of times we throw “Ergonomics” as an argument, however, are ergonomics more a feeling of how good are the API usage instead of actually prove with examples and design choices?

thayne

Well, as they said in the conclusion, they don't really have any constructive suggestions. It is mostly a lament that this wasn't better designed from the beginning, because backwards compatibility puts significant constraints on the design.

queuebert

The more pertinent question to me is can we implement some new static analysis that understands buffer re-use and can hoist buffer initialization outside the loop? Rather than make the programmer write obfuscated code for efficiency, it is usually better to have the compiler do the heavy lifting.

P.S. Also, folks, don't re-use buffers without zeroing unless you absolutely need the performance and know what you're doing.

mrpf1ster

Why would re-using a buffer be bad? Assuming you write to it with the contents of the file/stream before it is read.

kohbo

You just answered your own question

fnordpiglet

I think they implied you would prevent that.

rendaw

Why is it particularly more dangerous or likely than other logic errors?

phkahler

And that's not something you should be depending on a compiler to verify.

vlovich123

I like that direction better but it requires the ability to declare data-flow based contracts whereas Rust’s tools are only lifetime and type contracts. Is there a language that has data-flow based contracts?

queuebert

That would be easier but is not required. There are no compiler hints these days to unroll loops or hoist invariants, even though if done incorrectly it could change the result. It would take some complicated analysis, but I think it could be done safely in some cases.

gpm

I was going to make this argument, but I actually don't think it's true in almost any case.

Most functions could be inferred, but the ultimate source of basically all of these write only APIs is FFI functions, which in turn call systemcalls.

You're at least going to need a way to annotate the FFI calls and systemcalls to describe to the compiler how they access data.

vlovich123

The loop unrolling & invariant hoisting is a static transformation. What the “read” function does semantically isn’t captured today within that and the compiler wouldn’t be able to automatically infer it. It would have to be told that information and there would need to be unsafe annotations for things like syscalls and FFI boundaries. The other approach is to change the API which is what BorrowedBuf is.

If you can think of a different approach of how the compiler can figure out automatically what memory has become initialized by a random function call I’m all ears.

ijustlovemath

what do you mean by this?

vlovich123

There would need to be contractual declarations on the read method that the compiler is able to enforce that tells it that the input &mut slice has N elements clobbered based on the returned length. That’s basically what BorrowedBuf is accomplishing via the type system and runtime enforcement of the contract. Using a non-existent syntax:

    fn read<T, N: size_t>(&mut self, buf: &mut [MaybeUninit<T>] becomes &[T; N] after call) -> N {
        … enforces the body initializes N elements out of buf
    }

and then rules that &mut [T] can also be supplied to such functions that today could only accept a &mut [MaybeUninit<T>] transparently.

A more likely interface you could write today would look like:

    fn read_uninit<T>(&mut self, buf: &mut [MaybeUninit<T>]) -> (&[T], &[MaybeUninit<T>]) {
        … enforces the body initializes N elements out of buf
    }

You still have to cast &[T] into &[MaybeUninit<T>] somehow.

jvanderbot

Fair, but note there is a significant subset of Rust-targeted programmers who dislike the compiler doing things like that. They also dislike the compiler doing things like auto-initializing every loop iteration, but two wrongs wouldn't make it right, just less wrong.

Zagitta

Maybe Rust needs another type of reference that's exclusive write only? Right now there's RO (&T) and exclusive RW (&mut T) but WO is missing.

Having a WO reference would allow these read_buf APIs to express they only write and never read so the uninitialized memory is safe to pass directly.

gpm

In some sense that's exactly what a `&mut MaybeUninit<T>` is?

mmastrac

Probably more once https://doc.rust-lang.org/beta/std/mem/union.MaybeUninit.htm... is no longer nightly-only.

1oooqooq

everyone just tell you to use mpsr in this case

0x1ceb00da

> Even an obvious optimisation of moving the buffer declaration outside of the loop isn’t available to the compiler.

Why? Can't the programmer just do this himself?

Arch-TK

The compiler cannot assume that the read call won't read from the mutable reference (well, it might be able to given a sufficiently sophisticated optimizer and/or aggressive inlining).

The programmer, on the other hand, can do this, but the point is to make this implicitly possible by making it more explicit that read does not read from the buffer (and therefore allowing it to accept uninitialized memory).

dietr1ch

And I don't think it can ensure that all the bits were written. I've been bit by people trying to reuse buffers/objects like this that were not fully rewritten in one of the possible re-uses. It's a bit puzzling how a change that just adds a new continue leads to memory corruption.

hansvm

Zig has a couple features to help with that. I assume Rust should too (probably not directly applicable to BorrowedBuf, but for the case of a reusable pool of objects)?

It comes down to a piece of syntactic sugar, plus "result location semantics" guaranteeing that you won't have a copy. E.g.:

  my_ptr.* = .{
    .x = 42,
    .y = 53
  };

No matter how you choose to construct the intermediate fields (x and y in this example) with continues or other control flow, the very last step should be something that sets every field at once. If you miss one, the compiler will yell at you. If it compiles, the assembly is as if you filled in each field by hand.

vlovich123

Because the compiler doesn’t know what read/write are doing to the buffer. And since it’s declared as [0; 4096], the compiler wouldn’t be able to do anything other than 0’ing the entire 4kib region on every read instead of what’s dirtied if it attempted to automatically hoist. BorrowedBuf is an attempt to let you declare [MaybeUninit::uninit(); 4096] which the compiler could hoist although there it doesn’t matter either since the allocation of the uninit array is just an adjustment of the stack pointer.

lmm

> Why?

Because as far as the compiler is concerned it appears to change the behaviour, unless the compiler gets very fancy.

> Can't the programmer just do this himself?

Yes, but it's not really desirable for them to have to (and would arguably make the code less maintainable if they did). Doing the right thing should be easy.

Waterluvian

Arguably the buffer belongs in the loop scope because it is only relevant there. It’s probably also safer from wrong use.

This feels like exactly what you want the compiler to think about: a case where optimization comes at the cost of organization.

immibis

Organization is basically completely irrelevant for a small piece of code like this, which fits in your head all at once.

auggierose

Not if they are using Rust ... which is why I am not.

dgacmu

This is incorrect. It's trivial and compiles just fine. The argument here is that maybe for reasons the programmer doesn't want to - such as not wanting the buffer to outlive its use inside the loop, and they don't want to have to double-nest:

    { 
        let mut buf = [0; 4096];
        loop {
            ...
        }
    }

That accomplishes exactly the same goal but there's an argument -- not well made in the blog post -- that the compiler should be able to do some form of this hoisting automatically. In C, it would be automatic, because C doesn't make a zero-initialized promise for stack-allocated variables. In Rust it's not because the array is specified as zero-initialized. Of course, C's behavior comes with certain drawbacks of its own. ;)

Rust's behavior isn't unreasonable. It's just a potential missed optimization, but automating it is challenging.

jeroenhd

Adding an extra scope here is slightly annoying, but it's not always possible. I think the example in the blog post was poorly chosen, because the complexity of BorrowedBuf together with MaybeUninit doesn't make much sense when your fix makes for much more readable code.

Out of all problems I have encountered with Rust, this is a particularly minor one.

auggierose

I see. It is a very bad example indeed. Terrible, terrible example.

Switching off Trump mode for a moment, I don't see why you would want to declare the buffer inside the loop, given that keeping it alive for the entire time of the loop is actually the semantics you want.

vlovich123

What language would automatically be able to hoist the array outside the loop in that kind of code?

jhasse

C, because reading uninitialized memory is undefined behavior so the compile can assume it never happens.

pnathan

I think the answer is that in a case when you need that speed, you hoist the stack allocation & zeroing and unsafe that buffer in the loop if need be. Test well. I am a huge Rust fan but also it is actually possible to write correct unsafe code.

If I am interacting with from IO space, I would much rather write the interaction code myself for the machine at hand than farm it out to an array of third party crates. ::shrug::

getting the machinery to let it properly be hoisted smoothly and safely would be nice, but it isn't required.

personally I think rust macros are very painful and the "worst feature", but that's speaking as someone who did a fair bit of Common Lisp.

tcfhgj

what's so bad about macros?

pnathan

Have you ever used Common Lisp macros? Its absolutely eye poppingly easier.

umanwizard

> While replacing the array of zeros by an array of uninitialised values may work in specific circumstances, the code is unsound. Change to the compiler, its options, modification of unrelated parts of the code or using the function for a different Read trait implementation may break the program in unpredictable ways.

Why? It seems the only thing on that list that will cause UB is using the function with a different reader (one that inspects the uninitialized bytes). Why would any of the other listed possible changes break it?

GrantMoyer

Follow up to my earlier comment; The docs say creating a reference to uninitialized memory is undefined behavior, but I couldn't think of any cases where it would go wrong if the data is never actually accessed. It turns out that's because there currently aren't any. See comment [1] and the threads linked there.

Rust calls references to uninitialized memory undefined behavior because it wants to reserve the right to implement some kinds of optimizations using that fact in the future, however currently no optimizations do, and in fact the standard library depends on the compiler being well behaved here.

It's possible at some point in the future some optimization will be added which would cause problems for references to uninitialized memory, but it seems more likely that the requirements on references will eventually be relaxed instead.

[1]: https://github.com/rust-lang/rust/issues/119241#issuecomment...

GrantMoyer

In Rust, creating a reference to uninitialized data is undefined behavior, even if you don't access the data through it[1]. The optimizer assumes all references are intialized.

[1]: https://doc.rust-lang.org/std/ptr/#pointer-to-reference-conv...

mmastrac

This API has basically been adopted from Tokio. Like most of Rust buffer types, it's "not bad" to use as a caller and "awkward" to use as a consumer.

The pain of paying for buffer init is real, however. The last two projects have both seen perf hits from it.

null

[deleted]

Matthias247

If this is mainly useful for working with plain/uninterpreted byte arrays, then I wonder why we can't just do `[u8; N]::with_noinit()` method instead of doing the multi-line plus unsafe things listed in the article.

Is the main point that things like `slice_freeze_mut` could also be used for slices of e.g. `struct Coordinate { x: u32, y: u32, z: u32 }`?

It would obviously not work for f64 things, since there also not all bit-patterns are valid.

orlp

All f64 bit patterns are valid.

lmm

Why is it that the frozen semantics are actually needed? Is there no way to represent what people actually want here - memory that is entirely uninitialised, for which tautology might be false, until written? I.e. something that's a bit like MaybeUninit but more so?

api

There's probably no way for the compiler to prove safety. Rust is designed to allow 100% safe bare metal development, like a perfectly safe C that still allows you to get close to the hardware, and that's tough.

lmm

> There's probably no way for the compiler to prove safety.

That's already the case for the AnyBitPattern stuff though. (Indeed according to the docs AnyBitPattern traits already get cast from uninitialized bytes, which in C/LLVM semantics are not necessarily frozen, even if in practice Linux would not be remapping the pages they're in).

> Rust is designed to allow 100% safe bare metal development

I wouldn't say that since bare metal rust always needs some unsafe; rather it's designed to allow managed, contained use of unsafe constructs in code that's say 98% safe. The whole purpose of this BorrowedBuf is already something like that.

ajross

I'm failing to understand the correlation to "safety" here. Reading a byte for which you don't know the value isn't "unsafe". It's literally (!) the desired behavior of foreign data being read from an external source, which is in fact the use case in the article.

There's no safety problem as long as the arbitrary value is deterministic, which it is, being process RAM. The "uninitialized data read" bugs reported from instrumentation tools in C code are because the code is assuming the value has some semantics. The read itself has no value and is presumably an artifact of the bug, but it is safe.

gpm

> There's no safety problem as long as the arbitrary value is deterministic, which it is, being process RAM.

The article discusses how it is in fact, on Linux with memory returned from at least one very common allocator, not deterministic. Ctrl-f tautology.

MrBuddyCasino

What does safe mean here? Everything can be interpreted as a [u8], right?

gpm

[u8] guarantees to the compiler that two reads through the array at the same location without any intervening writes return the same value.

Turns out that's not the case on freshly returned uninitiated allocations. The first read could return old data (say "1"), and the second read could return a freshly zeroed page ("0").

lidavidm

https://www.ralfj.de/blog/2019/07/14/uninit.html perhaps (the OP also talks about this when linking to a talk about jemalloc)

leni536

Do some Rust types have invalid object representation or trap representation? On SysV x86_64 bool only has two valid representations in memory, the rest are trap representations.

So for an array of bools (if Rust matches SysV) freeze wouldn't be sound, even without the madvise problem.

gpm

Yes.

That's what "// SAFETY: u8 has no invalid bit patterns." is discussing. That while types in general can u8 specifically does not (none of the u*/i* integer types do) so freezing a buffer of u8s is sound.

kmeisthax

Yes; and furthermore LLVM also has undef, which is sort of a trap representation, but it only exists in the optimizer. (There's also poison for overflow, which is a strictly less defined value than undef.)

I suspect even reading an array of uninitialized u8s would cause havoc just from LLVM miscompiles alone.

jasonthorsness

I am not yet a Rust programmer but - is it not typical to have a small collection of unsafe functions, carefully reviewed, that in this case seem like they might be easier to maintain than some of these convoluted type-based workarounds?

steveklabnik

It's a tradeoff! You have to explore both options to know which side of the tradeoff to take.

HN

Rust’s worst feature

Rust’s worst feature