Skip to content(if available)orjump to list(if available)

Writing into Uninitialized Buffers in Rust

vgatherps

I wish that there was a useful “freeze” intrinsic exposed, even if only for primitive types and not for generic user types, where the values of the frozen region become unspecified instead of undefined. I believe llvm has one now?

Iirc the work on safe transmute also involves a sort of “any bit pattern” trait?

I’ve also dealt with pain implementing similar interfaces in Rust, and it really feels like you end up jumping through a ton of hoops (and in some of my cases, hurting performance) all to satisfy the abstract machine, at no benefit to programmer or application. It’s really a case where the abstract machine cart is leading the horse

CJefferson

I agree, I'd go further and say I wonder why primitive types aren't "frozen" by default.

I totally understand not wanting to promise things get zeroed, but I don't really understand why full UB, instead of just "they have whatever value is initially in memory / the register / the compiler chose" is so much better.

Has anyone ever done a performance comparison between UB and freezing I wonder? I can't find one.

vgatherps

Uninitialized memory being UB isn’t an insane default imo (although it makes masked simd hard), nor is most UB. But the lack of escape hatches can be frustrating

wrs

That assumes the compiler reserves one continuous place for the value, which isn’t always true (hardly ever true in the case of registers). If the compiler is required to make all code paths result in the same uninitialized value, that can limit code generation options, which might reduce performance (and performance is the whole reason to use uninitialized values!).

Also, an uninitialized value might be in a memory page that gets reclaimed and then mapped in again, in which case (because it hasn’t been written to) the OS doesn’t guarantee it will have the same value the second time. There was recently a bug discovered in one of the few algorithms that uses uninitialized values, because of this effect.

90s_dev

Write into uninit'd buffers was one of the pain points of Rust for the creator of the new open source "edit" program for Windows[1]. I wonder what he thinks of this article.

> Another thing is the difficulty of using uninitialized data in Rust. I do understand that this involves an attribute in clang which can then perform quite drastic optimizations based on it, but this makes my life as a programmer kind of difficult at times. When it comes to `MaybeUninit`, or the previous `mem::uninit()`, I feel like the complexity of compiler engineering is leaking into the programming language itself and I'd like to be shielded from that if possible. At the end of the day, what I'd love to do is declare an array in Rust, assign it no value, `read()` into it, and magically reading from said array is safe. That's roughly how it works in C, and I know that it's also UB there if you do it wrong, but one thing is different: It doesn't really ever occupy my mind as a problem. In Rust it does. [https://news.ycombinator.com/item?id=44036021]

electrograv

> That's roughly how it works in C, and I know that it's also UB there if you do it wrong, but one thing is different: It doesn't really ever occupy my mind as a problem. In Rust it does.

UB doesn’t occupy the author’s mind when writing C, when it really should. This kind of lazy attitude to memory safety is precisely why so much C code is notoriously riddled with memory bugs and security vulnerabilities.

mk12

There is an important difference for this case though. It C it’s fine to have pointers into uninitialized memory as you as you don’t read them until after initializing. You can write through those pointers the same way you always do. In Rust it’s UB as soon as you “produce” an invalid value, which includes references to uninitialized memory. Everything uses references in Rust but when dealing with uninitialized memory you have to scrupulously avoid them, and instead write through raw pointers. This means you can’t reuse any code that writes through &mut. Also, the rules change over time. At one point I had unsafe code that had a Vec of uninitialized elements, which was ok because I never produced a reference to any element until after I had written them (through raw pointers). But they later changed the Vec docs to say that’s UB, I guess because they want to reserve the right to use references even if you never call a method that returns a reference.

nemothekid

Bizarre. I think I've been writing broken Rust code for a couple years. If I understand you correctly something like:

    let mut data = Vec::with_capacity(sz);
    unsafe { data.set_len(sz) };
    buf.copy_to_slice(data.as_mut_slice());
is UB?

Arnavion

This stopped being much of a problem when MaybeUninit was stabilized. Now you can stick to using &MaybeUninit<T> / &mut MaybeUninit<T> instead of needing to juggle *T / *mut T and carefully track converting that to &T / &mut T only when it's known to be initialized, and you can't accidentally use a MaybeUninit<T> where you meant to use a T because the types are different.

It's not as painless as it could be though, because many of the MaybeUninit<T> -> T conversion fns are unstable. Eg the code in TFA needs `&mut [MaybeUninit<T>] -> &mut [T]` but `[T]::assume_init_mut()` is unstable. But reimplementing them is just a matter of copying the libstd impl, that in turn is usually just a straightforward reinterpret-cast one-liner.

usefulcat

I suspect that the main reason it doesn't really occupy the author's mind is that even though it's possible to misuse read(), it's really not that hard to actually use it safely.

It sounds like the more difficult problem here has to do with explaining to the compiler that read() is not being used unsafely.

o11c

The reason this particular UB doesn't need mindspace for C programmers is because it's not even meaningful to do anything with the parts of the buffer beyond the written length.

Most other UBs related to datums that you think you can do something with.

ii41

I think this solves his problem. He said he wants a read function that turns the unsafe buffer into a safe buffer, and this API does that.

IIRC it's not that hard to convince the compiler to give you a safe buffer from a MaybeUninit. However, this type has really lengthy docs and makes you question everything you do with it. Thinking through all this is painful but it's not like you don't have to it with C.