C stdlib isn't threadsafe and even safe Rust didn't save us
128 comments
·January 22, 2025mmastrac
usefulcat
But that won't actually fix the underlying problem, namely that getenv and setenv (or unsetenv, probably) cannot safely be called from different threads.
It seems like the only reliable way to fix this is to change these functions so that they exclusively acquire a mutex.
eqvinox
I have a different perspective: the underlying problem is calling setenv(). As far as I'm concerned, the environment is a read-only input parameter set on process creation like argv. It's not a mechanism for exchanging information within a process, as used here with SSL_CERT_FILE.
And remember that the exec* family of calls has a version with an envp argument, which is what should be used if a child process is to be started with a different environment — build a completely new structure, don't touch the existing one. Same for posix_spawn.
And, lastly, compatibility with ancient systems strikes again: the environment is also accessible through this:
extern char **environ;
Which is, of course, best described as bullshit.diroussel
Indeed, environment variables should be used to configure child processes, not to configure the current process, for non-shell programs, IMHO.
Note that Java, and the JVM, doesn't allow changing environment variables. It was the right choice, even if painful at times.
Joker_vD
> As far as I'm concerned, the environment is a read-only input parameter set on process creation like argv.
Mutating argv is actually quite popular, or at least it used to be.
pshc
The underlying problem is that setenv is mutable global state and should never have existed
Joker_vD
The process's current directory is mutable global state as well, and yet chdir(2) is thread-safe.
josefx
Welcome to the C standard library, the application of mutable global state to literally everything in it has to be the most consistent and predictable feature of the language standard.
ModernMech
It's the same problem with global vars, but at a machine scope. The real solution here would be for the OS to have a better interface to read and write env vars, more like a file where you have to get rw permission (whether that's implemented as a mutex or what).
eqvinox
This is neither an OS nor a machine scope problem. The environment is provided by the OS at startup. What the process does with it from there on is its own concern.
benatkin
People get trained to ignore the ____UNSAFE_payattention__nevermindthatthisappears50timesinthisfile___ blocks and prefixes
This also shows up in web frameworks where Vue has the v-html directive and react has dangerouslySetInnerHTML. Vue definitely has it better.
crooked-v
In the React world, the only times I've seen dangerouslySetInnerHTML consistently used is for outputting string literal CSS content (and this one is increasingly rare as build tools need less handholding), string literal JSON content (for JSON+LD), and string literal premade scripts (i.e. pixel tags from the marketing content). That's not to say there's no danger surface there, but it's not broadly used as a tool outside of code that's either really bad or really exhaustively hand-tuned.
rerdavies
Code syntax highlighting libraries for react use dangerouslySetInnerHTML.
javier2
I've only really seen dangerouslySetInnerHTML used while transitioning from certain kinds of server side rendering to React. There is still lots of really old internal tools in ancient html out there.
benatkin
React doesn't have a tag and attribute sanitizer built in, so having non-js-programmers edit JSX isn't especially safe anyways, as an img or a href could exfiltrate data. If it were they could just block out an innerHTML attribute. A js programmer can get around it by setting up a ref and then using the reference to set innerHTML without the word dangerously appearing.
ChrisSD
In the Rust std, `set_var` and `remove_var` will correctly require using an `unsafe {}` block in the next edition (2024). The documentation does now mention the safety issue but obviously it was a mistake to make these functions safe originally (albeit a mistake even higher level languages have made).
https://doc.rust-lang.org/stable/std/env/fn.set_var.html
There is a patch for glibc which makes `getenv` safe in more cases where the environment is modified but C still allows direct access to the environ so it can't be completely safe in the face of modification https://github.com/bminor/glibc/commit/7a61e7f557a97ab597d6f...
Thaxll
Why requiring unsafe when the std implementation could take care of the synchronisation?
masklinn
Because the std implementation can not force synchronisation on the libc, so any call into a C library which uses getenv will break... which is exactly what happened in TFA: `openssl-probe` called env::set_var on the Rust side, and the Python interpreter called getenv(3) directly.
rerdavies
But the standard implementation could copy the environment at startup, and only uses its copy.
And the library's use of setenv is clearly a bug as setenv is documented to be not threadsafe in the C standard library. So that would take care of that problem.
miohtama
Is it possible to skip libc completely or would this introduce too many portability concerns?
ChrisSD
It can only synchronize if everything using is Rust's functions. But that's not a given. People can use C libraries (especially libc) which won't be aware of Rust's locks. Or they could even use a high level runtime with its own locking but then they'll be distinct from Rust's locks.
The only way to coordinate locking would be to do so in libc itself.
wahern
libc does do locking, but it's insufficient. The semantics of getenv/setenv/putenv just aren't safe for multi-threaded mutation, period, because the addresses are exposed. It's not really even a C language issue; were you to design a thread-safe env API, for C or Rust, it would look much different, likely relying on string copying even on reads rather than passing strings by reference (reference counted immutable strings would work, too, but is probably too heavy handed), and definitely not exposing the environ array.
The closest libc can get to MT safety is to never deallocate an environment string or an environ array. Solaris does this--if you continually add new variables with setenv it just leaks environ array memory, or if you continually overwrite a key it just leaks the old value. (IIRC, glibc is halfway there.) But even then it still requires the application to abstain from doing crazy stuff, like modifying the strings you get back from getenv. NetBSD tried adding safer interfaces, like getenv_r, but it's ultimately insufficient to meaningfully address the problem.
The right answer for safe, portable programs is to not mutate the environment once you go multi-threaded, or even better just treat process environment as immutable once you enter your main loop or otherwise finish with initial process setup. glibc could (and maybe should) fully adopt the Solaris solution (currently, IIRC, glibc leaks env strings but not environ arrays), but if applications are using the environment variable table as a global, shared, mutable key-value store, then leaking memory probably isn't what they want, either. Either way, the best solution is to stop treating it as mutable.
demurgos
It can't ensure synchronization because any code using libc could bypass the sync wrapper. In particular, Rust lets you link C libs which wouldn't use the Rust stdlib.
msully4321
Because it can still race with C code using the standard library. getenv calls are common in C libraries; the call to getenv in this post was inside of strerror.
fsckboy
you've gotten a lot of answers which say the same thing, but which I don't think answer your question:
synchronization methods impose various complexity and performance penalties, and single threaded applications which don't need that would pay those penalties and get no benefit.
Unix was designed around a lightweight ethos that allowed simple combining of functions by the user on the command line. See "worse is better", but tl;dr that way of doing things proved better, and that's why you find yourself confronting what it doesn't do.
davidt84
The real problem is that getenv() and setenv() were created before threads were really a thing.
sunshowers
Well it was better in the short term but is worse in the long term. In particular, the error handling situation is generally atrocious, which is fine for interactive/sysadmin use but much worse for serious production use.
rikthevik
Great article about digging into a non-obvious bug. This one had it all! Intermittent bug, architecture-specific, hidden in a dependency, rust, the python GIL, gettext. Fantastic stuff.
These kinds of detailed troubleshooting reports are the closest thing you can get to having to do it yourself. Thanks to the authors. It's easy to say "don't use X duh" until a dependency relies on it, and how were you supposed to know?
vlovich123
Even if C stdlib maintainers are resistant against making setenv multi-thread safe, at a minimum there should be a new alternative thread-safe API defined, whether within POSIX or defining a defacto standard and forcing POSIX to adopt it over time. If instead of explaining why nothing could be done was spent fixing this problem, a new thread-safe API could have replaced the old setenv which could have been deprecated and removed from many software projects.
I'm also not convinced by Musl's maintainer that it can't be fixed within Musl considering glibc is making changes to make this a non-issue.
usefulcat
The biggest problem is not the absence of a thread safe API, it's the existence of this:
extern char **environ;
As long as environ is publicly accessible, there's no guarantee that setenv and getenv will be used at all, since they're not necessary.If you're willing to get rid of environ, it's pretty trivial to make setenv and getenv thread safe. If not, then it's impossible, although one could still argue that making setenv and getenv thread safe is at least an improvement, even if it's not a complete solution (aka don't let the perfect be the enemy of the good).
panzi
Guess that would also require some locking for all the exec() functions that don't take the environment as a parameter or that search PATH for the executable.
StillBored
Its like a rite of passage to be hit by an environment related bug on linux, which is mysteriously less a problem on other unix's. Which is sorta funny given how pragmatic Linus and the kernel are about fixing POSIX bugs by making them not happen, while glibc is still lagging here decades after people tried to at least make the problem better. Sure there is all the crap around TZ/etc, but simply providing getenv_r() and synchronizing it with setenv() and warning during compile/link on getenv() would have killed much of the problem. Nevermind, actually doing a COW style system where the env pointer(s) are read only. Instead the problem is pushed to the individual application, which is a huge mistake, because application writers are rarely aware of what their dependencies are doing. Which is the situation I found myself in many many years ago. The closed source library vendor, at the time, told us to stop using that toy unix clone (linux).
kelnos
> environment related bug on linux, which is mysteriously less a problem on other unix's.
How do you figure? The problem isn't the implementation, it's the API. setenv(), unsetenv(), putenv(), and especially environ, are inherently unsafe in a multithreaded program. Even getenv_r() can't really save you, since another thread may be calling setenv() while the (old) value of an env var is being copied into the provided buffer. Sure, a getenv_r() fixes the case where you get something back from getenv(), and then another thread calls setenv() and makes that memory invalid, but there's no way to protect the other calls breaking the API.
There are ways to mitigate some of the issues, like having libc hold a mutex when inside getenv()/setenv()/putenv()/unsetenv(), but there's still no way for libc to guarantee that something returned by getenv() remains valid long enough for the calling code to use it (which, right, can be fixed by getenv_r(), which could also be protected by that mutex). But there's no good way to make direct access to environ safe. I suppose you could make environ a thread-local, but then different threads' views of the environment could become out of sync, permanently (and you could get different results between calling getenv_r() and examining environ directly).
Back-compat here is just really hard to do. Even adding a mutex to protect those functions could change the semantics enough to break existing programs. (Arguably they're already broken in that case, but still...)
rerdavies
Why does adding a mutex break the API? I guess it breaks `char**environ`. But the API wouldn't be broken.
einpoklum
> Even getenv_r() can't really save you, since another thread may be calling setenv() while the (old) value of an env var is being copied into the provided buffer.
Won't that depends on the libc implementation. For example, maybe setenv writes to another buffer, then swaps pointers atomically; wouldn't that work?
masklinn
Previously on setenv being a terrible thing: https://www.evanjones.ca/setenv-is-not-thread-safe.html (discussion: https://news.ycombinator.com/item?id=38342642 first comment is even about it causing issues in Rust)
Animats
Yes. That's known.
Most of the rest of the problem here seems to be the development environment. They're testing on a remote machine in an Amazon data center and using Docker. This rig fails to report that a process has crashed. Then they don't have enough debug symbol info inside their container to get a backtrace. If they'd gotten a clean backtrace reported on the first failure, this would have been obvious.
Why is anyone using "setenv" anyway?
mmastrac
Yup, it's mostly just the story and tools we used to get ourselves out of a mess that was made harder by some decisions made earlier -- the tests were running in a container with stripped symbols (we're going to ship symbols after this, no reason to over-optimize), our custom test runner failed to report process death (an oversight).
There's no reason setenv should have been called here. The `openssl-probe` library could simply return the paths to the system cert files and callers could plug those directly into the OpenSSL config.
Oversights all around and hopefully this continues to improve.
masklinn
> Why is anyone using "setenv" anyway?
Because it’s there and it looks like a good idea until it takes one of your fingers.
einpoklum
It really does not look like a good idea to setenv() . The very notion is quite terrifying. Messing with a bunch of globals, that other code knows about as well? Nuh-uh.
The thing is, the OP people weren't doing that at all, it was some irresponsible library maintainers. If your code does that, you have to include something like the "surgeon general's warning" everywhere: "CAREFUL: USING THIS LIBRARY MAY CAUSE TERMINAL CRASHES".
HarHarVeryFunny
What is the rationale for libc not making setenv/getenv thread safe? It does seem rather odd given how environment variables are explicitly defined as shared between threads in the same process!
It doesn't seem it would take much to do it efficiently, even retaining the poor getenv() pointer-returning API (which could point to a thread local buffer). The coordination between getenv and setenv could be very lightweight - spinlock vs mutex.
kelnos
This reminded me of that whole "12-factor app" movement, which several of my former coworkers had really bought into. One of the "factors" is that apps should be configured by environment variables.
I always thought this was kinda foolish: your configuration method is a flat-namespace basked of stringly-typed values. The perils of getenv()/setenv()/environ are also, I think, a great argument against using env vars for configuration.
Sure, there aren't always great, well-supported options out there. I prefer using a configuration file (you can have templated config and a system that fills in different values for e.g. dev/stage/prod), and I'll usually use YAML, despite its faults and gotchas. There are probably better configuration file formats, but IMO YAML is still significantly better than using env vars.
shikon7
I wonder why it is so hard for Rust to implement its own safe stdlib independent of C.
dgrunwald
How exactly would that help in this situation?
If both Rust and C have independent standard libraries loaded into the same process, each would have an independent set of environment variables. So setting a variable from Rust wouldn't make it visible to the C code, which would break the article's usecase of configuring OpenSSL.
The only real solution is to have the operating system provide a thread-safe way of managing environment variables. Windows does so; but in Linux that's the job of libc, which refuses to provide thread-safety.
do_not_redeem
The crash in the article happened when Python called C's getenv. Rust could very well throw away libc, but then it would also be throwing away its great C interop story. Rust can't force Python to use its own stdlib instead of libc.
kbolino
They did, it's called core. But it assumes no operating system at all, and environment variables require an operating system.
nomel
> and environment variables require an operating system
Is that true? It's just a process global string -> string map, that can be pre-loaded with values before the process starts, with a copy of the current state being passed to any sub-process. This could be trivially implemented with batch processing/supervisory programs.
kbolino
Sure, there's a broader concept here, which doesn't require any operating system. But any alternate string->string map you define won't answer to C code calling getenv, won't be passed to child processes created with fork, won't be visible through /proc/$PID/environ, etc.
panzi
Well, it's used by the OS when exec-ing a new process, but at least the Linux syscall for that takes the environment as an explicit parameter. So it could be managed in whatever way by the runtime until execve() is called.
sunshowers
Environment variables are not just technical, they're social. You need to get everyone on board with your scheme.
steveklabnik
Linux is an unusual platform in that it allows you to call into it via assembly. Most other platforms require you to go through libc to do so. It's not really in Rust's hands.
PaulDavisThe1st
This is not unusual at all. Windows allowed it for years before Linux came along. It was also true of some other nix systems - IIRC, Ultrix (DEC) allowed this, and so did Dynix (Sequent).
BSD allows it too, or used as of 2022.
What is unusual about Linux is that it guarantees a syscall ABI, meaning that if you follow it, you can make a system call "portably" across "any" version of Linux.
steveklabnik
Sure, I’m speaking about platforms that are relevant today, not historical ones. Windows, MacOS, {Free,Open,Net}BSD, Solaris, illumos, none of these do.
zanderwohl
It would be a tremendous amount of work, and would take years. Meanwhile, the problems are avoidable. It's not exactly the "rust way" to just remember and avoid problems, but everything in language design is compromises.
IshKebab
"Impossibru!!"
https://github.com/sunfishcode/eyra
Oh look:
> Why use Eyra? It fixes Rust's set_var unsoundness issue. The environment-variable implementation leaks memory internally (it is optional, but enabled by default), so setenv etc. are thread-safe.
sunshowers
That only works on Linux though right?
kbolino
That's quite a trade-off
gavinhoward
It is weird that I got this right before Rust did.
Because I use structured concurrency, I can make it so every thread has its own environment stack. To add to a new environment, I duplicate it, add the new variable, and push the new enviroment on the stack.
Then I can use code blocks to delimit where that stack should be popped. [1]
This is all perfectly safe, no `unsafe` required, and can even extend to other things like the current working directory. [2]
IMO, Rust got this wrong 10 years ago when Leakpocalypse broke. [3]
[1]: https://git.yzena.com/Yzena/Yc/src/branch/master/tests/yao/e...
[2]: https://gavinhoward.com/2024/09/rewriting-rust-a-response/#g...
[3]: https://gavinhoward.com/2024/05/what-rust-got-wrong-on-forma...
mmastrac
This isn't _really_ a Rust problem. Rust is a victim of POSIX.
If you have 1) C FFI interop in Yao, there's still a chance you might have two C libraries cause a crash without your code even being involved.
datadeft
Couldn't we have a better pattern for this?
if (__environ == NULL || name[0] == '\0')
return NULL;
cuno
We ended up overriding and replacing with our own thread-safe version years ago when we also hit this.
The major takeaway from this is that Rust will be making environment setters unsafe in the next edition. With luck, this will filter down into crates that trigger these crashes (https://github.com/alexcrichton/openssl-probe/issues/30 filed upstream in the meantime).