Tokio and Prctl = Nasty Bug
8 comments
·February 23, 2025nemothekid
rendaw
My knowledge isn't very good here, but I assumed since they're using the single thread executor, everything was being spawned on the main thread. The only time new (temporary) threads were created was when calling `spawn_blocking`. And the main thread can't be moved because it's part of the `main()` call stack? Maybe...
TheDong
I think they don't want PR_SET_PDEATHSIG but rather PR_SET_CHILD_SUBREAPER, which I think would be both more correct than PDEATHSIG for letting them wait on grand-children / preventing grand-child-zombies, while also avoiding the issue they ran into here entirely.
They would need one special "main thread" that deals with reaping and that isn't subject to tokio's runtime cleaning it up, but presumably they already have that, or else the fix they did apply wouldn't have worked.
Alternatively, if they want they could integrate with systemd, even just by wrapping the children all in 'systemd-run', which would reliably allow cleaning up of children (via cgroups).
vlovich123
> when the orphan terminates, it is the subreaper process that will receive a SIGCHLD signal and will be able to wait(2) on the process to discover its termination status
Seems like you don’t need a dedicated “always alive” thread if it’s being delivered to the process and tokio automatically does masking for threads so that you register for listening to signals using it’s asynchronous mechanisms & don’t have issues around signal safety which it abstracts away for you (i.e. as long as you’re handling the SIGCHILD signal somewhere or even just ignoring it as I don’t think they actually care?).
That being said, it’s not clear PR_SET_CHILD_SUBREAPER actually causes grand children to be killed when the reaper process dies which is the effect they’re looking for here (not the reverse where you reap forked children as they die). So you may need to spawn a dedicated reaper process rather than thread to manage the lifetime of children which is much more complicated.
TheDong
Yeah, I was assuming they have something calling `wait` somewhere since they say "HyperQueue is essentially a process manager", and to me "process manager" implies pretty strongly "spawns and waits for processes".
vlovich123
> Edit: Someone on Reddit sent me a link to a method that can override the thread keep-alive duration. Its description makes it clear why the tasks were failing after exactly 10 seconds
> Yeah, testing if a task can run for 20 seconds isn’t great, but hey, at least it’s something
Well a reasonable thing to me is then to use the override within the test to shorten it (e.g. to 1s & use a 2s timeout).
kevingadd
Leaving PDEATHSIG enabled would make it harder for me to sleep at night, but I understand why the alternatives probably aren't appealing. Seems like a future bug waiting to happen. At least the author knows what to expect now.
immibis
Good writeup of yet another bug different from all the other bugs.
The Linux kernel isn't really bothered by the difference between threads and processes. Threads are just processes that happen to share an address space, file descriptor table, and thread group ID (what most tools call a PID). I think there are some subtle things related to the thread group ID, but they're subtle. The rest is implemented in glibc.
I may be mistaken, but I believe the bug still exists, but in a more esoteric manner; and a future change might cause the bug to exist again. The author might want to warn against usage of `tokio::task::block_in_place`, if the underlying issue can't be fixed.
The reason the current approach works is it runs on tokio's worker threads, which last the lifetime of the tokio runtime. However, if `tokio::task::block_in_place`, the current worker thread is demoted to a blocking thread pool, and the new worker thread is spawned in it's place.
There can be a situation when the stars align that:
1. Thread A spawns Process X.
2. N minutes/hours/days pass, and Thread A hits a section of code that calls `tokio::task::block_in_place`
3. Thread A goes into the blocking pool.
4. After some idle time, Thread A dies, prematurely killing Process X, causing the same bug again.
You can imagine that this would be much harder to reproduce and debug, because thread lifetime will be completely divorced from when you spawned the process. It's actually pretty lucky that the author reached for spawn_blocking, instead of block_in_place as when doing benchmarking it's a bit more tempting to use block_in_place. Had they used block_in_place it may have been harder to catch this bug.