veltas
bcrl
That doesn't work reliably either. No existing code scrubs STDDATA_FD from their environment variables, and there's no way to know if anyone uses STDDATA_FD in the wild. Why not just use a command line parameter like everyone else? Different isn't better in a situation like this.
This is a larger concern I've started to see in a certain class of younger developer where existing conventions are just ignored without an attempt at understanding of why they exist. Things are only going to get worse as naive vibe coders start flinging more AI generated garbage out into the world. I pity the pole folks trying to maintain these systems a couple of decades from now.
smarx007
This is long overdue. PowerShell has long supported passing structured output (objects) via pipes and this is the closest attempt to approximate that without breaking the world.
account-5
I don't know, Nushell does a pretty good job.
mmastrac
It's a shame that stdX streams were never spec'd as sockets, with appropriate handling available in the various shells.
Also, file handle inheritance by default was such a big mistake.
nulld3v
Yeah, POSIX made choices that looked sane and even elegant at the time, but nowadays I think it is fair to say that they have not aged well. Like it's not just FDs getting inherited by default, almost everything gets inherited by default:
Working dir, env vars, uid/gid, socket handles, file descriptors, (some) file locks, message queues. AFAIK the only exception is the argv, everything else is inherited on fork or exec.
Sometimes this makes sense, but programmers always forget about this, resulting in security incidents. Eventually most programming languages gave up and updated their stdlibs to set CLOEXEC when opening files and sockets, knowing that it would break POSIX compatibility and API compatibility on their stdlibs. Python is one example: https://peps.python.org/pep-0446/
The "inherit by default" behavior also makes it very difficult to evolve the shell interface. The nushell devs are looking for a reliable way to request JSON output/input on processes spawned by the shell (if supported by the program). Naively passing env vars or FDs to the process causes problems because if the process spawns any children of it's own, they too would also inherit those env vars or FDs.
bandie91
process inheritance was the best invention, because it models reality quite close. you dont have new things just sitting in an empty universe all alone and initialize everything themself from ... somewhere ... because everything is reset around them.
environment (in a broader sense, not just environment variables, but also CWD, file handles, uid/gid, sec context, namespaces) is there for a reason: to use. if you dont want your children processes to read the stdin in place of you, dont give it to them. it's the parent process responsibility to set up the env for the children.
although subprocesses are invented to do (some of) the parent's job by delegating smaller steps and leave the details to them. for example a http server would read the request (first) line, then delegate the rest of the input to a subprocess (worker) depending on who is free, who handles which type of request, etc. this is original idea behind inheritance, IMO.
gerikson
> Okay, apparently the stddata addition is causing havoc (who knew how many scripts just haphazardly hand programs random file descriptors, that's surely not a problem.)
I knew, and I've known since reading the "C shell considered harmful" paper, which offhandedly mentioned that sh-based shells can use an arbitrary number of file descriptors (maybe they have to be one-digit integers though). csh can't, of course.
It's discussed in the first section here
theamk
this brings memories - university, first Unix exposure, Sun Ray terminals, "tcsh" as default shell, and me doing "find / -name ..." a lot.
I always wanted to ignore all errors form this (there was a lot of "permission denied"), but tcsh just didn't have a simple ability to do so. This taught me a valuable lesson about some software just being better than other. And to this day, I keep wondering you would people choose to use csh/tcsh voluntarily.
layer8
Tcsh originally was more user-friendly for interactive use. The rest is inertia.
superdisk
Tangential but I was surprised to see that tree(1), at least the popular implementation, is made in Terre Haute (which is where I'm from). Maybe I should invite the author for lunch or something :)
NoboruWataya
I've never heard of stddata. What distro/environment provides it?
deathanatos
It's a local invention of TFA's, AFAIK. It's not "std".
stdout would be the canonical location for putting JSON output (and the "data" of a command, generally). Then things like `| jq` just work.
jamessb
Nor have I; I think it is just what the developer of tree has chosen to call file descriptor 3, rather than being a wider convention or standard thing provided by the environment.
> As of version 2.0.0, in Linux, tree will attempt to automatically output a compact JSON tree on file descriptor 3 (what I call stddata,) if present
https://github.com/Old-Man-Programmer/tree/blob/d501b58ff9cb...
zbendefy
offtopic: why is the Copyright © icon shake like crazy at the bottom of the page?
Edit: Oh I guess it seems to be intentional, I clicked around and I like the rgbcube site map.
omnicognate
<copyright intensifies>
krick
Can somebody explain what's going on here? It seems I'm missing some important piece of background info. Why don't they just add -J flag for everyone who wants to output JSON? Oh, wait, tree already has -J flag to output JSON. So WTF are they doing here?
I am especially confused by this:
> Surely, nothing will happen if I just assume that the existence of a specific file descriptor implies something, as nobody is crazy or stupid enough to hardcode such a thing?
Wait, what? But "you" (tree authors) just hardcoded such a thing. Do "you" have some special permission to do this nonsense?
Joker_vD
> who knew how many scripts just haphazardly hand programs random file descriptors, that's surely not a problem.
Oh for fuck's sake! Why are you using random file descriptors nobody told you about? Those open fds are there for a reason, thank you: I've put an end of an open pipe specifically so I could notice when it will become closed.
If the user set up the environment of your application in a specific way, that means he wants your application to run in such an environment. If you were invoked with 10 non-standard file descriptors open and two injected threads — you'll have to live with it. Because, believe it or not, your application's purpose is to serve the user's goals. So don't break composability that the user relies on, please.
listeria
This is the first I've heard of using an open pipe to poll for subprocess termination. Don't get me wrong, I don't hate it, but you could just as easily have a SIGCHLD handler write to your pipe (or do nothing, since poll(2) will be fail with EINTR), and you don't have to worry about the subprocess closing the pipe or considering it some weird stddata fd like tree does here.
o11c
`SIGCHLD` is extremely unreliable in a lot of ways, `pidfd` is better (but Linux-specific), though it doesn't handle the case of wanting to be notified of all grandchildren's terminations after the direct child dies early.
EdSchouten
If only there was a variant of execve() / posix_spawn() that simply took a literal array of which file descriptors would need to be present in the new process. So that you can say:
int subprocess_stdin = open("/dev/null", O_RDONLY);
int subprocess_stdout = open("some_output", O_WRONLY);
int subprocess_stderr = STDERR_FILENO; // Let the subprocess use the same stderr as me.
int subprocess_fds[] = {subprocess_stdin, subprocess_stdout, subprocess_stderr};
posix_spawn_with_fds("my process", [...], subprocess_fds, 3);
Never understood why POSIX makes all of this so hard.alerighi
It's something trivial to write (~20 lines of code), there is no point for standard library to provide that kind of functions in my opinion.
You do after the fork() (or clone, on Linux) a for loop that closes every FD except the one you want to keep. In Linux there is a close_range system call to close a range of in one call.
POSIX is an API designed to be a small layer on the operating system, and designed to make as little assumption as possible to the underlying system. This is the reason why POSIX is nowadays implemented even on low resources embedded devices and similar stuff.
At an higher level it's possible to use higher level abstractions to manipulate processed (e.g. a C++ library that does all of the above with a modern interface).
null
deathanatos
… what POSIX API gets you the open FDs? (Or even just the maximum open FD, and we'll just cause a bunch of errors closing non-existent FDs.)
o11c
That's `sysconf(_SC_OPEN_MAX)`, but it is always an bug to close FDs you don't know the origin of. You should be specifying `O_CLOEXEC` by default if you want FDs closed automatically.
o11c
It is always a bug to call `closerange` since you never know if a parent process has deliberately left a file descriptor open for some kind of tracing. If the parent does not want this, it must use `O_CLOEXEC`. Maybe if you clear the entire environment you'll be fine?
That said, it is trivial to write a loop that takes a set of known old and new fd numbers (including e.g. swapping) produces a set of calls to `dup2` and `fcntl` to give them the new numbers, while correctly leaving all open fds open.
Y_Y
> Never understood why POSIX makes all of this so hard
I honestly can't say in this particular instance but always my (unpopular?) instinct im such a situation is to asdume there is a good reason and I just haven't understood it yet. It may have become irrelevant in the meantime, but I can't know until I understand, and it's served me well to give the patriarchs the benefit of the doubt in such cases.
oguz-ismail
It's not hard, just a bit too long:
#include <fcntl.h>
#include <spawn.h>
int
main(void) {
posix_spawn_file_actions_t file_actions;
posix_spawn_file_actions_init(&file_actions);
posix_spawn_file_actions_addopen(&file_actions, 0, "/dev/null", O_RDONLY, 0);
posix_spawn_file_actions_addopen(&file_actions, 2, "/dev/null", O_WRONLY, 0);
posix_spawnp(NULL, "ls", &file_actions, NULL, (const char *[]){"ls", "-l", "/proc/self/fd", NULL}, NULL);
posix_spawn_file_actions_destroy(&file_actions);
}
williamcotton
For this the key would be to eliminate serialization and deserialization between steps in the pipeline.
The environment variable isn't much better, both are akin to using a global var in your reentrant code, but at least STDDATA_FD is less likely to collide than 3.
Can't wait for scripts using this variable for something unrelated to break when they call my scripts.
This should be a parameter or argv[0]-based.