Skip to content(if available)orjump to list(if available)

The curious case of shell commands, or how "this bug is required by POSIX" (2021)

panzi

Yeah, system() should definitely be deprecated and you should never use it if you write any new program. At least there is exec*() and posix_spawn() under POSIX. Under Windows there is no such thing and every program might parse the command line string differently. You can't naively write a generic posix_spawn() like interface for Windows, see this related Rust CVE: https://blog.rust-lang.org/2024/04/09/cve-2024-24576/ Why is it a CVE in Rust, but not in any other programming language? Did other language handle it better? Dunno, I just know that Rust has a big fat warning about this in their documentation (https://doc.rust-lang.org/std/process/struct.Command.html#me...), but e.g. Java doesn't (https://docs.oracle.com/javase/8/docs/api/java/lang/ProcessB...).

steamrolled

The main reason system() exists is that people want to execute shell commands; some confused novice developers might mix it up with execl(), but this is not a major source of vulnerabilities. The major source of vulnerabilities is "oh yeah, I actually meant to execute shell".

So if you just take away the libcall, people will make their own version by just doing execl() of /bin/sh. If you want this to change, I think you have to ask why do people want to do this in the first place.

And the answer here is basically that because of the unix design philosophy, the shell is immensely useful. There are all these cool, small utilities and tricks you can use in lieu of writing a lot of extra code. On Windows, command-line conventions, filesystem quirks, and escaping gotchas are actually more numerous. It's just that there's almost nothing to call, so you get fewer bugs.

The most practical way to make this class of bugs go away is to make the unix shell less useful.

rcxdude

most calls to system() that I've seen could be replaced with exec without much difficulty. There's relatively few that actually need the shell functionality.

oguz-ismail

system() involves fork()ing, setting up signal handlers, exec()ing and wait()ing. You won't be replacing it with exec, most of the time you'll be reimplementing it for absolutely no reason.

Galanwe

I don't understand what you are complaining about. I don't understand what the article is complaining about either.

exec* are not "better replacements" of the shell, they are just used for different use cases.

The whole article could be summarized to 3 bullet points:

1) Sanitize your inputs

2) If you want to execute a specific program, exec it after 1), no need for the shell

3) Allow the shell if there is no injection risk

jcranmer

The article spends a lot of time dancing around its central points rather than addressing them directly, but the basic problems with shell boil down to this:

There's two ways to think of "running a command:"

1. A list of strings containing an executable name (which may or may not be a complete path) and its arguments (think C's const char **argv).

2. A single string which is a space-separated list of arguments, with special characters in arguments (including spaces) requiring quoting to represent correctly.

Conversion between these two forms is non trivial. And the basic problem is that there's a lot of tools which incorrectly convert the former to the latter by just concatenating all of the arguments into a single string and inserting spaces. Part of the problem is that shell script itself makes doing the conversion difficult, but the end effect is that if you have to with commands with inputs that have special characters (including, but not limited to, spaces), you end up just going slowly insane trying to figure out how to get the quoting right to work around the broken tools.

In my experience, the world is so much easier if your own tools just break everything up into the list-of-strings model and you never to try to use an API that requires single-string model.

What GP is referring to is the fact that that solution doesn't work as well on Windows, because the OS's native idea of a command line isn't list-of-strings but rather a single-string, and how that single string is broken up into a list-of-strings is dependent on the application being invoked.

theamk

I think "non trivial" and "slowly going insane" parts only happen if you don't have right tools, or not using POSIX-compatable system.

In python you have "shlex.quote" and "shlex.join". In bash, you have "${env@Q}". I've found those to work wonderfully to me - and I did crazy things like quote arguments, embed into shell script, quote script again for ssh, and quote 3rd time to produce executable .sh file.

In other languages.. yeah, you are going to have bad time. Especially on Windows, where I'd just give up and move to WSL.

panzi

I'd say: Don't use the shell if what you want to do is to execute another program.

You don't need to handle any quoting with exec*(). You still need to handle options, yes. But under Windows you always have to to handle the quoting yourself and it is more difficult than for the POSIX shell and it is program dependent. Without knowing what program is executed you can't know what quoting syntax you have to use and as such a standard library cannot write a generic interface to pass arguments to another process in a safe way under Windows.

I just felt it sounded like POSIX is particularly bad in that context, while in fact it is better than Windows here. Still, the system() function is a mistake. Use posix_spawn(). (Note: Do not use _spawn*() under Windows. That just concatenates the arguments with a space between and no quoting whatsoever.)

oguz-ismail

>Still, the system() function is a mistake. Use posix_spawn().

They are entirely different interfaces though. If you'd implemented system() using posix_spawn() it'd be just as bad as system()

mrheosuper

if every developer can follow best practice, we won't need Rust.

twic

Java has a bunch of code which looks like it's trying to do the right kind of escaping for msvcrt vs cmd.exe:

https://github.com/openjdk/jdk/blob/jdk-26%2B1/src/java.base...

But i would be lying if i said i understood what was going on there. Some googling suggests this was added around 1.7, ie in the early 2010s.

But then, that Rust CVE seems to originate in this work, and this guy claims Java said "won't fix", which suggests it is vulnerable:

https://flatt.tech/research/posts/batbadbut-you-cant-securel...

But there's no link, and i can't find any discussion about it, so i don't know what the actual situation is.

panzi

Yeah, part of the problem is how Windows does variable substitution before the command line syntax is parsed, and at a glance I don't see any % in that file.

tedunangst

panzi

Yeah, especially the thing about variable substitution is insane. How can you mess this up so thoroughly!? Appendix B is a nice overview. I'm amazed that Java doesn't even mention this in their documentation!

null

[deleted]

teo_zero

In POSIX world you aren't encouraged to use a specific command for each goal you might have. Instead you get a set of basic tools and combine them to achieve the desired result. Without system() you wouldn't have pipes, and you would need one command for every task you might want to run.

Point in case I encountered this week: I was editing a list and wanted to remove duplicates without changing the order of the lines. There's no ready-made program to do that, but this sequence of piped command served the purpose:

  cat -n | sort -uk 2 | sort | cut -f 2-
Fortunately my text editor supports system().

o11c

This is woefully misguided. Half the time passing it to the shell is explicitly a feature, e.g. `popen("gzip > foo.gz")`. If you have user input you should always sanitize it regardless of API.

But `ssh` does deserve all the shame. It's a pity the real problems are hard to find in an article full of nonsense.

Note also that if you're using a deficient shell that supports neither `printf %q` nor `${var@Q}` it's still easy to do quoting in `sed`. GNU `./configure` scripts do this internally, including special-casing to only quote the right side of `--arg=value`.

akdev1l

> Note also that if you're using a deficient shell that supports neither `printf %q` nor `${var@Q}` it's still easy to do quoting in `sed`. GNU `./configure` scripts do this internally, including special-casing to only quote the right side of `--arg=value`.

With the assumption that:

1. The person knows to do this weird thing 2. They do it consistently every time 3. They never forget

Also not sure how to use those solutions for the popen() example you provided.

The correct way is:

    subprocess.run([
       "gzip",
       "-c",
       "—-to-stdout",
       user_input
    ], stdout=open("foo.gz"))

And now I don’t have to worry about any of these weird things

theamk

The idea is you have some 3rd-party app, which might accept a parameter and pass it to popen as-is, something like "--data-handler=command"

In this case, current "popen" semantics - a single shell command - works pretty well. You can pass it a custom process:

    --data-handler="scripts/handle_data.py"
or a shell fragment:

    --data-handler="gzip -c > last_data.gz"
or even mini-shell script:

    --data-handler "jq .contents | mail -s 'New data incoming' data-notify"
this is where the "shell command" arguments really shine - and note you cannot simulate all of this functionality with command vector.

akdev1l

Yes, in this specific use case you need a shell.

But that’s the same as saying you technically need SQL injection so that `psql -c 'command'` can work

> you cannot simulate all of this with command vector

Uhh, yes we can just call a shell:

    subprocess.run(["bash", "-c", data_handler])
As a bonus this way we get control of which shell is being used and I find it is more explicit so I prefer it

panzi

You need to ensure that user_input doesn't start with `-`. You can do that by forcing an absolute path. Some programs accept `--` as a marker that any arguments after that are non-options.

pwdisswordfishz

No need for an absolute path, just a './' prefix.

rcxdude

If you're sanitizing, you're losing. You need to either have a) a watertight escaping process or b) a format that doesn't mix the code and data in the first place (notably, shell lacks either).

hello_computer

> article full of nonsense

Pls elaborate. Seems like a decent list of shell gotchas to me.

bee_rider

It is kind of a journey in an annoying way. Like, do we really need to know all the stuff about: this man page says to sanitize input, this one doesn’t, blah blah blah.

Or, let’s just look at an excerpt, here’s the section “proper solution:”

I’ve emphasized the actionable advice.

> The proper solution would be dropping that broken tool immediately, securely erasing it from your hard-drive, then running and screaming that tool's name out-loud in shame... (Something akin to Game of Throne's walk of atonement...)

Joke

> I'm not kidding... This kind of broken tools are the cause of many stupid bugs, ranging from the funny ups-rm-with-spaces (i.e. rm -Rf / some folder with spaces /some-file), to serious security issues like the formerly mentioned shellshock...

Joke/contentless stakes raising.

> So, you say someone holds you at gun point, thus you must use that tool? Check if the broken tool doesn't have a flag that disables calling sh -c, and instead properly executes the given command and arguments directly via execve(2). (For example watch has the -x flag as mentioned.)

Here it is, the paragraph that has something!

> Alternatively, given that most likely the tool in question is an open-source project written by someone in his spare time, perhaps open a feature request describing the issue, and if possible contribute with a patch that solves it.

This doesn’t seem practically actionable, at least in the short term—most projects might ignore your patch, or maybe it will take multiple years to get pushed out to distros.

> Still no luck? Make some popcorn and prepare for the latest block-buster "convoluted solutions for simple problems in UNIX town"...

Dramatic buildup/joke.

steamrolled

The original post asserted the article is nonsense; you're trying to justify that by saying you don't like the author's writing style. Two separate things...

The article is mostly correct, although it makes some weird claims (e.g., the Shellshock bug had nothing to do with the class of bugs the article is complaining about - it was a vulnerability in the shell itself). It definitely has a "newcomer hates things without understanding why they are the way they are" vibe, but you actually need that every now and then. The old-timers tend to say "it was originally done this way for a reason and if you're experienced enough, you know how to deal with it", but what made sense 30-40 years ago might not make much sense today.

chubot

The article mentioned printf '%q ', but it is a bit hard to find. Here is a handy way to remember it.

First, define this function:

    quote-argv() { printf '%q ' "$@"; }
    # (uses subtle vectorization of printf over args)
Now this works correctly:

    ssh example.com "$(quote-argv ls 'file with spaces')"
    ls: cannot access 'file with spaces': No such file or directory
In contrast to:

    $ ssh example.com ls 'file with spaces'
    ls: cannot access 'file': No such file or directory
    ls: cannot access 'with': No such file or directory
    ls: cannot access 'spaces': No such file or directory
And yes the "hidden argv join" of ssh is VERY bad, and it is repeated in shell's eval builtin.

They should both only take a SINGLE arg.

It is basically a self-own because spaces are an OPERATOR in shell! (the operator that separates words)

When you concatenate operators and variables, then you are mixing code and data, which is a security problem.

---

As for the exec workaround, I think this is also deficiency of shell. Oils will probably grow an 'invoke' builtin which generalizes 'command' and 'builtin', which are non-orthogonal.

'command true' means "external or builtin" (disabling shell function lookup), but there should be something that means "external only".

blueflow

> hidden argv join

It is not hidden. It is written down in plain sight:

  A complete command line may be specified as command, or it may have additional arguments. If supplied, the arguments will be appended to the command, separated by spaces, before it is sent to the server to be executed.
- third line in `man 1 ssh`

chubot

It's hidden in the sense that it creates ambiguity at the usage site. Compare with sudo:

    $ sudo ls 'file with spaces'
    ls: cannot access 'file with spaces': No such file or directory
If ssh (and sh eval) did not accept multiple arguments, then this wouldn't even get to ls:

    $ ssh example.com ls 'file with spaces'
    ls: cannot access 'file': No such file or directory
    ls: cannot access 'with': No such file or directory
    ls: cannot access 'spaces': No such file or directory
Accepting argv is better. Or forcing this is better:

    $ ssh example.com "ls 'file with spaces'"
So it's clear it's a single shell string.

Accepting a shell string is sometimes OK, but silently joining multiple args is useless, and insecure.

"RTFM" is not a good answer when security is involved.

blueflow

This stubborn attitude to refuse to consult the documentation at all and then expect the tool to work according to your preconceptions.

Tools do have rough edges, if you don't want to learn about them, you will get bitten.

scbrg

That's honestly not particularly clear. It doesn't say the command will be invoked by a shell on the remote host. Sure the whole "separated by spaces" thing sorta implies it will, as spaces don't mean much to anything but a shell, but it's still fairly vague.

In fact, later on the man page only mentions a shell in the part that talks about the behavior when no additional arguments are given:

  When the user's identity has been accepted by the server, the server either executes the given command in a non-interactive session or, if no command has been specified, logs into the machine and gives the user a normal shell as an interactive session.
The wording "executes the given command" would generally not imply "I'll just throw it at $SHELL and see what happens".

A few lines later it gets even more confusing:

  The session terminates when the command or shell on the remote machine exits and all X11 and TCP connections have been closed.
...which I definitely would say suggests that either a shell is executed or the command supplied as argument to ssh. That it means "command as interpreted by a shell on the remote host" is far from obvious.

blueflow

> The wording "executes the given command" would generally not imply "I'll just throw it at $SHELL and see what happens".

"command" means exactly that. Evaluation by shell. With that in mind, the manual page should read less ambiguous to you.

I actually don't have a good source for that, but you can check the execve(2) manpage. If command would refer to the execution of an argument vector, it would have been mentioned in there.

The other meaning of "command" refers to specific programs like those in /bin.

o11c

Use ' %q' and you also fix the problem of program names starting with a dash.

chubot

Ah yes, that's clever:

    $ sh -c "$(quote-argv -echo 'file with spaces')"
    sh: 0: Illegal option -h

    $ sh -c "$(quote-argv-left -echo 'file with spaces')"
    sh: 1: -echo: not found
Over ssh:

    $ ssh example.com "$(quote-argv-left -dashtest 'file with spaces')"
    -dashtest
    file with spaces

null

[deleted]

endiangroup

AD: Huh! I just wrote a utility cmd [1] this weekend to deal with restricting ssh keys to executing only commands that match a rule set via `ForceCommand` in `sshd_config` or `Command=""` in `authorized_keys`. I'm curious to see how susceptible it is to the aforementioned issues, it does delegate to `<shell> -c '<cmd>'` under the hood [2], but there are checks to ensure only a single command option argument `--` is passed (to mitigate metacharacter expansions) [3].

Note this tool is only intended to be another layer in security.

[1] https://github.com/endiangroup/cmdjail [2] https://github.com/endiangroup/cmdjail/blob/main/main.go#L30... [3] https://github.com/endiangroup/cmdjail/blob/main/config.go#L...

pabs3

Note that OpenSSH always runs commands in a shell, and so far they refused to add support for exec.

https://bugzilla.mindrot.org/show_bug.cgi?id=2283

blueflow

The docs say that exec.Command works with execv directly, so there should be no issue? You dont seem to call out to /bin/sh at all.

GuB-42

system() is for running shell commands and the article complains that it runs shell commands...

I rarely use it, and almost never in production, but it has its place. Think of it as the eval() of the POSIX world. If you want to build pipelines, or anything a shell has to offer, and do it simply, then system() is for you.

Security-wise, if you are using system() with user input, you are essentially giving shell access to the user, which may or may not be a big deal. If the intended users are people who already have a shell, that's fine maybe even desitable, otherwise, use something else, like exec*().

As for OpenSSH, what is the problem? The "SH" at the end means "shell", it runs shell commands, what did you expect?

pabs3

Related problems: command-line options that allow code execution[1], and commands that execute arbitrary code from the current directory.

1. https://web.archive.org/web/20201111203646if_/https://www.de...

orbisvicis

I don't understand the rsync example with '-e sh shell.c' as short-form options with space-separated values are expected to be two separate args and yet the glob expands it to a single arg. Right? Unless the tool does additional argument processing?

oguz-ismail

'-e sh shell.c' as a single argument is the same as '-e' and ' sh shell.c' as separate arguments (see <https://pubs.opengroup.org/onlinepubs/9799919799/functions/g...>). rsync executes ' sh shell.c' as a shell script and shells usually trim leading and trailing spaces when executing commands.

Wicher

For SSH specifically (ssh user@host "command with args") I've written this workaround pseudoshell that makes it easy to pass your argument vector to execve unmolested.

https://crates.io/crates/arghsh

theamk

Note that at least in python, you can use "shlex.quote" instead - it's in stdlib and does not need any extra tools.

    >>> import subprocess
    >>> import shlex
    >>> subprocess.run(['ssh', 'host', shlex.join(['ls', '-la', 'a filename with spaces'])])
    ls: cannot access 'a filename with spaces': No such file or directory
works nested, too

    >>> layer2 = ['ls', '-la', 'a filename with spaces']
    >>> layer1 = ['ssh', 'host1', shlex.join(layer2)]
    >>> layer0 = ['ssh', 'host0', shlex.join(layer1)]
    >>> subprocess.run(layer0)
(I am not sure if Rust has equivalent, but if it does not, it's probably easy to implement.. Python version is only a few lines long)

CGamesPlay

Wrong! SSH is very much the worst: it uses the user's login shell, not sh -c. So if the user's login shell isn't POSIX compatible, it still fails!

   >>> subprocess.run(["fish", "-c", shlex.join(["echo", "this isn\\'t working"])])
   fish: Unexpected end of string, quotes are not balanced
   echo 'this isn\'"'"'t working'

theamk

Well, you gotta draw the line somewhere, right? You can ssh into all sort of weird places, like native windows machines, or routers which expose their own shell, and you cannot expect them to be as usable as the regular ones.

The systems with non-POSIX non-interactive shell are firmly in the "special" category. If a user decided to set their _non_interactive_ shell to fish - they know they are heading for trouble and should not be surprised. I would not worry about such users in my scripts for example.

steveklabnik

> I am not sure if Rust has equivalent

Not in the standard library, but there are packages.

eternauta3k

This just confirms my habit of switching to python as soon as a shell script reaches any level of complexity

0xbadcafebee

What the author is talking about here is the growing pains of learning how 60-year old *NIX systems work, and shells/shell scripting. Once you learn how it works, it all works fine. But it's not easy to learn. If we had a standard education track for this stuff, it would be easier. (but there would still be people who avoid the education, rush into things, then bump their shins into the coffee table, and blame the coffee table)

ptx

> BTW, this is not something Linux specific. Unfortunately it is a trait inherited from the UNIX ancestry by almost all operating systems, including all BSD variants [...] Hmm... Strange... There is nothing to quote from this manual about warnings, issues or sanitization...

This is not a problem on FreeBSD, if the problem is (as the article seems to say) that the documentation fails to warn about the requirement to properly encode arguments passed to the shell.

Here's the FreeBSD man page [1] for system(3):

  SECURITY CONSIDERATIONS
     The system() function is easily misused in a manner that enables a
     malicious user to run arbitrary command, because all meta-characters
     supported by sh(1) would be honored.  User supplied parameters should
     always be carefully santized before they appear in string.
[1] https://man.freebsd.org/cgi/man.cgi?query=system&sektion=3&m...

a_t48

I've definitely gone down the rabbit hole of trying/being forced to fix issues like this. It starts off as just someone taking a shortcut of doing a little shell scripting in a python program or whatever. Generally the best tool I've found for fixing this is python's shlex.quote - https://docs.python.org/3/library/shlex.html but YMMV (multiple levels may be needed). The real best solution is not to shell out from your program when possible. :)

blueflow

Its not the manual pages that are ambiguous on this, its the author who used the word "command" but seemingly had a mental model of it as if it was an argument vector. A command and an argument vector are different things....