Use Long Options in Scripts
74 comments
·March 21, 2025wahern
gray_-_wolf
Another approach is to have powerful enough language that allows you to guard against the shell injection. I wrote a syntax form allowing to do this:
(sh "cat " file " >" output)
With file being bound to "foo'bar" and output to "x", it is automatically translated into cat 'foo'\''bar' >'x'
This gives you the flexibility to use shell (sometimes it just is the most concise way) while being safe against injection.I believe for example in rust you should be able to do the same.
delusional
How do you know which shell you're escaping for? You could query the system, but now you end up implementing escaping for every shell out there.
paulddraper
Tbf this input does not need escaping.
But at the very least the shell is unnecessary here.
echelon
SQL injection on steroids.
rat87
Only if you are getting input from untrusted users
remus
imo it's best to just avoid it altogether. Requirements change, and what was once a trusted input can become untrusted input.
bschmidt711
[flagged]
bschmidt715
[flagged]
tasuki
Why not?
bulatb
Any time you send commands and data down a single channel, user input that's intended to be data can be misinterpreted as a command. For example, if your program wants to:
run("program --option '{user_input}' > file")
to save some input to a file, and the user's input is: '; bad_command #
then when run() sends that string to the shell, the shell will run: program --option '';
bad_command #' > file
Most languages have something like a safe_exec() that separates the shape of the command from the values of the options, executing "program" with the options and the user_input in the arguments array as data. Skipping the shell step, which would just be building an exec call anyway, removes the opportunity for users to confuse it into doing something else.The list-based API alternative they recommend might look like this:
safe_exec(["program", "--option", user_input], stdout="file")
and it would always exec "program" with argv[1] == "--option" && argv[2] == user_input. If the user_input happens to be: '; bad_command #
...well, then, the user can enjoy the contents of their file.tasuki
Yes of course. But why would you expect me to run shell commands with random person's input? Also:
safe_exec(["rm", user_input])
This isn't safe either! Despite clearly saying "safe_exec"!null
dosourcenotcode
Agree that long options should be used. But there is one caveat to consider: portability.
Sadly to this day not all BSD distributions have GNU style long options. And the ones that now do only got them fairly recently. So if you want portability you have to use short options as you weep with a bottle of vodka in hand.
saghm
This surprises me because the first case I remember ever coming across where short versus long options impacted portability across GNU and BSD was _fixed_ by using long options. Maybe six years ago or so I had an issue porting a script someone else had written for use in CI that happened to decode some base64 data that failed when I tried to use it on a different platform. I forget which one it was originally written for and which one I was trying to use it on, but the issue boiled down to the MacOS version of base64 using the BSD short option for decode and Linux using the GNU one, and they each used a different capitalization; one used `-d` and the other used `-D` (although I also can't remember which used which honestly). My solution was to use the long option `--decode`, which was the same on both of them, and since then the times I've needed to decode base64 I've always used the long option out of habit, which probably explains why I can't remember what option Linux uses despite it being the one I've used far more over the years since then.
delusional
I think the right way to think about this (if your goal is to avoid surprises at least) is that options (short or long) are just strings. There's no guarantee that there's a long variant of an option. There's not even a requirement that options start with a dash. A sufficiently brain-damaged developer could start them with a slash or something.
If you're going for portability the best bet is to just read the manual for each of the separate versions and do whatever works.
mplanchard
Not trying to spam this thread with praises of nix, because it does have its own problems, but it certainly solves the portability problem.
Four years in to using it at work for dev environments across mac (x86 & ARM) and various linuxes and can’t imagine going back. I also always make dev environment definitions for my open source projects, so even if people aren’t using nix, there is at least a record of what tools they will need to install to run scripts, tests, etc.
nine_k
Does nix work well on BSD-derived Unices? In particular, the most widespread of them, macOS?
bschmidt800
[flagged]
teddyh
Also, do not forget using “--” after all options, but before any dynamic arguments, just to be safe.
arcanemachiner
I know to do this intuitively, but I have no idea why.
hoherd
It terminates argument parsing, so anything following it that starts with a hyphen will not be treated as an argument.
$ echo 'hack the planet' > --help
$ cat --help
cat: illegal option -- -
usage: cat [-belnstuv] [file ...]
$ cat -- --help
hack the planet
$ rm -vf --help
rm: illegal option -- -
usage: rm [-f | -i] [-dIPRrvWx] file ...
unlink [--] file
$ rm -vf -- --help
--help
$ cat -- --help
cat: --help: No such file or directory
ratrocket
I agree with this practice. Another benefit is it makes it easier (slightly, but still) to grep the man page for what the options do.
The corollary must be "write programs that take long options".
saagarjha
Unfortunately, if you want your scripts to be portable to other POSIX systems you might have to use the short options, as the long ones are not standardized. You have to decide the tradeoff for yourself.
mplanchard
Using nix has really spoiled me on this. Everyone gets the same versions of all the CLI utilities in the dev environment, whether on mac or linux, and those are the same versions that run in CI and any prod systems. It’s really nice being able to use whichever newer bash features or gawk extensions you like, without having to deal with trying to ensuring the mac engineers have brew-installed all the right stuff to match a standard linux env.
delusional
nix didn't solve your issue here. nix didn't do anything. You're just describing the benefit of a reproducible development environment. You could do the same thing with brew, pacman, apt, or by just compiling every package from source from some huge mirror.
It's exactly the same thing people initially loved about docker or vagrant.
paulddraper
Everyone has to use nix :)
But yes, that is nice.
pcwalton
What POSIX systems in actual use (not historical Unixes) don't have the long options? macOS' BSD utilities I guess?
yjftsjthsd-h
> What POSIX systems in actual use (not historical Unixes) don't have the long options?
All of them except for GNU, AFAICT? (That is, only GNU seems to have long options.) Checking manpages for rm(1) as a simple reference, I can't see long options in any of the 3 major BSDs or illumos, and checking Alpine Linux seems to show busybox also only doing short options (sorry, can't find an online doc for this, though it's easy to check in docker if you don't have a machine running Alpine handy). OpenWRT also uses busybox and has the same (lack of) options.
https://man.freebsd.org/cgi/man.cgi?query=rm&apropos=0&sekti...
schneems
You can also brew install tools like gnused which have the same arguments. Not a viable option for all situations but if you just need to execute it on Linux and your local machine for dev you can use those.
bschmidt711
[flagged]
bschmidt706
[flagged]
pixelkink
Prepares to launch in flaming rant... sigh You're right.
starkparker
And put them on separate lines so you can track and git blame them more easily.
ivanjermakov
Same line git blame is not that hard, just list commits affecting specific file or even specific line span: https://git-scm.com/docs/git-log#Documentation/git-log.txt--...
bschmidt708
[flagged]
ndegruchy
This is one of my default rules for writing scripts. If the long option is available, use it. It makes too much sense to do so.
croes
> Long form options are much more self-explanatory for the reader.
And less prone to typos
dosourcenotcode
And not just options but base command names too. I wrote a tool to partially mitigate this in some cases: https://github.com/makesourcenotcode/name-safe-in-bash
sebastianmestre
You have a nice ternary counter going in the version numbers :)
gabrielsroka
Related 2013/2020 https://news.ycombinator.com/item?id=24518682
vivzkestrel
what if I or someone wrote a bot / script that searches across github for every shell script file that it can find and converts all short options into long options and opens a PR? Think dependabot but lets call it longabot or readabot?
jmholla
I don't think you should do that unprompted. There are reasons for using short options like the portability mentioned in other comments. It'd put an undo burden on open source maintainers.
Something opt-in like dependabot though could be useful.
Please DO NOT mix string interpolation and command execution, especially when a command is processed through the shell. Whatever your language, use a list-based or array-based execution API that passes arguments straight through to execv(2), execvp(2), etc, bypassing the shell.