Debugging Bash Like a Sire (2023)

50 comments

·July 17, 2025

mjd

I've been writing scripts in Bourne shell since the 1980s and in Bash since whenever it came along, and I feel like the most important thing I've learned about it is: don't. Sure, it can be done, and even done well, but why? There are better languages.

Every time I write a shell script that grows to more than about 20 lines I curse myself for not having written it in Python. The longer I have waited before throwing it away and redoing it, the more I curse.

This article says nothing to change my mind. I could build logging and stack traces in Bash. I admire the author's ingenuity. But again, why?

latexr

As someone who has been writing shell scripts for a few decades (though not as long as you), I’d instead recommend “learn what your tools are appropriate for and use them that way”. There are plenty of cases where shell scripts are the right tool for the job.

I can’t even tell how many times I’ve seen multi-line Python scripts which could instead have been a shell one-liner. Shorter and faster.

I have also written shell scripts with hundreds of lines, used by thousands of people, which work just fine and would be more complicated and slower in other languages.

I firmly disagree with the all too pervasive blanket statement of “there are better languages”. It depends. It always does.

imglorp

I'll put a point on the "it depends" bit.

If you have a standard-ish environment, you'll have an array of Unix tools to compose together, which is what a shell is best at. Even a minimal image like busybox will have enough to do serious work. Golfing in shell can be a pipeline of tools: lately "curl | jq | awk" does a lot of lifting for me in a one-liner.

As soon as you say "switch to (favorite decent scripting environment)", you're committing to (a) many megs of its base install, (b) its package management system, (c) whatever domain packages you need for $work, and (d) all the attendant dependency hells that brings along. Golfing in a scripting environment is composing a bunch of builtin operations.

thaumasiotes

> Golfing in shell can be a pipeline of tools: lately "curl | jq | awk" does a lot of lifting for me in a one-liner.

> As soon as you say "switch to (favorite decent scripting environment)", you're committing to (a) many megs of its base install, (b) its package management system, (c) whatever domain packages you need for $work, and (d) all the attendant dependency hells that brings along.

OK, but isn't jq just an example of a favorite scripting environment with a multi-meg install and a dependency system? What are you doing that's different from what you're advising everyone else not to do?

> Golfing in a scripting environment is composing a bunch of builtin operations.

Neither curl nor jq is a builtin operation.

jahsome

When it comes to shell scripting, I personally avoid golf at all costs. I'll take an extra verbose, easy script to parse (for a human) any day of the week when it comes to operations.

Yes it's a tradeoff. Every line of code is a liability. Powershell or python are probably "slower" which in my use case is negligible and almost never relevant. On the other hand, I can't help but view the often esoteric and obscurely clever bash mechanisms as debt.

latexr

I’m not talking about code golf. Verbosity and clarity are not directly correlated. The examples I’m talking about are also often easier to read as shell scripts.

For example, let’s take a file as input, filter for every "mypattern" line, then output them sorted.

Python example:

  import sys
  print(*sorted(line for line in open(sys.argv[1]) if 'mypattern' in line), sep='')

Shell example:

  grep 'mypattern' "${1}" | sort

The shell version is shorter, easier to read, easier to understand, easier to search for, and an order of magnitude faster. You can certainly make the Python version more verbose, yet it’ll never reach the same level of clarity and immediacy as the shell version.

hiAndrewQuinn

It won't be surprising since I wrote [1], but I mostly write bash when I want to create complicated pipelines with fzf, and I don't want to write Go code to go the same thing.

[1]: https://andrew-quinn.me/fzf/

jelder

This is an excellent point about pipes. There seems to be no other language which lets you stitch together pipes like Bash does. It's incredibly powerful, and worth putting up with all of Bash's warts.

Thanks for fzf, by the way. Always one of the first things I install in a new environment.

esafak

https://www.nushell.sh/ is next level when it comes to pipes:

    ls | where size > 10mb | sort-by modified

cerved

They are not the author of fzf

pdimitar

You have my admirations for fzf, it helps me dozens of times every day. And I do understand that authors of such prominent tools will want to have tamed integrations with people's shells, makes perfect sense.

That being said, as a guy who does not have big prominent OSS tools under his belt, I am slowly but surely migrating away from shell scripts and changing them to short Golang programs. Already saved my sanity a few times.

Nothing against the first cohort of people who had to make computers work; they are heroes. But at one point the old things only impede and slow everyone else and it's time to move on.

hiAndrewQuinn

Sorry, I was accidentally unclear in my writing. fzf was written by https://github.com/junegunn , I merely wrote a tutorial on it that got unexpectedly popular on here some years back.

I'm sorry Junegunn! I would never dream of stealing that kind of valor. I'll remember to flag [1] as a tutorial I wrote explicitly in the future.

kajkojednojajko

> the most important thing I've learned about [bash] is: don't. Sure, it can be done, and even done well, but why? There are better languages.

This. Bash gives you all the tools to dig a hole and none to climb out. It's quick and easy to copy commands from your terminal to a file, and it beats not saving them at all.

Support for digging: once you have a shell script, adding one more line conditioned on some env var is more pragmatic than rewriting the script in another language. Apply mathematical induction to grow the script to 1000 lines. Split into multiple files when one becomes too large and repeat.

Missing support for climbing out: janky functions, no modules, user types, or tests; no debugger and no standard library. I've successfully refactored messy python code in the past, but with bash I've had no idea where to even start.

There is hope that LLMs can be used to convert shell scripts to other languages, because they can make the jump that experienced devs have learned to avoid: rewriting from scratch. What else do you do when refactoring in small steps is not feasible?

sgarland

> no modules

Ish. You can source whatever files you want, so if you split up your functions into logical directories / files, you can get modules (-ish).

> no tests

BATS [0].

[0]: https://github.com/bats-core/bats-core

> I've successfully refactored messy python code in the past, but with bash I've had no idea where to even start.

I say this with all kindness: you probably need to know more bash before you can safely refactor it. It is a very pointy and unforgiving language.

thaumasiotes

I wrote a powershell script to run an ffmpeg workflow. I'm confident that this was a better idea than either of the other two approaches that you seem to be advocating for:

(a) instead of writing a shell script to operate a shell-operated tool, write a python script with a bunch of os.system('shell out') commands.

(b) instead of just invoking ffmpeg to do the things you want done, install an ffmpeg development library, and call the functions that ffmpeg itself calls to do those things.

What would be the argument for either of those?

pdimitar

> There is hope that LLMs can be used to convert shell scripts to other languages, because they can make the jump that experienced devs have learned to avoid: rewriting from scratch. What else do you do when refactoring in small steps is not feasible?

There were some languages shown in HN that compile to sh/bash (like oilshell[0]). I would think that's also a viable vector of attack but not sure how viable it actually is i.e. maintainers might have moved on for various reasons.

[0] https://github.com/oils-for-unix/oils

orhmeh09

Because it's better at the task than Python is.

bigstrat2003

The only thing bash is better at than Python is very short scripts, like 10ish lines. Everything else it sucks at, due to the horrible syntax and various bash footguns.

mjd

That's just the problem! It is better at the task. Until it isn't, and "isn't" comes much too soon.

hn92726819

I've seen this sentiment a lot here. "Once shell is >n lines, port to python". My experience has been different. Maybe half of the scripts I write are better off in python while the other half are exponentially longer in python than bash.

For example, anything to do with json can be done in 1 line of readable jq, while it could be 1, 5, or 20 lines in python depending on the problem.

I'd just like to put that out there because half of the time, the >n metric does not work for me at all. My shell scripts range from ~5-150 lines while python are 100+

johnisgood

If you have to call out to many external programs, might as well use Bash. I use Bash in such cases.

coderatlarge

i once wrote a whole data processing library in bash because i didn’t want people at my then workplace to extend and continue developing it. it was needed for a narrow purpose which it served well (details lost). ultimately people ported it to python and kept developing it anyway.

jeffrallen

An excellent use case for AI coders today is, "change this she'll script that's gotten too big into python". :)

bmoyles

You can also skip the subshell invocation of date by using %(fmt)T from bash's printf:

  %(fmt)T -output the date-time string resulting from using FMT as a format string for strftime(3)

The man page provides a bit more detail:

  %(datefmt)T causes printf to output the date-time string resulting from using datefmt as a format string for strftime(3).  The corresponding argument  is  an  integer
  representing  the  number of seconds since the epoch.  Two special argument values may be used: -1 represents the current time, and -2 represents the time
  the shell was invoked.  If no argument is specified, conversion behaves as if -1 had been given.  This is an exception to the usual printf behavior.

With that,

    timestamp=$(date +'%y.%m.%d %H:%M:%S')

becomes

    printf -v timestamp '%(%y.%m.%d %H:%M:%S)T' -1

gkfasdfasdf

That would indeed be faster, looks like it requires bash 4.2+

gkfasdfasdf

Why not leverage the bash 'caller' builtin? It's meant for printing stack traces, e.g.

    #!/bin/bash
    
    die() {
      local frame=0
      while caller $frame; do
        ((++frame));
      done
      echo "$*"
      exit 1
    }
    
    f1() { die "*** an error occured ***"; }
    f2() { f1; }
    f3() { f2; }
    
    f3


 Output
    
    
    12 f1 ./callertest.sh
    13 f2 ./callertest.sh
    14 f3 ./callertest.sh
    16 main ./callertest.sh
    *** an error occured ***

Via: https://bash-hackers.gabe565.com/commands/builtin/caller/

cassianoleal

> I tend to skip the -u flag as bash scripts often interact with global variables that are set outside my scripts.

That's throwing the baby out with the bathwater. Instead, default the optional global variables with something like:

    "${GLOBAL_VAR:-}"

That will satisfy the optionality of the variable whilst keeping the check for the cases you actually want them.

Y_Y

I find that `bash -x` actially gives such a good trace that I rarely need anything else. Coupled with the ancient wisdom "bash for quick and dirty, once it gets too fancy switch to python", I use bash a lot and find it manages really well without external tools. Shoutouts to shellcheck though, for catching a lot of edge cases.

ygritte

Yep, shellcheck + syntastic in vim makes for a great bash "IDE".

kjellsbells

Curious that after 30 years the idea of a CPAN-like bash repository hasnt taken root. How many personal reimplementations of logging in bash are there by now?

I suppose what is really tripping people up is that bash can show up on all kinds of runtimes, some of which have the external tools one might need (jq, logger, etc) and some of which don't. So then you go searching for a minimum standard that can be expected to be present. Maybe POSIX or gnu coreutils. Reminds me of the shell horrors of the late 1990s where every script had to figure out if sh was really ksh and what variant of UNIX it was running on, and therefore what commands and options were available. I swear this was one of the great things about Perl when it came along, it just worked.

In 2025, I kind of see the attraction of single binaries like Go does. Ship the binary and be done. It is very un-UNIX I suppose (not so much golfing as having the beer cart drive you to the hole) but then again its not 1985 any more.

t43562

I never thought of the idea of printing out a stack trace. A logging function is an example of such a good idea that is so obvious that I didn't think of it :-)

I use -e sometimes but I really dislike scripts that rely on it for all error handling instead of handling errors and logging them.

https://www.shellcheck.net/

^^ this tool has proven very useful for avoiding some of the most silly mistakes and making my scripts better. If you're maintaining scripts with other people then it is a great way of getting people to fix things without directly criticising them.

oneshtein

You can use bash-modules if you like stack traces in bash.

null

[deleted]

chasil

It is always helpful in recording if any bash behavior changes when it is not running in POSIX mode.

This is most common in Debian and Ubuntu, where ash is /bin/sh, and /bin/bash does not run in POSIX mode by default.

Some behavior of legacy bash of the '80s, prior to POSIX.2, can be surprising.

https://w3.pppl.gov/info/bash/Bash_POSIX_Mode.html

cerved

(D)ash is not bash --posix

hn92726819

> I tend to skip the -u flag as bash scripts often interact with global variables that are set outside my scripts.

What? If globals are set outside the scripts, -u still works. If the author means they may or may not be defined outside the script, the ${VAR:-} construct allows it to expand to nothing if unset (just throw VAR=${VAR:-} at the top if you don't want to edit the body)

Also, I do not like the function return based on error code:

    function ... {
      ...
      (( check_level >= current_level ))
    }

Unless I'm reading this wrong, this is a bad idea if using set -e. This is a function and it should instead:

    return $(( check_level < current_level ))

toolmantim

Reminds me of Keith Pitt’s (Buildkite founder) latest streams: https://twitch.tv/magickeith/

rook_line_sinkr

Many commenters recommend shellcheck

For actually _testing_ the scripts or its functions, I recommend ShellSpec

https://github.com/shellspec

stebian_dable

There’s also bashunit https://bashunit.typeddevs.com