Getting decent error reports in Bash when you're using 'set -e'

(utcc.utoronto.ca)

122 points | by zdw 3 days ago ago

35 comments

  • Grimeton 13 hours ago ago

    You can just do

      trap 'caller 1' ERR
    
    should do the same thing. Also you should set "errtrace" (-E) and possibly "nounset" (-u) and "pipefail".
  • o11c 13 hours ago ago

    FWIW, I've grown the following which handles a few more cases. For some reason I wasn't aware of `caller` ...

        set -e
    
        is-oil()
        {
            test -n "$OIL_VERSION"
        }
    
        set -E || is-oil
    
        trap 'echo "$BASH_SOURCE:$LINENO: error: failure during early startup! Details unavailable."' ERR
    
        magic_exitvalue=$(($(kill -l CONT)+128))
    
        backtrace()
        {
            {
                local status=$?
                if [ "$status" -eq "$magic_exitvalue" ]
                then
                    echo '(omit backtrace)'
                    exit "$magic_exitvalue"
                fi
                local max file line func argc argvi i j
                echo
                echo 'Panic! Something failed unexpectedly.' "(status $status)"
                echo 'While executing' "$BASH_COMMAND"
                echo
                echo Backtrace:
                echo
                max=${#BASH_LINENO[@]}
                let max-- # The top-most frame is "special".
                argvi=${BASH_ARGC[0]}
                for ((i=1;i<max;++i))
                do
                    file=${BASH_SOURCE[i]}
                    line=${BASH_LINENO[i-1]}
                    func=${FUNCNAME[i]}
                    argc=${BASH_ARGC[i]}
                    printf '%s:%d: ... in %q' "$file" "$line" "$func"
                    # BASH_ARGV: ... bar foo ...
                    # argvi          ^
                    # argvi+argc             ^
                    for ((j=argc-1; j>=0; --j))
                    do
                        printf ' %q' ${BASH_ARGV[argvi+j]}
                    done
                    let argvi+=argc || true
                    printf '\n'
                done
    
                if true
                then
                    file=${BASH_SOURCE[i]}
                    line=${BASH_LINENO[i-1]}
                    printf '%s:%d: ... at top level\n' "$file" "$line"
                fi
            } >&2
            exit "$magic_exitvalue"
            unreachable
        }
        shopt -s extdebug
        trap 'backtrace' ERR
    • edoceo 12 hours ago ago

      What the hell. This is cool and all but I'm looking at it as a signal I should move up one tier in language (eg: to Perl, PHP, Python or Ruby)

      • 0xbadcafebee 5 hours ago ago

        Or go the other direction: stop trying to do fancy things and write simpler code that avoids errors.

          #!/bin/sh
          [ "${DEBUG:-0}" = "1" ] && set -x
          set -u
          foo="$( my-external-program | pipe1 | pipe2 | pipe3 )"
          if [ -z "$foo" ] ; then
              echo "Error: I didn't get any output; exiting!"
              exit 1
          fi
          echo "Well I got something back. Was it right?"
          if ! printf "%s\n" "$foo" | grep -q -E 'some-extended-regex' ; then
              echo "Error: '$foo' didn't match what I was looking for; exiting!"
              exit 1
          fi
          echo "Do the thing now..."
        
        A lot of programs will either produce valid output as STDOUT, or if they encounter an error, not produce STDOUT. So for the most part you just need to 1) look for any STDOUT at all, and then 2) filter it for the specific output you're looking for. For anything else, just die with an error. If you need to find out why it didn't run, re-run with DEBUG=1.

        Advanced diagnosis code won't make your program work better, but it will make it more complicated. Re-running with tracing enabled works just as well 99% of the time.

        • maccard 13 minutes ago ago

          > A lot of programs will either produce valid output as STDOUT, or if they encounter an error not produce stdout

          Lots of programs produce nothing in the success case and only print in the failure case.

      • o11c 12 hours ago ago

        I actually tried rewriting this in Python, but gave up since Python's startup latency is atrocious if you have even a few imports (and using a socket to a pre-existing server is fundamentally unable to preserve enough process context related to the terminal). Perl would probably be a better fit but it's $CURRENTYEAR and I've managed to avoid learning Perl every year so far, and I don't want to break my streak just for this.

        The Bash code is not only fast but pretty easy to understand (other than perhaps the header, which I never have to change).

    • chubot 12 hours ago ago

      I think you should be able to get rid of the is-oil part, because set -E was implemented last year

          $ osh -c 'set -E; set -o |grep errtrace'
          set -o errtrace
      
      I'd be interested in any bug reports if it doesn't behave the same way

      (The Oils runtime supports FUNCNAME BASH_SOURCE and all that, but there is room for a much better introspection API. It actually has a JSON crash report with a shell stack dump, but it probably needs some polish.)

      • oguz-ismail 9 hours ago ago

        >I'd be interested in any bug reports

        What's the point? You can't fix them anyway

  • bjackman 12 hours ago ago

    But trap doesn't "stack" (like e.g. defer in Go) so if you do this it's not available for other purposes like cleanup

    • teddyh 11 hours ago ago

      Yes. This also means that if you use a third-party shell library which uses “trap” internally (like shunit2), you can’t use “trap” in your own script at all.

    • gkfasdfasdf 10 hours ago ago

      not sure what you mean, you can have separate ERR and EXIT traps that run independently.

      • bjackman an hour ago ago

        Yeah but each only exists once. If you wanna clean up on ERR and also use that trap for debugging you need to implement some janky layer on top of the trap.

        (FWIW my take away from issues like this is always: Bash is not a serious programming language. If you are running up against these limitations in real life it's time to switch language. The challenge is really in predicting when this will happen _before_ you write the big script!)

  • newAccount2025 14 hours ago ago

    Why don’t all shells just do this?

    • inetknght 13 hours ago ago

      Perhaps you underestimate just how many scripts are poorly written and part of your operating system.

      For what it's worth, I think `set -euo pipefail` should be default for every script, and thoroughly checked with shellcheck.net.

      • mananaysiempre 12 hours ago ago

        Take care that set -o pipefail will not work on older dash (including IIRC the current Ubuntu LTS), and neither will set -o pipefail || true if set -e is in effect. (For some reason that I’m too lazy to investigate, a failing set invocation will crash the script immediately rather than proceed into the second branch.) The best I could think of to opportunistically enable it was to use a subshell:

          if (set -o pipefail 2>/dev/null); then set -o pipefail; fi
        
        Or you can just target bash, I guess.

        (I rather dislike shellcheck because it combines genuine smells with opinions, such as insisting on $(...) instead of `...`. For the same reason, with Python I regularly use pyflakes but can’t stand flake8. But to each their own.)

        • koolba 10 hours ago ago

          > such as insisting on $(...) instead of `...`.

          Only one of those can be (sanely) nested. Why would you ever want to use backticks?

          • oguz-ismail 9 hours ago ago

            Looks slick. Works on older shells too

      • imcritic 12 hours ago ago
        • javier2 3 hours ago ago

          Well, maybe. But using `set -euo pipefail` here does not make it any worse as far as i understand? The script still does broken things, but the more correct way to be safe is not broken by set -euo pipefail

        • larkost 11 hours ago ago

          I have never liked this statement of the problem.

          It is not that `set -e` is bad, it is that bash is a bit weird in this area and you have to know when things eat errors and when they don't. This is not really changed by `set -e`: you already had to know them to make safe code. `set -e` does not wave a magic wand saying you don't have to understand bash error control.

          But having `set -e` is almost universally better for people who do not understand it (and I would argue also for people who do). Without it you are responsible for implementing error handling on almost every line.

          As other have already said: this is one of those things that generally pushes me to other languages (in my case often Python), as the error handling is much more intuitive, and much less tricky to get right.

      • eikenberry 12 hours ago ago

        What about for `/bin/sh`, i.e. posix compliant shells like dash?

      • scns 12 hours ago ago

        > For what it's worth, I think `set -euo pipefail` should be default for every script, and thoroughly checked with shellcheck.net.

        This

    • koolba 13 hours ago ago

      Automatically leaking the line number and command, even to stderr is not a sane default.

      • eastbound 4 hours ago ago

        Is that a safety point of view? Is shell supposed to be input-safe? I may have limited shell skills but it doesn’t seems like it’s designed to be safe.

    • forrestthewoods 11 hours ago ago

      Because shells weren’t supposed to be doing complex logic. People use shells to do way way way more than they should.

  • westurner 10 hours ago ago

    Setting PS4 gets decent error reports with `set -x` (and `set -x -v`; `help set`).

    Here's an excerpt that shows how to set PS4 from a main() in a .env shell script for configuring devcontainer userspace:

      for arg in "${@}"; do
      case "$arg" in
      --debug)
          export __VERBOSE=1 ;
          #export PS4='+${LINENO}: ' ;
          #export PS4='+ #${BASH_SOURCE}:${LINENO}:${FUNCNAME[0]:+${FUNCNAME[0]}()}:$(date +%T)\n+ ' ;
          #export PS4='+ ${LINENO} ${FUNCNAME[0]:+${FUNCNAME[0]}()}: ' ;
          #export PS4='+ $(printf "%-4s" ${LINENO}) | '
          export PS4='+ $(printf "%-4s %-24s " ${LINENO} ${FUNCNAME[0]:+${FUNCNAME[0]}} )| '
          #export PS4='+ $(printf "%-4s %-${SHLVL}s %-24s" ${LINENO} "     " ${FUNCNAME[0]:+${FUNCNAME[0]}} )| '
          ;;
      --debug-color|--debug-colors)
          export __VERBOSE=1 ;
          # red=31
          export ANSI_FG_BLACK='\e[30m'
          #export MID_GRAY_256='\e[38;5;244m'    # Example: a medium gray
          export _CRESET='\e[0m'
          export _COLOR="${ANSI_FG_BLACK}"
          printf "${_COLOR}DEBUG: --debug-color: This text is ANSI gray${_CRESET}\n" >&2
          export PS4='+ $(printf "${_COLOR}%-4s %-24s%s |${_CRESET} " ${LINENO} "${FUNCNAME[0]:+${FUNCNAME[0]}}" )'
          ;;
       esac
       done
    
    This, too:

      function error_handler {
        echo "Error occurred on line $(caller)" >&2
        awk 'NR>L-4 && NR<L+4 { printf "%-5d%3s%s\n",NR,(NR==L?">>>":""),$0 }' L=$1 $0 >&2
      }
      if (echo "${SHELL}" | grep "bash"); then
        trap 'error_handler $LINENO' ERR
      fi
    • kjellsbells 7 hours ago ago

      (I'm sure this is lovely Bash, but for all the people who rejected Perl for its modem line noise vibe...what do ya think of this?)

      As an aside, I actually wonder if Bash's caller() was inspired by Perl's.

      There is also Carp and friends, plus Data::Dumper when you not only need the stack trace but also the state of objects and data structures. Which is something that I don't think Bash can really do at all.

      • Grimeton 6 hours ago ago

        There are no objects in bash. There are indexed and associative arrays and both can be iterated over like so:

          for value in "${SOMEARRAY[@]}"; do
            echo "${value}"
          done
        
        or with the help of the keys:

          for key in "${!SOMEARRAY[@]}"; do
            echo "key: ${key} - value: ${SOMEARRAY["${key}"]}"
          done
        
        If you want to dump the data of any variable you can just use declare -p

          declare -p SOMEARRAY
        
        and you get something like this:

          declare -a SOMEARRAY=([0]="a" [1]="b" [2]="c" [3]="d" [4]="e" [5]="f")
        
        What you can do, if you have a set of variables and you want them to be "dumped", is this:

        Let's "dump" all variables that start with "BASH":

          for k in "${!BASH@}"; do
            declare -p "${k}"
          done
        
        Or one could do something like this:

          for k in "${!BASH@}"; do
            echo "${k}: ${!k}"
          done
        
        
        But the declare option is much more reliable as you don't have to test for the variable's type.