bash tips

Bash tips & tricks: good and not-so-good bash practices

Bash is not the most programmer-friendly tool. It requires a lot of caution, low-level knowledge and doesn’t allow the slightest mistake (you know you can’t type foo = 42, right?). On the other hand, bash is everywhere (even on Windows 10), it’s quite portable and powerful, and in effect is the most pragmatic choice when automating tasks. Luckily, following a set of simple rules can save you from many of its minefields. Everything bellow have been battle-tested by our dedicated software development teams in our day to day work.

1. Shebang
2. Quotes
3. Variables
4. Working directory
5. You don’t really need ls
6. Expect the unexpected
7. .sh or .bash?
8. Other things to remember
9. Conclusion

But let’s start from the top…

1. Shebang

There are a number of possible shebangs you can use to refer to the interpreter you want to execute your code under. Some of them are:

  • #!/usr/bin/env bash
  • #!/bin/bash
  • #!/bin/sh
  • #!/bin/sh –

We all know a shebang is nothing but the path (absolute or relative to current working directory) to shell interpreter, but which one is preferred?

Long story short – you should use #!/usr/bin/env bash for portability. The thing is that POSIX does not standardize path names, so different UNIX-based systems may have bash placed in different locations. You cannot safely assume that – for example – /bin/bash even exists (some of BSD systems have bash binary placed in /usr/local/bin/bash).

Env utility can help us workaround this limitation: #!/usr/bin/env bash will cause code execution under the first bash interpreter found in PATH. While it’s not the perfect solution (what if the same problem applies to /usr/bin/env? Luckily, every UNIX OS I know have env placed exactly there), it’s the best we can go for.

However, there is one exception I’m aware of: for a system boot script, use /bin/sh since it’s the standard command interpreter for the system.

It’s worth to check out this and this article for more information.

2. Always use quotes

This is the simplest and the best advice you should follow to save yourself from many of possible pitfalls. Incorrect shell quoting is the most common reason of a bash programmer’s headache. Unfortunately, it’s not as easy as important.

There are many great articles completely covering this specific topic. I don’t have anything more to say, but to recommend you this and this article.

It’s worth to remember, that you generally should use double quotes.

3. Variables usage

$foo is the classic form of variable referencing in bash. However, version 2 of bash (see echo $BASH_VERSION) brings us a new notation known as variable expansion. The idea is to use curly braces around variable identifier, like ${foo}. Why is this considered to be a good practice? It brings us a whole set of new features:

  • array elements expanding: ${array[42]}
  • parameter expansion, like ${filename%.*} (removes file extension), ${foo// } (removes whitespaces) and ${BASH_VERSION%%.*} (gets major version of bash)
  • variable concatenation: ${dirname}/${filename}
  • appending string to a variable: ${HOME}/.bashrc
  • access positional parameters (arguments to a script) beyond $9
  • substring support: ${foo:1:5}
  • indirect referencing: ${!foo} will be expanded to a value hold by a parameter whose name is stored in foo (bar=42; foo="bar"; echo "${!foo}" will print 42)
  • case modification: ${foo^} will modify foo‘s first character to uppercase, the , operator to lowercase. Theirs double-form (^^ and ,,) will convert all characters

In most common cases, using variable expansion form gives us no advantage over the classic one, but to keep code consistent, using it everywhere can be considered as a good practice. Read more about it here.

What you also have to know about variables in bash is that by default, all of them are global. This can result in problems like shadowing, overriding or ambiguous referencing. local operator restricts the scope of variables, protecting them from leaking to a global namespace. Just remember – make all your function’s variables as local.

4. Watch the script’s working directory

Within bash script, you will often operate on other files. Thus, you have to be really careful using relative paths. By default, the current working directory under script is derived from parent shell.

$ pwd
/home/jakub

$ cat test/test
#!/usr/bin/env bash
echo "$(pwd)"

$ ./test/test
/home/jakub

The problem exist when both pwd and script’s location differs. You cannot then simply refer to ./some_file, since it does not point to some_file placed next to your script. To be able to easily operate on files in script’s directory and avoid messing up random system files, you should consider using this handy one-liner to change subshell working directory to source directory of a bash script:

cd "$(cd "$(dirname "${BASH_SOURCE[0]}")" > /dev/null && pwd)" || return
$ pwd
/home/jakub

$ cat test/test
#!/usr/bin/env bash
cd "$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null && pwd)" || return
echo "$(pwd)"

$ ./test/test
/home/jakub/test

Looks much more natural, doesn’t it?

5. You don’t really need ls

The approach of ls usage inside bash script is almost always entirely flawed. I’m not able to recall even one reason to do this. To explain why, let’s go through two of common examples:

for file in $(ls *.txt)

Word Splitting will ruin this for-loop when any of filenames contains whitespace. What’s more – if a filename contains glob character (also known as a wildcard, like *, ?, [, ]), it will be recognized as a glob pattern and expanded by the shell. That’s probably not exactly what you want. Another problem is that POSIX allows pathnames contain any character except \0 (including |, / and even newline). This makes impossible to determine where the first pathname ends and the second one begins when dealing with ls output.

for file in "$(ls *.txt)"

Double quotes around ls will cause its output to be treated as a single word – not as a list of files, as desired.

How to iterate over list of files the right way? There are two possibilities:

for file in ./*.txt

This uses bash globbing feature mentioned above. Remember to double quote "${file}"!

find . -type f -name '*.txt' -exec ...

This one is probably the best solution. Find util lets you use regex-based search (-regex), recursion and has many other built-in features you may find useful. Here is a great synopsis of this tool.

find . -type f -name '*.txt' -print0 | xargs -0 ...

An alternative usage of find and xargs. It’s neither simpler nor shorter, but the advantage of xargs is that it supports parallel pipeline execution. Read more about the differences here.

To summarize, never try to parse the output of ls command. It’s simply not indented to be parsed and there is no way you can make it work. Read more here“.

6. Expect the unexpected

It’s often forgotten to check for non-zero status codes of commands executed within the bash script. It’s easy to imagine what would happen when our cd command preceding file operations fails silently (because of “No such file or directory” for example).

#!/usr/bin/env bash
cd "${some_directory}"
rm -rf ./*

An example above works well, but only if nothing goes wrong. The intention was to delete content of some_directory/, but it may end up executing rm -rf ./* in completely different location.

cd "${some_directory}" && rm -rf ./* and cd "${some_directory}" || return are the simplest and self-descriptive solution. In both cases, deletion won’t execute if cd returns non-zero. It’s worth to point out, that this code is still vulnerable to a common programming error – misspelling.

Executing cd "${some_dierctory}" && rm -rf ./* will end up deleting files you probably want to keep (as long as there isn’t misspelled some_dierctory variable declaration). "${some_dierctory}" will be expanded to "", which is entirely valid cd argument bringing us to home directory. Don’t worry though, that’s not the end of the story.

Bash has some programmer-friendly switches you should be aware of:

  • set -o nounset tells bash to treat referring to unset variables as an error. This one saves us from many typos mistakes.
  • set -o errexit tells bash to exit the script immediately if any statement returns a non-zero. One may say, that using errexit gives us error checking for free, but this can be tricky to use correctly. Some commands returns a non-zero for a warning and sometimes you know exactly how to handle particular command’s error. Read more here.
  • set -o pipefail changes the default behavior when using pipes. By default, bash takes the status code of the last expression in a pipeline, meaning that false | true will be considered to return 0. It may not be what you want, since this approach ignores errors raised by previous commands in pipeline. This is where pipefail comes in. This options sets the exit code of a pipeline to the rightmost non-zero one (or to 0 if all commands exit successfully).
  • set -x causes bash to print each command right before executing it (i.e. after globbing, arguments expanding). Definitely a great help when trying to debug a bash script failure.

Of course error handling problems applies not only to cd command described above. Your script should take into account vast majority of possible problems, like spaces in pathnames, files missing, directories not being created or non-existing commands (you know, awk isn’t always present in OS you’re about to run your script on).

7. .sh or .bash?

What’s the proper file extension of a shell script executable? And what if it stands in opposition to shebang?
Well, the very first thing you have to know is that UNIX-based OS is not like Windows. The most important difference here is that Windows uses the file extension to determine how to open it. UNIX follows different technique – it reads the file’s header code. You can set your UNIX binary extension to jpg and safely open it. No photo browser will pop-up.

Since usually UNIX does not rely on file extensions, the best advice is not to use script’s one at all. Using .sh is not really useful convention and I see no advantages of it. Many of UNIX utils are implemented as a bash scripts, but did you ever type xdg-open.sh ., shasum.sh some_file or lsb_release.sh -a? You probably didn’t and the reason is that those utils doesn’t have file extensions at all.

8. Other things to remember

  • Prefer $() syntax over legacy backticks (pid="$(pidof some_process)", not pid=`pidof some_process`)
  • Prefer using double brackets in if statements ([[ "${foo}" = '' ]], not [ "${foo}" = '' ])
  • Local variable names should follow lower_case convention with underscores
  • Constants and environment variable names should follow UPPER_CASE convention with underscores
  • Define functions like foo() { … }, not function foo { … }
  • Prefer absolute paths
  • Warnings and errors should be printed to STDERR
  • For simple conditionals, use && and || (e.g. [[ -z "${*// }" ]] && return 0)
  • printf is often better choice than echo (see this article)

9. Conclusion

Every time you choose bash to automate some task, it’s worth to consider its alternatives. Bash is powerful, but can be tricky, hard to debug and full of traps as well. Take a look at this python sh module, Bash Infinity framework (“modern boilerplate / framework / standard library for bash”) and Batsh – a simple programming language convertible to bash and Windows Batch.

To keep your code well-written, use this great bash linter and consider starting your next bash script like this:

#!/usr/bin/env bash

set -o errexit
set -o nounset
set -o pipefail
readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

main() {
    # your code goes here...
}

main "${@}"

Sources:
https://mywiki.wooledge.org/BashPitfalls
https://github.com/progrium/bashstyle
https://google.github.io/styleguide/shell.xml
https://wiki.bash-hackers.org/scripting/obsolete
https://books.goalkicker.com/BashBook/



LEAVE A COMMENT