dotlinux guide

Error Handling in Shell Scripts: Techniques and Tips

Table of Contents

Understanding Error Handling in Shell Scripts

Exit Codes: The Foundation of Error Signaling

Every command executed in a shell returns an exit code (or status) to indicate success or failure. By convention:

  • 0: Success (the command executed without errors).
  • Non-zero (1 to 255): Failure (specific codes may indicate the type of error, e.g., 1 for general errors, 2 for incorrect usage, 127 for “command not found”).

You can access the exit code of the last command using the special variable $?. For example:

# Attempt to list a non-existent file
ls non_existent_file
echo "Exit code: $?"  # Output: Exit code: 2 (failure)

# List an existing file
ls /tmp
echo "Exit code: $?"  # Output: Exit code: 0 (success)

Exit codes are the primary mechanism for detecting failures in shell scripts.

Default Shell Behavior: Why It’s Risky

By default, shell scripts ignore command failures and continue executing subsequent commands. Consider this script:

#!/bin/bash
echo "Step 1: Copying file..."
cp important_file backup/  # Fails if "backup/" doesn’t exist
echo "Step 2: Deleting original..."
rm important_file  # Runs even if Step 1 failed!

If backup/ doesn’t exist, cp fails (exit code 1), but the script proceeds to delete important_file, resulting in data loss. This silent failure is dangerous—hence the need for explicit error handling.

Fundamental Error Handling Techniques

Exit Codes and Conditional Checks

The simplest way to handle errors is to check the exit code of a command with if statements. Use ! to invert the exit code (e.g., if ! command; then ... to trigger on failure).

Example: Check if a command succeeds

#!/bin/bash
if cp important_file backup/; then
  echo "Backup succeeded."
else
  echo "Error: Backup failed!" >&2  # Redirect error to stderr
  exit 1  # Exit script with non-zero code to signal failure
fi

Key Notes:

  • >&2 redirects output to stderr (standard error), ensuring errors are separated from normal output.
  • Explicitly exiting with exit 1 informs the caller (e.g., another script or the user) that the script failed.

The set Builtin: Strict Mode for Scripts

The set builtin modifies shell behavior. Using it to enable “strict mode” makes scripts exit on errors and catch common mistakes. Here are critical flags:

FlagEffect
-e (or errexit)Exit immediately if any command fails (non-zero exit code).
-u (or nounset)Treat undefined variables as errors and exit.
-o pipefailMake a pipeline fail if any command in the pipeline fails (not just the last one).

Example: Strict Mode Script

#!/bin/bash
set -euo pipefail  # Enable strict mode

# Undefined variable: triggers -u and exits
echo "Hello, $USERNAME"  # Fails if USERNAME is not set

# Pipeline with a failing command: triggers pipefail and exits
grep "pattern" non_existent_file | wc -l  # grep fails, pipeline fails

# Command failure: triggers -e and exits
cp important_file backup/  # Fails if backup/ missing

Caveats:

  • set -e does not exit on errors in if conditions, loops (for, while), or pipeline segments ending with ||. This allows intentional error handling (e.g., if ! command; then ...).
  • Use set -euo pipefail at the start of most scripts for safety.

Error Traps with trap

The trap command lets you define actions to run when the shell receives a signal (e.g., error, exit, or user interrupt like Ctrl+C). It’s ideal for cleanup (e.g., deleting temporary files) or error reporting.

Common Use Cases for trap:

  • Error logging: Run a command when an error occurs (ERR signal).
  • Cleanup: Run commands on script exit (EXIT signal), regardless of success/failure.
  • Interruption handling: Catch SIGINT (Ctrl+C) or SIGTERM to avoid partial execution.

Example: Error Trap with Line Numbers

#!/bin/bash
set -euo pipefail

# Trap errors: print line number and message
trap 'echo "Error occurred at line $LINENO"' ERR

# Failing command: triggers the trap
ls non_existent_file  # Error occurred at line 7

Example: Cleanup on Exit

#!/bin/bash
set -euo pipefail

TMP_FILE=$(mktemp)  # Create temporary file

# Cleanup: delete temp file on exit (success or failure)
trap 'rm -f "$TMP_FILE"; echo "Cleaned up temporary file."' EXIT

# Work with the temp file...
echo "Data" > "$TMP_FILE"

# If script exits here (e.g., due to error), trap still runs

Common Practices for Reliable Scripts

Input Validation

Scripts often rely on user input (e.g., command-line arguments). Always validate inputs to avoid unexpected behavior.

Example: Check for Required Arguments

#!/bin/bash
set -euo pipefail

# Check if at least 1 argument is provided
if [ $# -eq 0 ]; then
  echo "Error: Please provide a filename as an argument." >&2
  exit 1
fi

FILENAME="$1"
echo "Processing $FILENAME..."

Example: Validate Argument Type

#!/bin/bash
set -euo pipefail

# Check if argument is a positive number
if ! [[ "$1" =~ ^[0-9]+$ ]]; then
  echo "Error: Argument must be a positive integer." >&2
  exit 1
fi
echo "Number: $1"

Command Existence Checks

Scripts often depend on external commands (e.g., curl, jq). Verify these exist before using them to avoid “command not found” errors mid-script.

Example: Check if a Command Exists

#!/bin/bash
set -euo pipefail

# Check if 'curl' is available
if ! command -v curl &> /dev/null; then
  echo "Error: 'curl' is required but not installed." >&2
  exit 1
fi

curl "https://example.com"

Note: command -v is more portable than which for checking command existence.

File and Directory Validation

Before reading/writing files or directories, validate their existence and permissions.

Common File Tests:

  • -f "$file": File exists and is a regular file.
  • -d "$dir": Directory exists.
  • -r "$file": File is readable.
  • -w "$file": File is writable.

Example: Validate Input File

#!/bin/bash
set -euo pipefail

FILENAME="$1"

# Check if file exists and is readable
if [ ! -f "$FILENAME" ]; then
  echo "Error: File '$FILENAME' does not exist." >&2
  exit 1
elif [ ! -r "$FILENAME" ]; then
  echo "Error: File '$FILENAME' is not readable." >&2
  exit 1
fi

echo "Processing '$FILENAME'..."

Logging Errors

Logging errors (with timestamps) helps debug failed scripts. Write logs to a file or syslog.

Example: Basic Error Logging

#!/bin/bash
set -euo pipefail

LOG_FILE="/var/log/my_script.log"

# Log error with timestamp
error_log() {
  local message="$1"
  echo "[$(date +'%Y-%m-%d %H:%M:%S')] ERROR: $message" >> "$LOG_FILE"
  echo "$message" >&2  # Also print to stderr
}

if ! cp important_file backup/; then
  error_log "Backup failed for 'important_file'"
  exit 1
fi

Best Practices

Defensive Programming

Assume inputs are invalid, commands may fail, and resources may be unavailable. Validate early and often.

Example: Defensive File Copy

#!/bin/bash
set -euo pipefail

SRC="important_file"
DST="backup/"

# Check src exists, dst exists, and copy succeeds
if [ ! -f "$SRC" ]; then
  echo "Error: Source file '$SRC' missing." >&2
  exit 1
elif [ ! -d "$DST" ]; then
  echo "Error: Destination directory '$DST' missing." >&2
  exit 1
elif ! cp "$SRC" "$DST"; then
  echo "Error: Failed to copy '$SRC' to '$DST'." >&2
  exit 1
fi

Meaningful Error Messages

Avoid vague messages like “Error occurred.” Instead, specify what failed, why, and how to fix it.

Bad:

if [ ! -f "$file" ]; then
  echo "Error" >&2
  exit 1
fi

Good:

if [ ! -f "$file" ]; then
  echo "Error: Input file '$file' not found. Ensure the file exists and try again." >&2
  exit 1
fi

Testing Error Scenarios

Test scripts under failure conditions to ensure error handling works:

  • Missing arguments.
  • Invalid inputs (e.g., non-numeric values).
  • Missing files/directories.
  • Unavailable commands (e.g., curl not installed).

Tool: shellcheck
Use ShellCheck to detect common errors (undefined variables, missing quotes, etc.):

shellcheck my_script.sh  # Highlights issues like missing quotes or unset variables

Reusable Error Handling with Functions

Encapsulate error-handling logic in functions to avoid repetition.

Example: Reusable Error Handler

#!/bin/bash
set -euo pipefail

# Exit with message and code
error_exit() {
  local message="$1"
  local exit_code="${2:-1}"  # Default exit code: 1
  echo "ERROR: $message" >&2
  exit "$exit_code"
}

# Validate arguments
if [ $# -eq 0 ]; then
  error_exit "No arguments provided. Usage: $0 <filename>"
fi

FILENAME="$1"

# Validate file exists
[ -f "$FILENAME" ] || error_exit "File '$FILENAME' not found."

Conclusion

Error handling is not an afterthought in shell scripting—it’s the foundation of reliable, maintainable automation. By leveraging exit codes, set flags, trap for cleanup, and validation checks, you can transform fragile scripts into robust tools. Remember to:

  • Enable strict mode with set -euo pipefail.
  • Validate inputs, files, and commands early.
  • Use trap for error reporting and cleanup.
  • Write meaningful error messages and log failures.
  • Test under failure conditions to ensure resilience.

Investing in error handling reduces debugging time, prevents data loss, and builds trust in your scripts.

References