dotlinux guide

Creating Robust Shell Scripts for Production Environments

Table of Contents

  1. Fundamentals of Robust Shell Scripting
  2. Core Practices for Production Scripts
  3. Advanced Techniques
  4. Testing and Validation
  5. Security Considerations
  6. Example: A Robust Production Backup Script
  7. Conclusion
  8. References

1. Fundamentals of Robust Shell Scripting

1.1 The Shebang Line: Choosing the Right Interpreter

The shebang (#!) line at the start of a script specifies the interpreter (e.g., bash, sh, zsh). For production, clarity and portability are key:

  • Use #!/usr/bin/env bash for Bash-specific scripts: This ensures the system uses the user’s $PATH to find bash, avoiding hardcoded paths like /bin/bash (which may vary across systems).
  • Use #!/bin/sh for POSIX compliance: If targeting minimal environments (e.g., Alpine Linux), use the POSIX-compliant sh interpreter. Avoid Bash-specific features (e.g., arrays, [[ ]] conditionals) in this case.

Example:

#!/usr/bin/env bash  # Bash-specific script
#!/bin/sh           # POSIX-compliant script (portable but limited)

1.2 Critical Shell Options: set -euo pipefail

Production scripts must fail fast and avoid silent errors. The set command enables critical safety flags:

  • -e: Exit immediately if any command fails (non-zero exit code).
  • -u: Treat unset variables as errors (avoids accidental use of undefined variables).
  • -o pipefail: Make pipelines fail if any command in the pipeline fails (not just the last one).

Example:

#!/usr/bin/env bash
set -euo pipefail  # Enable strict error checking

Without these flags, a script might continue running after a failure (e.g., a missing file) or use undefined variables, leading to unpredictable behavior.

1.3 Variable Handling: Quoting and Validation

Poor variable handling is a common source of bugs (e.g., word splitting, globbing, or undefined variables).

  • Always quote variables with "$VAR" to prevent word splitting and glob expansion.
  • Validate variables with ${VAR:?} to enforce required variables (fails if VAR is unset/empty).
  • Avoid uppercase variables for user-defined variables (uppercase is convention for environment variables like PATH).

Bad Practice:

file=$1
rm $file  # Risky: If $file is "file with spaces.txt", splits into "file", "with", "spaces.txt"

Good Practice:

local input_file="${1:?Usage: $0 <input-file>}"  # Enforce argument; exit if missing
rm "${input_file}"  # Safe: Quoting prevents splitting

2. Core Practices for Production Scripts

2.1 Error Handling and Recovery

Scripts must explicitly handle errors and provide actionable feedback. Use:

  • trap for cleanup/error reporting: Catch signals (e.g., ERR, EXIT) to run cleanup code (e.g., delete temp files) or log errors.
  • Custom error functions: Centralize error messaging to stderr (file descriptor >&2).

Example: Error Handling with trap

#!/usr/bin/env bash
set -euo pipefail

# Cleanup temporary files on exit (success or failure)
cleanup() {
  rm -f /tmp/backup.tmp
}
trap cleanup EXIT

# Log errors with line numbers
error() {
  echo "ERROR: $* (Line $LINENO)" >&2
  exit 1
}
trap 'error "Script failed"' ERR  # Trigger on any command failure

# Example: Fail intentionally to test error handling
cp /nonexistent/file /tmp/backup.tmp || error "Failed to copy file"

2.2 Structured Logging

Logging ensures visibility into script behavior for debugging and auditing. Key practices:

  • Include timestamps, log levels (INFO/WARN/ERROR), and context (e.g., script name).
  • Write logs to stdout (for normal output) and stderr (for errors).
  • For long-running scripts, log to a file with rotation (e.g., logrotate).

Example: Logging Function

#!/usr/bin/env bash
set -euo pipefail

log() {
  local level="$1"
  local message="$2"
  local timestamp=$(date +"%Y-%m-%d %H:%M:%S")
  echo "[$timestamp] [$level] $message"
}

info() { log "INFO" "$*"; }
warn() { log "WARN" "$*" >&2; }  # Warn to stderr
error() { log "ERROR" "$*" >&2; exit 1; }  # Error and exit

info "Starting backup process"
warn "Low disk space: 5% remaining"
error "Backup directory not found"

2.3 Input Validation and Argument Parsing

Scripts must validate inputs to avoid invalid operations (e.g., writing to a read-only directory).

  • Check argument count: Use $# to ensure required arguments are provided.
  • Validate argument types: Check if paths exist ([ -f "$file" ]), values are numeric ([ "$num" -gt 0 ]), etc.
  • Use getopts for options: Handle flags (e.g., -f for “force”) cleanly.

Example: Input Validation

#!/usr/bin/env bash
set -euo pipefail

# Validate argument count
if [ $# -ne 2 ]; then
  error "Usage: $0 <source-dir> <dest-dir>"
fi

local source_dir="$1"
local dest_dir="$2"

# Validate source exists and is a directory
if [ ! -d "${source_dir}" ]; then
  error "Source directory '${source_dir}' does not exist"
fi

# Validate destination is writable
if [ ! -w "${dest_dir}" ]; then
  error "Destination '${dest_dir}' is not writable"
fi

2.4 Idempotency: Scripts That Play Well With Repetition

Idempotent scripts can be run multiple times without unintended side effects (critical for cron jobs or retry logic).

  • Check state before acting: e.g., “Create a directory only if it doesn’t exist.”
  • Use atomic operations: e.g., mkdir -p (no error if directory exists) or ln -sf (overwrite symlinks safely).

Example: Idempotent Directory Creation

# Bad: Fails if directory exists
mkdir /app/config  

# Good: No error if directory exists (idempotent)
mkdir -p /app/config  

2.5 Portability Across Environments

Scripts may run on diverse systems (e.g., Debian, Alpine, macOS). Ensure portability by:

  • Avoiding Bash-specific features (e.g., arrays, ** glob) if using #!/bin/sh.
  • Checking for command availability (e.g., command -v jq instead of which jq).
  • Using POSIX-compliant syntax (e.g., [ ] instead of [[ ]], $() instead of ` `).

Example: Checking for Dependencies

# Check if "jq" is installed
if ! command -v jq &> /dev/null; then
  error "jq is required but not installed. Install with: apt install jq"
fi

3. Advanced Techniques

3.1 Modular Code with Functions

Organize scripts into functions to improve readability and reusability.

  • Limit function scope: Use local variables to avoid polluting the global namespace.
  • Single-responsibility principle: Each function does one task (e.g., validate_input(), backup_files()).

Example: Modular Script

#!/usr/bin/env bash
set -euo pipefail

# Function: Validate inputs
validate_input() {
  local source_dir="$1"
  local dest_dir="$2"
  if [ ! -d "${source_dir}" ]; then
    error "Source '${source_dir}' is not a directory"
  fi
}

# Function: Perform backup
backup_files() {
  local source_dir="$1"
  local dest_dir="$2"
  info "Backing up ${source_dir} to ${dest_dir}"
  rsync -av --delete "${source_dir}/" "${dest_dir}/"  # Idempotent sync
}

# Main execution
main() {
  validate_input "$1" "$2"
  backup_files "$1" "$2"
  info "Backup completed successfully"
}

main "$@"  # Pass all arguments to main

3.2 Dependency Management

Production scripts often rely on external tools (e.g., rsync, curl). Explicitly check for dependencies:

  • Use command -v <tool> to verify a tool exists (portable across shells).
  • Fail early with clear installation instructions if dependencies are missing.

Example: Dependency Check

check_dependencies() {
  local dependencies=("rsync" "gzip" "curl")
  for dep in "${dependencies[@]}"; do
    if ! command -v "${dep}" &> /dev/null; then
      error "Dependency '${dep}' not found. Install it first."
    fi
  done
}

3.3 Debugging Strategies

Debugging production scripts requires visibility. Use:

  • set -x: Enable tracing (prints each command before execution).
  • trap 'echo "Var: $var"' DEBUG: Log variables at each step.
  • Conditional debugging: Enable debug mode via a flag (e.g., --debug).

Example: Conditional Debugging

if [ "${DEBUG:-false}" = "true" ]; then
  set -x  # Enable tracing if DEBUG=true
fi

4. Testing and Validation

4.1 Static Analysis with ShellCheck

ShellCheck is a linter that identifies bugs, portability issues, and bad practices. Run it on scripts before deployment:

Example: Running ShellCheck

shellcheck my_script.sh  # Outputs issues like unquoted variables or undefined flags

4.2 Unit Testing with Frameworks (e.g., Bats)

Bats Core is a testing framework for shell scripts. Write unit tests for functions or critical logic.

Example: Bats Test for a Backup Function

#!/usr/bin/env bats

@test "backup_files creates dest dir if missing" {
  local source="test_source"
  local dest="test_dest"
  mkdir -p "${source}"
  touch "${source}/file.txt"

  run backup_files "${source}" "${dest}"
  
  [ "$status" -eq 0 ]
  [ -d "${dest}" ]
  [ -f "${dest}/file.txt" ]
}

4.3 Integration Testing

Test end-to-end behavior in staging environments that mirror production (e.g., network, permissions, dependencies). Validate:

  • Scripts run successfully with real data.
  • Error conditions (e.g., missing files, network failures) are handled gracefully.

5. Security Considerations

5.1 Input Sanitization

Malicious input (e.g., from users or untrusted sources) can lead to command injection. Sanitize inputs by:

  • Avoiding eval (executes arbitrary code).
  • Escaping variables with printf "%q" (Bash) or quote functions.

Example: Safe Input Handling

# UNSAFE: User input could include "file; rm -rf /"
user_input="file.txt"
rm "${user_input}"  # Safe if input is sanitized

# SAFER: Explicitly validate input format
if ! [[ "${user_input}" =~ ^[a-zA-Z0-9._-]+$ ]]; then
  error "Invalid filename: ${user_input}"
fi

5.2 Avoiding Insecure Patterns

  • Unquoted variables: Risk of word splitting/glob expansion (always use "$var").
  • Hardcoded secrets: Never include passwords/API keys in scripts (use environment variables).
  • World-writable files: Set strict permissions (e.g., chmod 600 for sensitive files).

5.3 File Permissions and Execution Context

  • Run scripts with the least privilege necessary (avoid sudo unless required).
  • Set script permissions to chmod 700 (readable/writable/executable only by owner).

6. Example: A Robust Production Backup Script

Below is a complete example combining the practices above:

#!/usr/bin/env bash
set -euo pipefail

# --------------------------
# Configuration
# --------------------------
readonly SCRIPT_NAME="$(basename "$0")"
readonly LOG_FILE="/var/log/${SCRIPT_NAME}.log"
readonly DEFAULT_RETENTION_DAYS=7

# --------------------------
# Logging Functions
# --------------------------
log() {
  local level="$1"
  local message="$2"
  local timestamp="$(date +"%Y-%m-%d %H:%M:%S")"
  echo "[$timestamp] [${level}] ${message}" | tee -a "${LOG_FILE}"
}

info() { log "INFO" "$*"; }
warn() { log "WARN" "$*" >&2; }
error() { log "ERROR" "$*" >&2; exit 1; }

# --------------------------
# Cleanup on Exit
# --------------------------
cleanup() {
  if [ -n "${TMP_DIR:-}" ] && [ -d "${TMP_DIR}" ]; then
    rm -rf "${TMP_DIR}"
    info "Cleaned up temporary directory: ${TMP_DIR}"
  fi
}
trap cleanup EXIT

# --------------------------
# Dependency Check
# --------------------------
check_dependencies() {
  local deps=("rsync" "gzip" "date")
  for dep in "${deps[@]}"; do
    if ! command -v "${dep}" &> /dev/null; then
      error "Dependency '${dep}' not found. Install it first."
    fi
  done
}

# --------------------------
# Main Backup Logic
# --------------------------
backup() {
  local source_dir="${1:?Source directory required}"
  local dest_dir="${2:?Destination directory required}"
  local retention_days="${3:-${DEFAULT_RETENTION_DAYS}}"

  # Validate inputs
  if [ ! -d "${source_dir}" ]; then
    error "Source directory '${source_dir}' does not exist"
  fi
  if [ ! -w "${dest_dir}" ]; then
    error "Destination '${dest_dir}' is not writable"
  fi

  # Create backup filename with timestamp
  local timestamp="$(date +"%Y%m%d_%H%M%S