Table of Contents
- Fundamentals of Robust Shell Scripting
- Core Practices for Production Scripts
- Advanced Techniques
- Testing and Validation
- Security Considerations
- Example: A Robust Production Backup Script
- Conclusion
- References
1. Fundamentals of Robust Shell Scripting
1.1 The Shebang Line: Choosing the Right Interpreter
The shebang (#!) line at the start of a script specifies the interpreter (e.g., bash, sh, zsh). For production, clarity and portability are key:
- Use
#!/usr/bin/env bashfor Bash-specific scripts: This ensures the system uses the user’s$PATHto findbash, avoiding hardcoded paths like/bin/bash(which may vary across systems). - Use
#!/bin/shfor POSIX compliance: If targeting minimal environments (e.g., Alpine Linux), use the POSIX-compliantshinterpreter. Avoid Bash-specific features (e.g., arrays,[[ ]]conditionals) in this case.
Example:
#!/usr/bin/env bash # Bash-specific script
#!/bin/sh # POSIX-compliant script (portable but limited)
1.2 Critical Shell Options: set -euo pipefail
Production scripts must fail fast and avoid silent errors. The set command enables critical safety flags:
-e: Exit immediately if any command fails (non-zero exit code).-u: Treat unset variables as errors (avoids accidental use of undefined variables).-o pipefail: Make pipelines fail if any command in the pipeline fails (not just the last one).
Example:
#!/usr/bin/env bash
set -euo pipefail # Enable strict error checking
Without these flags, a script might continue running after a failure (e.g., a missing file) or use undefined variables, leading to unpredictable behavior.
1.3 Variable Handling: Quoting and Validation
Poor variable handling is a common source of bugs (e.g., word splitting, globbing, or undefined variables).
- Always quote variables with
"$VAR"to prevent word splitting and glob expansion. - Validate variables with
${VAR:?}to enforce required variables (fails ifVARis unset/empty). - Avoid uppercase variables for user-defined variables (uppercase is convention for environment variables like
PATH).
Bad Practice:
file=$1
rm $file # Risky: If $file is "file with spaces.txt", splits into "file", "with", "spaces.txt"
Good Practice:
local input_file="${1:?Usage: $0 <input-file>}" # Enforce argument; exit if missing
rm "${input_file}" # Safe: Quoting prevents splitting
2. Core Practices for Production Scripts
2.1 Error Handling and Recovery
Scripts must explicitly handle errors and provide actionable feedback. Use:
trapfor cleanup/error reporting: Catch signals (e.g.,ERR,EXIT) to run cleanup code (e.g., delete temp files) or log errors.- Custom error functions: Centralize error messaging to
stderr(file descriptor>&2).
Example: Error Handling with trap
#!/usr/bin/env bash
set -euo pipefail
# Cleanup temporary files on exit (success or failure)
cleanup() {
rm -f /tmp/backup.tmp
}
trap cleanup EXIT
# Log errors with line numbers
error() {
echo "ERROR: $* (Line $LINENO)" >&2
exit 1
}
trap 'error "Script failed"' ERR # Trigger on any command failure
# Example: Fail intentionally to test error handling
cp /nonexistent/file /tmp/backup.tmp || error "Failed to copy file"
2.2 Structured Logging
Logging ensures visibility into script behavior for debugging and auditing. Key practices:
- Include timestamps, log levels (INFO/WARN/ERROR), and context (e.g., script name).
- Write logs to
stdout(for normal output) andstderr(for errors). - For long-running scripts, log to a file with rotation (e.g.,
logrotate).
Example: Logging Function
#!/usr/bin/env bash
set -euo pipefail
log() {
local level="$1"
local message="$2"
local timestamp=$(date +"%Y-%m-%d %H:%M:%S")
echo "[$timestamp] [$level] $message"
}
info() { log "INFO" "$*"; }
warn() { log "WARN" "$*" >&2; } # Warn to stderr
error() { log "ERROR" "$*" >&2; exit 1; } # Error and exit
info "Starting backup process"
warn "Low disk space: 5% remaining"
error "Backup directory not found"
2.3 Input Validation and Argument Parsing
Scripts must validate inputs to avoid invalid operations (e.g., writing to a read-only directory).
- Check argument count: Use
$#to ensure required arguments are provided. - Validate argument types: Check if paths exist (
[ -f "$file" ]), values are numeric ([ "$num" -gt 0 ]), etc. - Use
getoptsfor options: Handle flags (e.g.,-ffor “force”) cleanly.
Example: Input Validation
#!/usr/bin/env bash
set -euo pipefail
# Validate argument count
if [ $# -ne 2 ]; then
error "Usage: $0 <source-dir> <dest-dir>"
fi
local source_dir="$1"
local dest_dir="$2"
# Validate source exists and is a directory
if [ ! -d "${source_dir}" ]; then
error "Source directory '${source_dir}' does not exist"
fi
# Validate destination is writable
if [ ! -w "${dest_dir}" ]; then
error "Destination '${dest_dir}' is not writable"
fi
2.4 Idempotency: Scripts That Play Well With Repetition
Idempotent scripts can be run multiple times without unintended side effects (critical for cron jobs or retry logic).
- Check state before acting: e.g., “Create a directory only if it doesn’t exist.”
- Use atomic operations: e.g.,
mkdir -p(no error if directory exists) orln -sf(overwrite symlinks safely).
Example: Idempotent Directory Creation
# Bad: Fails if directory exists
mkdir /app/config
# Good: No error if directory exists (idempotent)
mkdir -p /app/config
2.5 Portability Across Environments
Scripts may run on diverse systems (e.g., Debian, Alpine, macOS). Ensure portability by:
- Avoiding Bash-specific features (e.g., arrays,
**glob) if using#!/bin/sh. - Checking for command availability (e.g.,
command -v jqinstead ofwhich jq). - Using POSIX-compliant syntax (e.g.,
[ ]instead of[[ ]],$()instead of` `).
Example: Checking for Dependencies
# Check if "jq" is installed
if ! command -v jq &> /dev/null; then
error "jq is required but not installed. Install with: apt install jq"
fi
3. Advanced Techniques
3.1 Modular Code with Functions
Organize scripts into functions to improve readability and reusability.
- Limit function scope: Use
localvariables to avoid polluting the global namespace. - Single-responsibility principle: Each function does one task (e.g.,
validate_input(),backup_files()).
Example: Modular Script
#!/usr/bin/env bash
set -euo pipefail
# Function: Validate inputs
validate_input() {
local source_dir="$1"
local dest_dir="$2"
if [ ! -d "${source_dir}" ]; then
error "Source '${source_dir}' is not a directory"
fi
}
# Function: Perform backup
backup_files() {
local source_dir="$1"
local dest_dir="$2"
info "Backing up ${source_dir} to ${dest_dir}"
rsync -av --delete "${source_dir}/" "${dest_dir}/" # Idempotent sync
}
# Main execution
main() {
validate_input "$1" "$2"
backup_files "$1" "$2"
info "Backup completed successfully"
}
main "$@" # Pass all arguments to main
3.2 Dependency Management
Production scripts often rely on external tools (e.g., rsync, curl). Explicitly check for dependencies:
- Use
command -v <tool>to verify a tool exists (portable across shells). - Fail early with clear installation instructions if dependencies are missing.
Example: Dependency Check
check_dependencies() {
local dependencies=("rsync" "gzip" "curl")
for dep in "${dependencies[@]}"; do
if ! command -v "${dep}" &> /dev/null; then
error "Dependency '${dep}' not found. Install it first."
fi
done
}
3.3 Debugging Strategies
Debugging production scripts requires visibility. Use:
set -x: Enable tracing (prints each command before execution).trap 'echo "Var: $var"' DEBUG: Log variables at each step.- Conditional debugging: Enable debug mode via a flag (e.g.,
--debug).
Example: Conditional Debugging
if [ "${DEBUG:-false}" = "true" ]; then
set -x # Enable tracing if DEBUG=true
fi
4. Testing and Validation
4.1 Static Analysis with ShellCheck
ShellCheck is a linter that identifies bugs, portability issues, and bad practices. Run it on scripts before deployment:
Example: Running ShellCheck
shellcheck my_script.sh # Outputs issues like unquoted variables or undefined flags
4.2 Unit Testing with Frameworks (e.g., Bats)
Bats Core is a testing framework for shell scripts. Write unit tests for functions or critical logic.
Example: Bats Test for a Backup Function
#!/usr/bin/env bats
@test "backup_files creates dest dir if missing" {
local source="test_source"
local dest="test_dest"
mkdir -p "${source}"
touch "${source}/file.txt"
run backup_files "${source}" "${dest}"
[ "$status" -eq 0 ]
[ -d "${dest}" ]
[ -f "${dest}/file.txt" ]
}
4.3 Integration Testing
Test end-to-end behavior in staging environments that mirror production (e.g., network, permissions, dependencies). Validate:
- Scripts run successfully with real data.
- Error conditions (e.g., missing files, network failures) are handled gracefully.
5. Security Considerations
5.1 Input Sanitization
Malicious input (e.g., from users or untrusted sources) can lead to command injection. Sanitize inputs by:
- Avoiding
eval(executes arbitrary code). - Escaping variables with
printf "%q"(Bash) orquotefunctions.
Example: Safe Input Handling
# UNSAFE: User input could include "file; rm -rf /"
user_input="file.txt"
rm "${user_input}" # Safe if input is sanitized
# SAFER: Explicitly validate input format
if ! [[ "${user_input}" =~ ^[a-zA-Z0-9._-]+$ ]]; then
error "Invalid filename: ${user_input}"
fi
5.2 Avoiding Insecure Patterns
- Unquoted variables: Risk of word splitting/glob expansion (always use
"$var"). - Hardcoded secrets: Never include passwords/API keys in scripts (use environment variables).
- World-writable files: Set strict permissions (e.g.,
chmod 600for sensitive files).
5.3 File Permissions and Execution Context
- Run scripts with the least privilege necessary (avoid
sudounless required). - Set script permissions to
chmod 700(readable/writable/executable only by owner).
6. Example: A Robust Production Backup Script
Below is a complete example combining the practices above:
#!/usr/bin/env bash
set -euo pipefail
# --------------------------
# Configuration
# --------------------------
readonly SCRIPT_NAME="$(basename "$0")"
readonly LOG_FILE="/var/log/${SCRIPT_NAME}.log"
readonly DEFAULT_RETENTION_DAYS=7
# --------------------------
# Logging Functions
# --------------------------
log() {
local level="$1"
local message="$2"
local timestamp="$(date +"%Y-%m-%d %H:%M:%S")"
echo "[$timestamp] [${level}] ${message}" | tee -a "${LOG_FILE}"
}
info() { log "INFO" "$*"; }
warn() { log "WARN" "$*" >&2; }
error() { log "ERROR" "$*" >&2; exit 1; }
# --------------------------
# Cleanup on Exit
# --------------------------
cleanup() {
if [ -n "${TMP_DIR:-}" ] && [ -d "${TMP_DIR}" ]; then
rm -rf "${TMP_DIR}"
info "Cleaned up temporary directory: ${TMP_DIR}"
fi
}
trap cleanup EXIT
# --------------------------
# Dependency Check
# --------------------------
check_dependencies() {
local deps=("rsync" "gzip" "date")
for dep in "${deps[@]}"; do
if ! command -v "${dep}" &> /dev/null; then
error "Dependency '${dep}' not found. Install it first."
fi
done
}
# --------------------------
# Main Backup Logic
# --------------------------
backup() {
local source_dir="${1:?Source directory required}"
local dest_dir="${2:?Destination directory required}"
local retention_days="${3:-${DEFAULT_RETENTION_DAYS}}"
# Validate inputs
if [ ! -d "${source_dir}" ]; then
error "Source directory '${source_dir}' does not exist"
fi
if [ ! -w "${dest_dir}" ]; then
error "Destination '${dest_dir}' is not writable"
fi
# Create backup filename with timestamp
local timestamp="$(date +"%Y%m%d_%H%M%S