Table of Contents
- Text Processing Tools
- File Manipulation Utilities
- Process Management Tools
- Data Validation and Logic
- Arithmetic Utilities
- Miscellaneous Utilities
- Common Practices
- Best Practices
- Conclusion
- References
Text Processing Tools
Text processing is a cornerstone of shell scripting, as scripts often parse logs, config files, or user input. The following tools are indispensable for manipulating and analyzing text.
grep: Search Text Patterns
grep (Global Regular Expression Print) searches for patterns in text files or input streams. It supports regular expressions, making it ideal for filtering output.
Basic Usage:
# Search for "error" in a log file (case-sensitive)
grep "error" app.log
# Case-insensitive search
grep -i "Error" app.log
# Recursively search all .txt files in a directory
grep -r "warning" ./documents/ --include="*.txt"
# Show line numbers of matches
grep -n "critical" system.log
Common Practice:
Combine grep with pipes (|) to filter output from other commands:
# Find running "nginx" processes
ps aux | grep "nginx"
sed: Stream Editor
sed (Stream Editor) modifies text in a stream (e.g., files or command output) using patterns. It’s often used for substitution, deletion, or insertion of text.
Basic Usage:
# Substitute "old" with "new" in a file (in-place with -i)
sed -i 's/old/new/g' config.txt # -i edits the file directly; 'g' = global (all occurrences)
# Delete lines containing "debug"
sed '/debug/d' app.log
# Insert "Header Line" at the top of a file
sed '1i Header Line' data.txt
Pro Tip:
Use -i.bak to create a backup before editing (e.g., sed -i.bak 's/old/new/g' file.txt).
awk: Pattern Scanning and Processing
awk is a powerful language for processing structured text (e.g., CSV files, logs with delimiters). It splits input into fields and applies actions based on patterns.
Basic Usage:
# Print the 2nd field of a CSV (delimiter = ",")
awk -F ',' '{print $2}' data.csv
# Sum the 3rd field of lines where the 1st field is "Sales"
awk -F ',' '$1 == "Sales" {sum += $3} END {print sum}' report.csv
# Print lines where the 4th field (age) is > 30
awk '$4 > 30 {print $0}' users.txt # $0 = entire line
Common Practice:
Use awk to parse logs with timestamps:
# Extract errors from the last 24 hours (assuming log format: "YYYY-MM-DD HH:MM:SS ERROR: ...")
awk -v today="$(date -d '24 hours ago' +'%Y-%m-%d')" '$1 >= today && /ERROR/ {print}' app.log
File Manipulation Utilities
Managing files and directories is a frequent task in scripting. These tools simplify searching, copying, and syncing files.
find: Locate Files and Directories
find recursively searches directories for files/directories matching criteria like name, size, or modification time.
Basic Usage:
# Find all .log files modified in the last 7 days
find /var/log -name "*.log" -mtime -7
# Find files larger than 100MB
find /home -type f -size +100M
# Delete empty directories (use -delete with caution!)
find ./tmp -type d -empty -delete
Pro Tip:
Use -exec to run commands on found files:
# Compress all .txt files older than 30 days
find ./docs -name "*.txt" -mtime +30 -exec gzip {} \; # {} = placeholder for found files
rsync: Sync Files and Directories
rsync synchronizes files between local or remote systems efficiently by transferring only changed data.
Basic Usage:
# Sync local directory to a remote server (SSH)
rsync -avz ./local_dir [email protected]:/path/to/remote_dir # -a = archive, -v = verbose, -z = compress
# Sync remote files to local (delete extraneous files in local)
rsync -avz --delete [email protected]:/remote_dir ./local_dir
Best Practice:
Test with --dry-run before syncing to avoid accidental data loss:
rsync -avz --dry-run ./local_dir [email protected]:/remote_dir
Process Management Tools
Scripts often need to monitor or control running processes, such as starting/stopping services.
ps: List Running Processes
ps (Process Status) displays information about active processes.
Basic Usage:
# List all processes (BSD-style)
ps aux
# List processes in a tree view
ps axjf
# Filter processes by user
ps -u john
kill: Terminate Processes
kill sends signals to processes to terminate, pause, or resume them. The most common signal is SIGKILL (-9), which forcefully terminates a process.
Basic Usage:
# Terminate a process by PID (gracefully with SIGTERM = 15)
kill 1234
# Force kill a stuck process (SIGKILL = 9)
kill -9 1234
# Kill all processes named "node"
pkill node # or killall node
Data Validation and Logic
Scripts often need to validate inputs (e.g., “Does this file exist?”) before proceeding. The test command (or its alias [ ]) evaluates conditions.
test / [ ]: Evaluate Conditions
test checks file types, string comparisons, and numeric values. It returns 0 (success) if the condition is true, and 1 (failure) otherwise.
Basic Usage:
# Check if a file exists
if [ -f "config.ini" ]; then
echo "Config file found."
else
echo "Config file missing!"
fi
# Check if a directory is writable
if [ -w "/tmp" ]; then
echo "/tmp is writable."
fi
# Numeric comparison: is 5 greater than 3?
if [ 5 -gt 3 ]; then
echo "5 > 3"
fi
# String comparison: is $name equal to "Alice"?
name="Alice"
if [ "$name" = "Alice" ]; then
echo "Hello, Alice!"
fi
Best Practice:
Always quote variables in [ ] to handle spaces:
file="my document.txt"
if [ -f "$file" ]; then # Quotes prevent errors if $file has spaces
echo "File exists."
fi
Arithmetic Utilities
Shell scripts often require arithmetic operations, from simple counting to complex calculations.
expr and $(( )): Integer Arithmetic
expr evaluates integer expressions, while $(( )) (POSIX arithmetic expansion) is a more modern alternative.
Basic Usage:
# Using expr (note spaces around operators)
sum=$(expr 5 + 3)
echo $sum # Output: 8
# Using $(( )) (no spaces needed; preferred)
sum=$((5 + 3))
echo $sum # Output: 8
# Increment a variable
count=10
count=$((count + 1)) # count becomes 11
bc: Floating-Point Arithmetic
The shell natively supports only integers. Use bc for floating-point calculations.
Basic Usage:
# Calculate 10 divided by 3 with 2 decimal places
echo "scale=2; 10/3" | bc # Output: 3.33
# Square root of 25
echo "sqrt(25)" | bc # Output: 5
Pro Tip:
Use -l to load the math library for advanced functions:
echo "s(pi/2)" | bc -l # Sine of π/2 = 1.000000...
Miscellaneous Utilities
These tools solve niche but critical problems in scripting.
xargs: Execute Commands with Input
xargs converts input (e.g., from find or grep) into arguments for a command. It’s useful when a command can’t read from stdin directly.
Basic Usage:
# Delete all .tmp files found by find (safer than -exec)
find ./tmp -name "*.tmp" | xargs rm
# Count lines in all .txt files
find ./docs -name "*.txt" | xargs wc -l
tee: Split Output to File and Stdout
tee writes input to both a file and standard output (stdout), useful for logging while viewing output.
Basic Usage:
# Save command output to log.txt and display on screen
ls -l | tee file_list.log
# Append to a log (use -a)
echo "Script ran at $(date)" | tee -a script.log
curl/wget: Transfer Data Over Networks
curl and wget download files from URLs. curl is more versatile (supports HTTP, FTP, etc.), while wget is simpler for basic downloads.
Basic Usage:
# Download a file with curl
curl -O https://example.com/file.zip # -O = output with original filename
# Download with wget
wget https://example.com/file.zip
# POST data to an API with curl
curl -X POST -d "name=John&age=30" https://api.example.com/users
Common Practices
- Readable Scripts: Use comments (
#) and meaningful variable names (e.g.,log_fileinstead oflf). - Error Handling: Use
set -eto exit on errors, ortrapto clean up resources:set -e # Exit if any command fails trap 'echo "Script failed at line $LINENO"' ERR # Log error line - Variables: Store paths/constants in variables for reusability:
LOG_DIR="/var/log/myapp" mkdir -p "$LOG_DIR" # -p = create parent dirs if missing
Best Practices
- Avoid Hard-Coded Paths: Use relative paths or environment variables (e.g.,
$HOME). - Quote Variables: Prevent word-splitting issues with spaces:
file="my file.txt" cat "$file" # Works even with spaces - Test with
shellcheck: Use the ShellCheck linter to catch bugs:shellcheck my_script.sh - Limit Root Access: Run scripts as a non-root user unless necessary.
- Use Functions: Encapsulate reusable logic:
log_error() { echo "ERROR: $1" >&2 # >&2 = redirect to stderr } log_error "File not found"
Conclusion
Mastering essential shell scripting tools and utilities transforms you from a casual user into a proficient automation engineer. Tools like grep, sed, awk, find, and rsync form the building blocks of efficient scripts, while practices like error handling and testing ensure reliability.
By combining these tools with best practices—such as readable code, quoting variables, and validating inputs—you’ll write scripts that are robust, maintainable, and scalable. Start small, experiment with examples, and gradually integrate these tools into your workflow to unlock the full power of shell scripting.