dotlinux guide

Essential Shell Scripting Tools and Utilities: A Comprehensive Guide

Table of Contents

  1. Text Processing Tools
  2. File Manipulation Utilities
  3. Process Management Tools
  4. Data Validation and Logic
  5. Arithmetic Utilities
  6. Miscellaneous Utilities
  7. Common Practices
  8. Best Practices
  9. Conclusion
  10. References

Text Processing Tools

Text processing is a cornerstone of shell scripting, as scripts often parse logs, config files, or user input. The following tools are indispensable for manipulating and analyzing text.

grep: Search Text Patterns

grep (Global Regular Expression Print) searches for patterns in text files or input streams. It supports regular expressions, making it ideal for filtering output.

Basic Usage:

# Search for "error" in a log file (case-sensitive)
grep "error" app.log

# Case-insensitive search
grep -i "Error" app.log

# Recursively search all .txt files in a directory
grep -r "warning" ./documents/ --include="*.txt"

# Show line numbers of matches
grep -n "critical" system.log

Common Practice:

Combine grep with pipes (|) to filter output from other commands:

# Find running "nginx" processes
ps aux | grep "nginx"

sed: Stream Editor

sed (Stream Editor) modifies text in a stream (e.g., files or command output) using patterns. It’s often used for substitution, deletion, or insertion of text.

Basic Usage:

# Substitute "old" with "new" in a file (in-place with -i)
sed -i 's/old/new/g' config.txt  # -i edits the file directly; 'g' = global (all occurrences)

# Delete lines containing "debug"
sed '/debug/d' app.log

# Insert "Header Line" at the top of a file
sed '1i Header Line' data.txt

Pro Tip:

Use -i.bak to create a backup before editing (e.g., sed -i.bak 's/old/new/g' file.txt).

awk: Pattern Scanning and Processing

awk is a powerful language for processing structured text (e.g., CSV files, logs with delimiters). It splits input into fields and applies actions based on patterns.

Basic Usage:

# Print the 2nd field of a CSV (delimiter = ",")
awk -F ',' '{print $2}' data.csv

# Sum the 3rd field of lines where the 1st field is "Sales"
awk -F ',' '$1 == "Sales" {sum += $3} END {print sum}' report.csv

# Print lines where the 4th field (age) is > 30
awk '$4 > 30 {print $0}' users.txt  # $0 = entire line

Common Practice:

Use awk to parse logs with timestamps:

# Extract errors from the last 24 hours (assuming log format: "YYYY-MM-DD HH:MM:SS ERROR: ...")
awk -v today="$(date -d '24 hours ago' +'%Y-%m-%d')" '$1 >= today && /ERROR/ {print}' app.log

File Manipulation Utilities

Managing files and directories is a frequent task in scripting. These tools simplify searching, copying, and syncing files.

find: Locate Files and Directories

find recursively searches directories for files/directories matching criteria like name, size, or modification time.

Basic Usage:

# Find all .log files modified in the last 7 days
find /var/log -name "*.log" -mtime -7

# Find files larger than 100MB
find /home -type f -size +100M

# Delete empty directories (use -delete with caution!)
find ./tmp -type d -empty -delete

Pro Tip:

Use -exec to run commands on found files:

# Compress all .txt files older than 30 days
find ./docs -name "*.txt" -mtime +30 -exec gzip {} \;  # {} = placeholder for found files

rsync: Sync Files and Directories

rsync synchronizes files between local or remote systems efficiently by transferring only changed data.

Basic Usage:

# Sync local directory to a remote server (SSH)
rsync -avz ./local_dir [email protected]:/path/to/remote_dir  # -a = archive, -v = verbose, -z = compress

# Sync remote files to local (delete extraneous files in local)
rsync -avz --delete [email protected]:/remote_dir ./local_dir

Best Practice:

Test with --dry-run before syncing to avoid accidental data loss:

rsync -avz --dry-run ./local_dir [email protected]:/remote_dir

Process Management Tools

Scripts often need to monitor or control running processes, such as starting/stopping services.

ps: List Running Processes

ps (Process Status) displays information about active processes.

Basic Usage:

# List all processes (BSD-style)
ps aux

# List processes in a tree view
ps axjf

# Filter processes by user
ps -u john

kill: Terminate Processes

kill sends signals to processes to terminate, pause, or resume them. The most common signal is SIGKILL (-9), which forcefully terminates a process.

Basic Usage:

# Terminate a process by PID (gracefully with SIGTERM = 15)
kill 1234

# Force kill a stuck process (SIGKILL = 9)
kill -9 1234

# Kill all processes named "node"
pkill node  # or killall node

Data Validation and Logic

Scripts often need to validate inputs (e.g., “Does this file exist?”) before proceeding. The test command (or its alias [ ]) evaluates conditions.

test / [ ]: Evaluate Conditions

test checks file types, string comparisons, and numeric values. It returns 0 (success) if the condition is true, and 1 (failure) otherwise.

Basic Usage:

# Check if a file exists
if [ -f "config.ini" ]; then
  echo "Config file found."
else
  echo "Config file missing!"
fi

# Check if a directory is writable
if [ -w "/tmp" ]; then
  echo "/tmp is writable."
fi

# Numeric comparison: is 5 greater than 3?
if [ 5 -gt 3 ]; then
  echo "5 > 3"
fi

# String comparison: is $name equal to "Alice"?
name="Alice"
if [ "$name" = "Alice" ]; then
  echo "Hello, Alice!"
fi

Best Practice:

Always quote variables in [ ] to handle spaces:

file="my document.txt"
if [ -f "$file" ]; then  # Quotes prevent errors if $file has spaces
  echo "File exists."
fi

Arithmetic Utilities

Shell scripts often require arithmetic operations, from simple counting to complex calculations.

expr and $(( )): Integer Arithmetic

expr evaluates integer expressions, while $(( )) (POSIX arithmetic expansion) is a more modern alternative.

Basic Usage:

# Using expr (note spaces around operators)
sum=$(expr 5 + 3)
echo $sum  # Output: 8

# Using $(( )) (no spaces needed; preferred)
sum=$((5 + 3))
echo $sum  # Output: 8

# Increment a variable
count=10
count=$((count + 1))  # count becomes 11

bc: Floating-Point Arithmetic

The shell natively supports only integers. Use bc for floating-point calculations.

Basic Usage:

# Calculate 10 divided by 3 with 2 decimal places
echo "scale=2; 10/3" | bc  # Output: 3.33

# Square root of 25
echo "sqrt(25)" | bc  # Output: 5

Pro Tip:

Use -l to load the math library for advanced functions:

echo "s(pi/2)" | bc -l  # Sine of π/2 = 1.000000...

Miscellaneous Utilities

These tools solve niche but critical problems in scripting.

xargs: Execute Commands with Input

xargs converts input (e.g., from find or grep) into arguments for a command. It’s useful when a command can’t read from stdin directly.

Basic Usage:

# Delete all .tmp files found by find (safer than -exec)
find ./tmp -name "*.tmp" | xargs rm

# Count lines in all .txt files
find ./docs -name "*.txt" | xargs wc -l

tee: Split Output to File and Stdout

tee writes input to both a file and standard output (stdout), useful for logging while viewing output.

Basic Usage:

# Save command output to log.txt and display on screen
ls -l | tee file_list.log

# Append to a log (use -a)
echo "Script ran at $(date)" | tee -a script.log

curl/wget: Transfer Data Over Networks

curl and wget download files from URLs. curl is more versatile (supports HTTP, FTP, etc.), while wget is simpler for basic downloads.

Basic Usage:

# Download a file with curl
curl -O https://example.com/file.zip  # -O = output with original filename

# Download with wget
wget https://example.com/file.zip

# POST data to an API with curl
curl -X POST -d "name=John&age=30" https://api.example.com/users

Common Practices

  • Readable Scripts: Use comments (#) and meaningful variable names (e.g., log_file instead of lf).
  • Error Handling: Use set -e to exit on errors, or trap to clean up resources:
    set -e  # Exit if any command fails
    trap 'echo "Script failed at line $LINENO"' ERR  # Log error line
  • Variables: Store paths/constants in variables for reusability:
    LOG_DIR="/var/log/myapp"
    mkdir -p "$LOG_DIR"  # -p = create parent dirs if missing

Best Practices

  • Avoid Hard-Coded Paths: Use relative paths or environment variables (e.g., $HOME).
  • Quote Variables: Prevent word-splitting issues with spaces:
    file="my file.txt"
    cat "$file"  # Works even with spaces
  • Test with shellcheck: Use the ShellCheck linter to catch bugs:
    shellcheck my_script.sh
  • Limit Root Access: Run scripts as a non-root user unless necessary.
  • Use Functions: Encapsulate reusable logic:
    log_error() {
      echo "ERROR: $1" >&2  # >&2 = redirect to stderr
    }
    log_error "File not found"

Conclusion

Mastering essential shell scripting tools and utilities transforms you from a casual user into a proficient automation engineer. Tools like grep, sed, awk, find, and rsync form the building blocks of efficient scripts, while practices like error handling and testing ensure reliability.

By combining these tools with best practices—such as readable code, quoting variables, and validating inputs—you’ll write scripts that are robust, maintainable, and scalable. Start small, experiment with examples, and gradually integrate these tools into your workflow to unlock the full power of shell scripting.

References