In the world of Linux, text is everywhere: configuration files, log files, CSV reports, system outputs, and more. Whether you’re a developer, system administrator, or data analyst, the ability to efficiently manipulate, filter, and analyze text is a foundational skill. Two tools stand out for their power and versatility in this domain: sed (stream editor) and awk (a pattern-scanning and processing language). This guide is designed for beginners to master the basics of sed and awk, from core concepts to practical use cases. By the end, you’ll be able to automate text tasks, parse logs, clean data, and generate reports with confidence.
Table of Contents
- Why Text Processing Matters in Linux
- Sed: The Stream Editor
- AWK: The Pattern-Processing Language
- Combining Sed and AWK
- Best Practices
- Conclusion
- References
Why Text Processing Matters in Linux
Text is the “lingua franca” of Linux systems. Consider these scenarios:
- A system administrator needs to extract error messages from a 10GB log file.
- A developer wants to clean up a CSV file by removing duplicate rows.
- A data analyst needs to sum values in a specific column of a TSV report.
Manually editing such files is impractical. Tools like sed and awk automate these tasks, saving time and reducing errors. They work with streams of text (e.g., file contents, command output) and process data line-by-line, making them lightweight and efficient even for large files.
Sed: The Stream Editor
sed (short for “stream editor”) is a non-interactive tool for editing text streams. It excels at simple transformations like substitutions, deletions, and insertions. Unlike visual editors (e.g., vim), sed processes text without opening a UI, making it ideal for scripts and automation.
Fundamentals of Sed
At its core, sed follows a simple workflow:
- Read a line of input from the stream (file or pipe).
- Apply a set of commands to the line.
- Output the modified line (unless suppressed).
sed commands are typically structured as:
sed [options] 'command' input_file
Key options:
-i: Edit files in-place (use-i.bakto create a backup before overwriting).-e: Specify multiple commands (e.g.,sed -e 'cmd1' -e 'cmd2' file).-n: Suppress default output (only print lines explicitly marked withp).
Basic Sed Commands
Let’s start with the most common sed commands using a sample file sample.txt:
apple banana cherry
date: 2024-01-01
error: disk full
orange grape mango
1. Substitution (s/pattern/replacement/flags)
The s (substitute) command replaces pattern with replacement in a line.
Example 1: Replace “apple” with “orange”
sed 's/apple/orange/' sample.txt
Output:
orange banana cherry # "apple" replaced with "orange"
date: 2024-01-01
error: disk full
orange grape mango
Flags modify behavior:
g: Replace all occurrences in the line (default: only the first match).sed 's/orange/lemon/g' sample.txt # Replace all "orange" with "lemon"i: Case-insensitive match (GNUsedonly).sed 's/ERROR/Error/i' sample.txt # Replace "ERROR" (any case) with "Error"
2. Deletion (d)
The d command deletes lines matching a pattern.
Example: Delete lines containing “error”
sed '/error/d' sample.txt
Output:
apple banana cherry
date: 2024-01-01
orange grape mango
3. Print (p)
The p command prints a line. Use with -n to print only matched lines.
Example: Print lines containing “date”
sed -n '/date/p' sample.txt
Output:
date: 2024-01-01
4. Insert/Append (i/a)
i: Insert text before a line matching a pattern.a: Append text after a line matching a pattern.
Example: Insert “Start of file” at the top
sed '1i Start of file' sample.txt # "1" targets the first line
Output:
Start of file
apple banana cherry
date: 2024-01-01
error: disk full
orange grape mango
Common Sed Use Cases
In-Place Editing
To modify a file directly (with a backup):
sed -i.bak 's/error/warning/' sample.txt # Overwrites sample.txt; creates sample.txt.bak
Delete Empty Lines
sed '/^$/d' sample.txt # "^$" matches empty lines
Replace Across Multiple Lines
Use \n to represent newlines (escape with \ in some shells):
sed 's/orange\n/ORANGE\n/' sample.txt # Replace "orange" at the start of a line with "ORANGE"
AWK: The Pattern-Processing Language
If sed is for simple edits, awk is for structured text processing. It treats input as records (lines) and fields (columns), making it ideal for CSV/TSV files, logs with fixed formats, and data aggregation. awk is a full-fledged programming language with variables, loops, and functions.
Fundamentals of AWK
awk processes input line-by-line, applying pattern-action pairs:
awk 'pattern { action }' input_file
- Pattern: A condition (e.g., line number, regex match) that triggers the action.
- Action: Commands to run (e.g., print, compute) when the pattern matches.
If no pattern is given, the action runs for all lines. If no action is given, awk prints the line by default.
Key Concepts in AWK
- Fields: By default, fields are separated by whitespace (spaces/tabs).
$1= first field,$2= second, etc. Use-Fto set a custom delimiter (e.g.,-F ','for CSV). - Variables: Built-in variables like
NR(current line number),NF(number of fields in the line), and$0(the entire line). - Blocks:
BEGIN(runs before processing input) andEND(runs after all lines are processed).
Basic AWK Syntax and Commands
Let’s use a CSV file sales.csv for examples:
Date,Product,Revenue
2024-01-01,A,150
2024-01-01,B,200
2024-01-02,A,180
2024-01-02,C,300
1. Print Specific Fields
awk -F ',' '{print $2, $3}' sales.csv # -F ',' sets comma as delimiter
Output:
Product Revenue
A 150
B 200
A 180
C 300
2. Filter Lines with Patterns
Print lines where Product is “A”:
awk -F ',' '$2 == "A" {print $1, $3}' sales.csv
Output:
2024-01-01 150
2024-01-02 180
3. Use BEGIN and END Blocks
Generate a report header and footer:
awk -F ',' '
BEGIN { print "Sales Report\n===========" } # Runs first
NR > 1 { total += $3 } # Skip header (NR=1), sum Revenue
END { print "Total Revenue: " total } # Runs last
' sales.csv
Output:
Sales Report
===========
Total Revenue: 830
Common AWK Use Cases
Process TSV Files (Tab-Separated)
awk -F '\t' '{print $1, $4}' data.tsv # -F '\t' sets tab as delimiter
Filter Rows by Numeric Conditions
Print sales where Revenue > 180:
awk -F ',' '$3 > 180 {print $2, $3}' sales.csv
Output:
B 200
C 300
Count Occurrences
Count how many times each product appears:
awk -F ',' 'NR > 1 {count[$2]++} END {for (p in count) print p ": " count[p]}' sales.csv
Output:
A: 2
B: 1
C: 1
Combining Sed and AWK
sed and awk are often used together. Use sed for preprocessing (cleaning) and awk for analysis:
Example: Clean a log file, then sum values
Suppose app.log has messy lines with extra spaces:
[INFO] 2024-01-01: User1 | 50
[ERROR] 2024-01-01: User2 | 30
[INFO] 2024-01-02: User1 | 70
-
Use
sedto remove[INFO]/[ERROR]and extra spaces:sed -E 's/\[.*\] //; s/ | /,/' app.log # Replace [*] with "", and " | " with ","Output (cleaned CSV):
2024-01-01: User1,50 2024-01-02: User1,70 -
Pipe to
awkto sum the numeric column:sed -E 's/\[.*\] //; s/ | /,/' app.log | awk -F ',' '{sum += $2} END {print "Total:", sum}'Output:
Total: 150
Best Practices
For Sed
- Test First: Avoid
-iuntil you’re sure the command works. Usesed 'cmd' file | lessto preview changes. - Backup Files: Always use
-i.bak(not just-i) to avoid data loss:sed -i.bak 's/old/new/' file. - Escape Special Characters: Use
\to escape regex metacharacters like.,*, or$(e.g.,sed 's/\$price/100/' file).
For AWK
- Set the Right Delimiter: Always use
-Ffor non-whitespace separators (e.g.,-F ';'for semicolons). - Use
BEGINfor Setup: Initialize variables or print headers inBEGINblocks (e.g.,BEGIN { FS=","; print "Report" }). - Handle Edge Cases: Check for empty lines or missing fields with
NF(e.g.,NF == 3 {print}to skip lines with <3 fields).
General
- Comment Complex Scripts: For multi-line
sed/awkcommands, add comments (use#inawk;sedcomments require-e '#'). - Use Pipes: Chain tools (e.g.,
grep "error" log.txt | sed 's/error/ERROR/' | awk '{print $1}').
Conclusion
sed and awk are indispensable tools for Linux text processing. sed shines for simple substitutions, deletions, and line edits, while awk handles structured data, aggregation, and complex logic. By mastering these tools, you’ll automate tedious tasks, analyze logs faster, and unlock new efficiencies in your Linux workflow.
Start small: practice with log files or CSV data, and gradually tackle more complex scripts. The more you use them, the more intuitive their power becomes!
References
- GNU Sed Manual
- GNU AWK Manual
- “The AWK Programming Language” by Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger
- Sed & AWK Tutorial (TutorialsPoint)
- Awk by Example (The Grymoire)