In the world of Linux systems administration, ensuring optimal performance and reliability hinges on understanding how system resources are utilized. Whether you’re troubleshooting a slow server, optimizing application performance, or pre-empting failures, resource monitoring is an indispensable skill. This blog demystifies Linux resource monitoring by breaking down fundamental concepts, essential commands, advanced tools, and best practices. By the end, you’ll be equipped to diagnose bottlenecks, track trends, and keep your Linux systems running smoothly.
Table of Contents
- Fundamental Concepts of Linux Resource Monitoring
- Basic Monitoring Commands
- Advanced Monitoring Tools
- Common Monitoring Practices
- Best Practices for Effective Monitoring
- Conclusion
- References
1. Fundamental Concepts of Linux Resource Monitoring
Before diving into tools, it’s critical to understand the key resources and metrics Linux systems expose. These form the foundation of monitoring:
1.1 Key Resources to Monitor
Linux systems rely on four primary resources:
- CPU (Central Processing Unit): Executes instructions. Metrics include utilization, load average, and core distribution.
- Memory (RAM/Swap): Temporary data storage. Metrics include total/used/free memory, swap usage, and cache/buffer utilization.
- Disk I/O: Read/write operations on storage devices. Metrics include throughput (MB/s), I/O operations per second (IOPS), and latency.
- Network: Data transfer over interfaces. Metrics include bandwidth (MB/s), packet loss, latency, and connection counts.
1.2 Critical Metrics
- CPU Load Average: The average number of processes waiting for CPU time over 1, 5, and 15 minutes (e.g.,
load average: 0.8, 1.2, 0.9). A load >1 per CPU core indicates saturation. - Memory “Available” vs. “Free”:
freememory is unused, whileavailablememory (reported byfree -h) includes free + cache/buffer memory that can be reclaimed for applications. - Disk %util: Percentage of time the disk is busy handling I/O. >80% often indicates a bottleneck.
- Network Latency: Time for a packet to travel (e.g.,
pingreports round-trip time).
2. Basic Monitoring Commands
These built-in or pre-installed commands provide quick, real-time insights into system health.
2.1 top: Real-Time Process Monitoring
top is the gold standard for live CPU/memory tracking. It displays running processes, sorted by resource usage.
Usage:
top
Sample Output:
top - 14:30:00 up 5 days, 2h, 1 user, load average: 0.75, 0.82, 0.90
Tasks: 189 total, 1 running, 188 sleeping, 0 stopped, 0 zombie
%Cpu(s): 12.3 us, 2.1 sy, 0.0 ni, 85.0 id, 0.2 wa, 0.0 hi, 0.4 si, 0.0 st
MiB Mem : 15987.4 total, 2345.1 free, 5678.2 used, 7964.1 buff/cache
MiB Swap: 2048.0 total, 1980.5 free, 67.5 used. 9876.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1234 ubuntu 20 0 120000 45000 20000 R 25.0 0.3 5:23.12 python3
5678 root 20 0 80000 15000 5000 S 5.0 0.1 2:10.00 nginx
Key Takeaways:
load average: 1/5/15-minute CPU queue length.%CPU: Breakdown (us=user, sy=system, wa=I/O wait). Highwasuggests disk bottlenecks.RES: Resident memory (actual RAM used by the process, not swapped).- Interactive Controls: Press
Pto sort by CPU,Mby memory,kto kill a process,qto quit.
2.2 ps: Process Snapshot
Unlike top, ps captures a one-time snapshot of processes. Use it to filter or export process data.
Common Usage:
# List all processes (BSD style)
ps aux
# List processes by memory usage (descending)
ps aux --sort=-%mem | head
# Filter by process name (e.g., "nginx")
ps aux | grep nginx
Sample Output (ps aux --sort=-%mem | head):
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
ubuntu 1234 25.0 0.3 120000 45000 pts/0 R+ 14:30 5:23 python3
root 5678 5.0 0.1 80000 15000 ? Ss 10:00 2:10 nginx
2.3 free: Memory Usage
free reports RAM and swap utilization. Use -h for human-readable units (GB/MB).
Usage:
free -h
Sample Output:
total used free shared buff/cache available
Mem: 15Gi 5.5Gi 2.3Gi 300Mi 8.0Gi 9.6Gi
Swap: 2.0Gi 67Mi 1.9Gi
Key Columns:
available: Estimated memory available for new applications (most important for capacity planning).buff/cache: Memory used for disk caching (temporarily stored for faster access; not “wasted”).
2.4 df and du: Disk Space
df(disk free): Checks free space on mounted filesystems.du(disk usage): Analyzes space used by specific directories/files.
Usage:
# Disk free space (human-readable, include inodes)
df -h
# Disk usage of /var/log (human-readable, summarize)
du -sh /var/log
Sample Output (df -h):
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 20G 12G 7.5G 61% /
tmpfs 7.8G 0 7.8G 0% /dev/shm
/dev/sdb1 100G 45G 55G 45% /data
Red Flag: A filesystem with Use% > 90% risks crashes or data corruption.
2.5 iostat: Disk I/O Statistics
iostat (from the sysstat package) monitors disk and CPU utilization over time.
Install (if missing):
sudo apt install sysstat # Debian/Ubuntu
sudo yum install sysstat # RHEL/CentOS
Usage:
iostat 2 3 # Report every 2 seconds, 3 times total
Sample Output:
avg-cpu: %user %nice %system %iowait %steal %idle
5.20 0.00 2.10 0.50 0.00 92.20
Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 10.50 120.00 80.00 240000 160000
sdb 2.00 10.00 5.00 20000 10000
Key Metric: %iowait (CPU idle waiting for I/O) and tps (transfers per second). High %iowait (>20%) indicates slow disks.
2.6 vmstat: Virtual Memory Statistics
vmstat provides a holistic view of CPU, memory, disk, and system activity.
Usage:
vmstat 1 # Report every 1 second
Sample Output:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 67M 2.3Gi 1.2Gi 6.8Gi 0 0 120 80 500 1200 5 2 92 1 0
Key Columns:
r: Number of processes waiting for CPU (highr= CPU saturation).si/so: Swap in/out (non-zero values indicate memory pressure).bi/bo: Blocks read/written per second (disk I/O).
2.7 ss/netstat: Network Connections
ss (socket statistics) replaces netstat (deprecated) for monitoring network sockets, ports, and connections.
Usage:
# List all TCP connections
ss -tuln
# List established TCP connections with process names
ss -tup
Sample Output (ss -tuln):
Netid State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
tcp LISTEN 0 128 0.0.0.0:80 0.0.0.0:*
tcp LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
udp LISTEN 0 512 127.0.0.1:53 0.0.0.0:*
3. Advanced Monitoring Tools
For deeper analysis or automation, use these powerful, often third-party tools.
3.1 htop: Enhanced top with a Modern UI
htop improves on top with interactive controls, color-coding, and mouse support.
Install:
sudo apt install htop # Debian/Ubuntu
sudo yum install htop # RHEL/CentOS
Usage:
htop
Features:
- Vertical/horizontal scrolling for processes.
- Customizable columns (e.g., add
TCP/UDPconnections). - One-click sorting by CPU, memory, or runtime.
3.2 glances: All-in-One System Monitor
glances aggregates CPU, memory, disk, network, and process data into a single dashboard. It even supports web/API access.
Install:
pip install glances # Cross-platform
Usage:
glances # CLI mode
glances -w # Web server (access at http://<IP>:61208)
Web Interface:

3.3 nmon: Nigel’s Monitor
nmon (Nigel’s Monitor) is a lightweight tool for capturing and saving system data to a file for later analysis.
Install:
sudo apt install nmon # Debian/Ubuntu
Usage:
nmon # Interactive mode (press 'c' for CPU, 'm' for memory, 'd' for disk)
nmon -f -s 5 -c 12 # Save data every 5s for 12 samples (output: <hostname>_YYYYMMDD_HHMM.nmon)
Output Analysis: Use nmonchart (from the nmon package) to convert .nmon files to HTML reports.
3.4 sar: System Activity Reporter
sar (from sysstat) collects historical performance data, making it ideal for trend analysis and capacity planning.
Usage:
sar -u 5 3 # CPU usage every 5s, 3 times
sar -r # Memory usage (today)
sar -f /var/log/sysstat/sa22 # Read data from 22nd of the month
Sample Output (sar -u):
Linux 5.4.0-100-generic (server) 09/22/2024 _x86_64_ (8 CPU)
14:30:00 CPU %user %nice %system %iowait %steal %idle
14:30:05 all 5.20 0.00 2.10 0.50 0.00 92.20
14:30:10 all 4.80 0.00 1.90 0.40 0.00 92.90
4. Common Monitoring Practices
4.1 Real-Time vs. Periodic Monitoring
- Real-Time: Use
top,htop, orglancesto diagnose active issues (e.g., a server suddenly slowing down). - Periodic: Use
sarornmonto track trends (e.g., “Is memory usage growing weekly?“).
4.2 Troubleshooting Workflow
When a system is slow:
- Check CPU with
top/sar -u(high%user= application issue; high%sy= kernel/driver issue). - Check memory with
free -h(high swap usage = insufficient RAM). - Check disk I/O with
iostat(high%iowait= slow storage). - Check network with
ss/ping(high latency = network congestion).
4.3 Pre-Deployment Checks
Before deploying an application:
- Verify CPU cores/memory match requirements (e.g., “App needs 4GB RAM;
free -hshows 8GB available”). - Test disk I/O with
dd if=/dev/zero of=/tmp/test bs=1G count=1 oflag=direct(measure write speed).
5. Best Practices for Effective Monitoring
5.1 Automate Monitoring
- Scheduled Data Collection: Use
cronto runsarornmonat intervals (e.g.,*/5 * * * * sar -o /var/log/sysstat/sar$(date +%d)). - Infrastructure as Code (IaC): Deploy monitoring tools with Ansible/Chef (e.g., “Install
glanceson all web servers”).
5.2 Set Alerts for Thresholds
Define critical thresholds and trigger alerts (e.g., via cron scripts or tools like Prometheus):
- CPU load > 80% for 5 minutes.
- Disk
Use% > 90%. - Memory swap usage > 50%.
Example Alert Script (Bash):
#!/bin/bash
THRESHOLD=90
df -h | awk -v threshold="$THRESHOLD" '$5+0 > threshold {print "Disk full: " $0; exit 1}'
if [ $? -eq 1 ]; then
echo "Disk alert!" | mail -s "Disk Full on $(hostname)" [email protected]
fi
5.3 Avoid Over-Monitoring
Focus on actionable metrics (e.g., %iowait matters; individual process VIRT memory often does not). Too many alerts lead to “alert fatigue.”
5.4 Secure Monitoring Tools
- Restrict access to
sarlogs (chmod 600 /var/log/sysstat/*). - Password-protect
glancesweb interface (glances -w --password).
6. Conclusion
Linux resource monitoring is a cornerstone of system reliability. By mastering basic commands like top, free, and iostat, you can quickly diagnose issues. Advanced tools like glances, nmon, and sar extend this capability to historical analysis and automation. Remember: monitoring