dotlinux guide

Linux Resource Monitoring: Basic Commands and Tools Explained

In the world of Linux systems administration, ensuring optimal performance and reliability hinges on understanding how system resources are utilized. Whether you’re troubleshooting a slow server, optimizing application performance, or pre-empting failures, resource monitoring is an indispensable skill. This blog demystifies Linux resource monitoring by breaking down fundamental concepts, essential commands, advanced tools, and best practices. By the end, you’ll be equipped to diagnose bottlenecks, track trends, and keep your Linux systems running smoothly.

Table of Contents

  1. Fundamental Concepts of Linux Resource Monitoring
  2. Basic Monitoring Commands
  3. Advanced Monitoring Tools
  4. Common Monitoring Practices
  5. Best Practices for Effective Monitoring
  6. Conclusion
  7. References

1. Fundamental Concepts of Linux Resource Monitoring

Before diving into tools, it’s critical to understand the key resources and metrics Linux systems expose. These form the foundation of monitoring:

1.1 Key Resources to Monitor

Linux systems rely on four primary resources:

  • CPU (Central Processing Unit): Executes instructions. Metrics include utilization, load average, and core distribution.
  • Memory (RAM/Swap): Temporary data storage. Metrics include total/used/free memory, swap usage, and cache/buffer utilization.
  • Disk I/O: Read/write operations on storage devices. Metrics include throughput (MB/s), I/O operations per second (IOPS), and latency.
  • Network: Data transfer over interfaces. Metrics include bandwidth (MB/s), packet loss, latency, and connection counts.

1.2 Critical Metrics

  • CPU Load Average: The average number of processes waiting for CPU time over 1, 5, and 15 minutes (e.g., load average: 0.8, 1.2, 0.9). A load >1 per CPU core indicates saturation.
  • Memory “Available” vs. “Free”: free memory is unused, while available memory (reported by free -h) includes free + cache/buffer memory that can be reclaimed for applications.
  • Disk %util: Percentage of time the disk is busy handling I/O. >80% often indicates a bottleneck.
  • Network Latency: Time for a packet to travel (e.g., ping reports round-trip time).

2. Basic Monitoring Commands

These built-in or pre-installed commands provide quick, real-time insights into system health.

2.1 top: Real-Time Process Monitoring

top is the gold standard for live CPU/memory tracking. It displays running processes, sorted by resource usage.

Usage:

top

Sample Output:

top - 14:30:00 up 5 days, 2h,  1 user,  load average: 0.75, 0.82, 0.90
Tasks: 189 total,   1 running, 188 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.3 us,  2.1 sy,  0.0 ni, 85.0 id,  0.2 wa,  0.0 hi,  0.4 si,  0.0 st
MiB Mem :  15987.4 total,   2345.1 free,   5678.2 used,   7964.1 buff/cache
MiB Swap:   2048.0 total,   1980.5 free,     67.5 used.   9876.3 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   1234 ubuntu    20   0  120000  45000  20000 R  25.0   0.3   5:23.12 python3
   5678 root      20   0   80000  15000   5000 S   5.0   0.1   2:10.00 nginx

Key Takeaways:

  • load average: 1/5/15-minute CPU queue length.
  • %CPU: Breakdown (us=user, sy=system, wa=I/O wait). High wa suggests disk bottlenecks.
  • RES: Resident memory (actual RAM used by the process, not swapped).
  • Interactive Controls: Press P to sort by CPU, M by memory, k to kill a process, q to quit.

2.2 ps: Process Snapshot

Unlike top, ps captures a one-time snapshot of processes. Use it to filter or export process data.

Common Usage:

# List all processes (BSD style)
ps aux  

# List processes by memory usage (descending)
ps aux --sort=-%mem | head  

# Filter by process name (e.g., "nginx")
ps aux | grep nginx  

Sample Output (ps aux --sort=-%mem | head):

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
ubuntu    1234 25.0  0.3 120000 45000 pts/0    R+   14:30   5:23 python3
root      5678  5.0  0.1  80000 15000 ?        Ss   10:00   2:10 nginx

2.3 free: Memory Usage

free reports RAM and swap utilization. Use -h for human-readable units (GB/MB).

Usage:

free -h  

Sample Output:

              total        used        free      shared  buff/cache   available
Mem:           15Gi       5.5Gi       2.3Gi       300Mi        8.0Gi        9.6Gi
Swap:          2.0Gi        67Mi       1.9Gi

Key Columns:

  • available: Estimated memory available for new applications (most important for capacity planning).
  • buff/cache: Memory used for disk caching (temporarily stored for faster access; not “wasted”).

2.4 df and du: Disk Space

  • df (disk free): Checks free space on mounted filesystems.
  • du (disk usage): Analyzes space used by specific directories/files.

Usage:

# Disk free space (human-readable, include inodes)
df -h  

# Disk usage of /var/log (human-readable, summarize)
du -sh /var/log  

Sample Output (df -h):

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        20G   12G  7.5G  61% /
tmpfs           7.8G     0  7.8G   0% /dev/shm
/dev/sdb1       100G   45G   55G  45% /data

Red Flag: A filesystem with Use% > 90% risks crashes or data corruption.

2.5 iostat: Disk I/O Statistics

iostat (from the sysstat package) monitors disk and CPU utilization over time.

Install (if missing):

sudo apt install sysstat  # Debian/Ubuntu  
sudo yum install sysstat  # RHEL/CentOS  

Usage:

iostat 2 3  # Report every 2 seconds, 3 times total  

Sample Output:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.20    0.00    2.10    0.50    0.00   92.20

Device             tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               10.50        120.00        80.00     240000     160000
sdb                2.00         10.00         5.00      20000      10000

Key Metric: %iowait (CPU idle waiting for I/O) and tps (transfers per second). High %iowait (>20%) indicates slow disks.

2.6 vmstat: Virtual Memory Statistics

vmstat provides a holistic view of CPU, memory, disk, and system activity.

Usage:

vmstat 1  # Report every 1 second  

Sample Output:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0    67M  2.3Gi  1.2Gi  6.8Gi    0    0   120    80  500 1200  5  2 92  1  0

Key Columns:

  • r: Number of processes waiting for CPU (high r = CPU saturation).
  • si/so: Swap in/out (non-zero values indicate memory pressure).
  • bi/bo: Blocks read/written per second (disk I/O).

2.7 ss/netstat: Network Connections

ss (socket statistics) replaces netstat (deprecated) for monitoring network sockets, ports, and connections.

Usage:

# List all TCP connections  
ss -tuln  

# List established TCP connections with process names  
ss -tup  

Sample Output (ss -tuln):

Netid State  Recv-Q Send-Q Local Address:Port   Peer Address:PortProcess
tcp   LISTEN 0      128          0.0.0.0:80         0.0.0.0:*     
tcp   LISTEN 0      128          0.0.0.0:22         0.0.0.0:*     
udp   LISTEN 0      512         127.0.0.1:53         0.0.0.0:*     

3. Advanced Monitoring Tools

For deeper analysis or automation, use these powerful, often third-party tools.

3.1 htop: Enhanced top with a Modern UI

htop improves on top with interactive controls, color-coding, and mouse support.

Install:

sudo apt install htop  # Debian/Ubuntu  
sudo yum install htop  # RHEL/CentOS  

Usage:

htop  

Features:

  • Vertical/horizontal scrolling for processes.
  • Customizable columns (e.g., add TCP/UDP connections).
  • One-click sorting by CPU, memory, or runtime.

3.2 glances: All-in-One System Monitor

glances aggregates CPU, memory, disk, network, and process data into a single dashboard. It even supports web/API access.

Install:

pip install glances  # Cross-platform  

Usage:

glances  # CLI mode  
glances -w  # Web server (access at http://<IP>:61208)  

Web Interface:
Glances Web Interface

3.3 nmon: Nigel’s Monitor

nmon (Nigel’s Monitor) is a lightweight tool for capturing and saving system data to a file for later analysis.

Install:

sudo apt install nmon  # Debian/Ubuntu  

Usage:

nmon  # Interactive mode (press 'c' for CPU, 'm' for memory, 'd' for disk)  
nmon -f -s 5 -c 12  # Save data every 5s for 12 samples (output: <hostname>_YYYYMMDD_HHMM.nmon)  

Output Analysis: Use nmonchart (from the nmon package) to convert .nmon files to HTML reports.

3.4 sar: System Activity Reporter

sar (from sysstat) collects historical performance data, making it ideal for trend analysis and capacity planning.

Usage:

sar -u 5 3  # CPU usage every 5s, 3 times  
sar -r  # Memory usage (today)  
sar -f /var/log/sysstat/sa22  # Read data from 22nd of the month  

Sample Output (sar -u):

Linux 5.4.0-100-generic (server)  09/22/2024  _x86_64_  (8 CPU)

14:30:00        CPU     %user     %nice   %system   %iowait    %steal     %idle
14:30:05        all      5.20      0.00      2.10      0.50      0.00     92.20
14:30:10        all      4.80      0.00      1.90      0.40      0.00     92.90

4. Common Monitoring Practices

4.1 Real-Time vs. Periodic Monitoring

  • Real-Time: Use top, htop, or glances to diagnose active issues (e.g., a server suddenly slowing down).
  • Periodic: Use sar or nmon to track trends (e.g., “Is memory usage growing weekly?“).

4.2 Troubleshooting Workflow

When a system is slow:

  1. Check CPU with top/sar -u (high %user = application issue; high %sy = kernel/driver issue).
  2. Check memory with free -h (high swap usage = insufficient RAM).
  3. Check disk I/O with iostat (high %iowait = slow storage).
  4. Check network with ss/ping (high latency = network congestion).

4.3 Pre-Deployment Checks

Before deploying an application:

  • Verify CPU cores/memory match requirements (e.g., “App needs 4GB RAM; free -h shows 8GB available”).
  • Test disk I/O with dd if=/dev/zero of=/tmp/test bs=1G count=1 oflag=direct (measure write speed).

5. Best Practices for Effective Monitoring

5.1 Automate Monitoring

  • Scheduled Data Collection: Use cron to run sar or nmon at intervals (e.g., */5 * * * * sar -o /var/log/sysstat/sar$(date +%d)).
  • Infrastructure as Code (IaC): Deploy monitoring tools with Ansible/Chef (e.g., “Install glances on all web servers”).

5.2 Set Alerts for Thresholds

Define critical thresholds and trigger alerts (e.g., via cron scripts or tools like Prometheus):

  • CPU load > 80% for 5 minutes.
  • Disk Use% > 90%.
  • Memory swap usage > 50%.

Example Alert Script (Bash):

#!/bin/bash
THRESHOLD=90
df -h | awk -v threshold="$THRESHOLD" '$5+0 > threshold {print "Disk full: " $0; exit 1}'  
if [ $? -eq 1 ]; then  
  echo "Disk alert!" | mail -s "Disk Full on $(hostname)" [email protected]  
fi  

5.3 Avoid Over-Monitoring

Focus on actionable metrics (e.g., %iowait matters; individual process VIRT memory often does not). Too many alerts lead to “alert fatigue.”

5.4 Secure Monitoring Tools

  • Restrict access to sar logs (chmod 600 /var/log/sysstat/*).
  • Password-protect glances web interface (glances -w --password).

6. Conclusion

Linux resource monitoring is a cornerstone of system reliability. By mastering basic commands like top, free, and iostat, you can quickly diagnose issues. Advanced tools like glances, nmon, and sar extend this capability to historical analysis and automation. Remember: monitoring