dotlinux guide

Managing Disk Space: Critical Tools and Strategies for Linux Servers

In the world of Linux server administration, disk space is a foundational resource—yet it’s often overlooked until a crisis strikes. A full disk can cripple services, corrupt data, or even bring down critical applications. Whether you’re managing a small VPS or a sprawling data center, proactive disk space management is non-negotiable. This blog explores the fundamental concepts, essential tools, and proven strategies to keep your Linux server’s storage healthy, efficient, and resilient. By the end, you’ll be equipped to diagnose issues, automate maintenance, and avoid costly downtime.

Table of Contents

Fundamental Concepts

Before diving into tools and strategies, it’s critical to understand key storage concepts that underpin Linux disk management:

1. Filesystems and Mount Points

Linux organizes storage into filesystems (e.g., ext4, XFS, Btrfs) that are “mounted” to directories (e.g., /, /home, /var). The mount command lists active mounts, while /etc/fstab defines persistent mounts.

Example:

mount | grep /dev/sda1  # Check mount details for /dev/sda1
cat /etc/fstab          # View persistent mount configurations

2. Free vs. Available Space

The df command reports “free” space, but some space is reserved for the root user (typically 5% by default) to prevent system instability when disks fill up. The “available” space reflects what’s usable by non-root users.

3. Inodes

Inodes track metadata (permissions, ownership, timestamps) for files and directories. Each file/directory consumes one inode. A disk can run out of inodes even if there’s free space, blocking new file creation. Use df -i to check inode usage.

4. Blocks and Block Size

Disks are divided into fixed-size blocks (e.g., 4KB). Small files may waste space (internal fragmentation) if they occupy a full block. Block size is set during filesystem creation (e.g., mkfs.ext4 -b 4096).

Essential Disk Space Management Tools

Linux offers a robust toolkit to monitor, analyze, and reclaim disk space. Below are the most critical tools, with practical examples.

1. df (Disk Free)

Purpose: Check overall disk usage and free space for mounted filesystems.

Common Options:

  • -h: Human-readable format (GB, MB).
  • -i: Show inode usage instead of block usage.
  • -T: Display filesystem type.

Examples:

# Check free space for all filesystems (human-readable)
df -h

# Check inode usage to avoid inode exhaustion
df -i

# Show filesystem types and free space
df -Th

2. du (Disk Usage)

Purpose: Analyze space usage of specific directories or files.

Common Options:

  • -s: Summarize total usage for a directory.
  • -h: Human-readable format.
  • --max-depth=N: Limit recursion to N levels (e.g., --max-depth=1 for top-level directories).

Examples:

# Total size of /var/log (summary)
du -sh /var/log

# Size of each subdirectory in /home (1 level deep)
du -h --max-depth=1 /home

# Find largest files in /tmp (sort by size, descending)
du -ah /tmp | sort -rh | head -5

3. ncdu (NCurses Disk Usage)

Purpose: Interactive, terminal-based tool for exploring disk usage (more user-friendly than du).

Installation:

# Debian/Ubuntu
sudo apt install ncdu

# RHEL/CentOS
sudo yum install ncdu

Usage: Run ncdu /path/to/directory to launch an interactive explorer. Navigate with arrow keys, delete files with d, and sort with s (size) or n (name).

4. find

Purpose: Locate large or old files for cleanup.

Common Use Cases:

  • Find files >100MB:

    find / -type f -size +100M -exec ls -lh {} \; 2>/dev/null

    (The 2>/dev/null suppresses permission errors.)

  • Find files modified >30 days ago:

    find /var/log -type f -mtime +30 -name "*.log" -print
  • Delete old temp files (use -delete cautiously!):

    find /tmp -type f -mtime +7 -delete  # Delete files >7 days old in /tmp

5. lsof (List Open Files)

Purpose: Identify files held open by processes, including “deleted” files that still consume space.

Common Use Case: Resolve “disk full” errors caused by open deleted files (e.g., logs deleted without restarting the process writing to them).

Example:

# Find open deleted files (space is freed when the process exits)
lsof | grep deleted

# Kill the process to free space (replace PID)
kill -9 <PID>

6. logrotate

Purpose: Automatically rotate, compress, and delete old log files (critical for preventing /var/log bloat).

Configuration: Logrotate rules are defined in /etc/logrotate.conf and /etc/logrotate.d/. Example for Nginx logs:

/var/log/nginx/*.log {
    daily               # Rotate daily
    missingok           # Ignore missing files
    rotate 14           # Keep 14 days of logs
    compress            # Compress old logs with gzip
    delaycompress       # Compress next rotation (not immediately)
    notifempty          # Don’t rotate empty logs
    create 0640 www-data www-data  # Set permissions on new logs
}

7. quota

Purpose: Limit disk usage for users or groups (prevents one user from filling the disk).

Setup:

  1. Enable quotas in /etc/fstab (add usrquota or grpquota to the filesystem options):
    /dev/sda1 /home ext4 defaults,usrquota 0 0
  2. Remount the filesystem and initialize quota databases:
    sudo mount -o remount /home
    sudo quotacheck -cu /home  # -c (create), -u (user quotas)
  3. Set a soft limit (warning) and hard limit (enforced) for a user:
    sudo edquota -u alice  # Edit quotas interactively

Proactive Strategies for Disk Space Management

Reactive cleanup is necessary, but proactive management prevents crises. Here’s how to stay ahead:

1. Monitor and Alert

Use tools to track disk usage in real time and trigger alerts before space runs out:

  • Prometheus + Grafana: Collect metrics (via node_exporter) and visualize trends.
  • Nagios/Icinga: Set thresholds (e.g., alert at 85% usage) and send emails/Slack notifications.
  • Simple Scripts: Use df in a cron job to check usage and alert via mail:
    # Example: Alert if /dev/sda1 exceeds 90% usage
    df -h /dev/sda1 | awk 'NR==2 {gsub("%",""); if($5>90) print "Disk full: " $0 | "mail -s 'Disk Alert' [email protected]"}'

2. Automate Cleanup

Schedule scripts to remove unnecessary files:

  • Old Logs: Use find to delete logs older than 30 days:
    # Add to crontab (run daily at 2 AM)
    0 2 * * * find /var/log -name "*.log.*" -mtime +30 -delete
  • Temp Files: Clean /tmp (ensure no critical processes are using files here!):
    find /tmp -type f -mtime +7 -delete  # Delete files >7 days old
  • Unused Packages: Remove cached Debian packages:
    sudo apt clean && sudo apt autoremove -y

3. Use LVM for Flexibility

Logical Volume Management (LVM) lets you resize partitions dynamically without downtime. Create “volume groups” (VGs) from physical disks, then carve out “logical volumes” (LVs) for filesystems.

Example Workflow:

  • Extend a logical volume:
    sudo lvextend -L +10G /dev/vg01/lv_root  # Add 10GB to LV
    sudo resize2fs /dev/vg01/lv_root        # Resize ext4 filesystem

4. Thin Provisioning

Over-allocate storage initially, and only consume physical space as data is written (common in virtualized environments like VMware or KVM). Use LVM thin pools or ZFS thin provisioning to avoid over-provisioning.

5. Archive to Remote Storage

Move infrequently accessed data (e.g., old backups, logs) to cheaper remote storage:

  • NFS/SMB: Mount network shares for centralized storage.
  • S3/GCS: Use tools like s3cmd or gsutil to archive data to cloud object storage.

Common Pitfalls and Solutions

Even experienced admins hit snags. Here are critical pitfalls to avoid:

1. Inode Exhaustion

Issue: Running out of inodes (common with many small files, e.g., in /tmp or user directories).
Solution: Check with df -i. Delete unnecessary small files or recreate the filesystem with more inodes (e.g., mkfs.ext4 -i 8192 for more inodes per block).

2. Hidden Files in Mounted Directories

Issue: Mounting a filesystem over a non-empty directory hides the original files, which still consume space.
Example: If /mnt/data has 10GB of files and you mount a new drive to /mnt/data, the original 10GB is hidden but not deleted.
Solution: Unmount the drive, delete the hidden files, then remount:

sudo umount /mnt/data
sudo rm -rf /mnt/data/*  # Clean up original files
sudo mount /mnt/data

3. Open Deleted Files

Issue: Files deleted with rm but still held open by a process continue to consume space.
Solution: Use lsof | grep deleted to find the process, then restart it to free the space.

4. Log Rotation Failures

Issue: Misconfigured logrotate (e.g., missing create or delaycompress) causes logs to grow indefinitely.
Solution: Test logrotate rules with logrotate -d /etc/logrotate.conf (dry run) and ensure rotate/compress are set.

Best Practices

To maintain a healthy storage environment, follow these guidelines:

1. Separate Partitions

Split /, /home, /var, and /tmp into separate partitions. This prevents one full partition (e.g., /var/log) from crashing the entire system.

2. Audit Regularly

Conduct monthly audits with ncdu or du to identify growing directories (e.g., /var/lib/docker for container logs).

3. Document Storage Layout

Map physical disks, LVM volumes, and mount points (e.g., in a wiki) to avoid confusion during emergencies.

4. Test Recovery Procedures

Practice resizing LVM volumes, restoring from archives, and resolving inode issues in a staging environment.

5. Avoid Overcommitting

Reserve 10-15% free space for unexpected growth (e.g., log spikes during traffic surges).

Conclusion

Effective disk space management is a cornerstone of Linux server reliability. By mastering tools like df, du, and ncdu, implementing proactive monitoring, and following best practices like LVM usage and automation, you can prevent downtime and ensure your server’s storage scales with demand. Remember: An ounce of prevention (via alerts and cleanup scripts) is worth a pound of cure (emergency disk resizing).

References