dotlinux guide

Monitoring Linux Systems with Nagios: A Practical Guide

In today’s digital landscape, ensuring the reliability and performance of Linux systems is critical for businesses and IT teams. Unplanned downtime, resource bottlenecks, or service failures can lead to lost revenue, damaged reputation, and operational disruptions. This is where monitoring tools like Nagios come into play. Nagios, an open-source monitoring system, provides robust capabilities to track the health, performance, and availability of Linux servers, networks, and applications. This guide will walk you through the fundamentals of Nagios, its core components, installation on Linux, configuration for monitoring Linux systems, advanced use cases, best practices, and troubleshooting tips. By the end, you’ll be able to set up a Nagios monitoring environment to keep your Linux infrastructure in check.

Table of Contents

  1. What is Nagios?
  2. Key Components of Nagios
  3. Installing Nagios on Linux
  4. Configuring Nagios for Linux Monitoring
  5. Advanced Monitoring Scenarios
  6. Best Practices for Nagios Monitoring
  7. Troubleshooting Common Issues
  8. Conclusion
  9. References

What is Nagios?

Nagios is an open-source monitoring system designed to monitor infrastructure components (servers, networks, storage) and applications. It alerts administrators to failures or performance degradation, enabling proactive issue resolution. Two primary variants exist:

  • Nagios Core: The free, open-source foundation (focus of this guide).
  • Nagios XI: A commercial, enterprise-grade version with additional features like a GUI, reporting, and integrations.

For Linux system monitoring, Nagios Core is often the starting point due to its flexibility and cost-effectiveness. It supports custom plugins, remote monitoring, and extensible alerting (email, SMS, Slack, etc.).

Key Components of Nagios

To effectively use Nagios, it’s essential to understand its core components:

ComponentPurpose
Nagios CoreThe engine that schedules checks, processes results, and triggers alerts.
PluginsExecutable scripts/tools (e.g., check_ping, check_disk) that perform monitoring checks.
NRPE (Nagios Remote Plugin Executor)Allows Nagios to run plugins on remote Linux hosts (e.g., checking CPU usage on a remote server).
NDOUtilsStores monitoring data in a database (MySQL/PostgreSQL) for reporting and historical analysis.
Web InterfaceA browser-based dashboard to view host/service status, alerts, and reports (powered by Apache/PHP).
Object DefinitionsConfiguration files defining hosts, services, commands, contacts, and contact groups.

Installing Nagios on Linux

This section walks through installing Nagios Core on Ubuntu 22.04 LTS (steps are similar for other Debian/Ubuntu-based distros). For RHEL/CentOS, replace apt with yum/dnf and adjust dependencies.

Prerequisites

  • A Linux server (physical or virtual) with Ubuntu 22.04.
  • Sudo privileges.
  • Internet access to download packages.

Step 1: Install Dependencies

Nagios requires Apache (web server), PHP (for the web interface), and build tools. Run:

sudo apt update && sudo apt upgrade -y
sudo apt install -y apache2 php libapache2-mod-php build-essential libgd-dev libssl-dev unzip

Step 2: Create Nagios User and Group

Nagios runs under a dedicated user/group for security:

sudo useradd nagios
sudo groupadd nagcmd
sudo usermod -aG nagcmd nagios
sudo usermod -aG nagcmd www-data  # Allow Apache to access Nagios files

Step 3: Download and Compile Nagios Core

Nagios Core is installed from source for flexibility:

# Download the latest Nagios Core (check https://www.nagios.org/downloads/nagios-core/ for updates)
wget https://github.com/NagiosEnterprises/nagioscore/archive/refs/tags/nagios-4.5.2.tar.gz -O nagios-core.tar.gz

# Extract and compile
tar xzf nagios-core.tar.gz
cd nagioscore-nagios-4.5.2/

# Configure and build
./configure --with-command-group=nagcmd
make all
sudo make install
sudo make install-init  # Install systemd service
sudo make install-config  # Install sample configs
sudo make install-commandmode  # Set permissions for command files

Step 4: Install Nagios Web Interface

Configure Apache to serve the Nagios web interface:

sudo make install-webconf  # Install Apache config
sudo a2enmod cgi  # Enable CGI module (required for Nagios web UI)

# Create a web admin user (e.g., "nagiosadmin")
sudo htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
# Enter a password when prompted (remember this for logging into the web UI)

Step 5: Start Nagios and Apache

sudo systemctl start nagios
sudo systemctl enable nagios  # Start on boot
sudo systemctl restart apache2

Verify Installation

Access the Nagios web interface at http://<your-server-ip>/nagios. Log in with nagiosadmin and the password you set. You’ll see a dashboard with default checks (e.g., localhost status).

Configuring Nagios for Linux Monitoring

Nagios uses text-based configuration files to define what to monitor. The primary config directory is /usr/local/nagios/etc/, with key files:

  • nagios.cfg: Main configuration (points to object definitions).
  • objects/: Contains host, service, command, and contact definitions (e.g., hosts.cfg, services.cfg).

Key Configuration Concepts

  • Host: A device to monitor (e.g., a Linux server, router).
  • Service: A specific metric/process on a host (e.g., CPU usage, HTTP service).
  • Command: A plugin or script Nagios runs to check a service (e.g., check_ping).
  • Contact: A person/team to alert (e.g., email, Slack).

Example 1: Monitor a Remote Linux Host with NRPE

To monitor a remote Linux server (e.g., server01 with IP 192.168.1.100), we use NRPE. Here’s how:

Step 1: Install NRPE on the Remote Host (Client)

On server01, install NRPE and Nagios plugins:

sudo apt install -y nagios-nrpe-server nagios-plugins  # Ubuntu/Debian
# For RHEL/CentOS: sudo yum install -y nrpe nagios-plugins-all

Step 2: Configure NRPE on the Client

Edit /etc/nagios/nrpe.cfg (Ubuntu) or /etc/nrpe.cfg (RHEL/CentOS):

# Allow Nagios server IP to connect (replace with your Nagios server IP)
allowed_hosts=127.0.0.1,192.168.1.200  # 192.168.1.200 = Nagios server IP

# Define commands NRPE can run (add these at the bottom)
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
# -w: Warning threshold (1min, 5min, 15min load averages)
# -c: Critical threshold

command[check_disk]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
# Check root filesystem (/), warn at 20% free, critical at 10% free

command[check_mem]=/usr/lib/nagios/plugins/check_memory.pl -w 80 -c 90
# Requires check_memory.pl plugin (see "Custom Plugins" below)

Step 3: Restart NRPE on the Client

sudo systemctl restart nagios-nrpe-server  # Ubuntu/Debian
# For RHEL/CentOS: sudo systemctl restart nrpe

Step 4: Define the Host and Services on the Nagios Server

Create a new config file for server01 in /usr/local/nagios/etc/objects/:

sudo nano /usr/local/nagios/etc/objects/server01.cfg

Add the following:

# Define the remote host
define host {
    use                     linux-server  # Use the "linux-server" template (from templates.cfg)
    host_name               server01
    alias                   Production Web Server
    address                 192.168.1.100  # Remote host IP
    max_check_attempts      3
    check_period            24x7
    notification_interval   30
    notification_period     24x7
}

# Define services to monitor on server01
define service {
    use                     generic-service
    host_name               server01
    service_description     CPU Load
    check_command           check_nrpe!check_load  # Run "check_load" via NRPE
    check_interval          5
    retry_interval          1
}

define service {
    use                     generic-service
    host_name               server01
    service_description     Root Disk Space
    check_command           check_nrpe!check_disk
    check_interval          10
}

Step 5: Update Nagios Main Config

Tell Nagios to load the new server01.cfg by adding this line to /usr/local/nagios/etc/nagios.cfg:

cfg_file=/usr/local/nagios/etc/objects/server01.cfg

Step 6: Verify and Restart Nagios

sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg  # Check for config errors
sudo systemctl restart nagios

Now, in the Nagios web UI, server01 and its services (CPU Load, Disk Space) will appear under “Hosts”.

Example 2: Custom Plugins

Nagios supports custom scripts for unique monitoring needs. For example, a script to check SSH connections:

  1. Create a plugin script on the Nagios server (or remote client, if using NRPE):
sudo nano /usr/local/nagios/libexec/check_ssh_connections.sh

Add:

#!/bin/bash
# Check number of SSH connections
CONNECTIONS=$(ss -tuln | grep ssh | wc -l)
WARNING=10
CRITICAL=20

if [ $CONNECTIONS -ge $CRITICAL ]; then
    echo "CRITICAL: $CONNECTIONS SSH connections | connections=$CONNECTIONS"
    exit 2
elif [ $CONNECTIONS -ge $WARNING ]; then
    echo "WARNING: $CONNECTIONS SSH connections | connections=$CONNECTIONS"
    exit 1
else
    echo "OK: $CONNECTIONS SSH connections | connections=$CONNECTIONS"
    exit 0
fi
  1. Make it executable:
sudo chmod +x /usr/local/nagios/libexec/check_ssh_connections.sh
  1. Define a Nagios command in commands.cfg:
define command {
    command_name    check_ssh_connections
    command_line    /usr/local/nagios/libexec/check_ssh_connections.sh
}
  1. Add a service to monitor SSH connections on server01 (in server01.cfg):
define service {
    use                     generic-service
    host_name               server01
    service_description     SSH Connections
    check_command           check_nrpe!check_ssh_connections  # If script is on client
    # OR, if script is on Nagios server: check_command check_ssh_connections
}

Advanced Monitoring Scenarios

Log Monitoring

Nagios can monitor log files (e.g., /var/log/auth.log) for errors using the check_logfiles plugin:

# Install check_logfiles on the Nagios server
sudo apt install -y nagios-plugin-check-logfiles

# Define a command in commands.cfg
define command {
    command_name    check_auth_log
    command_line    /usr/lib/nagios/plugins/check_logfiles --logfile /var/log/auth.log --criticalpattern "Failed password" --warningpattern "Accepted password"
}

Integrating with Grafana for Visualization

For advanced dashboards, forward Nagios data to Grafana using NDOUtils (stores data in MySQL) and the Nagios data source.

Best Practices for Nagios Monitoring

  1. Use Templates: Reduce redundancy by defining host/service templates (e.g., linux-server template with common settings like check_period).
  2. Set Realistic Thresholds: Avoid alert fatigue by tuning warning/critical thresholds (e.g., adjust check_load based on server CPU cores).
  3. Secure NRPE: Restrict allowed_hosts to Nagios server IPs and enable SSL (ssl_enable=1 in nrpe.cfg).
  4. Backup Configs: Regularly back up /usr/local/nagios/etc/ to avoid losing custom definitions.
  5. Monitor Critical Services First: Prioritize monitoring core services (e.g., SSH, HTTP, database) before non-essential ones.
  6. Document Configs: Maintain a wiki/docs for host/service definitions, plugins, and alert recipients.

Troubleshooting Common Issues

Nagios Fails to Start

Check for config errors:

sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Fix errors (e.g., missing } in config files) and restart:

sudo systemctl restart nagios

NRPE “Connection Refused”

  • Ensure NRPE is running on the client: sudo systemctl status nagios-nrpe-server.
  • Verify firewall rules allow port 5666 (NRPE port):
    sudo ufw allow 5666/tcp  # On client
  • Confirm allowed_hosts in nrpe.cfg includes the Nagios server IP.

Plugin Returns “Unknown”

Test the plugin manually to debug:

# On Nagios server, test NRPE command
/usr/local/nagios/libexec/check_nrpe -H 192.168.1.100 -c check_load

# On client, test the plugin directly
/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20

Conclusion

Nagios is a powerful, flexible tool for monitoring Linux systems, offering deep visibility into server health and performance. By following this guide, you’ve learned to install Nagios, configure remote monitoring with NRPE, define custom services, and troubleshoot common issues.

To expand further, explore Nagios XI for enterprise features, integrate with alerting tools like PagerDuty, or automate config management with Ansible. With proper setup and best practices, Nagios will become a cornerstone of your Linux infrastructure monitoring strategy.

References