In today’s digital landscape, ensuring the reliability and performance of Linux systems is critical for businesses and IT teams. Unplanned downtime, resource bottlenecks, or service failures can lead to lost revenue, damaged reputation, and operational disruptions. This is where monitoring tools like Nagios come into play. Nagios, an open-source monitoring system, provides robust capabilities to track the health, performance, and availability of Linux servers, networks, and applications. This guide will walk you through the fundamentals of Nagios, its core components, installation on Linux, configuration for monitoring Linux systems, advanced use cases, best practices, and troubleshooting tips. By the end, you’ll be able to set up a Nagios monitoring environment to keep your Linux infrastructure in check.
Table of Contents
- What is Nagios?
- Key Components of Nagios
- Installing Nagios on Linux
- Configuring Nagios for Linux Monitoring
- Advanced Monitoring Scenarios
- Best Practices for Nagios Monitoring
- Troubleshooting Common Issues
- Conclusion
- References
What is Nagios?
Nagios is an open-source monitoring system designed to monitor infrastructure components (servers, networks, storage) and applications. It alerts administrators to failures or performance degradation, enabling proactive issue resolution. Two primary variants exist:
- Nagios Core: The free, open-source foundation (focus of this guide).
- Nagios XI: A commercial, enterprise-grade version with additional features like a GUI, reporting, and integrations.
For Linux system monitoring, Nagios Core is often the starting point due to its flexibility and cost-effectiveness. It supports custom plugins, remote monitoring, and extensible alerting (email, SMS, Slack, etc.).
Key Components of Nagios
To effectively use Nagios, it’s essential to understand its core components:
| Component | Purpose |
|---|---|
| Nagios Core | The engine that schedules checks, processes results, and triggers alerts. |
| Plugins | Executable scripts/tools (e.g., check_ping, check_disk) that perform monitoring checks. |
| NRPE (Nagios Remote Plugin Executor) | Allows Nagios to run plugins on remote Linux hosts (e.g., checking CPU usage on a remote server). |
| NDOUtils | Stores monitoring data in a database (MySQL/PostgreSQL) for reporting and historical analysis. |
| Web Interface | A browser-based dashboard to view host/service status, alerts, and reports (powered by Apache/PHP). |
| Object Definitions | Configuration files defining hosts, services, commands, contacts, and contact groups. |
Installing Nagios on Linux
This section walks through installing Nagios Core on Ubuntu 22.04 LTS (steps are similar for other Debian/Ubuntu-based distros). For RHEL/CentOS, replace apt with yum/dnf and adjust dependencies.
Prerequisites
- A Linux server (physical or virtual) with Ubuntu 22.04.
- Sudo privileges.
- Internet access to download packages.
Step 1: Install Dependencies
Nagios requires Apache (web server), PHP (for the web interface), and build tools. Run:
sudo apt update && sudo apt upgrade -y
sudo apt install -y apache2 php libapache2-mod-php build-essential libgd-dev libssl-dev unzip
Step 2: Create Nagios User and Group
Nagios runs under a dedicated user/group for security:
sudo useradd nagios
sudo groupadd nagcmd
sudo usermod -aG nagcmd nagios
sudo usermod -aG nagcmd www-data # Allow Apache to access Nagios files
Step 3: Download and Compile Nagios Core
Nagios Core is installed from source for flexibility:
# Download the latest Nagios Core (check https://www.nagios.org/downloads/nagios-core/ for updates)
wget https://github.com/NagiosEnterprises/nagioscore/archive/refs/tags/nagios-4.5.2.tar.gz -O nagios-core.tar.gz
# Extract and compile
tar xzf nagios-core.tar.gz
cd nagioscore-nagios-4.5.2/
# Configure and build
./configure --with-command-group=nagcmd
make all
sudo make install
sudo make install-init # Install systemd service
sudo make install-config # Install sample configs
sudo make install-commandmode # Set permissions for command files
Step 4: Install Nagios Web Interface
Configure Apache to serve the Nagios web interface:
sudo make install-webconf # Install Apache config
sudo a2enmod cgi # Enable CGI module (required for Nagios web UI)
# Create a web admin user (e.g., "nagiosadmin")
sudo htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
# Enter a password when prompted (remember this for logging into the web UI)
Step 5: Start Nagios and Apache
sudo systemctl start nagios
sudo systemctl enable nagios # Start on boot
sudo systemctl restart apache2
Verify Installation
Access the Nagios web interface at http://<your-server-ip>/nagios. Log in with nagiosadmin and the password you set. You’ll see a dashboard with default checks (e.g., localhost status).
Configuring Nagios for Linux Monitoring
Nagios uses text-based configuration files to define what to monitor. The primary config directory is /usr/local/nagios/etc/, with key files:
nagios.cfg: Main configuration (points to object definitions).objects/: Contains host, service, command, and contact definitions (e.g.,hosts.cfg,services.cfg).
Key Configuration Concepts
- Host: A device to monitor (e.g., a Linux server, router).
- Service: A specific metric/process on a host (e.g., CPU usage, HTTP service).
- Command: A plugin or script Nagios runs to check a service (e.g.,
check_ping). - Contact: A person/team to alert (e.g., email, Slack).
Example 1: Monitor a Remote Linux Host with NRPE
To monitor a remote Linux server (e.g., server01 with IP 192.168.1.100), we use NRPE. Here’s how:
Step 1: Install NRPE on the Remote Host (Client)
On server01, install NRPE and Nagios plugins:
sudo apt install -y nagios-nrpe-server nagios-plugins # Ubuntu/Debian
# For RHEL/CentOS: sudo yum install -y nrpe nagios-plugins-all
Step 2: Configure NRPE on the Client
Edit /etc/nagios/nrpe.cfg (Ubuntu) or /etc/nrpe.cfg (RHEL/CentOS):
# Allow Nagios server IP to connect (replace with your Nagios server IP)
allowed_hosts=127.0.0.1,192.168.1.200 # 192.168.1.200 = Nagios server IP
# Define commands NRPE can run (add these at the bottom)
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
# -w: Warning threshold (1min, 5min, 15min load averages)
# -c: Critical threshold
command[check_disk]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
# Check root filesystem (/), warn at 20% free, critical at 10% free
command[check_mem]=/usr/lib/nagios/plugins/check_memory.pl -w 80 -c 90
# Requires check_memory.pl plugin (see "Custom Plugins" below)
Step 3: Restart NRPE on the Client
sudo systemctl restart nagios-nrpe-server # Ubuntu/Debian
# For RHEL/CentOS: sudo systemctl restart nrpe
Step 4: Define the Host and Services on the Nagios Server
Create a new config file for server01 in /usr/local/nagios/etc/objects/:
sudo nano /usr/local/nagios/etc/objects/server01.cfg
Add the following:
# Define the remote host
define host {
use linux-server # Use the "linux-server" template (from templates.cfg)
host_name server01
alias Production Web Server
address 192.168.1.100 # Remote host IP
max_check_attempts 3
check_period 24x7
notification_interval 30
notification_period 24x7
}
# Define services to monitor on server01
define service {
use generic-service
host_name server01
service_description CPU Load
check_command check_nrpe!check_load # Run "check_load" via NRPE
check_interval 5
retry_interval 1
}
define service {
use generic-service
host_name server01
service_description Root Disk Space
check_command check_nrpe!check_disk
check_interval 10
}
Step 5: Update Nagios Main Config
Tell Nagios to load the new server01.cfg by adding this line to /usr/local/nagios/etc/nagios.cfg:
cfg_file=/usr/local/nagios/etc/objects/server01.cfg
Step 6: Verify and Restart Nagios
sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg # Check for config errors
sudo systemctl restart nagios
Now, in the Nagios web UI, server01 and its services (CPU Load, Disk Space) will appear under “Hosts”.
Example 2: Custom Plugins
Nagios supports custom scripts for unique monitoring needs. For example, a script to check SSH connections:
- Create a plugin script on the Nagios server (or remote client, if using NRPE):
sudo nano /usr/local/nagios/libexec/check_ssh_connections.sh
Add:
#!/bin/bash
# Check number of SSH connections
CONNECTIONS=$(ss -tuln | grep ssh | wc -l)
WARNING=10
CRITICAL=20
if [ $CONNECTIONS -ge $CRITICAL ]; then
echo "CRITICAL: $CONNECTIONS SSH connections | connections=$CONNECTIONS"
exit 2
elif [ $CONNECTIONS -ge $WARNING ]; then
echo "WARNING: $CONNECTIONS SSH connections | connections=$CONNECTIONS"
exit 1
else
echo "OK: $CONNECTIONS SSH connections | connections=$CONNECTIONS"
exit 0
fi
- Make it executable:
sudo chmod +x /usr/local/nagios/libexec/check_ssh_connections.sh
- Define a Nagios command in
commands.cfg:
define command {
command_name check_ssh_connections
command_line /usr/local/nagios/libexec/check_ssh_connections.sh
}
- Add a service to monitor SSH connections on
server01(inserver01.cfg):
define service {
use generic-service
host_name server01
service_description SSH Connections
check_command check_nrpe!check_ssh_connections # If script is on client
# OR, if script is on Nagios server: check_command check_ssh_connections
}
Advanced Monitoring Scenarios
Log Monitoring
Nagios can monitor log files (e.g., /var/log/auth.log) for errors using the check_logfiles plugin:
# Install check_logfiles on the Nagios server
sudo apt install -y nagios-plugin-check-logfiles
# Define a command in commands.cfg
define command {
command_name check_auth_log
command_line /usr/lib/nagios/plugins/check_logfiles --logfile /var/log/auth.log --criticalpattern "Failed password" --warningpattern "Accepted password"
}
Integrating with Grafana for Visualization
For advanced dashboards, forward Nagios data to Grafana using NDOUtils (stores data in MySQL) and the Nagios data source.
Best Practices for Nagios Monitoring
- Use Templates: Reduce redundancy by defining host/service templates (e.g.,
linux-servertemplate with common settings likecheck_period). - Set Realistic Thresholds: Avoid alert fatigue by tuning warning/critical thresholds (e.g., adjust
check_loadbased on server CPU cores). - Secure NRPE: Restrict
allowed_hoststo Nagios server IPs and enable SSL (ssl_enable=1innrpe.cfg). - Backup Configs: Regularly back up
/usr/local/nagios/etc/to avoid losing custom definitions. - Monitor Critical Services First: Prioritize monitoring core services (e.g., SSH, HTTP, database) before non-essential ones.
- Document Configs: Maintain a wiki/docs for host/service definitions, plugins, and alert recipients.
Troubleshooting Common Issues
Nagios Fails to Start
Check for config errors:
sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Fix errors (e.g., missing } in config files) and restart:
sudo systemctl restart nagios
NRPE “Connection Refused”
- Ensure NRPE is running on the client:
sudo systemctl status nagios-nrpe-server. - Verify firewall rules allow port 5666 (NRPE port):
sudo ufw allow 5666/tcp # On client - Confirm
allowed_hostsinnrpe.cfgincludes the Nagios server IP.
Plugin Returns “Unknown”
Test the plugin manually to debug:
# On Nagios server, test NRPE command
/usr/local/nagios/libexec/check_nrpe -H 192.168.1.100 -c check_load
# On client, test the plugin directly
/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
Conclusion
Nagios is a powerful, flexible tool for monitoring Linux systems, offering deep visibility into server health and performance. By following this guide, you’ve learned to install Nagios, configure remote monitoring with NRPE, define custom services, and troubleshoot common issues.
To expand further, explore Nagios XI for enterprise features, integrate with alerting tools like PagerDuty, or automate config management with Ansible. With proper setup and best practices, Nagios will become a cornerstone of your Linux infrastructure monitoring strategy.