Bare-Bones Monitoring with Monit and RRDtool
How to provide robust monitoring to low-end systems.
When running a critical system, it's necessary to know what resources the system is consuming, to be alerted when resource utilization reaches a specific level and to trend long-term performance. Zabbix and Nagios are two large-scale solutions that monitor, alert and trend system performance, and they each provide a rich user interface. Due to the requirements of those solutions, however, dedicated hardware/VM resources typically are required to host the monitoring solution. For smaller server implementations, options exist for providing basic monitoring, alerting and trending functionality. This article shows how to accomplish basic and custom monitoring and alerting using Monit. It also covers how to monitor long-term trending of system performance with RRDtool.
Initial Monit ConfigurationOn many popular Linux distros, you can install Monit from the associated software repository. Once installed, you can handle all the configuration with the monitrc configuration file. That file generally is located within the /etc directory structure, but the exact location varies based on your distribution.
The config file has two sections: Global and Services. The Global section allows for custom configuration of the Monit application. The Monit service contains a web-based front end that is fully configurable through the config file. Although the section is commented out by default, you can uncomment items selectively for granular customization. The web configuration block looks like this:
set httpd port 2812 and
use address localhost
allow localhost
allow admin:monit
The first line sets the port number where you can access Monit via web browser. The second line sets the hostname (the HTTP Host header) that's used to access Monit. The third line sets the host from which the Monit application can be accessed. Note that you also can do this using a local firewall access restriction if a firewall is currently in place. The fourth line allows the configuration of a user name/password pair for use when accessing Monit. There's also a section that allows SSL options for encrypted connections to Monit. Although enabling SSL is recommended when passing authentication data, you also could reverse-proxy Monit through an existing web server, such as nginx or Apache, provided SSL is already configured on the web server. For more information on reverse-proxying Monit through Apache, see the Resources section at the end of this article.
The next items you need to enable deal with configuring email alerts. To set up the email server through which email will be relayed to the recipient, add or enable the following line:
set mailserver mailserver.company.com
Note that if there's a local SMTP server running, the server name of
mailserver.company.com
in this example may be replaced with
localhost
.
The next block to enable sets the contents of the email alert messages that will be sent and will look similar to this:
set mail-format {
from: Monit <monit@$HOST>
subject: Monit alert -- $EVENT $SERVICE
message: $EVENT Service $SERVICE
Date: $DATE
Action: $ACTION
Host: $HOST
Description: $DESCRIPTION
Your faithful employee,
Monit
}
Within this block, different predefined variables are used to provide alert-specific information (denoted by the $ sign). You can modify text within the from, subject or message fields, and you also can add additional data to the message field as desired.
To finish the alerting functionality, you can configure an email address that will receive all email alerts from Monit by adding the following line:
set alert user@domain.com
At this point, the specified email address will receive all alerts generated by Monit. However, so far, no alerts are configured. To begin configuring alerts, let's first look at the Services section mentioned earlier. That section provides some basic monitoring functionality for the local machine, including CPU, memory, swap, filesystem and basic network monitoring. Each of those configuration items provides for the definition of thresholds. After the thresholds are met, actions can be taken, including sending an alert. As an example, the out-of-the-box alert for CPU/memory/swap monitoring looks like this:
check system $HOST
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then alert
if cpu usage > 95% for 10 cycles then alert
if memory usage > 75% then alert
if swap usage > 25% then alert
Again, note the use of variables to define the host to be monitored. While all of the triggers defined here result in an alert, other actions also can be taken. For more information on these settings, consult the Monit documentation (see Resources).
Custom Configuration of MonitOnce initial configuration is complete, you can define custom alerts. It's best to define the custom alerts outside the monitrc file. You do this by defining an include directory in the monitrc file as follows:
include /opt/monit-custom/*
This line includes all configuration files located in the /opt/monit-custom folder.
Next, let's look at two types of monitoring: host checks and program checks. Host checks allow for the monitoring of TCP-based services running on remote hosts. Although you can do basic TCP port connection testing for simpler services, Monit also provides the ability to do HTTP-based content checks to a specific URL. Consider the following example:
check host linuxjournal-website with address www.linuxjournal.com
if failed
port 443 protocol https
with request / with content = "Become a Patron"
then alert
The first line of the host check defines the identifier within Monit for
this host (linuxjournal-website
) and the address with which the host
will be accessed (www.linuxjournal.com
). In this example, the trigger
within the host definition contains multiple conditions: it must be
accessed via port 443 using the https protocol, and when accessed at
the root URL, the text "Become a Patron" shows up in the response body.
This check could be reconfigured to use port 80 and the http protocol.
Along with host monitoring, Monit allows the definition of script-based monitors, which is called a program check. Once a script is configured within Monit, the script will be executed periodically, and based on the script's exit code, action may be taken.
Here's an example of a script that alerts when an SSL certificate expiration date is within a specified number of days:
#!/bin/bash
domainexpiredate() {
openssl x509 -text -in <(echo -n | \
openssl s_client -connect $1:$2 2>/dev/null | \
sed -n '/-*BEGIN/,/-*END/p') 2>/dev/null | sed -n 's/
↪*Not After : *//p'
}
daysleft() {
echo "((($(date -d "$(domainexpiredate $1 $2)" +%s)-$(date
↪+%s))/24)/60)/60" | bc
}
defaultport() {
if [ -z "$1" ]; then
echo "443"
else
echo "$1"
fi
}
[[ $(daysleft $2 $(defaultport $3)) -le $1 ]] && exit 1 ||
↪exit 0
This script is executed with two arguments: minimum number of days until expiration and the hostname of the server, with an optional third parameter for port number. Here's an example execution of the script:
$ checkcertexpire.sh 31 www.linuxjournal.com
$ echo $?
0
When the script is executed with the two required arguments, there is
no console output. After the execution, if the return code is echoed
(identified as $?
), the value is 0, which indicates that the domain does
not expire within 31 days. Configuring this item within Monit
requires the following:
check program linuxjournal-ssl with path
↪"/etc/monit/scripts/checkcertexpire.sh 31 www.linuxjournal.com"
if status != 0 then alert
In the same way as the host check, the program check has an identifier
within Monit (linuxjournal-ssl
, in this case). In the first line of the
program check, along with the identifier, is the script to be executed
along with the command-line arguments. Note that the trigger indicates
that if the exit code is not 0, an alert should be sent.
RRDtool is a very robust tool that lets you collect data over a long period of time. Named after its database format (round-robin database), RRDtool saves time-based data to its database and then lets you retrieve and graph the data. RRDtool can graph any data that you can present through a command to a shell script.
Before capturing data, you must initialize the database. For this example, let's create a database to capture the five-minute load average. Here's the command to initialize this specific database:
rrdtool create loadavg_db.rrd --step 60
↪DS:loadavg:GAUGE:120:0:10000 RRA:MAX:0.5:1:1500
The first two arguments indicate that a database named
loadavg_db.rrd
is being created. The --step
argument defines the expected time gap
between data samplings. In this case, 60 seconds are expected
between samplings.
Let's look at two more arguments
separately. The first of the two arguments begins with
DS
and defines a data set named loadavg
. Note that the
options for this data set are separated by colons. The GAUGE
keyword
says that when the data is read, it will be written to the database as is
(unaltered). The 120
is the timeout in seconds to wait for data to be
written to the database. If the data isn't written to the database within
that window, zeros will be written to the database to indicate an error
in the data feed. The 0
and 10000
are the minimum and maximum values
that can be written to the database. The argument beginning with
RRA
defines the round-robin archive value. This defines how many values can
be stored in the database and how long they'll be stored. The
MAX
indicates that the variable contains one value and shouldn't be modified
in any way. The 0.5
indicates the initial resolution value. This is a
standard value and shouldn't be changed. The 1
identifies how many
steps should be averaged when storing a final value. In this case, there is
one step value per value stored in the database. The final argument,
1440
, is how many steps will be stored in the database. Since
the step
length is 60 seconds, this configuration will provide 25 hours of data
to be stored in the database.
Now that the data is initialized, you can capture and store it in the database. To maintain accurate periodic data collection, it's best to create a crontab entry and have the data be collected at a desired interval. For this example, you would have the cron job run every minute. To collect data and put it in the database, use the following command:
rrdtool update loadavg_db.rrd --template loadavg N:$(cat
↪/proc/loadavg | sed 's/^\([0-9\.]\+\) .*$/\1/g')
To perform the data collection, the update
argument along with the
database name was used. The --template
argument allows you to specify the
variable name to populate with data. This is the same loadavg
variable
that was defined when the database was initialized. The N
argument
defines the data to be put into the loadavg
variable. In this
case, the
result of the command substitution will be put into the database, which
will be the five-minute load average. This command could be placed in the
crontab for minute-by-minute execution. The crotab entry would look
like this:
* * * * * /path/to/rrdtool-script.sh
Since all of the time fields contain asterisks, the specified script will run every minute. Once the database has been populated, you can render a graph with the following command:
rrdtool graph loadavg_graph-$(date +"%m-%d-%Y").png \
-w 785 -h 120 -a PNG \
--slope-mode \
--start -86400 --end now \
--font DEFAULT:7: \
--title "5-minute load average" \
--watermark "`date`" \
--vertical-label "load average" \
--lower-limit 0 \
--right-axis 1:0 \
--x-grid MINUTE:10:HOUR:1:MINUTE:120:0:%R \
--alt-y-grid --rigid \
DEF:loadaverage=loadavg_db.rrd:loadavg:MAX \
LINE1:loadaverage#0000FF:"load" \
GPRINT:loadaverage:LAST:"Cur\: %5.2lf" \
GPRINT:loadaverage:AVERAGE:"Avg\: %5.2lf" \
GPRINT:loadaverage:MAX:"Max\: %5.2lf" \
GPRINT:loadaverage:MIN:"Min\: %5.2lf\t\t\t"
The first line calls the RRDtool graph function along with the filename
of the image to create. In this instance, the image filename will contain
the current date. All of the arguments beginning with --
set
up the look and feel of the graph, including labels, axis configuration,
image format and the time frame from which to pull the data. For detailed
information on these arguments, see the RRDtool documentation.
The line beginning with DEF:loadaverage
defines a
graph variable named loadaverage
, which will have the values from the
loadavg
variable you created in the database. The line beginning with
LINE
specifies the color of the graph line and the label to use in
the legend. The GPRINT
lines indicate various statistic details to be
printed at the bottom of the graph. In this case, the last recorded value
and the average, minimum and maximum values during the time frame will
be displayed. Note that the %5.2lf
specifies the value to be printed as
a floating-point number with up to five digits to the left of the decimal
point and two digits to the right.
For ease of capturing daily graphs, you also could add this command to the crontab to run daily with the following entry:
0 0 * * * /path/to/rrdtool-graph.sh
This will run the graph script every day at midnight. The images may now be placed in a folder that is accessible via a browser for easy viewing.
Although many monitoring solutions exist that provide robust graphical UIs, these solutions provide basic monitoring and trending functionality while using a minimum of system resources and providing a basic framework for disseminating the data collected.
Resources