Monitoring 6WIND vRouter with InfluxDB and Nagios
6WIND vRouter provides monitoring functionalities, including SNMP and KPIs (Key Performance Indicators) exported to a time-series database.
In this document, we will explore here how to integrate both the InfluxDB time-series database and SNMP in Nagios Core. We will configure a simple Nagios Core server which will survey CPU usage and processes of a vRouter.
Environment
For our demonstration we will use a vRouter and a Debian 10 server with a standard Nagios Core and InfluxDB installation for monitoring.
Nagios Core and InfluxDB will be installed on the same server for monitoring. Please see both Debian, InfluxDB and Nagios websites for documentation about their installation.
The vRouter will have internal IP 10.0.2.1 and is named MyvRouter. It’s important to maintain consistency between our vRouter hostname and its name in Nagios Core’s configuration.
The monitoring server will have IP 10.0.2.2.
The InfluxDB database will be named telegraf.
About Nagios Core
Nagios Core is based on the use of plugins to know how and what to monitor. Nagios Core is distributed with some fundamental plugins but it’s still needed to build upon them to monitor a vRouter. The Nagios community proposes many plugins and Nagios Core itself is well documented.
In Nagios, a monitored host is a collection of services using commands to check status and values reported by network hosts. Values are then compared to predefined warning and critical intervals before sending notifications. Hosts and commands are defined in definition files.
We will add a host, services and commands definitions to monitor our vRouter host with our Nagios Core server.
The complete explanation of Nagios Core is beyond this article, please refer to https://support.nagios.com/kb/article/nagios-core-installing-nagios-core-from-source-96.html#Debian.
On Debian 10, as root, a functional Nagios installation is available with apt :
apt install nagios4
It will install the Apache web server and all needed dependencies.
Verify Apache has all default modules and these two are enabled:
a2enmod auth_digest a2enmod authz_groupfile
For convenience, we add a folder for host definitions:
mkdir /etc/nagios4/hosts
And add the following line to /etc/nagios4/nagios.cfg :
cfg_dir=/etc/nagios4/hosts
And then restart Apache and Nagios:
systemctl restart apache2 systemctl restart nagios4
After that, Nagios will be available at: http://10.0.2.2/nagios4.
We will add configurations in /etc/nagios4 and plugins in /usr/lib/nagios/plugins.
InfluxDB Monitoring
vRouter Configuration
The KPI monitoring feature of the InfluxDB and Telegraf integration in vRouter provides the ability to monitor and export KPIs to an InfluxDB time-series database. We configure the Telegraf agent to export KPIs to our InfluxDB database. You will find detailed instructions in the 6WIND vRouter User Guide.
Here is the most basic setup:
vrouter running config# vrf main kpi vrouter running kpi/# telegraf vrouter running telegraf/# influxdb-output url http://10.0.2.2:8086 database telegraf vrouter running telegraf/# commit
Installation
We need to install InfluxDB on our monitoring server, 10.0.2.2. It will receive KPIs from the vRouter.
On Linux Debian 10, a basic installation of InfluxDB is available with the command:
apt install influxdb
More details are available on the official site: https://docs.influxdata.com/influxdb/v1.7/introduction/installation .
Checking the CPU Usage with InfluxDB
Now we can add an InfluxDB support and command definition to our Nagios Core installation.
InfluxDB is not an integrated feature of Nagios Core. We will use the Nagios Core plugin architecture to add it. influx-nagios-plugin is an efficient and simple enough plugin written in Python.
It’s available at https://github.com/shaharke/influx-nagios-plugin or by pip.
Installation
On Debian Linux 10, as user root:
apt install python-pip pip install influx-nagios-plugin
it will install the check_influx command in /usr/local/bin.
Check the installation path with:
pip show --files influx-nagios-plugin
And finally, copy the check_influx executable to Nagios’ plugin folder. In a default installation of Nagios Core, the plugin folder is /usr/lib/nagios/plugins.
How it Works
The plugin consists in the command check_influx. It allows you to check the result of a InfluxDB query in a range.
check_influx sends a request to an InfluxDB server and checks if the answer is between the warning threshold and critical threshold before returning the state to Nagios.
Notice: check_influx takes the IP of the host running influxdb services, not the host monitored by Nagios.
Example of checking the last value of busy in system-cpu-usage series for a host named MyvRouter in an InfluxDB server of IP 10.0.2.2. A warning notification will be sent if the value is more than 50, a critical alert if more than 80.
check_influx -h 10.0.2.2 -u root -p "influxPWD" -d telegraf \ -q 'SELECT last(busy) FROM "system-cpu-usage" \ WHERE host =~ /^MyvRouter/ AND cpu=~ /^cpu0/' \ -w 50 -c 80
Nagios Command Definition
Now that we have the check_influx plugin for Nagios Core, we add definitions of commands and services leveraging it to monitor the vRouter.
In the command definition file of your Nagios Core installation, by default /etc/nagios4/objects/commands.cfg, we add the following definition:
# 'check_influx' command definition define command { command_name check_influx command_line $USER1$/check_influx -h $ARG1$ -u $ARG2$ -p $ARG3$ -d $ARG4$ -q $ARG5$ -m $ARG6$ -w $ARG7$ -c $ARG8$ }
Nagios Service Definition
And then, to add the service, Nagios will check for a vRouter host. We will define a new host definition to monitor the CPU load for that host. Please see the Nagios Core documentation about host definition.
Notice: InfluxDB host IP, login, password, data series name and a request are passed as an argument to the Nagios command.
We create a host definition file: /etc/nagios4/objects/myvrouter.cfg and add a service “Check Cpu Load”:
define host { Host_name myvRouter Address 10.0.2.1 } define service { Use service Host_name MyvRouter Service_description Check Cpu Load Check_command check_influx!10.0.2.2!root!influxPWD!telegraf!'SELECT last(busy) FROM "system-cpu-usage" WHERE host =~ /^myvRouter/ AND cpu=~ /^cpu0/'!""!50!80 }
The request
SELECT last(busy) FROM "system-cpu-usage" WHERE host =~ /^myvRouter/ AND cpu=~ /^cpu0/
retrieves the load in percent of the CPU 0 of the vRouter host. We may need to adjust that request for your setup for a specific CPU or averaging all CPUs.
With the parameters 50 and 80 to our check_influx plugin, Nagios Core will make a warning alert at 50% of CPU usage, and a critical alert at 80%.
Please, see InfluxDB documentation about request syntax.
After a restart of Nagios, Nagios will periodically monitor InfluxDB data.
SNMP Monitoring
vRouter Configuration
First, SNMP service and connection to SNMP port from our monitoring server need to be enabled. You will find detailed instructions in the 6WIND vRouter User Guide.
Snips of configuration:
/ vrf main firewall ipv4 filter input rule 10 description "snmp" protocol udp destination 161 source address 10.0.2.2/32 action accept / vrf main snmp enabled true static-info contact itsystem@6wind.com .. community public authorization read-only source 10.0.2.0/24 .. traps destination 10.0.2.2 port 162 protocol udp notification-type TRAP2 community public link-status-check frequency 60s enabled true process-check frequency 2s enabled true commit
Passive Checking of SNMP Traps
In this section, we will add support to Nagios for receiving and monitoring SNMP traps sent by our vRouter.
First, we need to install two tools on our monitoring server: Snmptrapd and Snmptt.
Snmptrapd will receive SNMP traps and will send them to Snmptt which filters and transfers them to Nagios.
For Example:
EVENT mteTriggerFired .1.3.6.1.2.1.88.2.0.1 "Status Events" Normal FORMAT Notification that the trigger indicated by the object $* EXEC /usr/share/nagios4/plugins/eventhandlers/submit_check_result $r "snmp_traps" 1 "Notification that the trigger indicated by the object $*"
That rule will take effect when receiving a trap from a vRouter, like an error in a process or a loss of connection. It will execute the submit_check_result script which will pass the event to Nagios.
And then we add Nagios definitions for adding the Snmp Trap service to our vRouter host.
Installation
# apt install snmptt snmptrapd snmp-mibs-downloader
Snmptt needs MIBS definitions to function. Edit /etc/snmp/snmp.conf and comment out with # the following line:
mibs :
Change configurations for Snmptrapd and Snmptt (see Configuration Files section, lower).
Nagios Service Definition
We add a new template service for Snmp traps. It will be used for our host vRouter.
In /etc/nagios4/objects/templates.cfg, add the following definition:
define service { name trap-service use generic-host register 0 service_description snmp_traps is_volatile 1 check_command check-host-alive flap_detection_enabled 0 process_perf_data 0 max_check_attempts 1 normal_check_interval 1 retry_check_interval 1 passive_checks_enabled 1 check_period 24x7 notification_interval 31536000 active_checks_enabled 0 notification_options w,u,c }
And then, we add a proper service using that template in our vRouter host, /etc/nagios4/objects/myvrouter.cfg:
define service { use trap-service host_name myvRouter }
After a restart:
systemctl restart snmptrapd systemctl restart snmptt systemctl restart nagios4
Then, Nagios will monitor SNMP trap notifications.
Configuration Files
/etc/snmp/snmptrapd.conf:
traphandle default /usr/sbin/snmptthandler disableAuthorization yes
/etc/snmp/snmptt.ini:
[General] snmptt_system_name = mode = daemon multiple_event = 1 dns_enable = 0 strip_domain = 0 strip_domain_list = <
/etc/snmp/snmptt.conf:
EVENT CatchAll .1.* "snmp_traps" Critical FORMAT $D EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result "$r" "snmp_traps" 2 "$O: $1 $2 $3 $4 $5" # EVENT netSnmpExampleHeartbeatRate .1.3.6.1.4.1.8072.2.3.0.1 "netSnmpExampleHeartbeatRate" Normal FORMAT SNMP netSnmpExampleHeartbeatRate EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result "$r" "snmp_traps" "$s" "$@" "" "netSnmpExampleHeartbeatRate" # EVENT nsNotifyStart .1.3.6.1.4.1.8072.4.0.1 "Status Events" Normal FORMAT An indication that the agent has started running. $* EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r "snmp_traps" 1 "An indication that the agent has started running. $*" SDESC An indication that the agent has started running. Variables: EDESC # # # EVENT nsNotifyShutdown .1.3.6.1.4.1.8072.4.0.2 "Status Events" Normal FORMAT An indication that the agent is in the process of being shut down. $* EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r "snmp_traps" 1 "An indication that the agent is in the process of being shut down. $*" SDESC An indication that the agent is in the process of being shut down. Variables: EDESC # # # EVENT nsNotifyRestart .1.3.6.1.4.1.8072.4.0.3 "Status Events" Normal FORMAT An indication that the agent has been restarted. $* EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r "snmp_traps" 1 "An indication that the agent has been restarted. $*" SDESC An indication that the agent has been restarted. This does not imply anything about whether the configuration has changed or not (unlike the standard coldStart or warmStart traps) Variables: EDESC # # # # EVENT pmNewRoleNotification .1.3.6.1.2.1.124.0.1 "Status Events" Normal FORMAT The pmNewRoleNotification is sent when an agent is configured $* EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r "snmp_traps" 1 "The pmNewRoleNotification is sent when an agent is configured $*" SDESC The pmNewRoleNotification is sent when an agent is configured with its first instance of a previously unused role string (not every time a new element is given a particular role). An instance of the pmRoleStatus object is sent containing the new roleString in its index. In the event that two or more elements are given the same role simultaneously, it is an implementation-dependent matter as to which pmRoleTable instance will be included in the notification. Variables: 1: pmRoleStatus EDESC # # # EVENT pmNewCapabilityNotification .1.3.6.1.2.1.124.0.2 "Status Events" Normal FORMAT The pmNewCapabilityNotification is sent when an agent $* EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r "snmp_traps" 1 "The pmNewCapabilityNotification is sent when an agent $*" SDESC The pmNewCapabilityNotification is sent when an agent gains a new capability that did not previously exist in any element on the system (not every time an element gains a particular capability). An instance of the pmCapabilitiesType object is sent containing the identity of the new capability. In the event that two or more elements gain the same capability simultaneously, it is an implementation-dependent matter as to which pmCapabilitiesType instance will be included in the notification. Variables: 1: pmCapabilitiesType EDESC # # # EVENT pmAbnormalTermNotification .1.3.6.1.2.1.124.0.3 "Status Events" Normal FORMAT The pmAbnormalTermNotification is sent when a policy's $* EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r "snmp_traps" 1 "The pmAbnormalTermNotification is sent when a policy's $*" SDESC The pmAbnormalTermNotification is sent when a policy's pmPolicyAbnormalTerminations gauge value changes from zero to any value greater than zero and no such notification has been sent for that policy in the last 5 minutes. The notification contains an instance of the pmTrackingPEInfo object where the pmPolicyIndex component of the index identifies the associated policy and the rest of the index identifies an element on which the policy failed. Variables: 1: pmTrackingPEInfo EDESC # # # EVENT mteTriggerFired .1.3.6.1.2.1.88.2.0.1 "Status Events" Normal FORMAT Notification that the trigger indicated by the object $* EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r "snmp_traps" 1 "Notification that the trigger indicated by the object $*" SDESC Notification that the trigger indicated by the object instances has fired, for triggers with mteTriggerType 'boolean' or 'existence'. Variables: 1: mteHotTrigger 2: mteHotTargetName 3: mteHotContextName 4: mteHotOID 5: mteHotValue EDESC # # # EVENT mteTriggerRising .1.3.6.1.2.1.88.2.0.2 "Status Events" Normal FORMAT Notification that the rising threshold was met for triggers $* EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r "snmp_traps" 1 "Notification that the rising threshold was met for triggers $*" SDESC Notification that the rising threshold was met for triggers with mteTriggerType 'threshold'. Variables: 1: mteHotTrigger 2: mteHotTargetName 3: mteHotContextName 4: mteHotOID 5: mteHotValue EDESC # # # EVENT mteTriggerFalling .1.3.6.1.2.1.88.2.0.3 "Status Events" Normal FORMAT Notification that the falling threshold was met for triggers $* EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r "snmp_traps" 1 "Notification that the falling threshold was met for triggers $*" SDESC Notification that the falling threshold was met for triggers with mteTriggerType 'threshold'. Variables: 1: mteHotTrigger 2: mteHotTargetName 3: mteHotContextName 4: mteHotOID 5: mteHotValue EDESC # # # EVENT mteTriggerFailure .1.3.6.1.2.1.88.2.0.4 "Status Events" Normal FORMAT Notification that an attempt to check a trigger has failed. $* EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r "snmp_traps" 1 "Notification that an attempt to check a trigger has failed. $*" SDESC Notification that an attempt to check a trigger has failed. The network manager must enable this notification only with a certain fear and trembling, as it can easily crowd out more important information. It should be used only to help diagnose a problem that has appeared in the error counters and can not be found otherwise. Variables: 1: mteHotTrigger 2: mteHotTargetName 3: mteHotContextName 4: mteHotOID 5: mteFailedReason EDESC # # # EVENT mteEventSetFailure .1.3.6.1.2.1.88.2.0.5 "Status Events" Normal FORMAT Notification that an attempt to do a set in response to an $* EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r "snmp_traps" 1 "Notification that an attempt to do a set in response to an $*" SDESC Notification that an attempt to do a set in response to an event has failed. The network manager must enable this notification only with a certain fear and trembling, as it can easily crowd out more important information. It should be used only to help diagnose a problem that has appeared in the error counters and can not be found otherwise. Variables: 1: mteHotTrigger 2: mteHotTargetName 3: mteHotContextName 4: mteHotOID 5: mteFailedReason EDESC # # # EVENT mteTriggerFired .1.3.6.1.2.1.88.2.0.1 "Status Events" Normal FORMAT Notification that the trigger indicated by the object $* EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r "snmp_traps" 1 "Notification that the trigger indicated by the object $*" SDESC Notification that the trigger indicated by the object instances has fired, for triggers with mteTriggerType 'boolean' or 'existence'. Variables: 1: mteHotTrigger 2: mteHotTargetName 3: mteHotContextName 4: mteHotOID 5: mteHotValue EDESC # # # EVENT mteTriggerRising .1.3.6.1.2.1.88.2.0.2 "Status Events" Normal FORMAT Notification that the rising threshold was met for triggers $* EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r "snmp_traps" 1 "Notification that the rising threshold was met for triggers $*" SDESC Notification that the rising threshold was met for triggers with mteTriggerType 'threshold'. Variables: 1: mteHotTrigger 2: mteHotTargetName 3: mteHotContextName 4: mteHotOID 5: mteHotValue EDESC # # # EVENT mteTriggerFalling .1.3.6.1.2.1.88.2.0.3 "Status Events" Normal FORMAT Notification that the falling threshold was met for triggers $* EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r "snmp_traps" 1 "Notification that the falling threshold was met for triggers $*" SDESC Notification that the falling threshold was met for triggers with mteTriggerType 'threshold'. Variables: 1: mteHotTrigger 2: mteHotTargetName 3: mteHotContextName 4: mteHotOID 5: mteHotValue EDESC # # # EVENT mteTriggerFailure .1.3.6.1.2.1.88.2.0.4 "Status Events" Normal FORMAT Notification that an attempt to check a trigger has failed. $* EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r "snmp_traps" 1 "Notification that an attempt to check a trigger has failed. $*" SDESC Notification that an attempt to check a trigger has failed. The network manager must enable this notification only with a certain fear and trembling, as it can easily crowd out more important information. It should be used only to help diagnose a problem that has appeared in the error counters and can not be found otherwise. Variables: 1: mteHotTrigger 2: mteHotTargetName 3: mteHotContextName 4: mteHotOID 5: mteFailedReason EDESC # # # EVENT mteEventSetFailure .1.3.6.1.2.1.88.2.0.5 "Status Events" Normal FORMAT Notification that an attempt to do a set in response to an $* EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r "snmp_traps" 1 "Notification that an attempt to do a set in response to an $*" SDESC Notification that an attempt to do a set in response to an event has failed. The network manager must enable this notification only with a certain fear and trembling, as it can easily crowd out more important information. It should be used only to help diagnose a problem that has appeared in the error counters and can not be found otherwise. Variables: 1: mteHotTrigger 2: mteHotTargetName 3: mteHotContextName 4: mteHotOID 5: mteFailedReason EDESC
Active Checking
One way to check the fast path by SNMP is to retrieve all processes running on the host by SNMP and then to check the presence of the process for the fast path.
For that task, we’ll use the HOST-RESOURCES-MIB::hrSWRunName OID and the check_snmp_process Nagios plugin.
Installation of check_snmp_process Plugin
Like previously for InfluxDB, we also need to add a plugin and command definition to Nagios Core to check processes by snmp. We will use the check_snmp_process plugin.
First we install its dependencies with:
sudo apt install libsnmp-dev
And then download check_snmp_process from https://sourceforge.net/projects/nagios-snmp/ and compile.
./configure make make install
The check_snmp_process executable has to be also copied in the default Nagios’ plugin folder: /usr/lib/nagios/plugins/.
Usage example, to launch a critical alert if there is less than one process of “fp-rte” or uses more than 200 Mb of memory:
./check_snmp_process -H 10.0.2.1 -C public -n fp-rte -w 0,1 -c 0,2 -m 100,200
The main parameters are:
- -n, –name=NAME Name of the process (regexp)
- -w, –warn=MIN[,MAX] Number of processes that will cause a warning
- -c, –critical=MIN[,MAX] Number of processes that will cause an error
(-1 for no critical, MAX must be >0. Ex : -c -1,50) - -m, –memory=WARN,CRIT Checks memory usage (default max of all process) values are warning and critical values in MB
Nagios Command Definition
As with the InfluxDB plugin, we add a new command definition in /etc/nagios4/objects/commands.cfg:
# 'snmp/check_snmp_process' command definition define command { command_name check_snmp_process command_line $USER1$/check_snmp_process -H $HOSTADDRESS$ -C $ARG1$ -n $ARG2$ -w $ARG3$ -c $ARG4$ -m $ARG5$ }
Nagios Service Definition
In the host definition of our vRouter, we add one service with arguments to check the presence of the fast path process (called fp-rte) and normal memory consumption (2GB before warning alert and less than 3GB before critical alert).
In /etc/nagios4/objects/myvrouter.cfg:
define service { Use service Host_name MyvRouter Service_description Check Process Fast Path Check_command check_snmp_process!public!fp-rte!0,1!0,2!2000,3000 }
Checking the Monitoring
After adding new definitions and plugins, Nagios needs to be restarted, with the command:
systemctl restart nagios
Connect on the Nagios Core server (http://10.0.2.2/nagios4) and navigate to the MyvRouter host.
It will show the services monitored and their states.
Each monitored service is detailed in their respective page.
Here is CPU load, checked from InfluxDB, below the warning limit :
Also for active checking of fast path:
And the service monitoring SNMP traps. Here, a trap was received, notifying a missing NTP server.
Conclusion
Customers using Nagios with 6WIND vRouter can add many monitoring services to their routing infrastructure. With the same basic principles, additional InfluxDB data can also be monitored. We welcome you to check the KPIs section in our vRouter documentation to see additional available KPIs.
Contact us today for a free 6WIND vRouter evaluation, with zero obligation. We look forward to hearing from you.
Michel Galle manages 6WIND IT.