Simple Network Management Protocol - Not As Simple As You Would Suggest
The Simple Network Management Protocol (SNMP) has been an integral part of monitoring network environments since its introduction in 1988. It has established itself as the de facto standard in network monitoring. Many manufacturers support the protocol and have implemented an SNMP agent on their network devices. These agents allow monitoring solutions to query various data, such as bandwidth, CPU load, network interfaces, etc., without installing an additional agent on network equipment.
Especially with the increasing number of devices on a network, a simple and established method such as SNMP sounds like a great help to include components in monitoring quickly. Unfortunately, SNMP has a few flaws. The first part of this article will explain how SNMP works, while the second part will drill deeper into the issues with SNMP and how to deal with them.
The protocol offers two methods to retrieve data from devices: polling and traps. With SNMP polling, a monitoring solution queries the data at user-specified time intervals from the SNMP agent. This active polling is used for status-based monitoring and is generally the recommended method. However, the disadvantage of SNMP polling is that the administrator does not notice if an event occurs between two queries, such as a brief change in the network interface status.
The alternative to SNMP polling is an event-based variant called SNMP traps. If a certain event occurs on the monitored device, it sends an error message to the monitoring instance. One of the disadvantages of SNMP traps is that the data packets transmitted via UDP can be lost. Since UDP does not acknowledge receipt of network packets, the administrator does not even know that an alert was sent if the packets containing the trap data are dropped. Thus, ironically, a problem on the network prevents the detection of another issue with a network device.
Another disadvantage of SNMP traps can be the flood of triggered messages. For example, suppose a core switch is no longer available. In that case, in large network environments, it can lead to thousands of switches sending traps. Even if it does not have an upstream filter mechanism, the trap receiver can collapse under such a load of error messages. Monitoring is then unavailable in an emergency. In addition, the administrator must re-reconfigure all components in the network if the IP address of the trap receiver changes.
Three protocol variants
In the more than 30 years of its existence, three variants of the standard have emerged. SNMPv1 is only used on ancient devices or on those that support SNMPv2c only poorly or not at all. Compared to v1, SNMPv2c contains bulk queries, for example, allows multiple values to be retrieved simultaneously and 64-bit counters, which are indispensable for monitoring switch ports with over 1 GBit/s of bandwidth. The "c" in the version name stands for community, which takes on the role of a password in SNMP. v2c is the most commonly used protocol variant of SNMP.
Since the network communication in v1 and v2c, including community, is in plain text, SNMPv3 introduced robust authentication in addition to encryption. v3 includes three levels of security with NoAuthNoPriv (no authentication, no privacy), AuthNoPriv (authentication, no privacy) and AuthPriv (authentication, privacy). However, encryption under v3 requires significantly more processing power on the monitored devices, while authentication entails additional configuration effort for the administrator. Depending on the scenario, it may make more sense to use SNMPv2c for monitoring or outsource the monitoring traffic to a separate VLAN.
Bad implementation on devices
Monitoring with SNMP relies on an implementation of the protocol by the manufacturers of the monitored devices. Unfortunately, this implementation is often poor, sometimes even contradicts the protocol spec and often contains programming and security errors. Thus, the data queried via SNMP may be inconsistent or faulty, with bulk queries crashing or taking so long that they timeout.
Time and again, you find cases where manufacturers give incorrect unit information and don't document it. For example, they use degrees Fahrenheit instead of degrees Celsius or do not mention that the transmitted value corresponds to 1/10 °C. A further example can be found with sensors when an output value of 0 °C can be an accurate temperature specification for an ambient sensor. However, it may also mean that the temperature sensor is defective or not accessible. If this information is missing from the documentation, it leads to misinterpretations and false alarms. The data must, therefore, always be seen in the correct context.
In addition to the security and implementation difficulties already mentioned, SNMP quickly reaches its performance limits. If an administrator wants to set up near real-time monitoring, they must set much higher SNMP polling intervals; as a rule, these are about a poll per minute. Shorter time windows between the queries lead to a significant increase in the load on the components in the network.
In some cases, it may also make sense not to poll a particular value in an emergency, at least if polling on a device makes monitoring impossible due to poor implementation or some other reason. Again, however, this contradicts the approach that you should monitor all objects in your network environment.
When you are monitoring with SNMP, it makes sense to use monitoring software that can deal with such shortcomings. The solution should initiate meaningful data queries for the detected device on its own and automatically correct faulty values. Of course, the development of such software requires corresponding know-how, both of SNMP and the monitored device. However, this saves the administrator from manually configuring their network components for monitoring and prevents them from receiving inaccurate data. This minimizes the administration effort even for large network environments, and the IT manager can be sure that their decisions are based on correct monitoring data.
No worthy successor expected soon
SNMP is, for the IT industry, venerable. However, given the weaknesses we have described, it is surprising there is no serious successor for SNMP that meets the requirements of modern infrastructures, especially with an increasing prevalence of software-defined networking and machine learning. Redfish, introduced in 2015, demonstrates the fact that device monitoring can be done differently. The Redfish protocol takes a unified approach to enabling remote access to server platforms via a REST API. The goal of Redfish is to replace the IPMI interface (Intelligent Platform Management Interface), which is now considered a security vulnerability.
Some manufacturers work with proprietary interfaces in the network monitoring area, but their integration into monitoring platforms requires a certain amount of development work. This heterogeneity also increases the effort in the network infrastructure if the monitoring software used does not already support it.
Streaming telemetry could be a promising candidate for the future. In this case, the network components such as routers, switches, etc., continuously stream data to the monitoring instance. Thus, the administrator can determine which information they need, at which frequency and from which device or application. The advantage of this approach is that real-time monitoring is possible, and the data is already prepared for AI and ML-based analysis. Streaming telemetry could therefore help drive automation, troubleshooting and traffic optimization in large network environments.
Some manufacturers such as Arista, Cisco, and Juniper are currently working on their own streaming telemetry projects. However, standardization is still a long way off. So, it remains to be seen whether streaming telemetry will gain relevance and whether standardization will then occur.
Indispensable despite its advanced age
Despite the age of the protocol and its shortcomings, SNMP will still be in use for some time. Even if an alternative is established, it will not replace SNMP overnight but will continue to be used in parallel for years to come. However, it should now be clear that monitoring SNMP with a one-way-fits-all approach does not make sense for the reasons outlined above. A monitoring solution that only performs standard queries to read out the SNMP data structure can quickly turn out to be a gamble - due to possibly incorrectly transmitted values.