Mellanox SNMP
Macros used
| Name | Value |
|---|---|
| {$CPU.UTIL.CRIT} | 90 |
| {$FAN_CRIT_STATUS} | 3 |
| {$ICMP.LOSS.WARN} | 20 |
| {$ICMP.RESPONSE_TIME.WARN} | 0.15 |
| {$ICMP_LOSS_WARN} | 20 |
| {$ICMP_RESPONSE_TIME_WARN} | 0.15 |
| {$IF.ERRORS.WARN} | 2 |
| {$IF.UTIL.MAX} | 90 |
| {$IFCONTROL} | 1 |
| {$MEMORY.NAME.MATCHES} | .* |
| {$MEMORY.NAME.NOT_MATCHES} | CHANGE_IF_NEEDED |
| {$MEMORY.TYPE.MATCHES} | .*(.2|hrStorageRam)$ |
| {$MEMORY.TYPE.NOT_MATCHES} | CHANGE_IF_NEEDED |
| {$MEMORY.UTIL.MAX} | 90 |
| {$NET.IF.IFADMINSTATUS.MATCHES} | ^.* |
| {$NET.IF.IFADMINSTATUS.NOT_MATCHES} | ^2$ |
| {$NET.IF.IFALIAS.MATCHES} | .* |
| {$NET.IF.IFALIAS.NOT_MATCHES} | CHANGE_IF_NEEDED |
| {$NET.IF.IFDESCR.MATCHES} | .* |
| {$NET.IF.IFDESCR.NOT_MATCHES} | CHANGE_IF_NEEDED |
| {$NET.IF.IFNAME.MATCHES} | ^.*$ |
| {$NET.IF.IFNAME.NOT_MATCHES} | (^Software Loopback Interface|^NULL[0-9.]$|^[Ll]o[0-9.]$|^[Ss]ystem$|^Nu[0-9.]*$|^veth[0-9a-z]+$|docker[0-9]+|br-[a-z0-9]{12}) |
| {$NET.IF.IFOPERSTATUS.MATCHES} | ^.*$ |
| {$NET.IF.IFOPERSTATUS.NOT_MATCHES} | ^6$ |
| {$NET.IF.IFTYPE.MATCHES} | .* |
| {$NET.IF.IFTYPE.NOT_MATCHES} | CHANGE_IF_NEEDED |
| {$PSU.STATUS.CRIT} | 2 |
| {$SNMP.TIMEOUT} | 5m |
| {$TEMP.MAX.CRIT} | 60 |
| {$TEMP.MAX.WARN} | 50 |
| {$TEMP.MIN.CRIT} | 5 |
| {$TEMP.STATUS.WARN} | 3 |
| {$VFS.FS.FREE.MIN.CRIT} | 5G |
| {$VFS.FS.FREE.MIN.WARN} | 10G |
| {$VFS.FS.FSNAME.MATCHES} | .+ |
| {$VFS.FS.FSNAME.NOT_MATCHES} | ^(/dev|/sys|/$|/run|/proc|.+/shm$) |
| {$VFS.FS.FSTYPE.MATCHES} | .*(.4|.9|hrStorageFixedDisk|hrStorageFlashMemory)$ |
| {$VFS.FS.FSTYPE.NOT_MATCHES} | CHANGE_IF_NEEDED |
| {$VFS.FS.PUSED.MAX.CRIT} | 90 |
| {$VFS.FS.PUSED.MAX.WARN} | 80 |
Items collected
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| ICMP ping | - | SIMPLE | - | icmpping |
| ICMP loss | - | SIMPLE | - | icmppingloss |
| ICMP response time | - | SIMPLE | - | icmppingsec |
| SNMP traps (fallback) | The item is used to collect all SNMP traps unmatched by other snmptrap items | SNMP_TRAP | - | snmptrap.fallback |
| System contact details | MIB: SNMPv2-MIB The textual identification of the contact person for this managed node, together with information on how to contact this person. If no contact information is known, the value is the zero-length string. | SNMP_AGENT | 15m | system.contact[sysContact.0] |
| CPU utilization | MIB: HOST-RESOURCES-MIB The average, over the last minute, of the percentage of time that processors was not idle. Implementations may approximate this one minute smoothing period if necessary. | SNMP_AGENT | - | system.cpu.util |
| System description | MIB: SNMPv2-MIB A textual description of the entity. This value should include the full name and version identification of the system's hardware type, software operating-system, and networking software. | SNMP_AGENT | 15m | system.descr[sysDescr.0] |
| System location | MIB: SNMPv2-MIB The physical location of this node (e.g., `telephone closet, 3rd floor'). If the location is unknown, the value is the zero-length string. | SNMP_AGENT | 15m | system.location[sysLocation.0] |
| System name | MIB: SNMPv2-MIB An administratively-assigned name for this managed node.By convention, this is the node's fully-qualified domain name. If the name is unknown, the value is the zero-length string. | SNMP_AGENT | 15m | system.name |
| System object ID | MIB: SNMPv2-MIB The vendor's authoritative identification of the network management subsystem contained in the entity. This value is allocated within the SMI enterprises subtree (1.3.6.1.4.1) and provides an easy and unambiguous means for determiningwhat kind of box' is being managed. For example, if vendorFlintstones, Inc.' was assigned the subtree1.3.6.1.4.1.4242, it could assign the identifier 1.3.6.1.4.1.4242.1.1 to its `Fred Router'. | SNMP_AGENT | 15m | system.objectid[sysObjectID.0] |
| Uptime | MIB: SNMPv2-MIB The time (in hundredths of a second) since the network management portion of the system was last re-initialized. | SNMP_AGENT | 30s | system.uptime[sysUpTime.0] |
| SNMP agent availability | Availability of SNMP checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - not available 1 - available 2 - unknown | INTERNAL | - | zabbix[host,snmp,available] |
Triggers
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| Unavailable by ICMP ping | Last three attempts returned timeout. Please check device connectivity. | max(/Mellanox SNMP/icmpping,#3)=0 | HIGH ⛔ | ICMP ping |
| High ICMP ping loss | - | min(/Mellanox SNMP/icmppingloss,5m)>{$ICMP_LOSS_WARN} and min(/Mellanox SNMP/icmppingloss,5m)<100 | WARNING 📢 | ICMP loss |
| High ICMP ping response time | - | avg(/Mellanox SNMP/icmppingsec,5m)>{$ICMP_RESPONSE_TIME_WARN} | WARNING 📢 | ICMP response time |
| High CPU utilization | CPU utilization is too high. The system might be slow to respond. | min(/Mellanox SNMP/system.cpu.util,5m)>{$CPU.UTIL.CRIT} | WARNING 📢 | CPU utilization |
| System name has changed | System name has changed. Ack to close. | last(/Mellanox SNMP/system.name,#1)<>last(/Mellanox SNMP/system.name,#2) and length(last(/Mellanox SNMP/system.name))>0 | INFO 🔔 | System name |
| has been restarted | Uptime is less than 10 minutes. | last(/Mellanox SNMP/system.uptime[sysUpTime.0])<10m | WARNING 📢 | Uptime |
| No SNMP data collection | SNMP is not available for polling. Please check device connectivity and SNMP settings. | max(/Mellanox SNMP/zabbix[host,snmp,available],{$SNMP.TIMEOUT})=0 | WARNING 📢 | SNMP agent availability |
Discovery rule №1
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Entity Discovery | - | SNMP_AGENT | 1h | entity.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| {#ENT_NAME}: Hardware model name | MIB: ENTITY-MIB | SNMP_AGENT | 1h | system.hw.model[entPhysicalModelName.{#SNMPINDEX}] |
| {#ENT_NAME}: Hardware serial number | MIB: ENTITY-MIB | SNMP_AGENT | 1h | system.hw.serialnumber[entPhysicalSerialNum.{#SNMPINDEX}] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| {#ENT_NAME}: Device has been replaced | Device serial number has changed. Ack to close | last(/Mellanox SNMP/system.hw.serialnumber[entPhysicalSerialNum.{#SNMPINDEX}],#1)<>last(/Mellanox SNMP/system.hw.serialnumber[entPhysicalSerialNum.{#SNMPINDEX}],#2) and length(last(/Mellanox SNMP/system.hw.serialnumber[entPhysicalSerialNum.{#SNMPINDEX}]))>0 | INFO 🔔 | {#ENT_NAME}: Hardware serial number |
Discovery rule №2
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Fan Discovery | ENTITY-SENSORS-MIB::EntitySensorDataType discovery with rpm filter | SNMP_AGENT | 1h | fan.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| {#SENSOR_INFO}: Fan speed | MIB: ENTITY-SENSORS-MIB The most recent measurement obtained by the agent for this sensor. To correctly interpret the value of this object, the associated entPhySensorType, entPhySensorScale, and entPhySensorPrecision objects must also be examined. | SNMP_AGENT | - | sensor.fan.speed[entPhySensorValue.{#SNMPINDEX}] |
| {#SENSOR_INFO}: Fan status | MIB: ENTITY-SENSORS-MIB The operational status of the sensor {#SENSOR_INFO} | SNMP_AGENT | 3m | sensor.fan.status[entPhySensorOperStatus.{#SNMPINDEX}] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| {#SENSOR_INFO}: Fan is in critical state | Please check the fan unit | count(/Mellanox SNMP/sensor.fan.status[entPhySensorOperStatus.{#SNMPINDEX}],#1,"eq","{$FAN_CRIT_STATUS}")=1 | AVERAGE ⚠ | {#SENSOR_INFO}: Fan status |
Discovery rule №3
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Network interfaces discovery | Discovering interfaces from IF-MIB. | SNMP_AGENT | 1h | net.if.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Interface {#IFNAME}({#IFALIAS}): Inbound packets discarded | MIB: IF-MIB The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. | SNMP_AGENT | 3m | net.if.in.discards[ifInDiscards.{#SNMPINDEX}] |
| Interface {#IFNAME}({#IFALIAS}): Inbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of inbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. | SNMP_AGENT | 3m | net.if.in.errors[ifInErrors.{#SNMPINDEX}] |
| Interface {#IFNAME}({#IFALIAS}): Bits received | MIB: IF-MIB The total number of octets received on the interface, including framing characters. This object is a 64-bit version of ifInOctets. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. | SNMP_AGENT | 3m | net.if.in[ifHCInOctets.{#SNMPINDEX}] |
| Interface {#IFNAME}({#IFALIAS}): Outbound packets discarded | MIB: IF-MIB The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. | SNMP_AGENT | 3m | net.if.out.discards[ifOutDiscards.{#SNMPINDEX}] |
| Interface {#IFNAME}({#IFALIAS}): Outbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of outbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of outbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. | SNMP_AGENT | 3m | net.if.out.errors[ifOutErrors.{#SNMPINDEX}] |
| Interface {#IFNAME}({#IFALIAS}): Bits sent | MIB: IF-MIB The total number of octets transmitted out of the interface, including framing characters. This object is a 64-bit version of ifOutOctets.Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. | SNMP_AGENT | 3m | net.if.out[ifHCOutOctets.{#SNMPINDEX}] |
| Interface {#IFNAME}({#IFALIAS}): Speed | MIB: IF-MIB An estimate of the interface's current bandwidth in units of 1,000,000 bits per second. If this object reports a value of n' then the speed of the interface is somewhere in the range of n-500,000' to`n+499,999'. For interfaces which do not vary in bandwidth or for those where no accurate estimation can be made, this object should contain the nominal bandwidth. For a sub-layer which has no concept of bandwidth, this object should be zero. | SNMP_AGENT | 5m | net.if.speed[ifHighSpeed.{#SNMPINDEX}] |
| Interface {#IFNAME}({#IFALIAS}): Operational status | MIB: IF-MIB The current operational state of the interface. - The testing(3) state indicates that no operational packet scan be passed - If ifAdminStatus is down(2) then ifOperStatus should be down(2) - If ifAdminStatus is changed to up(1) then ifOperStatus should change to up(1) if the interface is ready to transmit and receive network traffic - It should change todormant(5) if the interface is waiting for external actions (such as a serial line waiting for an incoming connection) - It should remain in the down(2) state if and only if there is a fault that prevents it from going to the up(1) state - It should remain in the notPresent(6) state if the interface has missing(typically, hardware) components. | SNMP_AGENT | - | net.if.status[ifOperStatus.{#SNMPINDEX}] |
| Interface {#IFNAME}({#IFALIAS}): Interface type | MIB: IF-MIB The type of interface. Additional values for ifType are assigned by the Internet Assigned Numbers Authority (IANA), through updating the syntax of the IANAifType textual convention. | SNMP_AGENT | 1h | net.if.type[ifType.{#SNMPINDEX}] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| Interface {#IFNAME}({#IFALIAS}): Link down | This trigger expression works as follows: 1. Can be triggered if operations status is down. 2. {$IFCONTROL:"{#IFNAME}"}=1 - user can redefine Context macro to value - 0. That marks this interface as not important. No new trigger will be fired if this interface is down. 3. {TEMPLATE_NAME:METRIC.diff()}=1) - trigger fires only if operational status was up(1) sometime before. (So, do not fire 'ethernal off' interfaces.) WARNING: if closed manually - won't fire again on next poll, because of .diff. | {$IFCONTROL:"{#IFNAME}"}=1 and last(/Mellanox SNMP/net.if.status[ifOperStatus.{#SNMPINDEX}])=2 and (last(/Mellanox SNMP/net.if.status[ifOperStatus.{#SNMPINDEX}],#1)<>last(/Mellanox SNMP/net.if.status[ifOperStatus.{#SNMPINDEX}],#2)) | AVERAGE ⚠ | Interface {#IFNAME}({#IFALIAS}): Operational status |
Discovery rule №4
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| PSU Discovery | - | SNMP_AGENT | 1h | psu.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| {#ENT_NAME}: Power supply status | MIB: ENTITY-STATE-MIB | SNMP_AGENT | 3m | sensor.psu.status[entStateOper.{#SNMPINDEX}] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| {#ENT_NAME}: Power supply is in critical state | Please check the power supply unit for errors | count(/Mellanox SNMP/sensor.psu.status[entStateOper.{#SNMPINDEX}],#1,"eq","{$PSU.STATUS.CRIT}")=1 | AVERAGE ⚠ | {#ENT_NAME}: Power supply status |
Discovery rule №5
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Temperature Discovery | ENTITY-SENSORS-MIB::EntitySensorDataType discovery with temperature filter | SNMP_AGENT | 1h | temp.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| {#SENSOR_INFO}: Temperature status | MIB: ENTITY-SENSORS-MIB The operational status of the sensor {#SENSOR_INFO}. Possible values: - ok(1) indicates that the agent can obtain the sensor value. - unavailable(2) indicates that the agent presently cannot obtain the sensor value. - nonoperational(3) indicates that the agent believes the sensor is broken. The sensor could have a hard failure (disconnected wire), or a soft failure such as out-of-range, jittery, or wildly fluctuating readings. | SNMP_AGENT | 3m | sensor.temp.status[entPhySensorOperStatus.{#SNMPINDEX}] |
| {#SENSOR_INFO}: Temperature | MIB: ENTITY-SENSORS-MIB The most recent measurement obtained by the agent for this sensor. To correctly interpret the value of this object, the associated entPhySensorType, entPhySensorScale, and entPhySensorPrecision objects must also be examined. | SNMP_AGENT | 3m | sensor.temp.value[entPhySensorValue.{#SNMPINDEX}] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| {#SENSOR_INFO}: Temperature is above critical threshold | This trigger uses temperature sensor values as well as temperature sensor status if available | avg(/Mellanox SNMP/sensor.temp.value[entPhySensorValue.{#SNMPINDEX}],5m)>{$TEMP.MAX.CRIT:"{#SENSOR_INFO}"} | HIGH ⛔ | {#SENSOR_INFO}: Temperature |
| {#SENSOR_INFO}: Temperature is too low | - | avg(/Mellanox SNMP/sensor.temp.value[entPhySensorValue.{#SNMPINDEX}],5m)<{$TEMP.MIN.CRIT:"{#SENSOR_INFO}"} | AVERAGE ⚠ | {#SENSOR_INFO}: Temperature |
Discovery rule №6
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Storage discovery | HOST-RESOURCES-MIB::hrStorage discovery with storage filter. | SNMP_AGENT | 1h | vfs.fs.discovery[snmp] |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| {#FSNAME}: Space utilization | Space utilization in % for {#FSNAME} | CALCULATED | - | vfs.fs.pused[storageUsedPercentage.{#SNMPINDEX}] |
| {#FSNAME}: Total space | MIB: HOST-RESOURCES-MIB The size of the storage represented by this entry, in units of hrStorageAllocationUnits. This object is writable to allow remote configuration of the size of the storage area in those cases where such an operation makes sense and is possible on the underlying system. For example, the amount of main storage allocated to a buffer pool might be modified or the amount of disk space allocated to virtual storage might be modified. | SNMP_AGENT | - | vfs.fs.total[hrStorageSize.{#SNMPINDEX}] |
| {#FSNAME}: Used space | MIB: HOST-RESOURCES-MIB The amount of the storage represented by this entry that is allocated, in units of hrStorageAllocationUnits. | SNMP_AGENT | - | vfs.fs.used[hrStorageUsed.{#SNMPINDEX}] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| {#FSNAME}: Disk space is critically low | Two conditions should match: First, space utilization should be above {$VFS.FS.PUSED.MAX.CRIT:"{#FSNAME}"}. Second condition should be one of the following: - The disk free space is less than {$VFS.FS.FREE.MIN.CRIT:"{#FSNAME}"}. - The disk will be full in less than 24 hours. | last(/Mellanox SNMP/vfs.fs.pused[storageUsedPercentage.{#SNMPINDEX}])>{$VFS.FS.PUSED.MAX.CRIT:"{#FSNAME}"} and ((last(/Mellanox SNMP/vfs.fs.total[hrStorageSize.{#SNMPINDEX}])-last(/Mellanox SNMP/vfs.fs.used[hrStorageUsed.{#SNMPINDEX}]))<{$VFS.FS.FREE.MIN.CRIT:"{#FSNAME}"} or timeleft(/Mellanox SNMP/vfs.fs.pused[storageUsedPercentage.{#SNMPINDEX}],1h,100)<1d) | AVERAGE ⚠ | |
| {#FSNAME}: Disk space is low | Two conditions should match: First, space utilization should be above {$VFS.FS.PUSED.MAX.WARN:"{#FSNAME}"}. Second condition should be one of the following: - The disk free space is less than {$VFS.FS.FREE.MIN.WARN:"{#FSNAME}"}. - The disk will be full in less than 24 hours. | last(/Mellanox SNMP/vfs.fs.pused[storageUsedPercentage.{#SNMPINDEX}])>{$VFS.FS.PUSED.MAX.WARN:"{#FSNAME}"} and ((last(/Mellanox SNMP/vfs.fs.total[hrStorageSize.{#SNMPINDEX}])-last(/Mellanox SNMP/vfs.fs.used[hrStorageUsed.{#SNMPINDEX}]))<{$VFS.FS.FREE.MIN.WARN:"{#FSNAME}"} or timeleft(/Mellanox SNMP/vfs.fs.pused[storageUsedPercentage.{#SNMPINDEX}],1h,100)<1d) | WARNING 📢 |
Discovery rule №7
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Memory discovery | HOST-RESOURCES-MIB::hrStorage discovery with memory filter | SNMP_AGENT | 1h | vm.memory.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| {#MEMNAME}: Total memory | MIB: HOST-RESOURCES-MIB The size of the storage represented by this entry, in units of hrStorageAllocationUnits. This object is writable to allow remote configuration of the size of the storage area in those cases where such an operation makes sense and is possible on the underlying system. For example, the amount of main memory allocated to a buffer pool might be modified or the amount of disk space allocated to virtual memory might be modified. | SNMP_AGENT | - | vm.memory.total[hrStorageSize.{#SNMPINDEX}] |
| {#MEMNAME}: Used memory | MIB: HOST-RESOURCES-MIB The amount of the storage represented by this entry that is allocated, in units of hrStorageAllocationUnits. | SNMP_AGENT | - | vm.memory.used[hrStorageUsed.{#SNMPINDEX}] |
| {#MEMNAME}: Memory utilization | Memory utilization in %. | CALCULATED | - | vm.memory.util[memoryUsedPercentage.{#SNMPINDEX}] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| {#MEMNAME}: High memory utilization | The system is running out of free memory. | min(/Mellanox SNMP/vm.memory.util[memoryUsedPercentage.{#SNMPINDEX}],5m)>{$MEMORY.UTIL.MAX} | AVERAGE ⚠ | {#MEMNAME}: Memory utilization |