NetApp SNMP
Macros used
| Name | Value |
|---|---|
| {$CPU.UTIL.CRIT} | 90 |
| {$FAS3220.FS.AVAIL.MIN.CRIT} | 10G |
| {$FAS3220.FS.NAME.MATCHES} | .* |
| {$FAS3220.FS.NAME.NOT_MATCHES} | snapshot |
| {$FAS3220.FS.PUSED.MAX.CRIT} | 90 |
| {$FAS3220.FS.TIME} | 10m |
| {$FAS3220.FS.TYPE.MATCHES} | .* |
| {$FAS3220.FS.TYPE.NOT_MATCHES} | CHANGE_IF_NEEDED |
| {$FAS3220.FS.USE.PCT} | 1 |
| {$FAS3220.NET.PORT.NAME.MATCHES} | .* |
| {$FAS3220.NET.PORT.NAME.NOT_MATCHES} | CHANGE_IF_NEEDED |
| {$FAS3220.NET.PORT.ROLE.MATCHES} | .* |
| {$FAS3220.NET.PORT.ROLE.NOT_MATCHES} | CHANGE_IF_NEEDED |
| {$FAS3220.NET.PORT.TYPE.MATCHES} | .* |
| {$FAS3220.NET.PORT.TYPE.NOT_MATCHES} | CHANGE_IF_NEEDED |
| {$IF.ERRORS.WARN} | - |
| {$IF.UTIL.MAX} | 95 |
Items collected
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| NetApp: Failed disks count | The number of disks that are currently broken. | SNMP_AGENT | - | netapp.disk[diskFailedCount] |
| NetApp: Failed disks message | If diskFailedCount is non-zero, this is a string describing the failed disk or disks. Each failed disk is described. | SNMP_AGENT | - | netapp.disk[diskFailedMessage] |
| NetApp: Product firmware version | Version string for the firmware running on this platform. | SNMP_AGENT | - | netapp.inventory[productFirmwareVersion] |
| NetApp: Product version | Version string for the software running on this platform. | SNMP_AGENT | - | netapp.inventory[productVersion] |
Discovery rule №1
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Cluster metrics discovery | Discovery of Cluster metrics per node | SNMP_AGENT | 1h | netapp.cluster.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Node {#NODE.NAME}: Failed FAN count | Count of the number of chassis fans that are not operating within the recommended RPM range. | SNMP_AGENT | - | netapp.cluster[nodeEnvFailedFanCount, "{#NODE.NAME}"] |
| Node {#NODE.NAME}: Failed FAN messgae | Text message describing current condition of chassis fans. This is useful only if envFailedFanCount is not zero. | SNMP_AGENT | - | netapp.cluster[nodeEnvFailedFanMessage, "{#NODE.NAME}"] |
| Node {#NODE.NAME}: Degraded power supplies count | Count of the number of power supplies that are in degraded mode. | SNMP_AGENT | - | netapp.cluster[nodeEnvFailedPowerSupplyCount, "{#NODE.NAME}"] |
| Node {#NODE.NAME}: Degraded power supplies message | Text message describing the state of any power supplies that are currently degraded. This is useful only if envFailedPowerSupplyCount is not zero. | SNMP_AGENT | - | netapp.cluster[nodeEnvFailedPowerSupplyMessage, "{#NODE.NAME}"] |
| Node {#NODE.NAME}: Over-temperature | An indication of whether the hardware is currently operating outside of its recommended temperature range. The hardware will shutdown if the temperature exceeds critical thresholds. | SNMP_AGENT | - | netapp.cluster[nodeEnvOverTemperature, "{#NODE.NAME}"] |
| Node {#NODE.NAME}: Health | Whether or not the node can communicate with the cluster. | SNMP_AGENT | - | netapp.cluster[nodeHealth, "{#NODE.NAME}"] |
| Node {#NODE.NAME}: Location | Node Location. Same as sysLocation for a specific node. | SNMP_AGENT | - | netapp.cluster[nodeLocation, "{#NODE.NAME}"] |
| Node {#NODE.NAME}: Model | Node Model. Same as productModel for a specific node. | SNMP_AGENT | - | netapp.cluster[nodeModel, "{#NODE.NAME}"] |
| Node {#NODE.NAME}: NVRAM battery status | An indication of the current status of the NVRAM battery or batteries. Batteries which are fully or partially discharged may not fully protect the system during a crash. The end-of-life status values are based on the manufacturer's recommended life for the batteries. Possible values: ok(1), partiallyDischarged(2), fullyDischarged(3), notPresent(4), nearEndOfLife(5), atEndOfLife(6), unknown(7), overCharged(8), fullyCharged(9). | SNMP_AGENT | - | netapp.cluster[nodeNvramBatteryStatus, "{#NODE.NAME}"] |
| Node {#NODE.NAME}: Serial number | Node Serial Number. Same as productSerialNum for a specific node. | SNMP_AGENT | - | netapp.cluster[nodeSerialNumber, "{#NODE.NAME}"] |
| Node {#NODE.NAME}: Uptime | Node uptime. Same as sysUpTime for a specific node. | SNMP_AGENT | - | netapp.cluster[nodeUptime, "{#NODE.NAME}"] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| Node {#NODE.NAME}: Temperature is over than recommended | The hardware will shutdown if the temperature exceeds critical thresholds. | last(/NetApp SNMP/netapp.cluster[nodeEnvOverTemperature, "{#NODE.NAME}"])=2 | HIGH ⛔ | Node {#NODE.NAME}: Over-temperature |
| Node {#NODE.NAME}: Node can not communicate with the cluster | - | last(/NetApp SNMP/netapp.cluster[nodeHealth, "{#NODE.NAME}"])=0 | HIGH ⛔ | Node {#NODE.NAME}: Health |
| Node {#NODE.NAME}: NVRAM battery status is not OK | - | last(/NetApp SNMP/netapp.cluster[nodeNvramBatteryStatus, "{#NODE.NAME}"])<>1 | AVERAGE ⚠ | Node {#NODE.NAME}: NVRAM battery status |
| Node {#NODE.NAME}: has been restarted (uptime < 10m) | Uptime is less than 10 minutes | last(/NetApp SNMP/netapp.cluster[nodeUptime, "{#NODE.NAME}"])<10m | AVERAGE ⚠ | Node {#NODE.NAME}: Uptime |
Discovery rule №2
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| CPU discovery | Discovery of CPU metrics per node | SNMP_AGENT | 1h | netapp.cpu.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Node {#NODE.NAME}: CPU utilization | The average, over the last minute, of the percentage of time that this processor was not idle. | SNMP_AGENT | - | netapp.cpu[cDOTCpuBusyTimePerCent, "{#NODE.NAME}"] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| Node {#NODE.NAME}: High CPU utilization (over {$CPU.UTIL.CRIT}% for 5m) | CPU utilization is too high. The system might be slow to respond. | min(/NetApp SNMP/netapp.cpu[cDOTCpuBusyTimePerCent, "{#NODE.NAME}"],5m)>{$CPU.UTIL.CRIT} | WARNING 📢 | Node {#NODE.NAME}: CPU utilization |
Discovery rule №3
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Filesystems discovery | Filesystems discovery with filter. | SNMP_AGENT | 1h | netapp.fs.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| {#VSERVER}{#FSNAME}: Total space available | The total disk space that is free for use on {#FSNAME}. | SNMP_AGENT | - | netapp.fs[df64AvailKBytes, "{#VSERVER}{#FSNAME}"] |
| {#VSERVER}{#FSNAME}: Total space | The total capacity in Bytes for {#FSNAME}. | SNMP_AGENT | - | netapp.fs[df64TotalKBytes, "{#VSERVER}{#FSNAME}"] |
| {#VSERVER}{#FSNAME}: Total space used | The total disk space that is in use on {#FSNAME}. | SNMP_AGENT | - | netapp.fs[df64UsedKBytes, "{#VSERVER}{#FSNAME}"] |
| {#VSERVER}{#FSNAME}: Saved by compression percents | Provides the percentage of compression savings in a volume, which is ((compr_saved/used)) * 10(compr_saved + 0). This is only returned for volumes. | SNMP_AGENT | - | netapp.fs[dfCompressSavedPercent, "{#VSERVER}{#FSNAME}"] |
| {#VSERVER}{#FSNAME}: Saved by deduplication percents | Provides the percentage of deduplication savings in a volume, which is ((dedup_saved/(dedup_saved + used)) * 100). This is only returned for volumes. | SNMP_AGENT | - | netapp.fs[dfDedupeSavedPercent, "{#VSERVER}{#FSNAME}"] |
| {#VSERVER}{#FSNAME}: Used space percents | The percentage of disk space currently in use on {#FSNAME}. | SNMP_AGENT | - | netapp.fs[dfPerCentKBytesCapacity, "{#VSERVER}{#FSNAME}"] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| {#VSERVER}{#FSNAME}: Disk space is too low (below {$FAS3220.FS.AVAIL.MIN.CRIT:"{#FSNAME}"} for {$FAS3220.FS.TIME:"{#FSNAME}"}) | - | min(/NetApp SNMP/netapp.fs[df64AvailKBytes, "{#VSERVER}{#FSNAME}"],{$FAS3220.FS.TIME:"{#FSNAME}"})<{$FAS3220.FS.AVAIL.MIN.CRIT:"{#FSNAME}"} and {$FAS3220.FS.USE.PCT:"{#FSNAME}"}=0 | HIGH ⛔ | {#VSERVER}{#FSNAME}: Total space available |
| {#VSERVER}{#FSNAME}: Disk space is too low (used over {$FAS3220.FS.PUSED.MAX.CRIT:"{#FSNAME}"}% for {$FAS3220.FS.TIME:"{#FSNAME}"}) | - | max(/NetApp SNMP/netapp.fs[dfPerCentKBytesCapacity, "{#VSERVER}{#FSNAME}"],{$FAS3220.FS.TIME:"{#FSNAME}"})>{$FAS3220.FS.PUSED.MAX.CRIT:"{#FSNAME}"} and {$FAS3220.FS.USE.PCT:"{#FSNAME}"}=1 | HIGH ⛔ | {#VSERVER}{#FSNAME}: Used space percents |
Discovery rule №4
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| HA discovery | Discovery of high availability metrics per node | SNMP_AGENT | 1h | netapp.ha.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Node {#NODE.NAME}: Cannot takeover cause | The reason node cannot take over it's HA partner {#PARTNER.NAME}. Possible states: ok(1), unknownReason(2), disabledByOperator(3), interconnectOffline(4), disabledByPartner(5), takeoverFailed(6), mailboxIsInDegradedState(7), partnermailboxIsInUninitialisedState(8), mailboxVersionMismatch(9), nvramSizeMismatch(10), kernelVersionMismatch(11), partnerIsInBootingStage(12), diskshelfIsTooHot(13), partnerIsPerformingRevert(14), nodeIsPerformingRevert(15), sametimePartnerIsAlsoTryingToTakeUsOver(16), alreadyInTakenoverMode(17), nvramLogUnsynchronized(18), stateofBackupMailboxIsDoubtful(19). | SNMP_AGENT | - | netapp.ha[haCannotTakeoverCause, "{#NODE.NAME}"] |
| Node {#NODE.NAME}: HA settings | High Availability configuration settings. The value notConfigured(1) indicates that the HA is not licensed. The thisNodeDead(5) setting indicates that this node has been takenover. | SNMP_AGENT | - | netapp.ha[haSettings, "{#NODE.NAME}"] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| Node {#NODE.NAME}: Node cannot takeover it's HA partner {#PARTNER.NAME}. Reason: {ITEM.VALUE} | Possible reasons: unknownReason(2), disabledByOperator(3), interconnectOffline(4), disabledByPartner(5), takeoverFailed(6), mailboxIsInDegradedState(7), partnermailboxIsInUninitialisedState(8), mailboxVersionMismatch(9), nvramSizeMismatch(10), kernelVersionMismatch(11), partnerIsInBootingStage(12), diskshelfIsTooHot(13), partnerIsPerformingRevert(14), nodeIsPerformingRevert(15), sametimePartnerIsAlsoTryingToTakeUsOver(16), alreadyInTakenoverMode(17), nvramLogUnsynchronized(18), stateofBackupMailboxIsDoubtful(19). | last(/NetApp SNMP/netapp.ha[haCannotTakeoverCause, "{#NODE.NAME}"])<>1 | HIGH ⛔ | Node {#NODE.NAME}: Cannot takeover cause |
| Node {#NODE.NAME}: HA is not licensed | The value notConfigured(1) indicates that the HA is not licensed. | last(/NetApp SNMP/netapp.ha[haSettings, "{#NODE.NAME}"])=1 | AVERAGE ⚠ | Node {#NODE.NAME}: HA settings |
| Node {#NODE.NAME}: Node has been taken over | The thisNodeDead(5) setting indicates that this node has been takenover. | last(/NetApp SNMP/netapp.ha[haSettings, "{#NODE.NAME}"])=5 | HIGH ⛔ | Node {#NODE.NAME}: HA settings |
Discovery rule №5
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Network ports discovery | Network interfaces discovery with filter. | SNMP_AGENT | 1h | netapp.net.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Node {#NODE}: port {#IFNAME} ({#TYPE}): Inbound packets discarded | The number of inbound packets that were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. | SNMP_AGENT | 3m | netapp.net.if[if64InDiscards, "{#NODE}", "{#IFNAME}"] |
| Node {#NODE}: port {#IFNAME} ({#TYPE}): Inbound packets with errors | The number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. | SNMP_AGENT | 3m | netapp.net.if[if64InErrors, "{#NODE}", "{#IFNAME}"] |
| Node {#NODE}: port {#IFNAME} ({#TYPE}): Bits received | The total number of octets received on the interface, including framing characters. | SNMP_AGENT | - | netapp.net.if[if64InOctets, "{#NODE}", "{#IFNAME}"] |
| Node {#NODE}: port {#IFNAME} ({#TYPE}): Outbound packets discarded | The number of outbound packets that were chosen to be discarded even though no errors had been detected to prevent their being transmitted. One possible reason for discarding such a packet could be to free up buffer space. | SNMP_AGENT | 3m | netapp.net.if[if64OutDiscards, "{#NODE}", "{#IFNAME}"] |
| Node {#NODE}: port {#IFNAME} ({#TYPE}): Outbound packets with errors | The number of outbound packets that could not be transmitted because of errors. | SNMP_AGENT | 3m | netapp.net.if[if64OutErrors, "{#NODE}", "{#IFNAME}"] |
| Node {#NODE}: port {#IFNAME} ({#TYPE}): Bits sent | The total number of octets transmitted out of the interface, including framing characters. | SNMP_AGENT | - | netapp.net.if[if64OutOctets, "{#NODE}", "{#IFNAME}"] |
| Node {#NODE}: port {#IFNAME} ({#TYPE}): Health degraded reason | The list of reasons why the port is marked as degraded. | SNMP_AGENT | - | netapp.net.port[netportDegradedReason, "{#NODE}", "{#IFNAME}"] |
| Node {#NODE}: port {#IFNAME} ({#TYPE}): Health | The health status of the port. | SNMP_AGENT | - | netapp.net.port[netportHealthStatus, "{#NODE}", "{#IFNAME}"] |
| Node {#NODE}: port {#IFNAME} ({#TYPE}): State | The link-state of the port. Normally it is either UP(2) or DOWN(3). | SNMP_AGENT | - | netapp.net.port[netportLinkState, "{#NODE}", "{#IFNAME}"] |
| Node {#NODE}: port {#IFNAME} ({#TYPE}): Role | Role of the port. A port must have one of the following roles: cluster(1), data(2), mgmt(3), intercluster(4), cluster-mgmt(5) or undef(0). The cluster port is used to communicate to other node(s) in the cluster. The data port services clients' requests. It is where all the file requests come in. The management port is used by administrator to manage resources within a node. The intercluster port is used to communicate to other cluster. The cluster-mgmt port is used to manage resources within the cluster. The undef role is for the port that has not yet been assigned a role. | SNMP_AGENT | - | netapp.net.port[netportRole, "{#NODE}", "{#IFNAME}"] |
| Node {#NODE}: port {#IFNAME} ({#TYPE}): Speed | The speed that appears on the port. It can be either undef(0), auto(1), ten Mb/s(2), hundred Mb/s(3), one Gb/s(4), or ten Gb/s(5). | SNMP_AGENT | - | netapp.net.port[netportSpeedOper, "{#NODE}", "{#IFNAME}"] |
| Node {#NODE}: port {#IFNAME} ({#TYPE}): Up by an administrator | Indicates whether the port status is set 'UP' by an administrator. | SNMP_AGENT | - | netapp.net.port[netportUpAdmin, "{#NODE}", "{#IFNAME}"] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| Node {#NODE}: port {#IFNAME} ({#TYPE}): High error rate (> {$IF.ERRORS.WARN:"{#IFNAME}"} for 5m) | Recovers when below 80% of {$IF.ERRORS.WARN:"{#IFNAME}"} threshold | min(/NetApp SNMP/netapp.net.if[if64InErrors, "{#NODE}", "{#IFNAME}"],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} or min(/NetApp SNMP/netapp.net.if[if64OutErrors, "{#NODE}", "{#IFNAME}"],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} | WARNING 📢 | |
| Node {#NODE}: port {#IFNAME} ({#TYPE}): Link down | Link state is not UP and the port status is set 'UP' by an administrator. | last(/NetApp SNMP/netapp.net.port[netportLinkState, "{#NODE}", "{#IFNAME}"])<>2 and last(/NetApp SNMP/netapp.net.port[netportUpAdmin, "{#NODE}", "{#IFNAME}"])=1 | AVERAGE ⚠ | |
| Node {#NODE}: port {#IFNAME} ({#TYPE}): Port is not healthy | {{ITEM.LASTVALUE2}.regsub("(.*)", \1)} | last(/NetApp SNMP/netapp.net.port[netportHealthStatus, "{#NODE}", "{#IFNAME}"])<>0 and length(last(/NetApp SNMP/netapp.net.port[netportDegradedReason, "{#NODE}", "{#IFNAME}"]))>0 | INFO 🔔 |