Перейти к основному содержимому

NetApp SNMP

Macros used

NameValue
{$CPU.UTIL.CRIT}90
{$FAS3220.FS.AVAIL.MIN.CRIT}10G
{$FAS3220.FS.NAME.MATCHES}.*
{$FAS3220.FS.NAME.NOT_MATCHES}snapshot
{$FAS3220.FS.PUSED.MAX.CRIT}90
{$FAS3220.FS.TIME}10m
{$FAS3220.FS.TYPE.MATCHES}.*
{$FAS3220.FS.TYPE.NOT_MATCHES}CHANGE_IF_NEEDED
{$FAS3220.FS.USE.PCT}1
{$FAS3220.NET.PORT.NAME.MATCHES}.*
{$FAS3220.NET.PORT.NAME.NOT_MATCHES}CHANGE_IF_NEEDED
{$FAS3220.NET.PORT.ROLE.MATCHES}.*
{$FAS3220.NET.PORT.ROLE.NOT_MATCHES}CHANGE_IF_NEEDED
{$FAS3220.NET.PORT.TYPE.MATCHES}.*
{$FAS3220.NET.PORT.TYPE.NOT_MATCHES}CHANGE_IF_NEEDED
{$IF.ERRORS.WARN}

-

{$IF.UTIL.MAX}95

Items collected

NameDescriptionTypeIntervalKey and additional info
NetApp: Failed disks countThe number of disks that are currently broken.SNMP_AGENT

-

netapp.disk[diskFailedCount]
NetApp: Failed disks messageIf diskFailedCount is non-zero, this is a string describing the failed disk or disks. Each failed disk is described.SNMP_AGENT

-

netapp.disk[diskFailedMessage]
NetApp: Product firmware versionVersion string for the firmware running on this platform.SNMP_AGENT

-

netapp.inventory[productFirmwareVersion]
NetApp: Product versionVersion string for the software running on this platform.SNMP_AGENT

-

netapp.inventory[productVersion]

Discovery rule №1

NameDescriptionTypeIntervalKey and additional info
Cluster metrics discoveryDiscovery of Cluster metrics per nodeSNMP_AGENT1hnetapp.cluster.discovery

Item prototypes

NameDescriptionTypeIntervalKey and additional info
Node {#NODE.NAME}: Failed FAN countCount of the number of chassis fans that are not operating within the recommended RPM range.SNMP_AGENT

-

netapp.cluster[nodeEnvFailedFanCount, "{#NODE.NAME}"]
Node {#NODE.NAME}: Failed FAN messgaeText message describing current condition of chassis fans. This is useful only if envFailedFanCount is not zero.SNMP_AGENT

-

netapp.cluster[nodeEnvFailedFanMessage, "{#NODE.NAME}"]
Node {#NODE.NAME}: Degraded power supplies countCount of the number of power supplies that are in degraded mode.SNMP_AGENT

-

netapp.cluster[nodeEnvFailedPowerSupplyCount, "{#NODE.NAME}"]
Node {#NODE.NAME}: Degraded power supplies messageText message describing the state of any power supplies that are currently degraded. This is useful only if envFailedPowerSupplyCount is not zero.SNMP_AGENT

-

netapp.cluster[nodeEnvFailedPowerSupplyMessage, "{#NODE.NAME}"]
Node {#NODE.NAME}: Over-temperatureAn indication of whether the hardware is currently operating outside of its recommended temperature range. The hardware will shutdown if the temperature exceeds critical thresholds.SNMP_AGENT

-

netapp.cluster[nodeEnvOverTemperature, "{#NODE.NAME}"]
Node {#NODE.NAME}: HealthWhether or not the node can communicate with the cluster.SNMP_AGENT

-

netapp.cluster[nodeHealth, "{#NODE.NAME}"]
Node {#NODE.NAME}: LocationNode Location. Same as sysLocation for a specific node.SNMP_AGENT

-

netapp.cluster[nodeLocation, "{#NODE.NAME}"]
Node {#NODE.NAME}: ModelNode Model. Same as productModel for a specific node.SNMP_AGENT

-

netapp.cluster[nodeModel, "{#NODE.NAME}"]
Node {#NODE.NAME}: NVRAM battery statusAn indication of the current status of the NVRAM battery or batteries. Batteries which are fully or partially discharged may not fully protect the system during a crash. The end-of-life status values are based on the manufacturer's recommended life for the batteries. Possible values: ok(1), partiallyDischarged(2), fullyDischarged(3), notPresent(4), nearEndOfLife(5), atEndOfLife(6), unknown(7), overCharged(8), fullyCharged(9).SNMP_AGENT

-

netapp.cluster[nodeNvramBatteryStatus, "{#NODE.NAME}"]
Node {#NODE.NAME}: Serial numberNode Serial Number. Same as productSerialNum for a specific node.SNMP_AGENT

-

netapp.cluster[nodeSerialNumber, "{#NODE.NAME}"]
Node {#NODE.NAME}: UptimeNode uptime. Same as sysUpTime for a specific node.SNMP_AGENT

-

netapp.cluster[nodeUptime, "{#NODE.NAME}"]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
Node {#NODE.NAME}: Temperature is over than recommendedThe hardware will shutdown if the temperature exceeds critical thresholds.last(/NetApp SNMP/netapp.cluster[nodeEnvOverTemperature, "{#NODE.NAME}"])=2HIGH ⛔Node {#NODE.NAME}: Over-temperature
Node {#NODE.NAME}: Node can not communicate with the cluster

-

last(/NetApp SNMP/netapp.cluster[nodeHealth, "{#NODE.NAME}"])=0HIGH ⛔Node {#NODE.NAME}: Health
Node {#NODE.NAME}: NVRAM battery status is not OK

-

last(/NetApp SNMP/netapp.cluster[nodeNvramBatteryStatus, "{#NODE.NAME}"])<>1AVERAGE ⚠Node {#NODE.NAME}: NVRAM battery status
Node {#NODE.NAME}: has been restarted (uptime < 10m)Uptime is less than 10 minuteslast(/NetApp SNMP/netapp.cluster[nodeUptime, "{#NODE.NAME}"])<10mAVERAGE ⚠Node {#NODE.NAME}: Uptime

Discovery rule №2

NameDescriptionTypeIntervalKey and additional info
CPU discoveryDiscovery of CPU metrics per nodeSNMP_AGENT1hnetapp.cpu.discovery

Item prototypes

NameDescriptionTypeIntervalKey and additional info
Node {#NODE.NAME}: CPU utilizationThe average, over the last minute, of the percentage of time that this processor was not idle.SNMP_AGENT

-

netapp.cpu[cDOTCpuBusyTimePerCent, "{#NODE.NAME}"]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
Node {#NODE.NAME}: High CPU utilization (over {$CPU.UTIL.CRIT}% for 5m)CPU utilization is too high. The system might be slow to respond.min(/NetApp SNMP/netapp.cpu[cDOTCpuBusyTimePerCent, "{#NODE.NAME}"],5m)>{$CPU.UTIL.CRIT}WARNING 📢Node {#NODE.NAME}: CPU utilization

Discovery rule №3

NameDescriptionTypeIntervalKey and additional info
Filesystems discoveryFilesystems discovery with filter.SNMP_AGENT1hnetapp.fs.discovery

Item prototypes

NameDescriptionTypeIntervalKey and additional info
{#VSERVER}{#FSNAME}: Total space availableThe total disk space that is free for use on {#FSNAME}.SNMP_AGENT

-

netapp.fs[df64AvailKBytes, "{#VSERVER}{#FSNAME}"]
{#VSERVER}{#FSNAME}: Total spaceThe total capacity in Bytes for {#FSNAME}.SNMP_AGENT

-

netapp.fs[df64TotalKBytes, "{#VSERVER}{#FSNAME}"]
{#VSERVER}{#FSNAME}: Total space usedThe total disk space that is in use on {#FSNAME}.SNMP_AGENT

-

netapp.fs[df64UsedKBytes, "{#VSERVER}{#FSNAME}"]
{#VSERVER}{#FSNAME}: Saved by compression percentsProvides the percentage of compression savings in a volume, which is ((compr_saved/used)) * 10(compr_saved + 0). This is only returned for volumes.SNMP_AGENT

-

netapp.fs[dfCompressSavedPercent, "{#VSERVER}{#FSNAME}"]
{#VSERVER}{#FSNAME}: Saved by deduplication percentsProvides the percentage of deduplication savings in a volume, which is ((dedup_saved/(dedup_saved + used)) * 100). This is only returned for volumes.SNMP_AGENT

-

netapp.fs[dfDedupeSavedPercent, "{#VSERVER}{#FSNAME}"]
{#VSERVER}{#FSNAME}: Used space percentsThe percentage of disk space currently in use on {#FSNAME}.SNMP_AGENT

-

netapp.fs[dfPerCentKBytesCapacity, "{#VSERVER}{#FSNAME}"]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
{#VSERVER}{#FSNAME}: Disk space is too low (below {$FAS3220.FS.AVAIL.MIN.CRIT:"{#FSNAME}"} for {$FAS3220.FS.TIME:"{#FSNAME}"})

-

min(/NetApp SNMP/netapp.fs[df64AvailKBytes, "{#VSERVER}{#FSNAME}"],{$FAS3220.FS.TIME:"{#FSNAME}"})<{$FAS3220.FS.AVAIL.MIN.CRIT:"{#FSNAME}"} and {$FAS3220.FS.USE.PCT:"{#FSNAME}"}=0HIGH ⛔{#VSERVER}{#FSNAME}: Total space available
{#VSERVER}{#FSNAME}: Disk space is too low (used over {$FAS3220.FS.PUSED.MAX.CRIT:"{#FSNAME}"}% for {$FAS3220.FS.TIME:"{#FSNAME}"})

-

max(/NetApp SNMP/netapp.fs[dfPerCentKBytesCapacity, "{#VSERVER}{#FSNAME}"],{$FAS3220.FS.TIME:"{#FSNAME}"})>{$FAS3220.FS.PUSED.MAX.CRIT:"{#FSNAME}"} and {$FAS3220.FS.USE.PCT:"{#FSNAME}"}=1HIGH ⛔{#VSERVER}{#FSNAME}: Used space percents

Discovery rule №4

NameDescriptionTypeIntervalKey and additional info
HA discoveryDiscovery of high availability metrics per nodeSNMP_AGENT1hnetapp.ha.discovery

Item prototypes

NameDescriptionTypeIntervalKey and additional info
Node {#NODE.NAME}: Cannot takeover causeThe reason node cannot take over it's HA partner {#PARTNER.NAME}. Possible states: ok(1), unknownReason(2), disabledByOperator(3), interconnectOffline(4), disabledByPartner(5), takeoverFailed(6), mailboxIsInDegradedState(7), partnermailboxIsInUninitialisedState(8), mailboxVersionMismatch(9), nvramSizeMismatch(10), kernelVersionMismatch(11), partnerIsInBootingStage(12), diskshelfIsTooHot(13), partnerIsPerformingRevert(14), nodeIsPerformingRevert(15), sametimePartnerIsAlsoTryingToTakeUsOver(16), alreadyInTakenoverMode(17), nvramLogUnsynchronized(18), stateofBackupMailboxIsDoubtful(19).SNMP_AGENT

-

netapp.ha[haCannotTakeoverCause, "{#NODE.NAME}"]
Node {#NODE.NAME}: HA settingsHigh Availability configuration settings. The value notConfigured(1) indicates that the HA is not licensed. The thisNodeDead(5) setting indicates that this node has been takenover.SNMP_AGENT

-

netapp.ha[haSettings, "{#NODE.NAME}"]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
Node {#NODE.NAME}: Node cannot takeover it's HA partner {#PARTNER.NAME}. Reason: {ITEM.VALUE}Possible reasons: unknownReason(2), disabledByOperator(3), interconnectOffline(4), disabledByPartner(5), takeoverFailed(6), mailboxIsInDegradedState(7), partnermailboxIsInUninitialisedState(8), mailboxVersionMismatch(9), nvramSizeMismatch(10), kernelVersionMismatch(11), partnerIsInBootingStage(12), diskshelfIsTooHot(13), partnerIsPerformingRevert(14), nodeIsPerformingRevert(15), sametimePartnerIsAlsoTryingToTakeUsOver(16), alreadyInTakenoverMode(17), nvramLogUnsynchronized(18), stateofBackupMailboxIsDoubtful(19).last(/NetApp SNMP/netapp.ha[haCannotTakeoverCause, "{#NODE.NAME}"])<>1HIGH ⛔Node {#NODE.NAME}: Cannot takeover cause
Node {#NODE.NAME}: HA is not licensedThe value notConfigured(1) indicates that the HA is not licensed.last(/NetApp SNMP/netapp.ha[haSettings, "{#NODE.NAME}"])=1AVERAGE ⚠Node {#NODE.NAME}: HA settings
Node {#NODE.NAME}: Node has been taken overThe thisNodeDead(5) setting indicates that this node has been takenover.last(/NetApp SNMP/netapp.ha[haSettings, "{#NODE.NAME}"])=5HIGH ⛔Node {#NODE.NAME}: HA settings

Discovery rule №5

NameDescriptionTypeIntervalKey and additional info
Network ports discoveryNetwork interfaces discovery with filter.SNMP_AGENT1hnetapp.net.discovery

Item prototypes

NameDescriptionTypeIntervalKey and additional info
Node {#NODE}: port {#IFNAME} ({#TYPE}): Inbound packets discardedThe number of inbound packets that were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space.SNMP_AGENT3mnetapp.net.if[if64InDiscards, "{#NODE}", "{#IFNAME}"]
Node {#NODE}: port {#IFNAME} ({#TYPE}): Inbound packets with errorsThe number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol.SNMP_AGENT3mnetapp.net.if[if64InErrors, "{#NODE}", "{#IFNAME}"]
Node {#NODE}: port {#IFNAME} ({#TYPE}): Bits receivedThe total number of octets received on the interface, including framing characters.SNMP_AGENT

-

netapp.net.if[if64InOctets, "{#NODE}", "{#IFNAME}"]
Node {#NODE}: port {#IFNAME} ({#TYPE}): Outbound packets discardedThe number of outbound packets that were chosen to be discarded even though no errors had been detected to prevent their being transmitted. One possible reason for discarding such a packet could be to free up buffer space.SNMP_AGENT3mnetapp.net.if[if64OutDiscards, "{#NODE}", "{#IFNAME}"]
Node {#NODE}: port {#IFNAME} ({#TYPE}): Outbound packets with errorsThe number of outbound packets that could not be transmitted because of errors.SNMP_AGENT3mnetapp.net.if[if64OutErrors, "{#NODE}", "{#IFNAME}"]
Node {#NODE}: port {#IFNAME} ({#TYPE}): Bits sentThe total number of octets transmitted out of the interface, including framing characters.SNMP_AGENT

-

netapp.net.if[if64OutOctets, "{#NODE}", "{#IFNAME}"]
Node {#NODE}: port {#IFNAME} ({#TYPE}): Health degraded reasonThe list of reasons why the port is marked as degraded.SNMP_AGENT

-

netapp.net.port[netportDegradedReason, "{#NODE}", "{#IFNAME}"]
Node {#NODE}: port {#IFNAME} ({#TYPE}): HealthThe health status of the port.SNMP_AGENT

-

netapp.net.port[netportHealthStatus, "{#NODE}", "{#IFNAME}"]
Node {#NODE}: port {#IFNAME} ({#TYPE}): StateThe link-state of the port. Normally it is either UP(2) or DOWN(3).SNMP_AGENT

-

netapp.net.port[netportLinkState, "{#NODE}", "{#IFNAME}"]
Node {#NODE}: port {#IFNAME} ({#TYPE}): RoleRole of the port. A port must have one of the following roles: cluster(1), data(2), mgmt(3), intercluster(4), cluster-mgmt(5) or undef(0). The cluster port is used to communicate to other node(s) in the cluster. The data port services clients' requests. It is where all the file requests come in. The management port is used by administrator to manage resources within a node. The intercluster port is used to communicate to other cluster. The cluster-mgmt port is used to manage resources within the cluster. The undef role is for the port that has not yet been assigned a role.SNMP_AGENT

-

netapp.net.port[netportRole, "{#NODE}", "{#IFNAME}"]
Node {#NODE}: port {#IFNAME} ({#TYPE}): SpeedThe speed that appears on the port. It can be either undef(0), auto(1), ten Mb/s(2), hundred Mb/s(3), one Gb/s(4), or ten Gb/s(5).SNMP_AGENT

-

netapp.net.port[netportSpeedOper, "{#NODE}", "{#IFNAME}"]
Node {#NODE}: port {#IFNAME} ({#TYPE}): Up by an administratorIndicates whether the port status is set 'UP' by an administrator.SNMP_AGENT

-

netapp.net.port[netportUpAdmin, "{#NODE}", "{#IFNAME}"]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
Node {#NODE}: port {#IFNAME} ({#TYPE}): High error rate (> {$IF.ERRORS.WARN:"{#IFNAME}"} for 5m)Recovers when below 80% of {$IF.ERRORS.WARN:"{#IFNAME}"} thresholdmin(/NetApp SNMP/netapp.net.if[if64InErrors, "{#NODE}", "{#IFNAME}"],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} or min(/NetApp SNMP/netapp.net.if[if64OutErrors, "{#NODE}", "{#IFNAME}"],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"}WARNING 📢
Node {#NODE}: port {#IFNAME} ({#TYPE}): Link downLink state is not UP and the port status is set 'UP' by an administrator.last(/NetApp SNMP/netapp.net.port[netportLinkState, "{#NODE}", "{#IFNAME}"])<>2 and last(/NetApp SNMP/netapp.net.port[netportUpAdmin, "{#NODE}", "{#IFNAME}"])=1AVERAGE ⚠
Node {#NODE}: port {#IFNAME} ({#TYPE}): Port is not healthy{{ITEM.LASTVALUE2}.regsub("(.*)", \1)}last(/NetApp SNMP/netapp.net.port[netportHealthStatus, "{#NODE}", "{#IFNAME}"])<>0 and length(last(/NetApp SNMP/netapp.net.port[netportDegradedReason, "{#NODE}", "{#IFNAME}"]))>0INFO 🔔