Перейти к основному содержимому

SMART by Zabbix agent 2 active

Macros used

NameValue
{$SMART.DISK.NAME.MATCHES}^.*$
{$SMART.DISK.NAME.NOT_MATCHES}CHANGE_IF_NEEDED
{$SMART.TEMPERATURE.MAX.CRIT}65
{$SMART.TEMPERATURE.MAX.WARN}50

Discovery rule №1

NameDescriptionTypeIntervalKey and additional info
Disk discoveryDiscovery SMART disks.ZABBIX_ACTIVE1hsmart.disk.discovery

Item prototypes

NameDescriptionTypeIntervalKey and additional info
SMART [{#NAME}]: Bad_Block_RatePercentage of used reserve blocks divided by total reserve blocks.DEPENDENT

-

smart.disk.attribute.bad_block_rate[{#NAME}]
SMART [{#NAME}]: Power_Cycle_CountThis attribute indicates the count of full hard disk power on/off cycles.DEPENDENT

-

smart.disk.attribute.power_cycle_count[{#NAME}]
SMART [{#NAME}]: Program_Fail_Count_ChipThe total number of flash program operation failures since the drive was deployed.DEPENDENT

-

smart.disk.attribute.program_fail_count_chip[{#NAME}]
SMART [{#NAME}]: Raw_Read_Error_RateStores data related to the rate of hardware read errors that occurred when reading data from a disk surface. The raw value has different structure for different vendors and is often not meaningful as a decimal number. For some drives, this number may increase during normal operation without necessarily signifying errors.DEPENDENT

-

smart.disk.attribute.raw_read_error_rate[{#NAME}]
SMART [{#NAME}]: Reallocated_Sector_CtDisk discovered attribute.DEPENDENT

-

smart.disk.attribute.reallocated_sector_ct[{#NAME}]
SMART [{#NAME}]: Reported_UncorrectThe count of errors that could not be recovered using hardware ECC.DEPENDENT

-

smart.disk.attribute.reported_uncorrect[{#NAME}]
SMART [{#NAME}]: Seek_Error_RateRate of seek errors of the magnetic heads. If there is a partial failure in the mechanical positioning system, then seek errors will arise. Such a failure may be due to numerous factors, such as damage to a servo, or thermal widening of the hard disk. The raw value has different structure for different vendors and is often not meaningful as a decimal number. For some drives, this number may increase during normal operation without necessarily signifying errors.DEPENDENT

-

smart.disk.attribute.seek_error_rate[{#NAME}]
SMART [{#NAME}]: Spin_Up_TimeAverage time of spindle spin up (from zero RPM to fully operational [milliseconds]).DEPENDENT

-

smart.disk.attribute.spin_up_time[{#NAME}]
SMART [{#NAME}]: Start_Stop_CountA tally of spindle start/stop cycles. The spindle turns on, and hence the count is increased, both when the hard disk is turned on after having before been turned entirely off (disconnected from power source) and when the hard disk returns from having previously been put to sleep mode.DEPENDENT

-

smart.disk.attribute.start_stop_count[{#NAME}]
SMART [{#NAME}]: Critical warningThis field indicates critical warnings for the state of the controller.DEPENDENT

-

smart.disk.critical_warning[{#NAME}]
SMART [{#NAME}]: Smartctl errorThis metric will contain smartctl errors.DEPENDENT

-

smart.disk.error[{#NAME}]
SMART [{#NAME}]: Exit statusThe exit statuses of smartctl are defined by a bitmask but in decimal value. The eight different bits in the exit status have the following meanings for ATA disks; some of these values may also be returned for SCSI disks. Bit 0: Command line did not parse. Bit 1: Device open failed, device did not return an IDENTIFY DEVICE structure, or device is in a low-power mode (see '-n' option above). Bit 2: Some SMART or other ATA command to the disk failed, or there was a checksum error in a SMART data structure (see '-b' option above). Bit 3: SMART status check returned "DISK FAILING". Bit 4: We found prefail Attributes <= threshold. Bit 5: SMART status check returned "DISK OK" but we found that some (usage or prefail) Attributes have been <= threshold at some time in the past. Bit 6: The device error log contains records of errors. Bit 7: The device self-test log contains records of errors. [ATA only] Failed self-tests outdated by a newer successful extended self-test are ignored.DEPENDENT

-

smart.disk.es[{#NAME}]
SMART [{#NAME}]: Get disk attributes

-

ZABBIX_ACTIVE

-

smart.disk.get[{#PATH},"{#RAIDTYPE}"]
SMART [{#NAME}]: Power on hoursCount of hours in power-on state. The raw value of this attribute shows total count of hours (or minutes, or seconds, depending on manufacturer) in power-on state. "By default, the total expected lifetime of a hard disk in perfect condition is defined as 5 years (running every day and night on all days). This is equal to 1825 days in 24/7 mode or 43800 hours." On some pre-2005 drives, this raw value may advance erratically and/or "wrap around" (reset to zero periodically). https://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributesDEPENDENT

-

smart.disk.hours[{#NAME}]
SMART [{#NAME}]: Media errorsContains the number of occurrences where the controller detected an unrecovered data integrity error. Errors such as uncorrectable ECC, CRC checksum failure, or LBA tag mismatch are included in this field.DEPENDENT

-

smart.disk.media_errors[{#NAME}]
SMART [{#NAME}]: Device model

-

DEPENDENT

-

smart.disk.model[{#NAME}]
SMART [{#NAME}]: Percentage usedContains a vendor specific estimate of the percentage of NVM subsystem life used based on the actual usage and the manufacturer's prediction of NVM life. A value of 100 indicates that the estimated endurance of the NVM in the NVM subsystem has been consumed, but may not indicate an NVM subsystem failure. The value is allowed to exceed 100. Percentages greater than 254 shall be represented as 255. This value shall be updated once per power-on hour (when the controller is not in a sleep state).DEPENDENT

-

smart.disk.percentage_used[{#NAME}]
SMART [{#NAME}]: Serial number

-

DEPENDENT

-

smart.disk.sn[{#NAME}]
SMART [{#NAME}]: TemperatureCurrent drive temperature.DEPENDENT

-

smart.disk.temperature[{#NAME}]
SMART [{#NAME}]: Self-test passedThe disk is passed the SMART self-test or not.DEPENDENT

-

smart.disk.test[{#NAME}]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
SMART [{#NAME}]: Check returned "DISK FAILING"SMART status check returned "DISK FAILING".( count(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}],#2) = 1 and bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),8) = 8 ) or ( bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),8) = 8 and bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),8) > bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}],#2),8) )HIGH ⛔SMART [{#NAME}]: Exit status
SMART [{#NAME}]: Command line did not parseCommand line did not parse.( count(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}],#2) = 1 and bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),1) = 1 ) or ( bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),1) = 1 and bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),1) > bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}],#2),1) )HIGH ⛔SMART [{#NAME}]: Exit status
SMART [{#NAME}]: Device open failedDevice open failed, device did not return an IDENTIFY DEVICE structure, or device is in a low-power mode.( count(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}],#2) = 1 and bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),2) = 2 ) or ( bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),2) = 2 and bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),2) > bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}],#2),2) )HIGH ⛔SMART [{#NAME}]: Exit status
SMART [{#NAME}]: Error log contains recordsThe device error log contains records of errors.( count(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}],#2) = 1 and bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),64) = 64 ) or ( bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),64) = 64 and bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),64) > bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}],#2),64) )HIGH ⛔SMART [{#NAME}]: Exit status
SMART [{#NAME}]: Self-test log contains recordsThe device self-test log contains records of errors. [ATA only] Failed self-tests outdated by a newer successful extended self-test are ignored.( count(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}],#2) = 1 and bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),128) = 128 ) or ( bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),128) = 128 and bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),128) > bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}],#2),128) )HIGH ⛔SMART [{#NAME}]: Exit status
SMART [{#NAME}]: Some Attributes have been <= thresholdSMART status check returned "DISK OK" but we found that some (usage or prefail) Attributes have been <= threshold at some time in the past.( count(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}],#2) = 1 and bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),32) = 32 ) or ( bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),32) = 32 and bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),32) > bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}],#2),32) )HIGH ⛔SMART [{#NAME}]: Exit status
SMART [{#NAME}]: Some command to the disk failedSome SMART or other ATA command to the disk failed, or there was a checksum error in a SMART data structure.( count(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}],#2) = 1 and bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),4) = 4 ) or ( bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),4) = 4 and bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),4) > bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}],#2),4) )HIGH ⛔SMART [{#NAME}]: Exit status
SMART [{#NAME}]: Some prefail Attributes <= thresholdWe found prefail Attributes <= threshold.( count(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}],#2) = 1 and bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),16) = 16 ) or ( bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),16) = 16 and bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}]),16) > bitand(last(/SMART by Zabbix agent 2 active/smart.disk.es[{#NAME}],#2),16) )HIGH ⛔SMART [{#NAME}]: Exit status
SMART [{#NAME}]: NVMe disk percentage using is over 90% of estimated endurance

-

last(/SMART by Zabbix agent 2 active/smart.disk.percentage_used[{#NAME}])>90AVERAGE ⚠SMART [{#NAME}]: Percentage used
SMART [{#NAME}]: Disk has been replacedDevice serial number has changed. Ack to close.last(/SMART by Zabbix agent 2 active/smart.disk.sn[{#NAME}],#1)<>last(/SMART by Zabbix agent 2 active/smart.disk.sn[{#NAME}],#2) and length(last(/SMART by Zabbix agent 2 active/smart.disk.sn[{#NAME}]))>0INFO 🔔SMART [{#NAME}]: Serial number
SMART [{#NAME}]: Average disk temperature is critical

-

avg(/SMART by Zabbix agent 2 active/smart.disk.temperature[{#NAME}],5m)>{$SMART.TEMPERATURE.MAX.CRIT}AVERAGE ⚠SMART [{#NAME}]: Temperature
SMART [{#NAME}]: Average disk temperature is too high

-

avg(/SMART by Zabbix agent 2 active/smart.disk.temperature[{#NAME}],5m)>{$SMART.TEMPERATURE.MAX.WARN}WARNING 📢SMART [{#NAME}]: Temperature
SMART [{#NAME}]: Disk self-test is not passed

-

last(/SMART by Zabbix agent 2 active/smart.disk.test[{#NAME}])="false"HIGH ⛔SMART [{#NAME}]: Self-test passed