Перейти к основному содержимому

Linux by Pult agent

Macros used

NameValue
{$AGENT.TIMEOUT}3m
{$CPU.UTIL.CRIT}90
{$IF.ERRORS.WARN}2
{$IF.UTIL.MAX}90
{$IFCONTROL}1
{$KERNEL.MAXFILES.MIN}256
{$KERNEL.MAXPROC.MIN}1024
{$LOAD_AVG_PER_CPU.MAX.WARN}1.5
{$MEMORY.AVAILABLE.MIN}20M
{$MEMORY.UTIL.MAX}90
{$NET.IF.IFNAME.MATCHES}^.*$
{$NET.IF.IFNAME.NOT_MATCHES}(^Software Loopback Interface|^NULL[0-9.]$|^[Ll]o[0-9.]$|^[Ss]ystem$|^Nu[0-9.]*$|^veth[0-9A-z]+$|docker[0-9]+|br-[a-z0-9]{12})
{$SWAP.PFREE.MIN.WARN}50
{$SYSTEM.FUZZYTIME.MAX}60
{$VFS.DEV.DEVNAME.MATCHES}.+
{$VFS.DEV.DEVNAME.NOT_MATCHES}^(loop[0-9]|sd[a-z][0-9]+|nbd[0-9]+|sr[0-9]+|fd[0-9]+|dm-[0-9]+|ram[0-9]+|ploop[a-z0-9]+|md[0-9]|hcp[0-9]|zram[0-9])
{$VFS.DEV.READ.AWAIT.WARN}20
{$VFS.DEV.WRITE.AWAIT.WARN}20
{$VFS.FS.FREE.MIN.CRIT}5G
{$VFS.FS.FREE.MIN.WARN}10G
{$VFS.FS.FSNAME.MATCHES}.+
{$VFS.FS.FSNAME.NOT_MATCHES}^(/dev|/sys|/run|/proc|.+/shm$)
{$VFS.FS.FSTYPE.MATCHES}^(btrfs|ext2|ext3|ext4|reiser|xfs|ffs|ufs|jfs|jfs2|vxfs|hfs|apfs|refs|ntfs|fat32|zfs)$
{$VFS.FS.FSTYPE.NOT_MATCHES}^\s$
{$VFS.FS.INODE.PFREE.MIN.CRIT}10
{$VFS.FS.INODE.PFREE.MIN.WARN}20
{$VFS.FS.PUSED.MAX.CRIT}90
{$VFS.FS.PUSED.MAX.WARN}80

Items collected

NameDescriptionTypeIntervalKey and additional info
Host name of Zabbix agent running

-

-

1hagent.hostname
Zabbix agent pingThe agent always returns 1 for this item. It could be used in combination with nodata() for availability check.

-

-

agent.ping
Version of Zabbix agent running

-

-

1hagent.version
Maximum number of open file descriptorsIt could be increased by using sysctl utility or modifying file /etc/sysctl.conf.

-

1hkernel.maxfiles
Maximum number of processesIt could be increased by using sysctl utility or modifying file /etc/sysctl.conf.

-

1hkernel.maxproc
Number of processes

-

-

-

proc.num
Number of running processes

-

-

-

proc.num[,,run]
System boot time

-

-

15msystem.boottime
Interrupts per second

-

-

-

system.cpu.intr
Load average (1m avg)

-

-

-

system.cpu.load[all,avg1]
Load average (5m avg)

-

-

-

system.cpu.load[all,avg5]
Load average (15m avg)

-

-

-

system.cpu.load[all,avg15]
Number of CPUs

-

-

-

system.cpu.num
Context switches per second

-

-

-

system.cpu.switches
CPU utilizationCPU utilization in %.DEPENDENT

-

system.cpu.util
CPU guest timeGuest time (time spent running a virtual CPU for a guest operating system).

-

-

system.cpu.util[,guest]
CPU guest nice timeTime spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel).

-

-

system.cpu.util[,guest_nice]
CPU idle timeThe time the CPU has spent doing nothing.

-

-

system.cpu.util[,idle]
CPU interrupt timeThe amount of time the CPU has been servicing hardware interrupts.

-

-

system.cpu.util[,interrupt]
CPU iowait timeAmount of time the CPU has been waiting for I/O to complete.

-

-

system.cpu.util[,iowait]
CPU nice timeThe time the CPU has spent running users' processes that have been niced.

-

-

system.cpu.util[,nice]
CPU softirq timeThe amount of time the CPU has been servicing software interrupts.

-

-

system.cpu.util[,softirq]
CPU steal timeThe amount of CPU 'stolen' from this virtual machine by the hypervisor for other tasks (such as running another virtual machine).

-

-

system.cpu.util[,steal]
CPU system timeThe time the CPU has spent running the kernel and its processes.

-

-

system.cpu.util[,system]
CPU user timeThe time the CPU has spent running users' processes that are not niced.

-

-

system.cpu.util[,user]
System nameSystem host name.

-

1hsystem.hostname
System local timeSystem local time of the host.

-

-

system.localtime
Operating system architectureOperating system architecture of the host.

-

1hsystem.sw.arch
Operating system

-

-

1hsystem.sw.os
Software installed

-

-

1hsystem.sw.packages
Free swap spaceThe free space of swap volume/file in bytes.

-

-

system.swap.size[,free]
Free swap space in %The free space of swap volume/file in percent.

-

-

system.swap.size[,pfree]
Total swap spaceThe total space of swap volume/file in bytes.

-

-

system.swap.size[,total]
System descriptionThe information as normally returned by 'uname -a'.

-

15msystem.uname
System uptimeSystem uptime in 'N days, hh:mm:ss' format.

-

30ssystem.uptime
Number of logged in usersNumber of users who are currently logged in.

-

-

system.users.num
Checksum of /etc/passwd

-

-

15mvfs.file.cksum[/etc/passwd,sha256]
Available memoryAvailable memory, in Linux, available = free + buffers + cache. On other platforms calculation may vary. See also Appendixes in Zabbix Documentation about parameters of the vm.memory.size item.

-

-

vm.memory.size[available]
Available memory in %Available memory as percentage of total. See also Appendixes in Zabbix Documentation about parameters of the vm.memory.size item.

-

-

vm.memory.size[pavailable]
Total memoryTotal memory in Bytes.

-

-

vm.memory.size[total]
Memory utilizationMemory used percentage is calculated as (100-pavailable)DEPENDENT

-

vm.memory.utilization
Zabbix agent availabilityMonitoring agent availability statusINTERNAL

-

zabbix[host,agent,available]

Triggers

NameDescriptionExpressionPriorityDependencies
Configured max number of open filedescriptors is too low

-

last(/Linux by Pult agent/kernel.maxfiles)<{$KERNEL.MAXFILES.MIN}INFO 🔔Maximum number of open file descriptors
Configured max number of processes is too low

-

last(/Linux by Pult agent/kernel.maxproc)<{$KERNEL.MAXPROC.MIN}INFO 🔔Maximum number of processes
High CPU utilizationCPU utilization is too high. The system might be slow to respond.min(/Linux by Pult agent/system.cpu.util,5m)>{$CPU.UTIL.CRIT}WARNING 📢CPU utilization
System name has changedSystem name has changed. Ack to close.last(/Linux by Pult agent/system.hostname,#1)<>last(/Linux by Pult agent/system.hostname,#2) and length(last(/Linux by Pult agent/system.hostname))>0INFO 🔔System name
System time is out of syncThe host system time is different from the Monitoring server time.fuzzytime(/Linux by Pult agent/system.localtime,{$SYSTEM.FUZZYTIME.MAX})=0WARNING 📢System local time
Operating system description has changedOperating system description has changed. Possible reasons that system has been updated or replaced. Ack to close.last(/Linux by Pult agent/system.sw.os,#1)<>last(/Linux by Pult agent/system.sw.os,#2) and length(last(/Linux by Pult agent/system.sw.os))>0INFO 🔔Operating system
has been restartedThe host uptime is less than 10 minuteslast(/Linux by Pult agent/system.uptime)<10mWARNING 📢System uptime
/etc/passwd has been changed

-

last(/Linux by Pult agent/vfs.file.cksum[/etc/passwd,sha256],#1)<>last(/Linux by Pult agent/vfs.file.cksum[/etc/passwd,sha256],#2)INFO 🔔Checksum of /etc/passwd
High memory utilizationThe system is running out of free memory.min(/Linux by Pult agent/vm.memory.utilization,5m)>{$MEMORY.UTIL.MAX}AVERAGE ⚠Memory utilization
Pult agent is not availableFor passive only agents, host availability is used with {$AGENT.TIMEOUT} as time threshold.max(/Linux by Pult agent/zabbix[host,agent,available],{$AGENT.TIMEOUT})=0AVERAGE ⚠Zabbix agent availability

Discovery rule №1

NameDescriptionTypeIntervalKey and additional info
Network interface discoveryDiscovery of network interfaces.

-

1hnet.if.discovery

Item prototypes

NameDescriptionTypeIntervalKey and additional info
Interface {#IFNAME}: Inbound packets discarded

-

-

3mnet.if.in["{#IFNAME}",dropped]
Interface {#IFNAME}: Inbound packets with errors

-

-

3mnet.if.in["{#IFNAME}",errors]
Interface {#IFNAME}: Bits received

-

-

3mnet.if.in["{#IFNAME}"]
Interface {#IFNAME}: Outbound packets discarded

-

-

3mnet.if.out["{#IFNAME}",dropped]
Interface {#IFNAME}: Outbound packets with errors

-

-

3mnet.if.out["{#IFNAME}",errors]
Interface {#IFNAME}: Bits sent

-

-

3mnet.if.out["{#IFNAME}"]
Interface {#IFNAME}: Operational statusReference: https://www.kernel.org/doc/Documentation/networking/operstates.txt

-

-

vfs.file.contents["/sys/class/net/{#IFNAME}/operstate"]
Interface {#IFNAME}: SpeedIndicates the interface latest or current speed value. Value is an integer representing the link speed in bits/sec. This attribute is only valid for interfaces that implement the ethtool get_link_ksettings method (mostly Ethernet). Reference: https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-class-net

-

5mvfs.file.contents["/sys/class/net/{#IFNAME}/speed"]
Interface {#IFNAME}: Interface typeIndicates the interface protocol type as a decimal value. See include/uapi/linux/if_arp.h for all possible values. Reference: https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-class-net

-

1hvfs.file.contents["/sys/class/net/{#IFNAME}/type"]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
Interface {#IFNAME}: Link downThis trigger expression works as follows: 1. Can be triggered if operations status is down. 2. {$IFCONTROL:"{#IFNAME}"}=1 - user can redefine Context macro to value - 0. That marks this interface as not important. No new trigger will be fired if this interface is down. 3. {TEMPLATE_NAME:METRIC.diff()}=1) - trigger fires only if operational status was up(1) sometime before. (So, do not fire 'ethernal off' interfaces.) WARNING: if closed manually - won't fire again on next poll, because of .diff.{$IFCONTROL:"{#IFNAME}"}=1 and last(/Linux by Pult agent/vfs.file.contents["/sys/class/net/{#IFNAME}/operstate"])=2 and (last(/Linux by Pult agent/vfs.file.contents["/sys/class/net/{#IFNAME}/operstate"],#1)<>last(/Linux by Pult agent/vfs.file.contents["/sys/class/net/{#IFNAME}/operstate"],#2))AVERAGE ⚠Interface {#IFNAME}: Operational status

Discovery rule №2

NameDescriptionTypeIntervalKey and additional info
Block devices discovery

-

-

1hvfs.dev.discovery

Item prototypes

NameDescriptionTypeIntervalKey and additional info
{#DEVNAME}: Disk average queue size (avgqu-sz)Current average disk queue, the number of requests outstanding on the disk at the time the performance data is collected.DEPENDENT

-

vfs.dev.queue_size[{#DEVNAME}]
{#DEVNAME}: Disk read request avg waiting time (r_await)This formula contains two boolean expressions that evaluates to 1 or 0 in order to set calculated metric to zero and to avoid division by zero exception.CALCULATED

-

vfs.dev.read.await[{#DEVNAME}]
{#DEVNAME}: Disk read rater/s. The number (after merges) of read requests completed per second for the device.DEPENDENT

-

vfs.dev.read.rate[{#DEVNAME}]
{#DEVNAME}: Disk read time (rate)Rate of total read time counter. Used in r_await calculationDEPENDENT

-

vfs.dev.read.time.rate[{#DEVNAME}]
{#DEVNAME}: Disk utilizationThis item is the percentage of elapsed time that the selected disk drive was busy servicing read or writes requests.DEPENDENT

-

vfs.dev.util[{#DEVNAME}]
{#DEVNAME}: Disk write request avg waiting time (w_await)This formula contains two boolean expressions that evaluates to 1 or 0 in order to set calculated metric to zero and to avoid division by zero exception.CALCULATED

-

vfs.dev.write.await[{#DEVNAME}]
{#DEVNAME}: Disk write ratew/s. The number (after merges) of write requests completed per second for the device.DEPENDENT

-

vfs.dev.write.rate[{#DEVNAME}]
{#DEVNAME}: Disk write time (rate)Rate of total write time counter. Used in w_await calculationDEPENDENT

-

vfs.dev.write.time.rate[{#DEVNAME}]
{#DEVNAME}: Get statsGet contents of /sys/block/{#DEVNAME}/stat for disk stats.

-

-

vfs.file.contents[/sys/block/{#DEVNAME}/stat]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
{#DEVNAME}: Disk read/write request responses are too highThis trigger might indicate disk {#DEVNAME} saturation.min(/Linux by Pult agent/vfs.dev.read.await[{#DEVNAME}],15m) > {$VFS.DEV.READ.AWAIT.WARN:"{#DEVNAME}"} or min(/Linux by Pult agent/vfs.dev.write.await[{#DEVNAME}],15m) > {$VFS.DEV.WRITE.AWAIT.WARN:"{#DEVNAME}"}WARNING 📢

Discovery rule №3

NameDescriptionTypeIntervalKey and additional info
Mounted filesystem discoveryDiscovery of file systems of different types.

-

1hvfs.fs.discovery

Item prototypes

NameDescriptionTypeIntervalKey and additional info
{#FSNAME}: Free inodes in %

-

-

-

vfs.fs.inode[{#FSNAME},pfree]
{#FSNAME}: Space utilizationSpace utilization in % for {#FSNAME}

-

-

vfs.fs.size[{#FSNAME},pused]
{#FSNAME}: Total spaceTotal space in Bytes

-

-

vfs.fs.size[{#FSNAME},total]
{#FSNAME}: Used spaceUsed storage in Bytes

-

-

vfs.fs.size[{#FSNAME},used]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
{#FSNAME}: Running out of free inodesIt may become impossible to write to disk if there are no index nodes left. As symptoms, 'No space left on device' or 'Disk is full' errors may be seen even though free space is available.min(/Linux by Pult agent/vfs.fs.inode[{#FSNAME},pfree],5m)<{$VFS.FS.INODE.PFREE.MIN.CRIT:"{#FSNAME}"}AVERAGE ⚠{#FSNAME}: Free inodes in %
{#FSNAME}: Running out of free inodesIt may become impossible to write to disk if there are no index nodes left. As symptoms, 'No space left on device' or 'Disk is full' errors may be seen even though free space is available.min(/Linux by Pult agent/vfs.fs.inode[{#FSNAME},pfree],5m)<{$VFS.FS.INODE.PFREE.MIN.WARN:"{#FSNAME}"}WARNING 📢{#FSNAME}: Free inodes in %