Linux by Pult agent
Macros used
| Name | Value |
|---|---|
| {$AGENT.TIMEOUT} | 3m |
| {$CPU.UTIL.CRIT} | 90 |
| {$IF.ERRORS.WARN} | 2 |
| {$IF.UTIL.MAX} | 90 |
| {$IFCONTROL} | 1 |
| {$KERNEL.MAXFILES.MIN} | 256 |
| {$KERNEL.MAXPROC.MIN} | 1024 |
| {$LOAD_AVG_PER_CPU.MAX.WARN} | 1.5 |
| {$MEMORY.AVAILABLE.MIN} | 20M |
| {$MEMORY.UTIL.MAX} | 90 |
| {$NET.IF.IFNAME.MATCHES} | ^.*$ |
| {$NET.IF.IFNAME.NOT_MATCHES} | (^Software Loopback Interface|^NULL[0-9.]$|^[Ll]o[0-9.]$|^[Ss]ystem$|^Nu[0-9.]*$|^veth[0-9A-z]+$|docker[0-9]+|br-[a-z0-9]{12}) |
| {$SWAP.PFREE.MIN.WARN} | 50 |
| {$SYSTEM.FUZZYTIME.MAX} | 60 |
| {$VFS.DEV.DEVNAME.MATCHES} | .+ |
| {$VFS.DEV.DEVNAME.NOT_MATCHES} | ^(loop[0-9]|sd[a-z][0-9]+|nbd[0-9]+|sr[0-9]+|fd[0-9]+|dm-[0-9]+|ram[0-9]+|ploop[a-z0-9]+|md[0-9]|hcp[0-9]|zram[0-9]) |
| {$VFS.DEV.READ.AWAIT.WARN} | 20 |
| {$VFS.DEV.WRITE.AWAIT.WARN} | 20 |
| {$VFS.FS.FREE.MIN.CRIT} | 5G |
| {$VFS.FS.FREE.MIN.WARN} | 10G |
| {$VFS.FS.FSNAME.MATCHES} | .+ |
| {$VFS.FS.FSNAME.NOT_MATCHES} | ^(/dev|/sys|/run|/proc|.+/shm$) |
| {$VFS.FS.FSTYPE.MATCHES} | ^(btrfs|ext2|ext3|ext4|reiser|xfs|ffs|ufs|jfs|jfs2|vxfs|hfs|apfs|refs|ntfs|fat32|zfs)$ |
| {$VFS.FS.FSTYPE.NOT_MATCHES} | ^\s$ |
| {$VFS.FS.INODE.PFREE.MIN.CRIT} | 10 |
| {$VFS.FS.INODE.PFREE.MIN.WARN} | 20 |
| {$VFS.FS.PUSED.MAX.CRIT} | 90 |
| {$VFS.FS.PUSED.MAX.WARN} | 80 |
Items collected
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Host name of Zabbix agent running | - | - | 1h | agent.hostname |
| Zabbix agent ping | The agent always returns 1 for this item. It could be used in combination with nodata() for availability check. | - | - | agent.ping |
| Version of Zabbix agent running | - | - | 1h | agent.version |
| Maximum number of open file descriptors | It could be increased by using sysctl utility or modifying file /etc/sysctl.conf. | - | 1h | kernel.maxfiles |
| Maximum number of processes | It could be increased by using sysctl utility or modifying file /etc/sysctl.conf. | - | 1h | kernel.maxproc |
| Number of processes | - | - | - | proc.num |
| Number of running processes | - | - | - | proc.num[,,run] |
| System boot time | - | - | 15m | system.boottime |
| Interrupts per second | - | - | - | system.cpu.intr |
| Load average (1m avg) | - | - | - | system.cpu.load[all,avg1] |
| Load average (5m avg) | - | - | - | system.cpu.load[all,avg5] |
| Load average (15m avg) | - | - | - | system.cpu.load[all,avg15] |
| Number of CPUs | - | - | - | system.cpu.num |
| Context switches per second | - | - | - | system.cpu.switches |
| CPU utilization | CPU utilization in %. | DEPENDENT | - | system.cpu.util |
| CPU guest time | Guest time (time spent running a virtual CPU for a guest operating system). | - | - | system.cpu.util[,guest] |
| CPU guest nice time | Time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel). | - | - | system.cpu.util[,guest_nice] |
| CPU idle time | The time the CPU has spent doing nothing. | - | - | system.cpu.util[,idle] |
| CPU interrupt time | The amount of time the CPU has been servicing hardware interrupts. | - | - | system.cpu.util[,interrupt] |
| CPU iowait time | Amount of time the CPU has been waiting for I/O to complete. | - | - | system.cpu.util[,iowait] |
| CPU nice time | The time the CPU has spent running users' processes that have been niced. | - | - | system.cpu.util[,nice] |
| CPU softirq time | The amount of time the CPU has been servicing software interrupts. | - | - | system.cpu.util[,softirq] |
| CPU steal time | The amount of CPU 'stolen' from this virtual machine by the hypervisor for other tasks (such as running another virtual machine). | - | - | system.cpu.util[,steal] |
| CPU system time | The time the CPU has spent running the kernel and its processes. | - | - | system.cpu.util[,system] |
| CPU user time | The time the CPU has spent running users' processes that are not niced. | - | - | system.cpu.util[,user] |
| System name | System host name. | - | 1h | system.hostname |
| System local time | System local time of the host. | - | - | system.localtime |
| Operating system architecture | Operating system architecture of the host. | - | 1h | system.sw.arch |
| Operating system | - | - | 1h | system.sw.os |
| Software installed | - | - | 1h | system.sw.packages |
| Free swap space | The free space of swap volume/file in bytes. | - | - | system.swap.size[,free] |
| Free swap space in % | The free space of swap volume/file in percent. | - | - | system.swap.size[,pfree] |
| Total swap space | The total space of swap volume/file in bytes. | - | - | system.swap.size[,total] |
| System description | The information as normally returned by 'uname -a'. | - | 15m | system.uname |
| System uptime | System uptime in 'N days, hh:mm:ss' format. | - | 30s | system.uptime |
| Number of logged in users | Number of users who are currently logged in. | - | - | system.users.num |
| Checksum of /etc/passwd | - | - | 15m | vfs.file.cksum[/etc/passwd,sha256] |
| Available memory | Available memory, in Linux, available = free + buffers + cache. On other platforms calculation may vary. See also Appendixes in Zabbix Documentation about parameters of the vm.memory.size item. | - | - | vm.memory.size[available] |
| Available memory in % | Available memory as percentage of total. See also Appendixes in Zabbix Documentation about parameters of the vm.memory.size item. | - | - | vm.memory.size[pavailable] |
| Total memory | Total memory in Bytes. | - | - | vm.memory.size[total] |
| Memory utilization | Memory used percentage is calculated as (100-pavailable) | DEPENDENT | - | vm.memory.utilization |
| Zabbix agent availability | Monitoring agent availability status | INTERNAL | - | zabbix[host,agent,available] |
Triggers
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| Configured max number of open filedescriptors is too low | - | last(/Linux by Pult agent/kernel.maxfiles)<{$KERNEL.MAXFILES.MIN} | INFO 🔔 | Maximum number of open file descriptors |
| Configured max number of processes is too low | - | last(/Linux by Pult agent/kernel.maxproc)<{$KERNEL.MAXPROC.MIN} | INFO 🔔 | Maximum number of processes |
| High CPU utilization | CPU utilization is too high. The system might be slow to respond. | min(/Linux by Pult agent/system.cpu.util,5m)>{$CPU.UTIL.CRIT} | WARNING 📢 | CPU utilization |
| System name has changed | System name has changed. Ack to close. | last(/Linux by Pult agent/system.hostname,#1)<>last(/Linux by Pult agent/system.hostname,#2) and length(last(/Linux by Pult agent/system.hostname))>0 | INFO 🔔 | System name |
| System time is out of sync | The host system time is different from the Monitoring server time. | fuzzytime(/Linux by Pult agent/system.localtime,{$SYSTEM.FUZZYTIME.MAX})=0 | WARNING 📢 | System local time |
| Operating system description has changed | Operating system description has changed. Possible reasons that system has been updated or replaced. Ack to close. | last(/Linux by Pult agent/system.sw.os,#1)<>last(/Linux by Pult agent/system.sw.os,#2) and length(last(/Linux by Pult agent/system.sw.os))>0 | INFO 🔔 | Operating system |
| has been restarted | The host uptime is less than 10 minutes | last(/Linux by Pult agent/system.uptime)<10m | WARNING 📢 | System uptime |
| /etc/passwd has been changed | - | last(/Linux by Pult agent/vfs.file.cksum[/etc/passwd,sha256],#1)<>last(/Linux by Pult agent/vfs.file.cksum[/etc/passwd,sha256],#2) | INFO 🔔 | Checksum of /etc/passwd |
| High memory utilization | The system is running out of free memory. | min(/Linux by Pult agent/vm.memory.utilization,5m)>{$MEMORY.UTIL.MAX} | AVERAGE ⚠ | Memory utilization |
| Pult agent is not available | For passive only agents, host availability is used with {$AGENT.TIMEOUT} as time threshold. | max(/Linux by Pult agent/zabbix[host,agent,available],{$AGENT.TIMEOUT})=0 | AVERAGE ⚠ | Zabbix agent availability |
Discovery rule №1
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Network interface discovery | Discovery of network interfaces. | - | 1h | net.if.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Interface {#IFNAME}: Inbound packets discarded | - | - | 3m | net.if.in["{#IFNAME}",dropped] |
| Interface {#IFNAME}: Inbound packets with errors | - | - | 3m | net.if.in["{#IFNAME}",errors] |
| Interface {#IFNAME}: Bits received | - | - | 3m | net.if.in["{#IFNAME}"] |
| Interface {#IFNAME}: Outbound packets discarded | - | - | 3m | net.if.out["{#IFNAME}",dropped] |
| Interface {#IFNAME}: Outbound packets with errors | - | - | 3m | net.if.out["{#IFNAME}",errors] |
| Interface {#IFNAME}: Bits sent | - | - | 3m | net.if.out["{#IFNAME}"] |
| Interface {#IFNAME}: Operational status | Reference: https://www.kernel.org/doc/Documentation/networking/operstates.txt | - | - | vfs.file.contents["/sys/class/net/{#IFNAME}/operstate"] |
| Interface {#IFNAME}: Speed | Indicates the interface latest or current speed value. Value is an integer representing the link speed in bits/sec. This attribute is only valid for interfaces that implement the ethtool get_link_ksettings method (mostly Ethernet). Reference: https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-class-net | - | 5m | vfs.file.contents["/sys/class/net/{#IFNAME}/speed"] |
| Interface {#IFNAME}: Interface type | Indicates the interface protocol type as a decimal value. See include/uapi/linux/if_arp.h for all possible values. Reference: https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-class-net | - | 1h | vfs.file.contents["/sys/class/net/{#IFNAME}/type"] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| Interface {#IFNAME}: Link down | This trigger expression works as follows: 1. Can be triggered if operations status is down. 2. {$IFCONTROL:"{#IFNAME}"}=1 - user can redefine Context macro to value - 0. That marks this interface as not important. No new trigger will be fired if this interface is down. 3. {TEMPLATE_NAME:METRIC.diff()}=1) - trigger fires only if operational status was up(1) sometime before. (So, do not fire 'ethernal off' interfaces.) WARNING: if closed manually - won't fire again on next poll, because of .diff. | {$IFCONTROL:"{#IFNAME}"}=1 and last(/Linux by Pult agent/vfs.file.contents["/sys/class/net/{#IFNAME}/operstate"])=2 and (last(/Linux by Pult agent/vfs.file.contents["/sys/class/net/{#IFNAME}/operstate"],#1)<>last(/Linux by Pult agent/vfs.file.contents["/sys/class/net/{#IFNAME}/operstate"],#2)) | AVERAGE ⚠ | Interface {#IFNAME}: Operational status |
Discovery rule №2
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Block devices discovery | - | - | 1h | vfs.dev.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| {#DEVNAME}: Disk average queue size (avgqu-sz) | Current average disk queue, the number of requests outstanding on the disk at the time the performance data is collected. | DEPENDENT | - | vfs.dev.queue_size[{#DEVNAME}] |
| {#DEVNAME}: Disk read request avg waiting time (r_await) | This formula contains two boolean expressions that evaluates to 1 or 0 in order to set calculated metric to zero and to avoid division by zero exception. | CALCULATED | - | vfs.dev.read.await[{#DEVNAME}] |
| {#DEVNAME}: Disk read rate | r/s. The number (after merges) of read requests completed per second for the device. | DEPENDENT | - | vfs.dev.read.rate[{#DEVNAME}] |
| {#DEVNAME}: Disk read time (rate) | Rate of total read time counter. Used in r_await calculation | DEPENDENT | - | vfs.dev.read.time.rate[{#DEVNAME}] |
| {#DEVNAME}: Disk utilization | This item is the percentage of elapsed time that the selected disk drive was busy servicing read or writes requests. | DEPENDENT | - | vfs.dev.util[{#DEVNAME}] |
| {#DEVNAME}: Disk write request avg waiting time (w_await) | This formula contains two boolean expressions that evaluates to 1 or 0 in order to set calculated metric to zero and to avoid division by zero exception. | CALCULATED | - | vfs.dev.write.await[{#DEVNAME}] |
| {#DEVNAME}: Disk write rate | w/s. The number (after merges) of write requests completed per second for the device. | DEPENDENT | - | vfs.dev.write.rate[{#DEVNAME}] |
| {#DEVNAME}: Disk write time (rate) | Rate of total write time counter. Used in w_await calculation | DEPENDENT | - | vfs.dev.write.time.rate[{#DEVNAME}] |
| {#DEVNAME}: Get stats | Get contents of /sys/block/{#DEVNAME}/stat for disk stats. | - | - | vfs.file.contents[/sys/block/{#DEVNAME}/stat] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| {#DEVNAME}: Disk read/write request responses are too high | This trigger might indicate disk {#DEVNAME} saturation. | min(/Linux by Pult agent/vfs.dev.read.await[{#DEVNAME}],15m) > {$VFS.DEV.READ.AWAIT.WARN:"{#DEVNAME}"} or min(/Linux by Pult agent/vfs.dev.write.await[{#DEVNAME}],15m) > {$VFS.DEV.WRITE.AWAIT.WARN:"{#DEVNAME}"} | WARNING 📢 |
Discovery rule №3
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Mounted filesystem discovery | Discovery of file systems of different types. | - | 1h | vfs.fs.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| {#FSNAME}: Free inodes in % | - | - | - | vfs.fs.inode[{#FSNAME},pfree] |
| {#FSNAME}: Space utilization | Space utilization in % for {#FSNAME} | - | - | vfs.fs.size[{#FSNAME},pused] |
| {#FSNAME}: Total space | Total space in Bytes | - | - | vfs.fs.size[{#FSNAME},total] |
| {#FSNAME}: Used space | Used storage in Bytes | - | - | vfs.fs.size[{#FSNAME},used] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| {#FSNAME}: Running out of free inodes | It may become impossible to write to disk if there are no index nodes left. As symptoms, 'No space left on device' or 'Disk is full' errors may be seen even though free space is available. | min(/Linux by Pult agent/vfs.fs.inode[{#FSNAME},pfree],5m)<{$VFS.FS.INODE.PFREE.MIN.CRIT:"{#FSNAME}"} | AVERAGE ⚠ | {#FSNAME}: Free inodes in % |
| {#FSNAME}: Running out of free inodes | It may become impossible to write to disk if there are no index nodes left. As symptoms, 'No space left on device' or 'Disk is full' errors may be seen even though free space is available. | min(/Linux by Pult agent/vfs.fs.inode[{#FSNAME},pfree],5m)<{$VFS.FS.INODE.PFREE.MIN.WARN:"{#FSNAME}"} | WARNING 📢 | {#FSNAME}: Free inodes in % |