Перейти к основному содержимому

Ignite by JMX

Macros used

NameValue
{$IGNITE.CHECKPOINT.PUSED.MAX.HIGH}80
{$IGNITE.CHECKPOINT.PUSED.MAX.WARN}66
{$IGNITE.DATA.REGION.PUSED.MAX.HIGH}90
{$IGNITE.DATA.REGION.PUSED.MAX.WARN}80
{$IGNITE.JOBS.QUEUE.MAX.WARN}10
{$IGNITE.LLD.FILTER.CACHE.MATCHES}.*
{$IGNITE.LLD.FILTER.CACHE.NOT_MATCHES}CHANGE_IF_NEEDED
{$IGNITE.LLD.FILTER.DATA.REGION.MATCHES}.*
{$IGNITE.LLD.FILTER.DATA.REGION.NOT_MATCHES}^(sysMemPlc|TxLog)$
{$IGNITE.LLD.FILTER.THREAD.POOL.MATCHES}.*
{$IGNITE.LLD.FILTER.THREAD.POOL.NOT_MATCHES}^(GridCallbackExecutor|GridRebalanceStripedExecutor|GridDataStreamExecutor|StripedExecutor)$
{$IGNITE.PASSWORD}<secret>
{$IGNITE.PME.DURATION.MAX.HIGH}60000
{$IGNITE.PME.DURATION.MAX.WARN}10000
{$IGNITE.THREAD.QUEUE.MAX.WARN}1000
{$IGNITE.THREADS.COUNT.MAX.WARN}1000
{$IGNITE.USER}zabbix

Discovery rule №1

NameDescriptionTypeIntervalKey and additional info
Data region metrics

-

JMX10mjmx.discovery[beans,"org.apache:group=DataRegionMetrics,*"]

Item prototypes

NameDescriptionTypeIntervalKey and additional info
Data region {#JMXNAME}: Allocation, rateAllocation rate (pages per second) averaged across rateTimeInternal.JMX

-

jmx["{#JMXOBJ}",AllocationRate]
Data region {#JMXNAME}: Checkpoint buffer sizeTotal size in bytes for checkpoint buffer.JMX

-

jmx["{#JMXOBJ}",CheckpointBufferSize]
Data region {#JMXNAME}: Dirty pagesNumber of pages in memory not yet synchronized with persistent storage.JMX

-

jmx["{#JMXOBJ}",DirtyPages]
Data region {#JMXNAME}: Eviction, rateEviction rate (pages per second).JMX

-

jmx["{#JMXOBJ}",EvictionRate]
Data region {#JMXNAME}: Size, maxMaximum memory region size defined by its data region.JMX

-

jmx["{#JMXOBJ}",MaxSize]
Data region {#JMXNAME}: Offheap sizeOffheap size in bytes.JMX

-

jmx["{#JMXOBJ}",OffHeapSize]
Data region {#JMXNAME}: Offheap used sizeTotal used offheap size in bytes.JMX

-

jmx["{#JMXOBJ}",OffheapUsedSize]
Data region {#JMXNAME}: Pages fill factorThe percentage of the used space.JMX

-

jmx["{#JMXOBJ}",PagesFillFactor]
Data region {#JMXNAME}: Pages replace, rateRate at which pages in memory are replaced with pages from persistent storage (pages per second).JMX

-

jmx["{#JMXOBJ}",PagesReplaceRate]
Data region {#JMXNAME}: Allocated, bytesTotal size of memory allocated in bytes.JMX

-

jmx["{#JMXOBJ}",TotalAllocatedSize]
Data region {#JMXNAME}: Used checkpoint buffer sizeUsed checkpoint buffer size in bytes.JMX

-

jmx["{#JMXOBJ}",UsedCheckpointBufferSize]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
Data region {#JMXNAME}: Node started to evict pagesYou store more data than region can accommodate. Data started to move to disk it can make requests work slower. Ack to close.min(/Ignite by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0INFO 🔔Data region {#JMXNAME}: Eviction, rate
Data region {#JMXNAME}: Pages replace rate more than 0There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations.min(/Ignite by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0WARNING 📢Data region {#JMXNAME}: Pages replace, rate

Discovery rule №2

NameDescriptionTypeIntervalKey and additional info
Local node metrics

-

JMX30mjmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,*"]

Item prototypes

NameDescriptionTypeIntervalKey and additional info
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs active, currentNumber of currently active jobs concurrently executing on the node.JMX

-

jmx["{#JMXOBJ}",CurrentActiveJobs]
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, currentNumber of cancelled jobs that are still running.JMX

-

jmx["{#JMXOBJ}",CurrentCancelledJobs]
Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration, currentCurrent PME duration in milliseconds.JMX

-

jmx["{#JMXOBJ}",CurrentPmeDuration]
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, currentNumber of jobs rejected after more recent collision resolution operation.JMX

-

jmx["{#JMXOBJ}",CurrentRejectedJobs]
Ignite [{#JMXIGNITEINSTANCENAME}]: Threads count, currentCurrent number of live threads.JMX

-

jmx["{#JMXOBJ}",CurrentThreadCount]
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, currentNumber of queued jobs currently waiting to be executed.JMX

-

jmx["{#JMXOBJ}",CurrentWaitingJobs]
Ignite [{#JMXIGNITEINSTANCENAME}]: Heap memory usedCurrent heap size that is used for object allocation.JMX

-

jmx["{#JMXOBJ}",HeapMemoryUsed]
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rateTotal number of jobs cancelled by the node per second.JMX

-

jmx["{#JMXOBJ}",TotalCancelledJobs]
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rateTotal number of jobs handled by the node per second.JMX

-

jmx["{#JMXOBJ}",TotalExecutedJobs]
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rateTotal number of jobs this node rejects during collision resolution operations since node startup per second.JMX

-

jmx["{#JMXOBJ}",TotalRejectedJobs]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration is too longPME duration is over {$IGNITE.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung.min(/Ignite by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$IGNITE.PME.DURATION.MAX.HIGH}HIGH ⛔Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration, current
Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration is too longPME duration is over {$IGNITE.PME.DURATION.MAX.WARN}ms.min(/Ignite by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$IGNITE.PME.DURATION.MAX.WARN}WARNING 📢Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration, current
Ignite [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too highNumber of running threads is over {$IGNITE.THREADS.COUNT.MAX.WARN}.min(/Ignite by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$IGNITE.THREADS.COUNT.MAX.WARN}WARNING 📢Ignite [{#JMXIGNITEINSTANCENAME}]: Threads count, current
Ignite [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too highNumber of queued jobs is over {$IGNITE.JOBS.QUEUE.MAX.WARN}.min(/Ignite by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$IGNITE.JOBS.QUEUE.MAX.WARN}WARNING 📢Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current

Discovery rule №3

NameDescriptionTypeIntervalKey and additional info
Cluster metrics

-

JMX30mjmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,*"]

Item prototypes

NameDescriptionTypeIntervalKey and additional info
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baselineThe number of nodes that are currently active in the baseline topology.JMX

-

jmx["{#JMXOBJ}",ActiveBaselineNodes]
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, BaselineTotal baseline nodes that are registered in the baseline topology.JMX

-

jmx["{#JMXOBJ}",TotalBaselineNodes]
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, ClientThe number of client nodes in the cluster.JMX

-

jmx["{#JMXOBJ}",TotalClientNodes]
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, totalTotal number of nodes.JMX

-

jmx["{#JMXOBJ}",TotalNodes]
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, ServerThe number of server nodes in the cluster.JMX

-

jmx["{#JMXOBJ}",TotalServerNodes]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
Ignite [{#JMXIGNITEINSTANCENAME}]: Server node added to the topologyOne or more server node added to the topology. Ack to close.change(/Ignite by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0INFO 🔔Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Server
Ignite [{#JMXIGNITEINSTANCENAME}]: Server node left the topologyOne or more server node left the topology. Ack to close.change(/Ignite by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0WARNING 📢Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Server

Discovery rule №4

NameDescriptionTypeIntervalKey and additional info
Ignite kernal metrics

-

JMX30mjmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,*"]

Item prototypes

NameDescriptionTypeIntervalKey and additional info
Ignite [{#JMXIGNITEINSTANCENAME}]: VersionVersion of Ignite instance.JMX

-

jmx["{#JMXOBJ}",FullVersion]
Ignite [{#JMXIGNITEINSTANCENAME}]: Local node IDUnique identifier for this node within grid.JMX

-

jmx["{#JMXOBJ}",LocalNodeId]
Ignite [{#JMXIGNITEINSTANCENAME}]: UptimeUptime of Ignite instance.JMX

-

jmx["{#JMXOBJ}",UpTime]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
Ignite [{#JMXIGNITEINSTANCENAME}]: Version has changedIgnite [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close.last(/Ignite by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/Ignite by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/Ignite by JMX/jmx["{#JMXOBJ}",FullVersion]))>0INFO 🔔Ignite [{#JMXIGNITEINSTANCENAME}]: Version
Ignite [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info dataZabbix has not received data for items for the last 10 minutes.nodata(/Ignite by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1WARNING 📢Ignite [{#JMXIGNITEINSTANCENAME}]: Uptime
Ignite [{#JMXIGNITEINSTANCENAME}]: has been restartedUptime is less than 10 minutes.last(/Ignite by JMX/jmx["{#JMXOBJ}",UpTime])<10mINFO 🔔Ignite [{#JMXIGNITEINSTANCENAME}]: Uptime

Discovery rule №5

NameDescriptionTypeIntervalKey and additional info
TCP Communication SPI metrics

-

JMX30mjmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,*"]

Item prototypes

NameDescriptionTypeIntervalKey and additional info
Ignite [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queueOutbound messages queue size.JMX

-

jmx["{#JMXOBJ}",OutboundMessagesQueueSize]
Ignite [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rateThe number of messages received per second.JMX

-

jmx["{#JMXOBJ}",ReceivedMessagesCount]
Ignite [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rateThe number of messages sent per second.JMX

-

jmx["{#JMXOBJ}",SentMessagesCount]

Discovery rule №6

NameDescriptionTypeIntervalKey and additional info
TCP discovery SPI

-

JMX30mjmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,*"]

Item prototypes

NameDescriptionTypeIntervalKey and additional info
Ignite [{#JMXIGNITEINSTANCENAME}]: CoordinatorCurrent coordinator UUID.JMX

-

jmx["{#JMXOBJ}",Coordinator]
Ignite [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queueMessage worker queue current size.JMX

-

jmx["{#JMXOBJ}",MessageWorkerQueueSize]
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes failedNodes failed count.JMX

-

jmx["{#JMXOBJ}",NodesFailed]
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes joinedNodes join count.JMX

-

jmx["{#JMXOBJ}",NodesJoined]
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes leftNodes left count.JMX

-

jmx["{#JMXOBJ}",NodesLeft]
Ignite [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rateNumber of times node tries to (re)establish connection to another node per second.JMX

-

jmx["{#JMXOBJ}",ReconnectCount]
Ignite [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessagesThe number of messages received per second.JMX

-

jmx["{#JMXOBJ}",TotalProcessedMessages]
Ignite [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rateThe number of messages processed per second.JMX

-

jmx["{#JMXOBJ}",TotalReceivedMessages]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
Ignite [{#JMXIGNITEINSTANCENAME}]: Coordinator has changedIgnite [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close.last(/Ignite by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/Ignite by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/Ignite by JMX/jmx["{#JMXOBJ}",Coordinator]))>0WARNING 📢Ignite [{#JMXIGNITEINSTANCENAME}]: Coordinator

Discovery rule №7

NameDescriptionTypeIntervalKey and additional info
Transaction metrics

-

JMX30mjmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,*"]

Item prototypes

NameDescriptionTypeIntervalKey and additional info
Ignite [{#JMXIGNITEINSTANCENAME}]: Locked keysThe number of keys locked on the node.JMX

-

jmx["{#JMXOBJ}",LockedKeysNumber]
Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions owner, currentThe number of active transactions for which this node is the initiator.JMX

-

jmx["{#JMXOBJ}",OwnerTransactionsNumber]
Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rateThe number of transactions which were committed per second.JMX

-

jmx["{#JMXOBJ}",TransactionsCommittedNumber]
Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, currentThe number of active transactions holding at least one key lock.JMX

-

jmx["{#JMXOBJ}",TransactionsHoldingLockNumber]
Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rateThe number of transactions which were rollback per second.JMX

-

jmx["{#JMXOBJ}",TransactionsRolledBackNumber]

Discovery rule №8

NameDescriptionTypeIntervalKey and additional info
Cache groups

-

JMX10mjmx.discovery[beans,"org.apache:group="Cache groups",*"]

Item prototypes

NameDescriptionTypeIntervalKey and additional info
Cache group [{#JMXNAME}]: BackupsCount of backups configured for cache group.JMX

-

jmx["{#JMXOBJ}",Backups]
Cache group [{#JMXNAME}]: CachesList of caches.JMX

-

jmx["{#JMXOBJ}",Caches]
Cache group [{#JMXNAME}]: Local node partitions, movingCount of partitions with state MOVING for this cache group located on this node.JMX

-

jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount]
Cache group [{#JMXNAME}]: Local node partitions, owningCount of partitions with state OWNING for this cache group located on this node.JMX

-

jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount]
Cache group [{#JMXNAME}]: Local node entries, rentingCount of entries remains to evict in RENTING partitions located on this node for this cache group.JMX

-

jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount]
Cache group [{#JMXNAME}]: Local node partitions, rentingCount of partitions with state RENTING for this cache group located on this node.JMX

-

jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount]
Cache group [{#JMXNAME}]: Partition copies, maxMaximum number of partition copies for all partitions of this cache group.JMX

-

jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies]
Cache group [{#JMXNAME}]: Partition copies, minMinimum number of partition copies for all partitions of this cache group.JMX

-

jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies]
Cache group [{#JMXNAME}]: PartitionsCount of partitions for cache group.JMX

-

jmx["{#JMXOBJ}",Partitions]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
Cache group [{#JMXNAME}]: List of caches has changedList of caches has changed. Significant changes have occurred in the cluster. Ack to close.last(/Ignite by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/Ignite by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/Ignite by JMX/jmx["{#JMXOBJ}",Caches]))>0INFO 🔔Cache group [{#JMXNAME}]: Caches
Cache group [{#JMXNAME}]: Rebalance in progressAck to close.max(/Ignite by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0INFO 🔔Cache group [{#JMXNAME}]: Local node partitions, moving
Cache group [{#JMXNAME}]: There is no copy for partitions

-

max(/Ignite by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0WARNING 📢Cache group [{#JMXNAME}]: Partition copies, min

Discovery rule №9

NameDescriptionTypeIntervalKey and additional info
Thread pool metrics

-

JMX10mjmx.discovery[beans,"org.apache:group="Thread Pools",*"]

Item prototypes

NameDescriptionTypeIntervalKey and additional info
Thread pool [{#JMXNAME}]: Pool size, coreThe core number of threads.JMX

-

jmx["{#JMXOBJ}",CorePoolSize]
Thread pool [{#JMXNAME}]: Pool size, maxThe maximum allowed number of threads.JMX

-

jmx["{#JMXOBJ}",MaximumPoolSize]
Thread pool [{#JMXNAME}]: Pool sizeCurrent number of threads in the pool.JMX

-

jmx["{#JMXOBJ}",PoolSize]
Thread pool [{#JMXNAME}]: Queue sizeCurrent size of the execution queue.JMX

-

jmx["{#JMXOBJ}",QueueSize]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
Thread pool [{#JMXNAME}]: Too many messages in queueNumber of messages in queue more than {$IGNITE.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}.min(/Ignite by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$IGNITE.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}AVERAGE ⚠Thread pool [{#JMXNAME}]: Queue size

Discovery rule №10

NameDescriptionTypeIntervalKey and additional info
Cache metrics

-

JMX10mjmx.discovery[beans,"org.apache:name="org.apache.ignite.internal.processors.cache.CacheLocalMetricsMXBeanImpl",*"]

Item prototypes

NameDescriptionTypeIntervalKey and additional info
Cache group [{#JMXGROUP}]: Cache gets, rateThe number of gets to the cache per second.JMX

-

jmx["{#JMXOBJ}",CacheGets]
Cache group [{#JMXGROUP}]: Cache hits, pctPercentage of successful hits.JMX

-

jmx["{#JMXOBJ}",CacheHitPercentage]
Cache group [{#JMXGROUP}]: Cache misses, pctPercentage of accesses that failed to find anything.JMX

-

jmx["{#JMXOBJ}",CacheMissPercentage]
Cache group [{#JMXGROUP}]: Cache puts, rateThe number of puts to the cache per second.JMX

-

jmx["{#JMXOBJ}",CachePuts]
Cache group [{#JMXGROUP}]: Cache removals, rateThe number of removals from the cache per second.JMX

-

jmx["{#JMXOBJ}",CacheRemovals]
Cache group [{#JMXGROUP}]: Cache sizeThe number of non-null values in the cache as a long value.JMX

-

jmx["{#JMXOBJ}",CacheSize]
Cache group [{#JMXGROUP}]: Cache transaction commits, rateThe number of transaction commits per second.JMX

-

jmx["{#JMXOBJ}",CacheTxCommits]
Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rateThe number of transaction rollback per second.JMX

-

jmx["{#JMXOBJ}",CacheTxRollbacks]
Cache group [{#JMXGROUP}]: Cache heap entriesThe number of entries in heap memory.JMX

-

jmx["{#JMXOBJ}",HeapEntriesCount]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
Cache group [{#JMXGROUP}]: All entries are in heapAll entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Ack to close.last(/Ignite by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/Ignite by JMX/jmx["{#JMXOBJ}",HeapEntriesCount])INFO 🔔
Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m

-

min(/Ignite by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/Ignite by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)WARNING 📢
Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m

-

min(/Ignite by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/Ignite by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0AVERAGE ⚠