Перейти к основному содержимому

ClickHouse by HTTP

Macros used

NameValue
{$CLICKHOUSE.DELAYED.FILES.DISTRIBUTED.COUNT.MAX.WARN}600
{$CLICKHOUSE.DELAYED.INSERTS.MAX.WARN}0
{$CLICKHOUSE.LLD.FILTER.DB.MATCHES}.*
{$CLICKHOUSE.LLD.FILTER.DB.NOT_MATCHES}CHANGE_IF_NEEDED
{$CLICKHOUSE.LLD.FILTER.DICT.MATCHES}.*
{$CLICKHOUSE.LLD.FILTER.DICT.NOT_MATCHES}CHANGE_IF_NEEDED
{$CLICKHOUSE.LOG_POSITION.DIFF.MAX.WARN}30
{$CLICKHOUSE.NETWORK.ERRORS.MAX.WARN}5
{$CLICKHOUSE.PARTS.PER.PARTITION.WARN}300
{$CLICKHOUSE.PASSWORD}zabbix_pass
{$CLICKHOUSE.PORT}8123
{$CLICKHOUSE.QUERY_TIME.MAX.WARN}600
{$CLICKHOUSE.QUEUE.SIZE.MAX.WARN}20
{$CLICKHOUSE.REPLICA.MAX.WARN}600
{$CLICKHOUSE.SCHEME}http
{$CLICKHOUSE.USER}zabbix

Items collected

NameDescriptionTypeIntervalKey and additional info
ClickHouse: Current distribute connectionsNumber of connections to remote servers sending data that was INSERTed into Distributed tables.DEPENDENT

-

clickhouse.connections.distribute
ClickHouse: Current HTTP connectionsNumber of connections to HTTP server.DEPENDENT

-

clickhouse.connections.http
ClickHouse: Current Interserver connectionsNumber of connections from other replicas to fetch parts.DEPENDENT

-

clickhouse.connections.interserver
ClickHouse: Current MySQL connectionsNumber of connections to MySQL server.DEPENDENT

-

clickhouse.connections.mysql
ClickHouse: Current TCP connectionsNumber of connections to TCP server (clients with native interface).DEPENDENT

-

clickhouse.connections.tcp
ClickHouse: Get dictionaries info

-

HTTP_AGENT

-

clickhouse.dictionaries
ClickHouse: Current distributed files to insertNumber of pending files to process for asynchronous insertion into Distributed tables. Number of files for every shard is summed.DEPENDENT

-

clickhouse.distributed.files
ClickHouse: Distributed connection fail with retry per second"Connection failures after all retries in replicated DB connection pool"DEPENDENT

-

clickhouse.distributed.files.fail.rate
ClickHouse: Distributed connection fail with retry per secondConnection retries in replicated DB connection poolDEPENDENT

-

clickhouse.distributed.files.retry.rate
ClickHouse: Delayed insert queries"Number of INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree table."DEPENDENT

-

clickhouse.insert.delay
ClickHouse: Inserted bytes per secondThe number of uncompressed bytes inserted in all tables.DEPENDENT

-

clickhouse.inserted_bytes.rate
ClickHouse: Inserted rows per secondThe number of rows inserted in all tables.DEPENDENT

-

clickhouse.inserted_rows.rate
ClickHouse: New INSERT queries per secondNumber of INSERT queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.DEPENDENT

-

clickhouse.insert_query.rate
ClickHouse: Allocated bytes"Total number of bytes allocated by the application."DEPENDENT

-

clickhouse.jemalloc.allocated
ClickHouse: Mapped memory"Total number of bytes in active extents mapped by the allocator."DEPENDENT

-

clickhouse.jemalloc.mapped
ClickHouse: Resident memoryMaximum number of bytes in physically resident data pages mapped by the allocator, comprising all pages dedicated to allocator metadata, pages backing active allocations, and unused dirty pages.DEPENDENT

-

clickhouse.jemalloc.resident
ClickHouse: Max count of parts per partition across all tablesClickhouse MergeTree table engine split each INSERT query to partitions (PARTITION BY expression) and add one or more PARTS per INSERT inside each partition, after that background merge process run.DEPENDENT

-

clickhouse.max.part.count.for.partition
ClickHouse: Memory used for queries"Total amount of memory (bytes) allocated in currently executing queries."DEPENDENT

-

clickhouse.memory.tracking
ClickHouse: Memory used for background merges"Total amount of memory (bytes) allocated in background processing pool (that is dedicated for background merges, mutations and fetches). Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks."DEPENDENT

-

clickhouse.memory.tracking.background
ClickHouse: Memory used for background moves"Total amount of memory (bytes) allocated in background processing pool (that is dedicated for background moves). Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks."DEPENDENT

-

clickhouse.memory.tracking.background.moves
ClickHouse: Memory used for mergesTotal amount of memory (bytes) allocated for background merges. Included in MemoryTrackingInBackgroundProcessingPool. Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks.DEPENDENT

-

clickhouse.memory.tracking.merges
ClickHouse: Memory used for background schedule pool"Total amount of memory (bytes) allocated in background schedule pool (that is dedicated for bookkeeping tasks of Replicated tables)."DEPENDENT

-

clickhouse.memory.tracking.schedule.pool
ClickHouse: Current running mergesNumber of executing background mergesDEPENDENT

-

clickhouse.merge.current
ClickHouse: Uncompressed bytes merged per secondUncompressed bytes that were read for background mergesDEPENDENT

-

clickhouse.merge_bytes.rate
ClickHouse: Merged rows per secondRows read for background merges.DEPENDENT

-

clickhouse.merge_rows.rate
ClickHouse: Network errors per secondNetwork errors (timeouts and connection failures) during query execution, background pool tasks and DNS cache update.DEPENDENT

-

clickhouse.network.error.rate
ClickHouse: Ping

-

HTTP_AGENT

-

clickhouse.ping
ClickHouse: Longest currently running query timeGet longest running query.HTTP_AGENT

-

clickhouse.process.elapsed
ClickHouse: Current running queriesNumber of executing queriesDEPENDENT

-

clickhouse.query.current
ClickHouse: New queries per secondNumber of queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.DEPENDENT

-

clickhouse.query.rate
ClickHouse: Read syscalls in flyNumber of read (read, pread, io_getevents, etc.) syscalls in flyDEPENDENT

-

clickhouse.read
ClickHouse: Read bytes per second"Number of bytes (the number of bytes before decompression) read from compressed sources (files, network)."DEPENDENT

-

clickhouse.read_bytes.rate
ClickHouse: Get replicas info

-

HTTP_AGENT

-

clickhouse.replicas
ClickHouse: Replication lag across all tablesMaximum replica queue delay relative to current timeDEPENDENT

-

clickhouse.replicas.max.absolute.delay
ClickHouse: Total number read-only ReplicasNumber of Replicated tables that are currently in readonly state due to re-initialization after ZooKeeper session loss or due to startup without ZooKeeper configured.DEPENDENT

-

clickhouse.replicas.readonly.total
ClickHouse: Total replication tasks in queue

-

DEPENDENT

-

clickhouse.replicas.sum.queue.size
ClickHouse: RevisionRevision of the server.DEPENDENT

-

clickhouse.revision
ClickHouse: New SELECT queries per secondNumber of SELECT queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.DEPENDENT

-

clickhouse.select_query.rate
ClickHouse: Get system.asynchronous_metricsGet metrics that are calculated periodically in the backgroundHTTP_AGENT

-

clickhouse.system.asynchronous_metrics
ClickHouse: Get system.eventsGet information about the number of events that have occurred in the system.HTTP_AGENT

-

clickhouse.system.events
ClickHouse: Get system.metricsGet metrics which can be calculated instantly, or have a current value format JSONEachRowHTTP_AGENT

-

clickhouse.system.metrics
ClickHouse: Get system.settingsGet information about settings that are currently in use.HTTP_AGENT

-

clickhouse.system.settings
ClickHouse: Get tables info

-

HTTP_AGENT

-

clickhouse.tables
ClickHouse: UptimeNumber of seconds since ClickHouse server startDEPENDENT

-

clickhouse.uptime
ClickHouse: VersionVersion of the serverHTTP_AGENT

-

clickhouse.version
ClickHouse: Write syscalls in flyNumber of write (write, pwrite, io_getevents, etc.) syscalls in flyDEPENDENT

-

clickhouse.write
ClickHouse: ZooKeeper exceptions per secondCount of ZooKeeper exceptions that does not belong to user/hardware exceptions.DEPENDENT

-

clickhouse.zookeper.exceptions.rate
ClickHouse: ZooKeeper hardware exceptions per secondCount of ZooKeeper exceptions caused by session moved/expired, connection loss, marshalling error, operation timed out and invalid zhandle state.DEPENDENT

-

clickhouse.zookeper.hw_exceptions.rate
ClickHouse: ZooKeeper requestsNumber of requests to ZooKeeper in progress.DEPENDENT

-

clickhouse.zookeper.request
ClickHouse: ZooKeeper sessionsNumber of sessions (connections) to ZooKeeper. Should be no more than one.DEPENDENT

-

clickhouse.zookeper.session
ClickHouse: ZooKeeper user exceptions per secondCount of ZooKeeper exceptions caused by no znodes, bad version, node exists, node empty and no children for ephemeral.DEPENDENT

-

clickhouse.zookeper.user_exceptions.rate
ClickHouse: ZooKeeper wait timeTime spent in waiting for ZooKeeper operations.DEPENDENT

-

clickhouse.zookeper.wait.time
ClickHouse: ZooKeeper watchesNumber of watches (e.g., event subscriptions) in ZooKeeper.DEPENDENT

-

clickhouse.zookeper.watch
ClickHouse: Check port availability

-

SIMPLE

-

net.tcp.service[{$CLICKHOUSE.SCHEME},"{HOST.CONN}","{$CLICKHOUSE.PORT}"]

Triggers

NameDescriptionExpressionPriorityDependencies
ClickHouse: Too many distributed files to insert"Clickhouse servers and <remote_servers> in config.xml https://clickhouse.tech/docs/en/operations/table_engines/distributed/"min(/ClickHouse by HTTP/clickhouse.distributed.files,5m)>{$CLICKHOUSE.DELAYED.FILES.DISTRIBUTED.COUNT.MAX.WARN}WARNING 📢ClickHouse: Current distributed files to insert
ClickHouse: Too many throttled insert queriesClickhouse have INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree, please decrease INSERT frequencymin(/ClickHouse by HTTP/clickhouse.insert.delay,5m)>{$CLICKHOUSE.DELAYED.INSERTS.MAX.WARN}WARNING 📢ClickHouse: Delayed insert queries
ClickHouse: Too many MergeTree partsDescease INSERT queries frequency. Clickhouse MergeTree table engine split each INSERT query to partitions (PARTITION BY expression) and add one or more PARTS per INSERT inside each partition, after that background merge process run, and when you have too much unmerged parts inside partition, SELECT queries performance can significate degrade, so clickhouse try delay insert, or abort it.min(/ClickHouse by HTTP/clickhouse.max.part.count.for.partition,5m)>{$CLICKHOUSE.PARTS.PER.PARTITION.WARN} * 0.9WARNING 📢ClickHouse: Max count of parts per partition across all tables
ClickHouse: Too many network errorsNumber of errors (timeouts and connection failures) during query execution, background pool tasks and DNS cache update is too high.min(/ClickHouse by HTTP/clickhouse.network.error.rate,5m)>{$CLICKHOUSE.NETWORK.ERRORS.MAX.WARN}WARNING 📢ClickHouse: Network errors per second
ClickHouse: There are queries running is long

-

last(/ClickHouse by HTTP/clickhouse.process.elapsed)>{$CLICKHOUSE.QUERY_TIME.MAX.WARN}AVERAGE ⚠ClickHouse: Longest currently running query time
ClickHouse: Replication lag is too highWhen replica have too much lag, it can be skipped from Distributed SELECT Queries without errors and you will have wrong query results.min(/ClickHouse by HTTP/clickhouse.replicas.max.absolute.delay,5m)>{$CLICKHOUSE.REPLICA.MAX.WARN}WARNING 📢ClickHouse: Replication lag across all tables
ClickHouse: Configuration has been changedClickHouse configuration has been changed. Ack to close.last(/ClickHouse by HTTP/clickhouse.system.settings,#1)<>last(/ClickHouse by HTTP/clickhouse.system.settings,#2) and length(last(/ClickHouse by HTTP/clickhouse.system.settings))>0INFO 🔔ClickHouse: Get system.settings
ClickHouse: Failed to fetch info dataZabbix has not received data for items for the last 30 minutesnodata(/ClickHouse by HTTP/clickhouse.uptime,30m)=1WARNING 📢ClickHouse: Uptime
ClickHouse: has been restartedUptime is less than 10 minutes.last(/ClickHouse by HTTP/clickhouse.uptime)<10mINFO 🔔ClickHouse: Uptime
ClickHouse: Version has changedClickHouse version has changed. Ack to close.last(/ClickHouse by HTTP/clickhouse.version,#1)<>last(/ClickHouse by HTTP/clickhouse.version,#2) and length(last(/ClickHouse by HTTP/clickhouse.version))>0INFO 🔔ClickHouse: Version
ClickHouse: Too many ZooKeeper sessions openedNumber of sessions (connections) to ZooKeeper. Should be no more than one, because using more than one connection to ZooKeeper may lead to bugs due to lack of linearizability (stale reads) that ZooKeeper consistency model allows.min(/ClickHouse by HTTP/clickhouse.zookeper.session,5m)>1WARNING 📢ClickHouse: ZooKeeper sessions
ClickHouse: Port {$CLICKHOUSE.PORT} is unavailable

-

last(/ClickHouse by HTTP/net.tcp.service[{$CLICKHOUSE.SCHEME},"{HOST.CONN}","{$CLICKHOUSE.PORT}"])=0AVERAGE ⚠ClickHouse: Check port availability

Discovery rule №1

NameDescriptionTypeIntervalKey and additional info
DictionariesInfo about dictionariesDEPENDENT0clickhouse.dictionaries.discovery

Item prototypes

NameDescriptionTypeIntervalKey and additional info
ClickHouse: Dictionary {#NAME}: Bytes allocatedThe amount of RAM the dictionary uses.DEPENDENT

-

clickhouse.dictionary.bytes_allocated["{#NAME}"]
ClickHouse: Dictionary {#NAME}: Element countNumber of items stored in the dictionary.DEPENDENT

-

clickhouse.dictionary.element_count["{#NAME}"]
ClickHouse: Dictionary {#NAME}: Load factorThe percentage filled in the dictionary (for a hashed dictionary, the percentage filled in the hash table).DEPENDENT

-

clickhouse.dictionary.load_factor["{#NAME}"]

Discovery rule №2

NameDescriptionTypeIntervalKey and additional info
ReplicasInfo about replicasDEPENDENT0clickhouse.replicas.discovery

Item prototypes

NameDescriptionTypeIntervalKey and additional info
ClickHouse: {#DB}.{#TABLE}: Active replicasNumber of replicas of this table that have a session in ZooKeeper (i.e., the number of functioning replicas). (Have a non-zero value only where there is an active session with ZooKeeper).DEPENDENT

-

clickhouse.replica.active_replicas["{#DB}.{#TABLE}"]
ClickHouse: {#DB}.{#TABLE}: Replica future partsNumber of data parts that will appear as the result of INSERTs or merges that haven't been done yet.DEPENDENT

-

clickhouse.replica.future_parts["{#DB}.{#TABLE}"]
ClickHouse: {#DB}.{#TABLE}: Replica queue inserts sizeNumber of inserts of blocks of data that need to be made.DEPENDENT

-

clickhouse.replica.inserts_in_queue["{#DB}.{#TABLE}"]
ClickHouse: {#DB}.{#TABLE}: Replica readonlyWhether the replica is in read-only mode. This mode is turned on if the config doesn't have sections with ZooKeeper, if an unknown error occurred when re-initializing sessions in ZooKeeper, and during session re-initialization in ZooKeeper.DEPENDENT

-

clickhouse.replica.is_readonly["{#DB}.{#TABLE}"]
ClickHouse: {#DB}.{#TABLE}: Replica session expiredTrue if the ZooKeeper session expiredDEPENDENT

-

clickhouse.replica.is_session_expired["{#DB}.{#TABLE}"]
ClickHouse: {#DB}.{#TABLE}: Replica lagDifference between log_max_index and log_pointerDEPENDENT

-

clickhouse.replica.lag["{#DB}.{#TABLE}"]
ClickHouse: {#DB}.{#TABLE}: Replica log max indexMaximum entry number in the log of general activity. (Have a non-zero value only where there is an active session with ZooKeeper).DEPENDENT

-

clickhouse.replica.log_max_index["{#DB}.{#TABLE}"]
ClickHouse: {#DB}.{#TABLE}: Replica log pointerMaximum entry number in the log of general activity that the replica copied to its execution queue, plus one. (Have a non-zero value only where there is an active session with ZooKeeper).DEPENDENT

-

clickhouse.replica.log_pointer["{#DB}.{#TABLE}"]
ClickHouse: {#DB}.{#TABLE}: Replica queue merges sizeNumber of merges waiting to be made.DEPENDENT

-

clickhouse.replica.merges_in_queue["{#DB}.{#TABLE}"]
ClickHouse: {#DB}.{#TABLE}: Replica parts to checkNumber of data parts in the queue for verification. A part is put in the verification queue if there is suspicion that it might be damaged.DEPENDENT

-

clickhouse.replica.parts_to_check["{#DB}.{#TABLE}"]
ClickHouse: {#DB}.{#TABLE}: Replica queue sizeSize of the queue for operations waiting to be performed.DEPENDENT

-

clickhouse.replica.queue_size["{#DB}.{#TABLE}"]
ClickHouse: {#DB}.{#TABLE}: Total replicasTotal number of known replicas of this table. (Have a non-zero value only where there is an active session with ZooKeeper).DEPENDENT

-

clickhouse.replica.total_replicas["{#DB}.{#TABLE}"]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
ClickHouse: {#DB}.{#TABLE} Replica is readonlyThis mode is turned on if the config doesn't have sections with ZooKeeper, if an unknown error occurred when re-initializing sessions in ZooKeeper, and during session re-initialization in ZooKeeper.min(/ClickHouse by HTTP/clickhouse.replica.is_readonly["{#DB}.{#TABLE}"],5m)=1WARNING 📢ClickHouse: {#DB}.{#TABLE}: Replica readonly
ClickHouse: {#DB}.{#TABLE} Replica session is expiredThis mode is turned on if the config doesn't have sections with ZooKeeper, if an unknown error occurred when re-initializing sessions in ZooKeeper, and during session re-initialization in ZooKeeper.min(/ClickHouse by HTTP/clickhouse.replica.is_session_expired["{#DB}.{#TABLE}"],5m)=1WARNING 📢ClickHouse: {#DB}.{#TABLE}: Replica session expired
ClickHouse: {#DB}.{#TABLE}: Difference between log_max_index and log_pointer is too high

-

min(/ClickHouse by HTTP/clickhouse.replica.lag["{#DB}.{#TABLE}"],5m) > {$CLICKHOUSE.LOG_POSITION.DIFF.MAX.WARN}WARNING 📢ClickHouse: {#DB}.{#TABLE}: Replica lag
ClickHouse: {#DB}.{#TABLE}: Too many operations in queue

-

min(/ClickHouse by HTTP/clickhouse.replica.queue_size["{#DB}.{#TABLE}"],5m)>{$CLICKHOUSE.QUEUE.SIZE.MAX.WARN:"{#TABLE}"}WARNING 📢ClickHouse: {#DB}.{#TABLE}: Replica queue size

Discovery rule №3

NameDescriptionTypeIntervalKey and additional info
TablesInfo about tablesDEPENDENT0clickhouse.tables.discovery

Item prototypes

NameDescriptionTypeIntervalKey and additional info
ClickHouse: {#DB}: BytesDatabase size in bytes.DEPENDENT

-

clickhouse.db.bytes["{#DB}"]
ClickHouse: {#DB}.{#TABLE}: BytesTable size in bytes. Database: {#DB}, table: {#TABLE}DEPENDENT

-

clickhouse.table.bytes["{#DB}.{#TABLE}"]
ClickHouse: {#DB}.{#TABLE}: PartsNumber of parts of the table. Database: {#DB}, table: {#TABLE}DEPENDENT

-

clickhouse.table.parts["{#DB}.{#TABLE}"]
ClickHouse: {#DB}.{#TABLE}: RowsNumber of rows in the table. Database: {#DB}, table: {#TABLE}DEPENDENT

-

clickhouse.table.rows["{#DB}.{#TABLE}"]