ClickHouse by HTTP
Macros used
| Name | Value |
|---|---|
| {$CLICKHOUSE.DELAYED.FILES.DISTRIBUTED.COUNT.MAX.WARN} | 600 |
| {$CLICKHOUSE.DELAYED.INSERTS.MAX.WARN} | 0 |
| {$CLICKHOUSE.LLD.FILTER.DB.MATCHES} | .* |
| {$CLICKHOUSE.LLD.FILTER.DB.NOT_MATCHES} | CHANGE_IF_NEEDED |
| {$CLICKHOUSE.LLD.FILTER.DICT.MATCHES} | .* |
| {$CLICKHOUSE.LLD.FILTER.DICT.NOT_MATCHES} | CHANGE_IF_NEEDED |
| {$CLICKHOUSE.LOG_POSITION.DIFF.MAX.WARN} | 30 |
| {$CLICKHOUSE.NETWORK.ERRORS.MAX.WARN} | 5 |
| {$CLICKHOUSE.PARTS.PER.PARTITION.WARN} | 300 |
| {$CLICKHOUSE.PASSWORD} | zabbix_pass |
| {$CLICKHOUSE.PORT} | 8123 |
| {$CLICKHOUSE.QUERY_TIME.MAX.WARN} | 600 |
| {$CLICKHOUSE.QUEUE.SIZE.MAX.WARN} | 20 |
| {$CLICKHOUSE.REPLICA.MAX.WARN} | 600 |
| {$CLICKHOUSE.SCHEME} | http |
| {$CLICKHOUSE.USER} | zabbix |
Items collected
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| ClickHouse: Current distribute connections | Number of connections to remote servers sending data that was INSERTed into Distributed tables. | DEPENDENT | - | clickhouse.connections.distribute |
| ClickHouse: Current HTTP connections | Number of connections to HTTP server. | DEPENDENT | - | clickhouse.connections.http |
| ClickHouse: Current Interserver connections | Number of connections from other replicas to fetch parts. | DEPENDENT | - | clickhouse.connections.interserver |
| ClickHouse: Current MySQL connections | Number of connections to MySQL server. | DEPENDENT | - | clickhouse.connections.mysql |
| ClickHouse: Current TCP connections | Number of connections to TCP server (clients with native interface). | DEPENDENT | - | clickhouse.connections.tcp |
| ClickHouse: Get dictionaries info | - | HTTP_AGENT | - | clickhouse.dictionaries |
| ClickHouse: Current distributed files to insert | Number of pending files to process for asynchronous insertion into Distributed tables. Number of files for every shard is summed. | DEPENDENT | - | clickhouse.distributed.files |
| ClickHouse: Distributed connection fail with retry per second | "Connection failures after all retries in replicated DB connection pool" | DEPENDENT | - | clickhouse.distributed.files.fail.rate |
| ClickHouse: Distributed connection fail with retry per second | Connection retries in replicated DB connection pool | DEPENDENT | - | clickhouse.distributed.files.retry.rate |
| ClickHouse: Delayed insert queries | "Number of INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree table." | DEPENDENT | - | clickhouse.insert.delay |
| ClickHouse: Inserted bytes per second | The number of uncompressed bytes inserted in all tables. | DEPENDENT | - | clickhouse.inserted_bytes.rate |
| ClickHouse: Inserted rows per second | The number of rows inserted in all tables. | DEPENDENT | - | clickhouse.inserted_rows.rate |
| ClickHouse: New INSERT queries per second | Number of INSERT queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries. | DEPENDENT | - | clickhouse.insert_query.rate |
| ClickHouse: Allocated bytes | "Total number of bytes allocated by the application." | DEPENDENT | - | clickhouse.jemalloc.allocated |
| ClickHouse: Mapped memory | "Total number of bytes in active extents mapped by the allocator." | DEPENDENT | - | clickhouse.jemalloc.mapped |
| ClickHouse: Resident memory | Maximum number of bytes in physically resident data pages mapped by the allocator, comprising all pages dedicated to allocator metadata, pages backing active allocations, and unused dirty pages. | DEPENDENT | - | clickhouse.jemalloc.resident |
| ClickHouse: Max count of parts per partition across all tables | Clickhouse MergeTree table engine split each INSERT query to partitions (PARTITION BY expression) and add one or more PARTS per INSERT inside each partition, after that background merge process run. | DEPENDENT | - | clickhouse.max.part.count.for.partition |
| ClickHouse: Memory used for queries | "Total amount of memory (bytes) allocated in currently executing queries." | DEPENDENT | - | clickhouse.memory.tracking |
| ClickHouse: Memory used for background merges | "Total amount of memory (bytes) allocated in background processing pool (that is dedicated for background merges, mutations and fetches). Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks." | DEPENDENT | - | clickhouse.memory.tracking.background |
| ClickHouse: Memory used for background moves | "Total amount of memory (bytes) allocated in background processing pool (that is dedicated for background moves). Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks." | DEPENDENT | - | clickhouse.memory.tracking.background.moves |
| ClickHouse: Memory used for merges | Total amount of memory (bytes) allocated for background merges. Included in MemoryTrackingInBackgroundProcessingPool. Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks. | DEPENDENT | - | clickhouse.memory.tracking.merges |
| ClickHouse: Memory used for background schedule pool | "Total amount of memory (bytes) allocated in background schedule pool (that is dedicated for bookkeeping tasks of Replicated tables)." | DEPENDENT | - | clickhouse.memory.tracking.schedule.pool |
| ClickHouse: Current running merges | Number of executing background merges | DEPENDENT | - | clickhouse.merge.current |
| ClickHouse: Uncompressed bytes merged per second | Uncompressed bytes that were read for background merges | DEPENDENT | - | clickhouse.merge_bytes.rate |
| ClickHouse: Merged rows per second | Rows read for background merges. | DEPENDENT | - | clickhouse.merge_rows.rate |
| ClickHouse: Network errors per second | Network errors (timeouts and connection failures) during query execution, background pool tasks and DNS cache update. | DEPENDENT | - | clickhouse.network.error.rate |
| ClickHouse: Ping | - | HTTP_AGENT | - | clickhouse.ping |
| ClickHouse: Longest currently running query time | Get longest running query. | HTTP_AGENT | - | clickhouse.process.elapsed |
| ClickHouse: Current running queries | Number of executing queries | DEPENDENT | - | clickhouse.query.current |
| ClickHouse: New queries per second | Number of queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries. | DEPENDENT | - | clickhouse.query.rate |
| ClickHouse: Read syscalls in fly | Number of read (read, pread, io_getevents, etc.) syscalls in fly | DEPENDENT | - | clickhouse.read |
| ClickHouse: Read bytes per second | "Number of bytes (the number of bytes before decompression) read from compressed sources (files, network)." | DEPENDENT | - | clickhouse.read_bytes.rate |
| ClickHouse: Get replicas info | - | HTTP_AGENT | - | clickhouse.replicas |
| ClickHouse: Replication lag across all tables | Maximum replica queue delay relative to current time | DEPENDENT | - | clickhouse.replicas.max.absolute.delay |
| ClickHouse: Total number read-only Replicas | Number of Replicated tables that are currently in readonly state due to re-initialization after ZooKeeper session loss or due to startup without ZooKeeper configured. | DEPENDENT | - | clickhouse.replicas.readonly.total |
| ClickHouse: Total replication tasks in queue | - | DEPENDENT | - | clickhouse.replicas.sum.queue.size |
| ClickHouse: Revision | Revision of the server. | DEPENDENT | - | clickhouse.revision |
| ClickHouse: New SELECT queries per second | Number of SELECT queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries. | DEPENDENT | - | clickhouse.select_query.rate |
| ClickHouse: Get system.asynchronous_metrics | Get metrics that are calculated periodically in the background | HTTP_AGENT | - | clickhouse.system.asynchronous_metrics |
| ClickHouse: Get system.events | Get information about the number of events that have occurred in the system. | HTTP_AGENT | - | clickhouse.system.events |
| ClickHouse: Get system.metrics | Get metrics which can be calculated instantly, or have a current value format JSONEachRow | HTTP_AGENT | - | clickhouse.system.metrics |
| ClickHouse: Get system.settings | Get information about settings that are currently in use. | HTTP_AGENT | - | clickhouse.system.settings |
| ClickHouse: Get tables info | - | HTTP_AGENT | - | clickhouse.tables |
| ClickHouse: Uptime | Number of seconds since ClickHouse server start | DEPENDENT | - | clickhouse.uptime |
| ClickHouse: Version | Version of the server | HTTP_AGENT | - | clickhouse.version |
| ClickHouse: Write syscalls in fly | Number of write (write, pwrite, io_getevents, etc.) syscalls in fly | DEPENDENT | - | clickhouse.write |
| ClickHouse: ZooKeeper exceptions per second | Count of ZooKeeper exceptions that does not belong to user/hardware exceptions. | DEPENDENT | - | clickhouse.zookeper.exceptions.rate |
| ClickHouse: ZooKeeper hardware exceptions per second | Count of ZooKeeper exceptions caused by session moved/expired, connection loss, marshalling error, operation timed out and invalid zhandle state. | DEPENDENT | - | clickhouse.zookeper.hw_exceptions.rate |
| ClickHouse: ZooKeeper requests | Number of requests to ZooKeeper in progress. | DEPENDENT | - | clickhouse.zookeper.request |
| ClickHouse: ZooKeeper sessions | Number of sessions (connections) to ZooKeeper. Should be no more than one. | DEPENDENT | - | clickhouse.zookeper.session |
| ClickHouse: ZooKeeper user exceptions per second | Count of ZooKeeper exceptions caused by no znodes, bad version, node exists, node empty and no children for ephemeral. | DEPENDENT | - | clickhouse.zookeper.user_exceptions.rate |
| ClickHouse: ZooKeeper wait time | Time spent in waiting for ZooKeeper operations. | DEPENDENT | - | clickhouse.zookeper.wait.time |
| ClickHouse: ZooKeeper watches | Number of watches (e.g., event subscriptions) in ZooKeeper. | DEPENDENT | - | clickhouse.zookeper.watch |
| ClickHouse: Check port availability | - | SIMPLE | - | net.tcp.service[{$CLICKHOUSE.SCHEME},"{HOST.CONN}","{$CLICKHOUSE.PORT}"] |
Triggers
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| ClickHouse: Too many distributed files to insert | "Clickhouse servers and <remote_servers> in config.xml https://clickhouse.tech/docs/en/operations/table_engines/distributed/" | min(/ClickHouse by HTTP/clickhouse.distributed.files,5m)>{$CLICKHOUSE.DELAYED.FILES.DISTRIBUTED.COUNT.MAX.WARN} | WARNING 📢 | ClickHouse: Current distributed files to insert |
| ClickHouse: Too many throttled insert queries | Clickhouse have INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree, please decrease INSERT frequency | min(/ClickHouse by HTTP/clickhouse.insert.delay,5m)>{$CLICKHOUSE.DELAYED.INSERTS.MAX.WARN} | WARNING 📢 | ClickHouse: Delayed insert queries |
| ClickHouse: Too many MergeTree parts | Descease INSERT queries frequency. Clickhouse MergeTree table engine split each INSERT query to partitions (PARTITION BY expression) and add one or more PARTS per INSERT inside each partition, after that background merge process run, and when you have too much unmerged parts inside partition, SELECT queries performance can significate degrade, so clickhouse try delay insert, or abort it. | min(/ClickHouse by HTTP/clickhouse.max.part.count.for.partition,5m)>{$CLICKHOUSE.PARTS.PER.PARTITION.WARN} * 0.9 | WARNING 📢 | ClickHouse: Max count of parts per partition across all tables |
| ClickHouse: Too many network errors | Number of errors (timeouts and connection failures) during query execution, background pool tasks and DNS cache update is too high. | min(/ClickHouse by HTTP/clickhouse.network.error.rate,5m)>{$CLICKHOUSE.NETWORK.ERRORS.MAX.WARN} | WARNING 📢 | ClickHouse: Network errors per second |
| ClickHouse: There are queries running is long | - | last(/ClickHouse by HTTP/clickhouse.process.elapsed)>{$CLICKHOUSE.QUERY_TIME.MAX.WARN} | AVERAGE ⚠ | ClickHouse: Longest currently running query time |
| ClickHouse: Replication lag is too high | When replica have too much lag, it can be skipped from Distributed SELECT Queries without errors and you will have wrong query results. | min(/ClickHouse by HTTP/clickhouse.replicas.max.absolute.delay,5m)>{$CLICKHOUSE.REPLICA.MAX.WARN} | WARNING 📢 | ClickHouse: Replication lag across all tables |
| ClickHouse: Configuration has been changed | ClickHouse configuration has been changed. Ack to close. | last(/ClickHouse by HTTP/clickhouse.system.settings,#1)<>last(/ClickHouse by HTTP/clickhouse.system.settings,#2) and length(last(/ClickHouse by HTTP/clickhouse.system.settings))>0 | INFO 🔔 | ClickHouse: Get system.settings |
| ClickHouse: Failed to fetch info data | Zabbix has not received data for items for the last 30 minutes | nodata(/ClickHouse by HTTP/clickhouse.uptime,30m)=1 | WARNING 📢 | ClickHouse: Uptime |
| ClickHouse: has been restarted | Uptime is less than 10 minutes. | last(/ClickHouse by HTTP/clickhouse.uptime)<10m | INFO 🔔 | ClickHouse: Uptime |
| ClickHouse: Version has changed | ClickHouse version has changed. Ack to close. | last(/ClickHouse by HTTP/clickhouse.version,#1)<>last(/ClickHouse by HTTP/clickhouse.version,#2) and length(last(/ClickHouse by HTTP/clickhouse.version))>0 | INFO 🔔 | ClickHouse: Version |
| ClickHouse: Too many ZooKeeper sessions opened | Number of sessions (connections) to ZooKeeper. Should be no more than one, because using more than one connection to ZooKeeper may lead to bugs due to lack of linearizability (stale reads) that ZooKeeper consistency model allows. | min(/ClickHouse by HTTP/clickhouse.zookeper.session,5m)>1 | WARNING 📢 | ClickHouse: ZooKeeper sessions |
| ClickHouse: Port {$CLICKHOUSE.PORT} is unavailable | - | last(/ClickHouse by HTTP/net.tcp.service[{$CLICKHOUSE.SCHEME},"{HOST.CONN}","{$CLICKHOUSE.PORT}"])=0 | AVERAGE ⚠ | ClickHouse: Check port availability |
Discovery rule №1
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Dictionaries | Info about dictionaries | DEPENDENT | 0 | clickhouse.dictionaries.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| ClickHouse: Dictionary {#NAME}: Bytes allocated | The amount of RAM the dictionary uses. | DEPENDENT | - | clickhouse.dictionary.bytes_allocated["{#NAME}"] |
| ClickHouse: Dictionary {#NAME}: Element count | Number of items stored in the dictionary. | DEPENDENT | - | clickhouse.dictionary.element_count["{#NAME}"] |
| ClickHouse: Dictionary {#NAME}: Load factor | The percentage filled in the dictionary (for a hashed dictionary, the percentage filled in the hash table). | DEPENDENT | - | clickhouse.dictionary.load_factor["{#NAME}"] |
Discovery rule №2
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Replicas | Info about replicas | DEPENDENT | 0 | clickhouse.replicas.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| ClickHouse: {#DB}.{#TABLE}: Active replicas | Number of replicas of this table that have a session in ZooKeeper (i.e., the number of functioning replicas). (Have a non-zero value only where there is an active session with ZooKeeper). | DEPENDENT | - | clickhouse.replica.active_replicas["{#DB}.{#TABLE}"] |
| ClickHouse: {#DB}.{#TABLE}: Replica future parts | Number of data parts that will appear as the result of INSERTs or merges that haven't been done yet. | DEPENDENT | - | clickhouse.replica.future_parts["{#DB}.{#TABLE}"] |
| ClickHouse: {#DB}.{#TABLE}: Replica queue inserts size | Number of inserts of blocks of data that need to be made. | DEPENDENT | - | clickhouse.replica.inserts_in_queue["{#DB}.{#TABLE}"] |
| ClickHouse: {#DB}.{#TABLE}: Replica readonly | Whether the replica is in read-only mode. This mode is turned on if the config doesn't have sections with ZooKeeper, if an unknown error occurred when re-initializing sessions in ZooKeeper, and during session re-initialization in ZooKeeper. | DEPENDENT | - | clickhouse.replica.is_readonly["{#DB}.{#TABLE}"] |
| ClickHouse: {#DB}.{#TABLE}: Replica session expired | True if the ZooKeeper session expired | DEPENDENT | - | clickhouse.replica.is_session_expired["{#DB}.{#TABLE}"] |
| ClickHouse: {#DB}.{#TABLE}: Replica lag | Difference between log_max_index and log_pointer | DEPENDENT | - | clickhouse.replica.lag["{#DB}.{#TABLE}"] |
| ClickHouse: {#DB}.{#TABLE}: Replica log max index | Maximum entry number in the log of general activity. (Have a non-zero value only where there is an active session with ZooKeeper). | DEPENDENT | - | clickhouse.replica.log_max_index["{#DB}.{#TABLE}"] |
| ClickHouse: {#DB}.{#TABLE}: Replica log pointer | Maximum entry number in the log of general activity that the replica copied to its execution queue, plus one. (Have a non-zero value only where there is an active session with ZooKeeper). | DEPENDENT | - | clickhouse.replica.log_pointer["{#DB}.{#TABLE}"] |
| ClickHouse: {#DB}.{#TABLE}: Replica queue merges size | Number of merges waiting to be made. | DEPENDENT | - | clickhouse.replica.merges_in_queue["{#DB}.{#TABLE}"] |
| ClickHouse: {#DB}.{#TABLE}: Replica parts to check | Number of data parts in the queue for verification. A part is put in the verification queue if there is suspicion that it might be damaged. | DEPENDENT | - | clickhouse.replica.parts_to_check["{#DB}.{#TABLE}"] |
| ClickHouse: {#DB}.{#TABLE}: Replica queue size | Size of the queue for operations waiting to be performed. | DEPENDENT | - | clickhouse.replica.queue_size["{#DB}.{#TABLE}"] |
| ClickHouse: {#DB}.{#TABLE}: Total replicas | Total number of known replicas of this table. (Have a non-zero value only where there is an active session with ZooKeeper). | DEPENDENT | - | clickhouse.replica.total_replicas["{#DB}.{#TABLE}"] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| ClickHouse: {#DB}.{#TABLE} Replica is readonly | This mode is turned on if the config doesn't have sections with ZooKeeper, if an unknown error occurred when re-initializing sessions in ZooKeeper, and during session re-initialization in ZooKeeper. | min(/ClickHouse by HTTP/clickhouse.replica.is_readonly["{#DB}.{#TABLE}"],5m)=1 | WARNING 📢 | ClickHouse: {#DB}.{#TABLE}: Replica readonly |
| ClickHouse: {#DB}.{#TABLE} Replica session is expired | This mode is turned on if the config doesn't have sections with ZooKeeper, if an unknown error occurred when re-initializing sessions in ZooKeeper, and during session re-initialization in ZooKeeper. | min(/ClickHouse by HTTP/clickhouse.replica.is_session_expired["{#DB}.{#TABLE}"],5m)=1 | WARNING 📢 | ClickHouse: {#DB}.{#TABLE}: Replica session expired |
| ClickHouse: {#DB}.{#TABLE}: Difference between log_max_index and log_pointer is too high | - | min(/ClickHouse by HTTP/clickhouse.replica.lag["{#DB}.{#TABLE}"],5m) > {$CLICKHOUSE.LOG_POSITION.DIFF.MAX.WARN} | WARNING 📢 | ClickHouse: {#DB}.{#TABLE}: Replica lag |
| ClickHouse: {#DB}.{#TABLE}: Too many operations in queue | - | min(/ClickHouse by HTTP/clickhouse.replica.queue_size["{#DB}.{#TABLE}"],5m)>{$CLICKHOUSE.QUEUE.SIZE.MAX.WARN:"{#TABLE}"} | WARNING 📢 | ClickHouse: {#DB}.{#TABLE}: Replica queue size |
Discovery rule №3
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Tables | Info about tables | DEPENDENT | 0 | clickhouse.tables.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| ClickHouse: {#DB}: Bytes | Database size in bytes. | DEPENDENT | - | clickhouse.db.bytes["{#DB}"] |
| ClickHouse: {#DB}.{#TABLE}: Bytes | Table size in bytes. Database: {#DB}, table: {#TABLE} | DEPENDENT | - | clickhouse.table.bytes["{#DB}.{#TABLE}"] |
| ClickHouse: {#DB}.{#TABLE}: Parts | Number of parts of the table. Database: {#DB}, table: {#TABLE} | DEPENDENT | - | clickhouse.table.parts["{#DB}.{#TABLE}"] |
| ClickHouse: {#DB}.{#TABLE}: Rows | Number of rows in the table. Database: {#DB}, table: {#TABLE} | DEPENDENT | - | clickhouse.table.rows["{#DB}.{#TABLE}"] |