TiDB by HTTP
Macros used
| Name | Value |
|---|---|
| {$TIDB.DDL.WAITING.MAX.WARN} | 5 |
| {$TIDB.GC_ACTIONS.ERRORS.MAX.WARN} | 1 |
| {$TIDB.HEAP.USAGE.MAX.WARN} | 10G |
| {$TIDB.MONITOR_KEEP_ALIVE.MAX.WARN} | 10 |
| {$TIDB.OPEN.FDS.MAX.WARN} | 90 |
| {$TIDB.PORT} | 10080 |
| {$TIDB.REGION_ERROR.MAX.WARN} | 50 |
| {$TIDB.SCHEMA_LEASE_ERRORS.MAX.WARN} | 0 |
| {$TIDB.SCHEMA_LOAD_ERRORS.MAX.WARN} | 1 |
| {$TIDB.TIME_JUMP_BACK.MAX.WARN} | 1 |
| {$TIDB.URL} | localhost |
Items collected
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| TiDB: CPU | Total user and system CPU usage ratio. | DEPENDENT | - | tidb.cpu.util |
| TiDB: DDL waiting jobs | The number of TiDB operations that resolve locks per second. When TiDB's read or write request encounters a lock, it tries to resolve the lock. | DEPENDENT | - | tidb.ddl_waiting_jobs |
| TiDB: Load schema failed, rate | The total number of failures to reload the latest schema information in TiDB per second. | DEPENDENT | - | tidb.domain_load_schema.failed.rate |
| TiDB: Load schema total, rate | The statistics of the schemas that TiDB obtains from TiKV per second. | DEPENDENT | - | tidb.domain_load_schema.rate |
| TiDB: Failed Query, rate | The number of error occurred when executing SQL statements per second (such as syntax errors and primary key conflicts). | DEPENDENT | - | tidb.execute_error.rate |
| TiDB: Get instance metrics | Get TiDB instance metrics. | HTTP_AGENT | - | tidb.get_metrics |
| TiDB: Get instance status | Get TiDB instance status info. | HTTP_AGENT | - | tidb.get_status |
| TiDB: Goroutine count | The number of Goroutines on TiDB instance. | DEPENDENT | - | tidb.goroutines |
| TiDB: Heap memory usage | Number of heap bytes that are in use. | DEPENDENT | - | tidb.heap_bytes |
| TiDB: Keep alive, rate | The number of times that the metrics are refreshed on TiDB instance per minute. | DEPENDENT | - | tidb.monitor_keep_alive.rate |
| TiDB: Time jump back, rate | The number of times that the operating system rewinds every second. | DEPENDENT | - | tidb.monitor_time_jump_back.rate |
| TiDB: PD TSO commands, rate | The number of TSO commands that TiDB obtains from PD per second. | DEPENDENT | - | tidb.pd_tso_cmd.rate |
| TiDB: PD TSO requests, rate | The number of TSO requests that TiDB obtains from PD per second. | DEPENDENT | - | tidb.pd_tso_request.rate |
| TiDB: Open file descriptors, max | Maximum number of open file descriptors. | DEPENDENT | - | tidb.process_max_fds |
| TiDB: Open file descriptors | Number of open file descriptors. | DEPENDENT | - | tidb.process_open_fds |
| TiDB: RSS memory usage | Resident memory size in bytes. | DEPENDENT | - | tidb.rss_bytes |
| TiDB: Total "error" server query, rate | The number of queries on TiDB instance per second with failure of command execution results. | DEPENDENT | - | tidb.server_query.error.rate |
| TiDB: Total "ok" server query, rate | The number of queries on TiDB instance per second with success of command execution results. | DEPENDENT | - | tidb.server_query.ok.rate |
| TiDB: Total server query, rate | The number of queries per second on TiDB instance. | DEPENDENT | - | tidb.server_query.rate |
| TiDB: Schema lease "change" errors, rate | The number of schema lease errors per second. "change" means that the schema has changed | DEPENDENT | - | tidb.session_schema_lease_error.change.rate |
| TiDB: Schema lease "outdate" errors , rate | The number of schema lease errors per second. "outdate" errors means that the schema cannot be updated, which is a more serious error and triggers an alert. | DEPENDENT | - | tidb.session_schema_lease_error.outdate.rate |
| TiDB: SQL statements, rate | The total number of SQL statements executed per second. | DEPENDENT | - | tidb.statement_total.rate |
| TiDB: Status | Status of PD instance. | DEPENDENT | - | tidb.status |
| TiDB: Server connections | The connection number of current TiDB instance. | DEPENDENT | - | tidb.tidb_server_connections |
| TiDB: Server critical error, rate | The number of critical errors occurred in TiDB per second. | DEPENDENT | - | tidb.tidb_server_critical_error_total.rate |
| TiDB: Server panic, rate | The number of panics occurred in TiDB per second. | DEPENDENT | - | tidb.tidb_server_panic_total.rate |
| TiDB: KV backoff, rate | The number of errors returned by TiKV. | DEPENDENT | - | tidb.tikvclient_backoff.rate |
| TiDB: Lock resolves, rate | The number of DDL tasks that are waiting. | DEPENDENT | - | tidb.tikvclient_lock_resolver_action.rate |
| TiDB: TiClient region errors, rate | The number of region related errors returned by TiKV per second. | DEPENDENT | - | tidb.tikvclient_region_err.rate |
| TiDB: KV commands, rate | The number of executed KV commands per second. | DEPENDENT | - | tidb.tikvclient_txn.rate |
| TiDB: Uptime | The runtime of each TiDB instance. | DEPENDENT | - | tidb.uptime |
| TiDB: Version | Version of the TiDB instance. | DEPENDENT | - | tidb.version |
Triggers
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| TiDB: Too many DDL waiting jobs | - | min(/TiDB by HTTP/tidb.ddl_waiting_jobs,5m)>{$TIDB.DDL.WAITING.MAX.WARN} | WARNING 📢 | TiDB: DDL waiting jobs |
| TiDB: Too many schema lease errors | - | min(/TiDB by HTTP/tidb.domain_load_schema.failed.rate,5m)>{$TIDB.SCHEMA_LOAD_ERRORS.MAX.WARN} | AVERAGE ⚠ | TiDB: Load schema failed, rate |
| TiDB: Heap memory usage is too high | - | min(/TiDB by HTTP/tidb.heap_bytes,5m)>{$TIDB.HEAP.USAGE.MAX.WARN} | WARNING 📢 | TiDB: Heap memory usage |
| TiDB: Too few keep alive operations | Indicates whether the TiDB process still exists. If the number of times for tidb_monitor_keep_alive_total increases less than 10 per minute, the TiDB process might already exit and an alert is triggered. | max(/TiDB by HTTP/tidb.monitor_keep_alive.rate,5m)<{$TIDB.MONITOR_KEEP_ALIVE.MAX.WARN} | AVERAGE ⚠ | TiDB: Keep alive, rate |
| TiDB: Too many time jump backs | - | min(/TiDB by HTTP/tidb.monitor_time_jump_back.rate,5m)>{$TIDB.TIME_JUMP_BACK.MAX.WARN} | WARNING 📢 | TiDB: Time jump back, rate |
| TiDB: Too many schema lease errors | The latest schema information is not reloaded in TiDB within one lease. | min(/TiDB by HTTP/tidb.session_schema_lease_error.outdate.rate,5m)>{$TIDB.SCHEMA_LEASE_ERRORS.MAX.WARN} | AVERAGE ⚠ | TiDB: Schema lease "outdate" errors , rate |
| TiDB: Instance is not responding | - | last(/TiDB by HTTP/tidb.status)=0 | AVERAGE ⚠ | TiDB: Status |
| TiDB: There are panicked TiDB threads | When a panic occurs, an alert is triggered. The thread is often recovered, otherwise, TiDB will frequently restart. | last(/TiDB by HTTP/tidb.tidb_server_panic_total.rate)>0 | AVERAGE ⚠ | TiDB: Server panic, rate |
| TiDB: Too many region related errors | - | min(/TiDB by HTTP/tidb.tikvclient_region_err.rate,5m)>{$TIDB.REGION_ERROR.MAX.WARN} | AVERAGE ⚠ | TiDB: TiClient region errors, rate |
| TiDB: has been restarted | Uptime is less than 10 minutes. | last(/TiDB by HTTP/tidb.uptime)<10m | INFO 🔔 | TiDB: Uptime |
| TiDB: Version has changed | TiDB version has changed. Ack to close. | last(/TiDB by HTTP/tidb.version,#1)<>last(/TiDB by HTTP/tidb.version,#2) and length(last(/TiDB by HTTP/tidb.version))>0 | INFO 🔔 | TiDB: Version |
Discovery rule №1
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| KV metrics discovery | Discovery KV specific metrics. | DEPENDENT | 0 | tidb.kv_ops.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| TiDB: KV Commands: {#TYPE}, rate | The number of executed KV commands per second. | DEPENDENT | - | tidb.tikvclient_txn.rate[{#TYPE}] |
Discovery rule №2
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| QPS metrics discovery | Discovery QPS specific metrics. | DEPENDENT | 0 | tidb.qps.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| TiDB: Server query "Error": {#TYPE}, rate | The number of queries on TiDB instance per second with failure of command execution results. | DEPENDENT | - | tidb.server_query.error.rate[{#TYPE}] |
| TiDB: Server query "OK": {#TYPE}, rate | The number of queries on TiDB instance per second with success of command execution results. | DEPENDENT | - | tidb.server_query.ok.rate[{#TYPE}] |
Discovery rule №3
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Statement metrics discovery | Discovery statement specific metrics. | DEPENDENT | 0 | tidb.statement.discover |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| TiDB: SQL statements: {#TYPE}, rate | The number of SQL statements executed per second. | DEPENDENT | - | tidb.statement.rate[{#TYPE}] |
Discovery rule №4
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| KV backoff discovery | Discovery KV backoff specific metrics. | DEPENDENT | 0 | tidb.tikvclient_backoff.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| TiDB: KV backoff: {#TYPE}, rate | The number of TiDB operations that resolve locks per second. When TiDB's read or write request encounters a lock, it tries to resolve the lock. | DEPENDENT | - | tidb.tikvclient_backoff.rate[{#TYPE}] |
Discovery rule №5
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| GC action results discovery | Discovery GC action results metrics. | DEPENDENT | 0 | tidb.tikvclient_gc_action.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| TiDB: GC action result: {#TYPE}, rate | The number of results of GC-related operations per second. | DEPENDENT | - | tidb.tikvclient_gc_action.rate[{#TYPE}] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| TiDB: Too many failed GC-related operations | - | min(/TiDB by HTTP/tidb.tikvclient_gc_action.rate[{#TYPE}],5m)>{$TIDB.GC_ACTIONS.ERRORS.MAX.WARN} | WARNING 📢 | TiDB: GC action result: {#TYPE}, rate |
Discovery rule №6
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Lock resolves discovery | Discovery lock resolves specific metrics. | DEPENDENT | 0 | tidb.tikvclient_lock_resolver_action.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| TiDB: Lock resolves: {#TYPE}, rate | The number of TiDB operations that resolve locks per second. When TiDB's read or write request encounters a lock, it tries to resolve the lock. | DEPENDENT | - | tidb.tikvclient_lock_resolver_action.rate[{#TYPE}] |