Перейти к основному содержимому

TiDB by HTTP

Macros used

NameValue
{$TIDB.DDL.WAITING.MAX.WARN}5
{$TIDB.GC_ACTIONS.ERRORS.MAX.WARN}1
{$TIDB.HEAP.USAGE.MAX.WARN}10G
{$TIDB.MONITOR_KEEP_ALIVE.MAX.WARN}10
{$TIDB.OPEN.FDS.MAX.WARN}90
{$TIDB.PORT}10080
{$TIDB.REGION_ERROR.MAX.WARN}50
{$TIDB.SCHEMA_LEASE_ERRORS.MAX.WARN}0
{$TIDB.SCHEMA_LOAD_ERRORS.MAX.WARN}1
{$TIDB.TIME_JUMP_BACK.MAX.WARN}1
{$TIDB.URL}localhost

Items collected

NameDescriptionTypeIntervalKey and additional info
TiDB: CPUTotal user and system CPU usage ratio.DEPENDENT

-

tidb.cpu.util
TiDB: DDL waiting jobsThe number of TiDB operations that resolve locks per second. When TiDB's read or write request encounters a lock, it tries to resolve the lock.DEPENDENT

-

tidb.ddl_waiting_jobs
TiDB: Load schema failed, rateThe total number of failures to reload the latest schema information in TiDB per second.DEPENDENT

-

tidb.domain_load_schema.failed.rate
TiDB: Load schema total, rateThe statistics of the schemas that TiDB obtains from TiKV per second.DEPENDENT

-

tidb.domain_load_schema.rate
TiDB: Failed Query, rateThe number of error occurred when executing SQL statements per second (such as syntax errors and primary key conflicts).DEPENDENT

-

tidb.execute_error.rate
TiDB: Get instance metricsGet TiDB instance metrics.HTTP_AGENT

-

tidb.get_metrics
TiDB: Get instance statusGet TiDB instance status info.HTTP_AGENT

-

tidb.get_status
TiDB: Goroutine countThe number of Goroutines on TiDB instance.DEPENDENT

-

tidb.goroutines
TiDB: Heap memory usageNumber of heap bytes that are in use.DEPENDENT

-

tidb.heap_bytes
TiDB: Keep alive, rateThe number of times that the metrics are refreshed on TiDB instance per minute.DEPENDENT

-

tidb.monitor_keep_alive.rate
TiDB: Time jump back, rateThe number of times that the operating system rewinds every second.DEPENDENT

-

tidb.monitor_time_jump_back.rate
TiDB: PD TSO commands, rateThe number of TSO commands that TiDB obtains from PD per second.DEPENDENT

-

tidb.pd_tso_cmd.rate
TiDB: PD TSO requests, rateThe number of TSO requests that TiDB obtains from PD per second.DEPENDENT

-

tidb.pd_tso_request.rate
TiDB: Open file descriptors, maxMaximum number of open file descriptors.DEPENDENT

-

tidb.process_max_fds
TiDB: Open file descriptorsNumber of open file descriptors.DEPENDENT

-

tidb.process_open_fds
TiDB: RSS memory usageResident memory size in bytes.DEPENDENT

-

tidb.rss_bytes
TiDB: Total "error" server query, rateThe number of queries on TiDB instance per second with failure of command execution results.DEPENDENT

-

tidb.server_query.error.rate
TiDB: Total "ok" server query, rateThe number of queries on TiDB instance per second with success of command execution results.DEPENDENT

-

tidb.server_query.ok.rate
TiDB: Total server query, rateThe number of queries per second on TiDB instance.DEPENDENT

-

tidb.server_query.rate
TiDB: Schema lease "change" errors, rateThe number of schema lease errors per second. "change" means that the schema has changedDEPENDENT

-

tidb.session_schema_lease_error.change.rate
TiDB: Schema lease "outdate" errors , rateThe number of schema lease errors per second. "outdate" errors means that the schema cannot be updated, which is a more serious error and triggers an alert.DEPENDENT

-

tidb.session_schema_lease_error.outdate.rate
TiDB: SQL statements, rateThe total number of SQL statements executed per second.DEPENDENT

-

tidb.statement_total.rate
TiDB: StatusStatus of PD instance.DEPENDENT

-

tidb.status
TiDB: Server connectionsThe connection number of current TiDB instance.DEPENDENT

-

tidb.tidb_server_connections
TiDB: Server critical error, rateThe number of critical errors occurred in TiDB per second.DEPENDENT

-

tidb.tidb_server_critical_error_total.rate
TiDB: Server panic, rateThe number of panics occurred in TiDB per second.DEPENDENT

-

tidb.tidb_server_panic_total.rate
TiDB: KV backoff, rateThe number of errors returned by TiKV.DEPENDENT

-

tidb.tikvclient_backoff.rate
TiDB: Lock resolves, rateThe number of DDL tasks that are waiting.DEPENDENT

-

tidb.tikvclient_lock_resolver_action.rate
TiDB: TiClient region errors, rateThe number of region related errors returned by TiKV per second.DEPENDENT

-

tidb.tikvclient_region_err.rate
TiDB: KV commands, rateThe number of executed KV commands per second.DEPENDENT

-

tidb.tikvclient_txn.rate
TiDB: UptimeThe runtime of each TiDB instance.DEPENDENT

-

tidb.uptime
TiDB: VersionVersion of the TiDB instance.DEPENDENT

-

tidb.version

Triggers

NameDescriptionExpressionPriorityDependencies
TiDB: Too many DDL waiting jobs

-

min(/TiDB by HTTP/tidb.ddl_waiting_jobs,5m)>{$TIDB.DDL.WAITING.MAX.WARN}WARNING 📢TiDB: DDL waiting jobs
TiDB: Too many schema lease errors

-

min(/TiDB by HTTP/tidb.domain_load_schema.failed.rate,5m)>{$TIDB.SCHEMA_LOAD_ERRORS.MAX.WARN}AVERAGE ⚠TiDB: Load schema failed, rate
TiDB: Heap memory usage is too high

-

min(/TiDB by HTTP/tidb.heap_bytes,5m)>{$TIDB.HEAP.USAGE.MAX.WARN}WARNING 📢TiDB: Heap memory usage
TiDB: Too few keep alive operationsIndicates whether the TiDB process still exists. If the number of times for tidb_monitor_keep_alive_total increases less than 10 per minute, the TiDB process might already exit and an alert is triggered.max(/TiDB by HTTP/tidb.monitor_keep_alive.rate,5m)<{$TIDB.MONITOR_KEEP_ALIVE.MAX.WARN}AVERAGE ⚠TiDB: Keep alive, rate
TiDB: Too many time jump backs

-

min(/TiDB by HTTP/tidb.monitor_time_jump_back.rate,5m)>{$TIDB.TIME_JUMP_BACK.MAX.WARN}WARNING 📢TiDB: Time jump back, rate
TiDB: Too many schema lease errorsThe latest schema information is not reloaded in TiDB within one lease.min(/TiDB by HTTP/tidb.session_schema_lease_error.outdate.rate,5m)>{$TIDB.SCHEMA_LEASE_ERRORS.MAX.WARN}AVERAGE ⚠TiDB: Schema lease "outdate" errors , rate
TiDB: Instance is not responding

-

last(/TiDB by HTTP/tidb.status)=0AVERAGE ⚠TiDB: Status
TiDB: There are panicked TiDB threadsWhen a panic occurs, an alert is triggered. The thread is often recovered, otherwise, TiDB will frequently restart.last(/TiDB by HTTP/tidb.tidb_server_panic_total.rate)>0AVERAGE ⚠TiDB: Server panic, rate
TiDB: Too many region related errors

-

min(/TiDB by HTTP/tidb.tikvclient_region_err.rate,5m)>{$TIDB.REGION_ERROR.MAX.WARN}AVERAGE ⚠TiDB: TiClient region errors, rate
TiDB: has been restartedUptime is less than 10 minutes.last(/TiDB by HTTP/tidb.uptime)<10mINFO 🔔TiDB: Uptime
TiDB: Version has changedTiDB version has changed. Ack to close.last(/TiDB by HTTP/tidb.version,#1)<>last(/TiDB by HTTP/tidb.version,#2) and length(last(/TiDB by HTTP/tidb.version))>0INFO 🔔TiDB: Version

Discovery rule №1

NameDescriptionTypeIntervalKey and additional info
KV metrics discoveryDiscovery KV specific metrics.DEPENDENT0tidb.kv_ops.discovery

Item prototypes

NameDescriptionTypeIntervalKey and additional info
TiDB: KV Commands: {#TYPE}, rateThe number of executed KV commands per second.DEPENDENT

-

tidb.tikvclient_txn.rate[{#TYPE}]

Discovery rule №2

NameDescriptionTypeIntervalKey and additional info
QPS metrics discoveryDiscovery QPS specific metrics.DEPENDENT0tidb.qps.discovery

Item prototypes

NameDescriptionTypeIntervalKey and additional info
TiDB: Server query "Error": {#TYPE}, rateThe number of queries on TiDB instance per second with failure of command execution results.DEPENDENT

-

tidb.server_query.error.rate[{#TYPE}]
TiDB: Server query "OK": {#TYPE}, rateThe number of queries on TiDB instance per second with success of command execution results.DEPENDENT

-

tidb.server_query.ok.rate[{#TYPE}]

Discovery rule №3

NameDescriptionTypeIntervalKey and additional info
Statement metrics discoveryDiscovery statement specific metrics.DEPENDENT0tidb.statement.discover

Item prototypes

NameDescriptionTypeIntervalKey and additional info
TiDB: SQL statements: {#TYPE}, rateThe number of SQL statements executed per second.DEPENDENT

-

tidb.statement.rate[{#TYPE}]

Discovery rule №4

NameDescriptionTypeIntervalKey and additional info
KV backoff discoveryDiscovery KV backoff specific metrics.DEPENDENT0tidb.tikvclient_backoff.discovery

Item prototypes

NameDescriptionTypeIntervalKey and additional info
TiDB: KV backoff: {#TYPE}, rateThe number of TiDB operations that resolve locks per second. When TiDB's read or write request encounters a lock, it tries to resolve the lock.DEPENDENT

-

tidb.tikvclient_backoff.rate[{#TYPE}]

Discovery rule №5

NameDescriptionTypeIntervalKey and additional info
GC action results discoveryDiscovery GC action results metrics.DEPENDENT0tidb.tikvclient_gc_action.discovery

Item prototypes

NameDescriptionTypeIntervalKey and additional info
TiDB: GC action result: {#TYPE}, rateThe number of results of GC-related operations per second.DEPENDENT

-

tidb.tikvclient_gc_action.rate[{#TYPE}]

Trigger prototypes

NameDescriptionExpressionPriorityDependencies
TiDB: Too many failed GC-related operations

-

min(/TiDB by HTTP/tidb.tikvclient_gc_action.rate[{#TYPE}],5m)>{$TIDB.GC_ACTIONS.ERRORS.MAX.WARN}WARNING 📢TiDB: GC action result: {#TYPE}, rate

Discovery rule №6

NameDescriptionTypeIntervalKey and additional info
Lock resolves discoveryDiscovery lock resolves specific metrics.DEPENDENT0tidb.tikvclient_lock_resolver_action.discovery

Item prototypes

NameDescriptionTypeIntervalKey and additional info
TiDB: Lock resolves: {#TYPE}, rateThe number of TiDB operations that resolve locks per second. When TiDB's read or write request encounters a lock, it tries to resolve the lock.DEPENDENT

-

tidb.tikvclient_lock_resolver_action.rate[{#TYPE}]