Kubernetes API server by HTTP
Macros used
| Name | Value |
|---|---|
| {$KUBE.API.CERT.EXPIRATION} | 7 |
| {$KUBE.API.HTTP.CLIENT.ERROR} | 2 |
| {$KUBE.API.HTTP.SERVER.ERROR} | 2 |
| {$KUBE.API.SERVER.URL} | http://localhost:8086/metrics |
| {$KUBE.API.TOKEN} | - |
Items collected
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Kubernetes API: Request terminations, rate | Number of requests which apiserver terminated in self-defense per second. | DEPENDENT | - | kubernetes.api.apiserver_request_terminations |
| Kubernetes API: API server requests: 0 | Counter of apiserver requests broken out for each HTTP response code. | DEPENDENT | - | kubernetes.api.apiserver_request_total_0.rate |
| Kubernetes API: API server requests: 2xx, rate | Counter of apiserver requests broken out for each HTTP response code. | DEPENDENT | - | kubernetes.api.apiserver_request_total_200.rate |
| Kubernetes API: API server requests: 3xx, rate | Counter of apiserver requests broken out for each HTTP response code. | DEPENDENT | - | kubernetes.api.apiserver_request_total_300.rate |
| Kubernetes API: API server requests: 4xx, rate | Counter of apiserver requests broken out for each HTTP response code. | DEPENDENT | - | kubernetes.api.apiserver_request_total_400.rate |
| Kubernetes API: API server requests: 5xx, rate | Counter of apiserver requests broken out for each HTTP response code. | DEPENDENT | - | kubernetes.api.apiserver_request_total_500.rate |
| Kubernetes API: TLS handshake errors, rate | Number of requests dropped with 'TLS handshake error from' error per second. | DEPENDENT | - | kubernetes.api.apiserver_tls_handshake_errors_total.rate |
| Kubernetes API: Audit events, total | Accumulated number audit events generated and sent to the audit backend. | DEPENDENT | - | kubernetes.api.audit_event_total |
| Kubernetes API: CPU | Total user and system CPU usage ratio. | DEPENDENT | - | kubernetes.api.cpu.util |
| Kubernetes API: Get API instance metrics | Get raw metrics from API instance /metrics endpoint. | HTTP_AGENT | - | kubernetes.api.get_metrics |
| Kubernetes API: Goroutines | Number of goroutines that currently exist. | DEPENDENT | - | kubernetes.api.go_goroutines |
| Kubernetes API: Go threads | Number of OS threads created. | DEPENDENT | - | kubernetes.api.go_threads |
| Kubernetes API: gRPCs messages ressived, rate | Total number of gRPC stream messages received per second. | DEPENDENT | - | kubernetes.api.grpc_client_msg_received.rate |
| Kubernetes API: gRPCs messages sent, rate | Total number of gRPC stream messages sent per second. | DEPENDENT | - | kubernetes.api.grpc_client_msg_sent.rate |
| Kubernetes API: gRPCs client started, rate | Total number of RPCs started per second. | DEPENDENT | - | kubernetes.api.grpc_client_started.rate |
| Kubernetes API: Fds max | Maximum allowed open file descriptors. | DEPENDENT | - | kubernetes.api.max_fds |
| Kubernetes API: Fds open | Number of open file descriptors. | DEPENDENT | - | kubernetes.api.open_fds |
| Kubernetes API: Resident memory, bytes | Resident memory size in bytes. | DEPENDENT | - | kubernetes.api.process_resident_memory_bytes |
| Kubernetes API: Virtual memory, bytes | Virtual memory size in bytes. | DEPENDENT | - | kubernetes.api.process_virtual_memory_bytes |
| Kubernetes API: HTTP requests: 2xx, rate | Number of HTTP requests with 2xx status code per second. | DEPENDENT | - | kubernetes.api.rest_client_requests_total_200.rate |
| Kubernetes API: HTTP requests: 3xx, rate | Number of HTTP requests with 3xx status code per second. | DEPENDENT | - | kubernetes.api.rest_client_requests_total_300.rate |
| Kubernetes API: HTTP requests: 4xx, rate | Number of HTTP requests with 4xx status code per second. | DEPENDENT | - | kubernetes.api.rest_client_requests_total_400.rate |
| Kubernetes API: HTTP requests: 5xx, rate | Number of HTTP requests with 5xx status code per second. | DEPENDENT | - | kubernetes.api.rest_client_requests_total_500.rate |
Triggers
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| Kubernetes API: Too many server errors | "Kubernetes API server is experiencing high error rate (with 5xx HTTP code). | min(/Kubernetes API server by HTTP/kubernetes.api.apiserver_request_total_500.rate,5m)>{$KUBE.API.HTTP.SERVER.ERROR} | WARNING 📢 | Kubernetes API: API server requests: 5xx, rate |
| Kubernetes API: Too many client errors | "Kubernetes API client is experiencing high error rate (with 5xx HTTP code). | min(/Kubernetes API server by HTTP/kubernetes.api.rest_client_requests_total_500.rate,5m)>{$KUBE.API.HTTP.CLIENT.ERROR} | WARNING 📢 | Kubernetes API: HTTP requests: 5xx, rate |
Discovery rule №1
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Watchers metrics discovery | Discovery watchers by kind. | DEPENDENT | 0 | kubernetes.api.apiserver_registered_watchers.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Kubernetes API: Watchers: {#KIND} | Number of currently registered watchers for a given resource. | DEPENDENT | - | kubernetes.api.apiserver_registered_watchers["{#KIND}"] |
Discovery rule №2
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Authentication requests discovery | Discovery authentication attempts by name. | DEPENDENT | 0 | kubernetes.api.authenticated_user_requests.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Kubernetes API: Authenticated requests: {#NAME}, rate | Counter of authenticated requests broken out by username per second. | DEPENDENT | - | kubernetes.api.authenticated_user_requests.rate["{#NAME}"] |
Discovery rule №3
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Authentication attempts discovery | Discovery authentication attempts by result. | DEPENDENT | 0 | kubernetes.api.authentication_attempts.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Kubernetes API: Authentication attempts: {#RESULT}, rate | Authentication attempts by result per second. | DEPENDENT | - | kubernetes.api.authentication_attempts.rate["{#RESULT}"] |
Discovery rule №4
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Client certificate expiration histogram | Discovery raw data of client certificate expiration | DEPENDENT | 0 | kubernetes.api.certificate_expiration.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Kubernetes API: Client certificate expiration, p1 | 1 percentile of of the remaining lifetime on the certificate used to authenticate a request. | CALCULATED | - | kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}] |
| Kubernetes API: Certificate expiration seconds bucket, {#LE} | Distribution of the remaining lifetime on the certificate used to authenticate a request. | DEPENDENT | - | kubernetes.api.client_certificate_expiration_seconds_bucket[{#LE}] |
Trigger prototypes
| Name | Description | Expression | Priority | Dependencies |
|---|---|---|---|---|
| Kubernetes API: Kubernetes client certificate expires soon | A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours. | last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < 246060 | WARNING 📢 | Kubernetes API: Client certificate expiration, p1 |
| Kubernetes API: Kubernetes client certificate is expiring | A client certificate used to authenticate to the apiserver is expiring in {$KUBE.API.CERT.EXPIRATION} days. | last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < {$KUBE.API.CERT.EXPIRATION}2460*60 | WARNING 📢 | Kubernetes API: Client certificate expiration, p1 |
Discovery rule №5
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Etcd objects metrics discovery | Discovery etcd objects by resource. | DEPENDENT | 0 | kubernetes.api.etcd_object_counts.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Kubernetes API: etcd objects: {#RESOURCE} | Number of stored objects at the time of last check split by kind. | DEPENDENT | - | kubernetes.api.etcd_object_counts["{#RESOURCE}"] |
Discovery rule №6
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| gRPC completed requests discovery | Discovery grpc completed requests by grpc code. | DEPENDENT | 0 | kubernetes.api.grpc_client_handled.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Kubernetes API: gRPCs completed: {#GRPC_CODE}, rate | Total number of RPCs completed by the client regardless of success or failure per second. | DEPENDENT | - | kubernetes.api.grpc_client_handled_total.rate["{#GRPC_CODE}"] |
Discovery rule №7
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Requests inflight discovery | Discovery requests inflight by kind. | DEPENDENT | 0 | kubernetes.api.inflight_requests.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Kubernetes API: Requests current: {#KIND} | Maximal number of currently used inflight request limit of this apiserver per request kind in last second. | DEPENDENT | - | kubernetes.api.current_inflight_requests["{#KIND}"] |
Discovery rule №8
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Long-running requests | Discovery of long-running requests by verb, resource and scope. | DEPENDENT | 0 | kubernetes.api.longrunning_gauge.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Kubernetes API: Long-running ["{#VERB}"] requests ["{#RESOURCE}"]: {#SCOPE} | Gauge of all active long-running apiserver requests broken out by verb, resource and scope. Not all requests are tracked this way. | DEPENDENT | - | kubernetes.api.longrunning_gauge["{#RESOURCE}","{#SCOPE}","{#VERB}"] |
Discovery rule №9
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Request duration histogram | Discovery raw data and percentile items of request duration. | DEPENDENT | 0 | kubernetes.api.requests_bucket.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Kubernetes API: ["{#VERB}"] Requests bucket: {#LE} | Response latency distribution in seconds for each verb. | DEPENDENT | - | kubernetes.api.request_duration_seconds_bucket[{#LE},"{#VERB}"] |
| Kubernetes API: ["{#VERB}"] Requests, p50 | 50 percentile of response latency distribution in seconds for each verb. | CALCULATED | - | kubernetes.api.request_duration_seconds_p50["{#VERB}"] |
| Kubernetes API: ["{#VERB}"] Requests, p90 | 90 percentile of response latency distribution in seconds for each verb. | CALCULATED | - | kubernetes.api.request_duration_seconds_p90["{#VERB}"] |
| Kubernetes API: ["{#VERB}"] Requests, p95 | 95 percentile of response latency distribution in seconds for each verb. | CALCULATED | - | kubernetes.api.request_duration_seconds_p95["{#VERB}"] |
| Kubernetes API: ["{#VERB}"] Requests, p99 | 99 percentile of response latency distribution in seconds for each verb. | CALCULATED | - | kubernetes.api.request_duration_seconds_p99["{#VERB}"] |
Discovery rule №10
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Workqueue metrics discovery | Discovery workqueue metrics by name. | DEPENDENT | 0 | kubernetes.api.workqueue.discovery |
Item prototypes
| Name | Description | Type | Interval | Key and additional info |
|---|---|---|---|---|
| Kubernetes API: ["{#NAME}"] Workqueue adds total, rate | Total number of adds handled by workqueue per second. | DEPENDENT | - | kubernetes.api.workqueue_adds_total.rate["{#NAME}"] |
| Kubernetes API: ["{#NAME}"] Workqueue depth | Current depth of workqueue. | DEPENDENT | - | kubernetes.api.workqueue_depth["{#NAME}"] |