Skip to content

Latest commit

 

History

History
87 lines (77 loc) · 38.8 KB

pod-metrics.md

File metadata and controls

87 lines (77 loc) · 38.8 KB

Pod Metrics

Metric name Metric type Description Unit (where applicable) Labels/tags Status Opt-in
kube_pod_annotations Gauge Kubernetes annotations converted to Prometheus labels controlled via --metric-annotations-allowlist pod=<pod-name>
namespace=<pod-namespace>
annotation_POD_ANNOTATION=<POD_ANNOTATION>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_info Gauge Information about pod pod=<pod-name>
namespace=<pod-namespace>
host_ip=<host-ip>
pod_ip=<pod-ip>
node=<node-name>
created_by_kind=<created_by_kind>
created_by_name=<created_by_name>
uid=<pod-uid>
priority_class=<priority_class>
host_network=<host_network>
STABLE -
kube_pod_ips Gauge Pod IP addresses pod=<pod-name>
namespace=<pod-namespace>
ip=<pod-ip-address>
ip_family=<4 OR 6>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_start_time Gauge Start time in unix timestamp for a pod seconds pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
STABLE -
kube_pod_completion_time Gauge Completion time in unix timestamp for a pod seconds pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
STABLE -
kube_pod_owner Gauge Information about the Pod's owner pod=<pod-name>
namespace=<pod-namespace>
owner_kind=<owner kind>
owner_name=<owner name>
owner_is_controller=<whether owner is controller>
uid=<pod-uid>
STABLE -
kube_pod_labels Gauge Kubernetes labels converted to Prometheus labels controlled via --metric-labels-allowlist pod=<pod-name>
namespace=<pod-namespace>
label_POD_LABEL=<POD_LABEL>
uid=<pod-uid>
STABLE -
kube_pod_nodeselectors Gauge Describes the Pod nodeSelectors pod=<pod-name>
namespace=<pod-namespace>
nodeselector_NODE_SELECTOR=<NODE_SELECTOR>
uid=<pod-uid>
EXPERIMENTAL Opt-in
kube_pod_status_phase Gauge The pods current phase pod=<pod-name>
namespace=<pod-namespace>
phase=<Pending|Running|Succeeded|Failed|Unknown>
uid=<pod-uid>
STABLE -
kube_pod_status_qos_class Gauge The pods current qosClass pod=<pod-name>
namespace=<pod-namespace>
qos_class=<BestEffort|Burstable|Guaranteed>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_status_ready Gauge Describes whether the pod is ready to serve requests pod=<pod-name>
namespace=<pod-namespace>
condition=<true|false|unknown>
uid=<pod-uid>
STABLE -
kube_pod_status_scheduled Gauge Describes the status of the scheduling process for the pod pod=<pod-name>
namespace=<pod-namespace>
condition=<true|false|unknown>
uid=<pod-uid>
STABLE -
kube_pod_container_info Gauge Information about a container in a pod container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
image=<image-name>
image_id=<image-id>
image_spec=<image-spec>
container_id=<containerid>
uid=<pod-uid>
STABLE -
kube_pod_container_status_waiting Gauge Describes whether the container is currently in waiting state container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
STABLE -
kube_pod_container_status_waiting_reason Gauge Describes the reason the container is currently in waiting state container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
reason=<container-waiting-reason>
uid=<pod-uid>
STABLE -
kube_pod_container_status_running Gauge Describes whether the container is currently in running state container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
STABLE -
kube_pod_container_state_started Gauge Start time in unix timestamp for a pod container seconds container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
STABLE -
kube_pod_container_status_terminated Gauge Describes whether the container is currently in terminated state container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
STABLE -
kube_pod_container_status_terminated_reason Gauge Describes the reason the container is currently in terminated state container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
reason=<container-terminated-reason>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_container_status_last_terminated_reason Gauge Describes the last reason the container was in terminated state container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
reason=<last-terminated-reason>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_container_status_last_terminated_exitcode Gauge Describes the exit code for the last container in terminated state. container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_container_status_last_terminated_timestamp Gauge Last terminated time for a pod container in unix timestamp. container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_container_status_ready Gauge Describes whether the containers readiness check succeeded container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
STABLE -
kube_pod_status_initialized_time Gauge Time when the pod is initialized. seconds pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_status_ready_time Gauge Time when pod passed readiness probes. seconds pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_status_container_ready_time Gauge Time when the container of the pod entered Ready state. seconds pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_container_status_restarts_total Counter The number of container restarts per container container=<container-name>
namespace=<pod-namespace>
pod=<pod-name>
uid=<pod-uid>
STABLE -
kube_pod_container_resource_requests Gauge The number of requested request resource by a container. It is recommended to use the kube_pod_resource_requests metric exposed by kube-scheduler instead, as it is more precise. cpu=<core>
memory=<bytes>
resource=<resource-name>
unit=<resource-unit>
container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
node=< node-name>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_container_resource_limits Gauge The number of requested limit resource by a container. It is recommended to use the kube_pod_resource_limits metric exposed by kube-scheduler instead, as it is more precise. cpu=<core>
memory=<bytes>
resource=<resource-name>
unit=<resource-unit>
container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
node=< node-name>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_overhead_cpu_cores Gauge The pod overhead in regards to cpu cores associated with running a pod core pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_overhead_memory_bytes Gauge The pod overhead in regards to memory associated with running a pod bytes pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_runtimeclass_name_info Gauge The runtimeclass associated with the pod pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_created Gauge Unix creation timestamp seconds pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
STABLE -
kube_pod_deletion_timestamp Gauge Unix deletion timestamp seconds pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_restart_policy Gauge Describes the restart policy in use by this pod pod=<pod-name>
namespace=<pod-namespace>
type=<Always|Never|OnFailure>
uid=<pod-uid>
STABLE -
kube_pod_init_container_info Gauge Information about an init container in a pod container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
image=<image-name>
image_id=<image-id>
image_spec=<image-spec>
container_id=<containerid>
uid=<pod-uid>
restart_policy=<restart-policy>
STABLE -
kube_pod_init_container_status_waiting Gauge Describes whether the init container is currently in waiting state container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
STABLE -
kube_pod_init_container_status_waiting_reason Gauge Describes the reason the init container is currently in waiting state container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
reason=<container-waiting-reason>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_init_container_status_running Gauge Describes whether the init container is currently in running state container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
STABLE -
kube_pod_init_container_status_terminated Gauge Describes whether the init container is currently in terminated state container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
STABLE -
kube_pod_init_container_status_terminated_reason Gauge Describes the reason the init container is currently in terminated state container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
reason=<container-terminated-reason>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_init_container_status_last_terminated_reason Gauge Describes the last reason the init container was in terminated state container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
reason=<last-terminated-reason>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_init_container_status_ready Gauge Describes whether the init containers readiness check succeeded container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
STABLE -
kube_pod_init_container_status_restarts_total Counter The number of restarts for the init container integer container=<container-name>
namespace=<pod-namespace>
pod=<pod-name>
uid=<pod-uid>
STABLE -
kube_pod_init_container_resource_limits Gauge The number of CPU cores requested limit by an init container cpu=<core>
memory=<bytes>
resource=<resource-name>
unit=<resource-unit>
container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
node=< node-name>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_init_container_resource_requests Gauge The number of CPU cores requested by an init container cpu=<core>
memory=<bytes>
resource=<resource-name>
unit=<resource-unit>
container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
node=< node-name>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_spec_volumes_persistentvolumeclaims_info Gauge Information about persistentvolumeclaim volumes in a pod pod=<pod-name>
namespace=<pod-namespace>
volume=<volume-name>
persistentvolumeclaim=<persistentvolumeclaim-claimname>
uid=<pod-uid>
STABLE -
kube_pod_spec_volumes_persistentvolumeclaims_readonly Gauge Describes whether a persistentvolumeclaim is mounted read only bool pod=<pod-name>
namespace=<pod-namespace>
volume=<volume-name>
persistentvolumeclaim=<persistentvolumeclaim-claimname>
uid=<pod-uid>
STABLE -
kube_pod_status_reason Gauge The pod status reasons pod=<pod-name>
namespace=<pod-namespace>
reason=<Evicted|NodeAffinity|NodeLost|Shutdown|UnexpectedAdmissionError>
uid=<pod-uid>
EXPERIMENTAL -
kube_pod_status_scheduled_time Gauge Unix timestamp when pod moved into scheduled status seconds pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
STABLE -
kube_pod_status_unschedulable Gauge Describes the unschedulable status for the pod pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
STABLE -
kube_pod_tolerations Gauge Information about the pod tolerations pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
key=<toleration-key>
operator=<toleration-operator>
value=<toleration-value>
effect=<toleration-effect> toleration_seconds=<toleration-seconds>
EXPERIMENTAL -
kube_pod_service_account Gauge The service account for a pod pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
service_account=<service_account>
EXPERIMENTAL -
kube_pod_scheduler Gauge The scheduler for a pod pod=<pod-name>
namespace=<pod-namespace>
uid=<pod-uid>
name=<scheduler-name>
EXPERIMENTAL -

Useful metrics queries

How to retrieve non-standard Pod state

It is not straightforward to get the Pod states for certain cases like "Terminating" and "Unknown" since it is not stored behind a field in the Pod.Status.

So to mimic the logic used by the kubectl command line, you will need to compose multiple metrics.

For example:

  • To get the list of pods that are in the Unknown state, you can run the following PromQL query: sum(kube_pod_status_phase{phase="Unknown"}) by (namespace, pod) or (count(kube_pod_deletion_timestamp) by (namespace, pod) * sum(kube_pod_status_reason{reason="NodeLost"}) by(namespace, pod))

  • For Pods in Terminating state: count(kube_pod_deletion_timestamp) by (namespace, pod) * count(kube_pod_status_reason{reason="NodeLost"} == 0) by (namespace, pod)

Here is an example of a Prometheus rule that can be used to alert on a Pod that has been in the Terminating state for more than 5m.

groups:
- name: Pod state
  rules:
  - alert: PodsBlockedInTerminatingState
    expr: count(kube_pod_deletion_timestamp) by (namespace, pod) * count(kube_pod_status_reason{reason="NodeLost"} == 0) by (namespace, pod) > 0
    for: 5m
    labels:
      severity: page
    annotations:
      summary: Pod {{$labels.namespace}}/{{$labels.pod}} blocked in Terminating state.