Skip to content

Commit

Permalink
fix(prometheus-alerts): the hpa alerts were referencing invalid labels (
Browse files Browse the repository at this point in the history
#313)

1. There is no `hpa` label - I think that KSM changed that to
`horizontalpodautoscaler` a while back and we never updated.
2. The alerts included all of labels from the KSM pods that the metric
comes from, which was wildly confusing and makes the alarm hard to
understand.

Co-authored-by: Matt Wise <[email protected]>
  • Loading branch information
diranged and diranged authored Jun 9, 2024
1 parent cb0fa5d commit 919760f
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 23 deletions.
2 changes: 1 addition & 1 deletion charts/prometheus-alerts/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ apiVersion: v2
name: prometheus-alerts
description: Helm Chart that provisions a series of common Prometheus Alerts
type: application
version: 1.8.0
version: 1.8.1
appVersion: 0.0.1
maintainers:
- name: diranged
Expand Down
2 changes: 1 addition & 1 deletion charts/prometheus-alerts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Helm Chart that provisions a series of common Prometheus Alerts

![Version: 1.8.0](https://img.shields.io/badge/Version-1.8.0-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 0.0.1](https://img.shields.io/badge/AppVersion-0.0.1-informational?style=flat-square)
![Version: 1.8.1](https://img.shields.io/badge/Version-1.8.1-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 0.0.1](https://img.shields.io/badge/AppVersion-0.0.1-informational?style=flat-square)

[deployments]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
[hpa]: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
Expand Down
46 changes: 25 additions & 21 deletions charts/prometheus-alerts/templates/containers-prometheusrule.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -671,26 +671,28 @@ spec:
summary: HPA has not matched descired number of replicas.
runbook_url: {{ $.Values.defaults.runbookUrl }}#alert-name-kubehpareplicasmismatch
description: >-
HPA {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.hpa {{`}}`}}
HPA {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.horizontalpodautoscaler {{`}}`}}
has not matched the desired number of replicas for longer than 15
minutes.
expr: |-
(
kube_horizontalpodautoscaler_status_desired_replicas{ {{- $hpaSelector -}} }
!=
kube_horizontalpodautoscaler_status_current_replicas{ {{- $hpaSelector -}} }
) and (
kube_horizontalpodautoscaler_status_current_replicas{ {{- $hpaSelector -}} }
>
kube_horizontalpodautoscaler_spec_min_replicas{ {{- $hpaSelector -}} }
) and (
kube_horizontalpodautoscaler_status_current_replicas{ {{- $hpaSelector -}} }
<
kube_horizontalpodautoscaler_spec_max_replicas{ {{- $hpaSelector -}} }
) and (
changes(kube_horizontalpodautoscaler_status_current_replicas[15m])
==
0
sum by (horizontalpodautoscaler, namespace) (
(
kube_horizontalpodautoscaler_status_desired_replicas{ {{- $hpaSelector -}} }
!=
kube_horizontalpodautoscaler_status_current_replicas{ {{- $hpaSelector -}} }
) and (
kube_horizontalpodautoscaler_status_current_replicas{ {{- $hpaSelector -}} }
>
kube_horizontalpodautoscaler_spec_min_replicas{ {{- $hpaSelector -}} }
) and (
kube_horizontalpodautoscaler_status_current_replicas{ {{- $hpaSelector -}} }
<
kube_horizontalpodautoscaler_spec_max_replicas{ {{- $hpaSelector -}} }
) and (
changes(kube_horizontalpodautoscaler_status_current_replicas[15m])
==
0
)
)
for: {{ .for }}
labels:
Expand All @@ -709,12 +711,14 @@ spec:
summary: HPA is running at max replicas
runbook_url: {{ $.Values.defaults.runbookUrl }}#alert-name-kubehpamaxedout
description: >-
HPA {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.hpa {{`}}`}}
HPA {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.horizontalpodautoscaler {{`}}`}}
has been running at max replicas for longer than 15 minutes.
expr: |-
kube_horizontalpodautoscaler_status_current_replicas{ {{- $hpaSelector -}} }
==
kube_horizontalpodautoscaler_spec_max_replicas{ {{- $hpaSelector -}} }
sum by (horizontalpodautoscaler, namespace) (
kube_horizontalpodautoscaler_status_current_replicas{ {{- $hpaSelector -}} }
==
kube_horizontalpodautoscaler_spec_max_replicas{ {{- $hpaSelector -}} }
)
for: {{ .for }}
labels:
severity: {{ .severity }}
Expand Down

0 comments on commit 919760f

Please sign in to comment.