-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(istio-alerts): Add service selector validity alert (#287)
> [!WARNING] > Reminder for @LaikaN57 - Update other charts in this repo after this PR is merged. Split from #285 We ran into an issue where unbeknownst to us, our pod selector was set incorrectly and therefore all of our rules deployed via prometheus-alerts did not have any series to evaluate. This lack of series data is silently ignored. Here we are resolving this by implementing a rule which will use the various selectors to see if a basic metric exists or not. If it does not exist then we want to warn the user that non of the other rules will be evaluated. Snippet from the other PR about using `kube_service_info` as our test metric. > I think I see what you're doing here ... but I see a risk. If there is no traffic to the service for example, when its being launched), there will be no metrics that match this query. I propose that instead we do this: > > ``` > count(kube_service_info{namespace={{ $.Release.Namespace }}, name={{ $.Release.Namespace $.Values.serviceRules }}) > ``` > > Sadly - I know that this means we aren't using the exact selectors (like I propose above) ... but this allows us to verify that the Service actually exists by that name. Ya know? Lastly, I set the timing of this metric up for 1 hour by default. This felt okay to me as we want to be a little bit resilient to prometheus metric collection outages and not accidentally page every service owner if we have a centralized outage of the platform itself. Normally, a team might only be paged once for this when they first setup their application if they had not setup their selectors correctly. This alert is not expected to go into an alert state without some heavy handed naming changes (mostly done during things like migrations).
- Loading branch information
Showing
6 changed files
with
109 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,8 +2,10 @@ apiVersion: v2 | |
name: istio-alerts | ||
description: A Helm chart that provisions a series of alerts for istio VirtualServices | ||
type: application | ||
version: 0.2.0 | ||
version: 0.3.0 | ||
appVersion: 0.0.1 | ||
maintainers: | ||
- name: diranged | ||
email: [email protected] | ||
- name: laikan57 | ||
email: [email protected] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
# istio-alerts | ||
|
||
![Version: 0.2.0](https://img.shields.io/badge/Version-0.2.0-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 0.0.1](https://img.shields.io/badge/AppVersion-0.0.1-informational?style=flat-square) | ||
![Version: 0.3.0](https://img.shields.io/badge/Version-0.3.0-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 0.0.1](https://img.shields.io/badge/AppVersion-0.0.1-informational?style=flat-square) | ||
|
||
A Helm chart that provisions a series of alerts for istio VirtualServices | ||
|
||
|
@@ -9,6 +9,18 @@ A Helm chart that provisions a series of alerts for istio VirtualServices | |
| Name | Email | Url | | ||
| ---- | ------ | --- | | ||
| diranged | <[email protected]> | | | ||
| laikan57 | <[email protected]> | | | ||
|
||
## Upgrade Notes | ||
|
||
### 0.2.x -> 0.3.x | ||
|
||
**BREAKING: The DestinationServiceSelectorValidity alert rule requires kube-state-metrics.** | ||
|
||
An alert was introduced in 0.3.x that requires kube-state-metrics to be installed in the cluster. If | ||
you do not have kube-state-metrics installed, you will need to disable the alert by setting | ||
`serviceRules.destinationServiceSelectorValidity.enabled` to `false`. This alert is used to detect | ||
if the destinationServiceSelector is actually selecting series for a service that exists. | ||
|
||
## Values | ||
|
||
|
@@ -25,7 +37,12 @@ A Helm chart that provisions a series of alerts for istio VirtualServices | |
| defaults.additionalRuleLabels | object | `{}` | Additional custom labels attached to every PrometheusRule | | ||
| defaults.runbookUrl | string | `"https://github.com/Nextdoor/k8s-charts/blob/main/charts/istio-alerts/runbook.md"` | The prefix URL to the runbook_urls that will be applied to each PrometheusRule | | ||
| serviceRules.destinationServiceName | string | `".*"` | Narrow down the alerts to a particular Destination Service if there are multiple services that require different thresholds within the same namespace. | | ||
| serviceRules.destinationServiceSelectorValidity | object | `{"enabled":true,"for":"1h","severity":"warning"}` | Does a basic lookup using the defined selectors to see if we can see any info for a given selector. This is the "watcher for the watcher". If we get alerted by this, we likely have a bad selector and our alerts are not going to ever fire. | | ||
| serviceRules.destinationServiceSelectorValidity.enabled | bool | `true` | Whether to enable the monitor on the selector for the VirtualService. | | ||
| serviceRules.destinationServiceSelectorValidity.for | string | `"1h"` | How long to evaluate. | | ||
| serviceRules.destinationServiceSelectorValidity.severity | string | `"warning"` | Severity of the monitor | | ||
| serviceRules.enabled | bool | `true` | Whether to enable the service rules template | | ||
| serviceRules.highRequestLatency | object | `{"enabled":true,"for":"15m","percentile":0.95,"severity":"warning","threshold":0.5}` | Configuration related to the latency monitor for the VirtualService. | | ||
| serviceRules.highRequestLatency.enabled | bool | `true` | Whether to enable the monitor on latency returned by the VirtualService. | | ||
| serviceRules.highRequestLatency.for | string | `"15m"` | How long to evaluate the latency of services. | | ||
| serviceRules.highRequestLatency.percentile | float | `0.95` | Which percentile to monitor - should be between 0 and 1. Default is 95th percentile. | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
{{ template "chart.header" . }} | ||
|
||
{{ template "chart.versionBadge" . }}{{ template "chart.typeBadge" . }}{{ template "chart.appVersionBadge" . }} | ||
|
||
{{ template "chart.description" . }} | ||
|
||
{{ template "chart.maintainersSection" . }} | ||
|
||
## Upgrade Notes | ||
|
||
### 0.2.x -> 0.3.x | ||
|
||
**BREAKING: The DestinationServiceSelectorValidity alert rule requires kube-state-metrics.** | ||
|
||
An alert was introduced in 0.3.x that requires kube-state-metrics to be installed in the cluster. If | ||
you do not have kube-state-metrics installed, you will need to disable the alert by setting | ||
`serviceRules.destinationServiceSelectorValidity.enabled` to `false`. This alert is used to detect | ||
if the destinationServiceSelector is actually selecting series for a service that exists. | ||
|
||
{{ template "chart.requirementsSection" . }} | ||
|
||
{{ template "chart.valuesSection" . }} | ||
|
||
{{ template "helm-docs.versionFooter" . }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters