You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 11, 2021. It is now read-only.
The alerts based on burn rate thresholds can be made easier if the operator exposes metrics based on the CRD thresholds.
My idea at this moment is having something like this on the CRD:
apiVersion: measure.slok.xyz/v1alpha1kind: ServiceLevelmetadata:
name: awesome-servicespec:
serviceLevelObjectives:
# A typical 5xx request SLO.
- name: "9999_http_request_lt_500"description: 99.99% of requests must be served with <500 status code.disable: falseavailabilityObjectivePercent: 99.99burnRates:
- errorBudgetDays: 30thresholds:
- timeRangeHours: 1errorBudgetPercent: 2
- timeRangeHours: 6errorBudgetPercent: 5
- timeRangeHours: 72errorBudgetPercent: 10serviceLevelIndicator:
prometheus:
address: http://127.0.0.1:9091totalQuery: | sum( increase(skipper_serve_host_duration_seconds_count{host="www_spotahome_com"}[2m]))errorQuery: | sum( increase(skipper_serve_host_duration_seconds_count{host="www_spotahome_com", code=~"5.."}[2m]))output:
prometheus: {}
We could have multiple burnRates and in each burn rate multiple thresholds.
I have a branch that creates the threshold metrics and sets the threshold information on labels:
# HELP service_level_slo_burn_rate_threshold Is the threshold for a burn rate period.
# TYPE service_level_slo_burn_rate_threshold gauge
service_level_slo_burn_rate_threshold{burn_rate_range="168h",error_budget_spent="10%",fake="true",namespace="ns0",service_level="fake-service0",slo="fake_slo0",team="fake-team0",total_error_budget_range="70d"} 1
service_level_slo_burn_rate_threshold{burn_rate_range="1h",error_budget_spent="2%",fake="true",namespace="ns0",service_level="fake-service0",slo="fake_slo0",team="fake-team0",total_error_budget_range="30d"} 14.4
service_level_slo_burn_rate_threshold{burn_rate_range="1h",error_budget_spent="2%",fake="true",namespace="ns0",service_level="fake-service0",slo="fake_slo1",team="fake-team1",total_error_budget_range="30d"} 14.4
service_level_slo_burn_rate_threshold{burn_rate_range="1h",error_budget_spent="2%",fake="true",namespace="ns0",service_level="fake-service0",slo="fake_slo2",team="fake-team2",total_error_budget_range="30d"} 14.4
service_level_slo_burn_rate_threshold{burn_rate_range="24h",error_budget_spent="7%",fake="true",namespace="ns0",service_level="fake-service0",slo="fake_slo0",team="fake-team0",total_error_budget_range="70d"} 4.9
service_level_slo_burn_rate_threshold{burn_rate_range="6h",error_budget_spent="3%",fake="true",namespace="ns0",service_level="fake-service0",slo="fake_slo0",team="fake-team0",total_error_budget_range="70d"} 8.4
service_level_slo_burn_rate_threshold{burn_rate_range="6h",error_budget_spent="5%",fake="true",namespace="ns0",service_level="fake-service0",slo="fake_slo0",team="fake-team0",total_error_budget_range="30d"} 6
service_level_slo_burn_rate_threshold{burn_rate_range="6h",error_budget_spent="5%",fake="true",namespace="ns0",service_level="fake-service0",slo="fake_slo1",team="fake-team1",total_error_budget_range="30d"} 6
service_level_slo_burn_rate_threshold{burn_rate_range="6h",error_budget_spent="5%",fake="true",namespace="ns0",service_level="fake-service0",slo="fake_slo2",team="fake-team2",total_error_budget_range="30d"} 6
service_level_slo_burn_rate_threshold{burn_rate_range="72h",error_budget_spent="10%",fake="true",namespace="ns0",service_level="fake-service0",slo="fake_slo0",team="fake-team0",total_error_budget_range="30d"} 1
service_level_slo_burn_rate_threshold{burn_rate_range="72h",error_budget_spent="10%",fake="true",namespace="ns0",service_level="fake-service0",slo="fake_slo1",team="fake-team1",total_error_budget_range="30d"} 1
service_level_slo_burn_rate_threshold{burn_rate_range="72h",error_budget_spent="10%",fake="true",namespace="ns0",service_level="fake-service0",slo="fake_slo2",team="fake-team2",total_error_budget_range="30d"} 1
# HELP service_level_slo_objective_ratio Is the objective of the SLO in ratio unit.
The alerts based on burn rate thresholds can be made easier if the operator exposes metrics based on the CRD thresholds.
My idea at this moment is having something like this on the CRD:
We could have multiple
burnRates
and in each burn rate multiplethresholds
.I have a branch that creates the threshold metrics and sets the threshold information on labels:
Any thoughs? @ese
The text was updated successfully, but these errors were encountered: