Skip to content

Commit

Permalink
Merge pull request #233 from navikt/alarmer
Browse files Browse the repository at this point in the history
Legger til alarmer (PrometheusRule) for veilarbfilter
  • Loading branch information
slovrid authored Aug 11, 2023
2 parents 8aca941 + 2b1a844 commit b17c5d9
Show file tree
Hide file tree
Showing 6 changed files with 73 additions and 3 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build-deploy-feature-branch-dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,5 +45,5 @@ jobs:
env:
APIKEY: ${{ secrets.NAIS_DEPLOY_APIKEY }}
CLUSTER: dev-fss
RESOURCE: nais-dev.yaml
RESOURCE: .nais/application/application-config-dev.yaml
VAR: version=${{ env.IMAGE_TAG }}
25 changes: 25 additions & 0 deletions .github/workflows/deploy-alerts-to-prod.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: Deploy alerts for veilarbfilter to prod-fss

on:
push:
branches:
- 'master'
paths:
- '.github/workflows/deploy-alerts-to-prod.yaml'
- '.nais/alerts/alerts-config-prod.yaml'
workflow_dispatch:

jobs:
deploy-alerts:
name: Deploy alerts to prod-fss
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3

- name: Deploy to prod-fss
uses: nais/deploy/actions/deploy@v1
env:
APIKEY: ${{ secrets.NAIS_DEPLOY_APIKEY_OBO }}
CLUSTER: prod-fss
RESOURCE: .nais/alerts/alerts-config-prod.yaml
4 changes: 2 additions & 2 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ jobs:
env:
APIKEY: ${{ secrets.NAIS_DEPLOY_APIKEY }}
CLUSTER: dev-fss
RESOURCE: nais-dev.yaml
RESOURCE: .nais/application/application-config-dev.yaml
VAR: version=${{ env.IMAGE_TAG }}
deploy-prod:
name: Deploy application to prod
Expand All @@ -77,7 +77,7 @@ jobs:
env:
APIKEY: ${{ secrets.NAIS_DEPLOY_APIKEY }}
CLUSTER: prod-fss
RESOURCE: nais-prod.yaml
RESOURCE: .nais/application/application-config-prod.yaml
VAR: version=${{ env.IMAGE_TAG }}
release-prod:
name: Create prod release
Expand Down
45 changes: 45 additions & 0 deletions .nais/alerts/alerts-config-prod.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
apiVersion: "monitoring.coreos.com/v1"
kind: PrometheusRule
metadata:
name: team-obo-alarmer-veilarbfilter
namespace: obo
labels:
team: obo
spec:
groups:
- name: team-obo-alarmer-veilarbfilter
rules:
# Kubernetes-spesifikke alerts
- alert: Applikasjon er nede
expr: kube_deployment_status_replicas_available{deployment="veilarbfilter"} == 0
for: 1m
annotations:
summary: "App {{ $labels.deployment }} er nede i namespace {{ $labels.namespace }}!"
consequence: "Appen kan ikke nås av andre applikasjoner, noe som kan potensielt ha stor konsekvens for brukere (nedetid, mm.)."
action: "Diagnostiser applikasjonen ved hjelp av relevante kubectl-kommandoer (`kubectl get pod -l app={{ $labels.deployment }}`, `kubectl describe pod <pod>`, `kubectl get events --field-selector involvedObject.name=<pod>`)."
labels:
namespace: obo
severity: critical

# Spring Boot spesifikke alerts
- alert: Høy andel serverfeil (HTTP 5XX)
expr: (100 * (sum(rate(http_server_requests_seconds_count{app="veilarbfilter", outcome="SERVER_ERROR"}[5m])) / sum(rate(http_server_requests_seconds_count{app="veilarbfilter"}[5m])))) > 1
for: 5m
annotations:
summary: "Andelen HTTP 5XX feil i veilarbfilter har oversteget 1% de siste 5 minuttene."
consequence: "Potensielle konsekvenser for bruker kan være forhøyet andel opplevd feil, degradert ytelse, mm."
action: "Sjekk logger for å se hvilke feil som oppstår og start feilsøking."
labels:
namespace: obo
severity: critical

- alert: Høy andel klientfeil (HTTP 4XX)
expr: (100 * (sum(rate(http_server_requests_seconds_count{app="veilarbfilter", outcome="CLIENT_ERROR"}[5m])) / sum(rate(http_server_requests_seconds_count{app="veilarbfilter"}[5m])))) > 10
for: 5m
annotations:
summary: "Andelen HTTP 4XX feil i veilarbfilter har oversteget 10% de siste 5 minuttene."
consequence: "Potensielle konsekvenser for bruker kan være forhøyet andel opplevd feil, degradert ytelse, mm."
action: "Sjekk logger for å se hvilke feil som oppstår og start feilsøking."
labels:
namespace: obo
severity: warning
File renamed without changes.
File renamed without changes.

0 comments on commit b17c5d9

Please sign in to comment.