Skip to content

Commit

Permalink
Add alert if an OpenSearch scrape fails
Browse files Browse the repository at this point in the history
If a scrape fails, this might indicate that a unit is not in
a healthy state.

OpenSearch right now does not have a metric saying that one node
is down. E.g. If the systemd service is stopped in one node, the
cluster (N nodes) will drop the faulty node because connectivity
issues and the metrics will show that the cluster now has N-1 nodes
without saying that one node has failed.

With this new alert, at least a notification will appear if one
node stop being responsive.
  • Loading branch information
gabrielcocenza committed Nov 26, 2024
1 parent 1fa589c commit 454d363
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions src/alert_rules/prometheus/prometheus_alerts.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,17 @@
"groups":
- "name": "opensearch.alerts"
"rules":

- "alert": "OpenSearchScrapeFailed"
"annotations":
"message": "Scrape on {{ $labels.juju_unit }} failed. Ensure that the OpenSearch systemd service is healthy and that the unit is part of the cluster."
"summary": "OpenSearch exporter scrape failed"
"expr": |
up{job=~".*opensearch_.*"} < 1
"for": "5m"
"labels":
"severity": "critical"

- "alert": "OpenSearchClusterRed"
"annotations":
"message": "Cluster {{ $labels.cluster }} health status has been RED for at least 2m. Cluster does not accept writes, shards may be missing or master node hasn't been elected yet."
Expand Down

0 comments on commit 454d363

Please sign in to comment.