Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alert if an OpenSearch scrape fails #507

Merged
merged 2 commits into from
Nov 27, 2024

Conversation

gabrielcocenza
Copy link
Member

@gabrielcocenza gabrielcocenza commented Nov 26, 2024

If a scrape fails, this might indicate that a unit is not in a healthy state.

OpenSearch right now does not have a metric saying that one node is down. E.g. If the systemd service is stopped in one node, the cluster (N nodes) will drop the faulty node because connectivity issues and the metrics will show that the cluster now has N-1 nodes without saying that one node has failed.

With this new alert, at least a notification will appear if one node stop being responsive.

How to test:

  • Deploy opensearch units
  • Stop the opensearch daemon in one of the units

The grafana-agent injects the juju topology at the alert rule, so the expression up < 1 will filter just for OpenSearch apps:

image

The alert will trigger:
image

If a scrape fails, this might indicate that a unit is not in
a healthy state.

OpenSearch right now does not have a metric saying that one node
is down. E.g. If the systemd service is stopped in one node, the
cluster (N nodes) will drop the faulty node because connectivity
issues and the metrics will show that the cluster now has N-1 nodes
without saying that one node has failed.

With this new alert, at least a notification will appear if one
node stop being responsive.
phvalguima
phvalguima previously approved these changes Nov 27, 2024
grafana-agent already inject the juju topology, so it's not
necessary to filter by jobs or application
Copy link
Contributor

@Mehdi-Bendriss Mehdi-Bendriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one! thank you

@gabrielcocenza gabrielcocenza merged commit e225961 into canonical:2/edge Nov 27, 2024
36 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants