Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create more specific checks around ceph #678

Open
Alfano93 opened this issue Sep 27, 2019 · 1 comment
Open

Create more specific checks around ceph #678

Alfano93 opened this issue Sep 27, 2019 · 1 comment

Comments

@Alfano93
Copy link

Currently, our monitoring only watches the status of a ceph cluster and alerts if it is in a HEALTH_WARN or HEALTH_ERR state. An engineer has no more information than the cluster health state and and cannot determine the severity of an alert based on these statuses alone.

We should have more specific alerting around ceph and state specifically why a cluster is in the state that it is in. Ceph can report health with json

ceph health -f json

{"checks":{"OSD_BACKFILLFULL":{"severity":"HEALTH_WARN","summary":{"message":"6 backfillfull osd(s)"}},"POOL_BACKFILLFULL":{"severity":"HEALTH_WARN","summary":{"message":"14 pool(s) backfillfull"}}},"status":"HEALTH_WARN","summary":[{"severity":"HEALTH_WARN","summary":"'ceph health' JSON format has changed in luminous. If you see this your monitoring system is scraping the wrong fields. Disable this with 'mon health preluminous compat warning = false'"}],"overall_status":"HEALTH_WARN"}

This gives the reasons why a cluster is in the state that it is in. With better descriptions, ceph alerts would look less scary and could cut down in the time it takes to do them, and if we get more specific with what we alert on, cut down the number of alerts ceph creates.

@shannonmitchell
Copy link
Contributor

The new checks and alarms are being worked in the following PR: #712

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants