Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new Ceph dashboards feedback #33

Open
linwalth opened this issue Jul 13, 2023 · 2 comments
Open

new Ceph dashboards feedback #33

linwalth opened this issue Jul 13, 2023 · 2 comments

Comments

@linwalth
Copy link
Contributor

linwalth commented Jul 13, 2023

I have found the following nonfunctional configurations in the following dashboards:

First off - no uid is needed to be set for the dashboards, as they will create their own UID when imported.
Also, a version is also not needed.

So the parameters "version" and "uuid" after the "title" parameter do not need to be set. Specifically since it can be a problem to reimport a dashboard for testing and tweaking and the UUID already exists. Though the dashboard title will also be a problem in that case ;)

  • irate(...[1m]) does not produce any output, specifically since the standard scrape interval is 1m.
  • same for deriv(...[1m])
  • the regex "([^:.])." will break the queries for hostschemas like "host.foo.bar.co.uk:9283" because it will cut and match after the first period, leaving the scraping instance name as "host". if this is functional behavior in other cases, i would just make sure to inform about it.
  • Being able to choose "All" as $job variable and/or multiple job variables from the dropdown will break many dashboard panels in osd-device-details.json,osds-overview.json,pool-detail.json
  • the grafana container does not seem to natively have the grafana piechart panel plugin: "Panel plugin not found: grafana-piechart-panel" (osd-overview.json)
  • selection by "job" label is needed in the "Top Slow Ops" Panel if using multiple clusters like so: (ceph_daemon_health_metrics{job="$job", type="SLOW_OPS", ceph_daemon=~"osd.*"})
  • scraping the Ceph-MGR with the Kolla Prometheus will have the instance name of the manager that was scraped as instance name, hence most of the times when grouping node_exporter metrics with ceph metrics (or sometimes just using ceph metrics, like in avg disk utilization in hosts-overview.json, to relabel the "instance" label with "exported_instance" like so (example):
label_replace(
  label_replace(
    ceph_disk_occupation_human{job=~"$job", exported_instance=~"($ceph_hosts)([\\\\.:].*)?"}, "device", "$1", "device", "/dev/(.*)"
  ), "instance", "$1", "exported_instance", "([^:]*).*"
)
  • radosgw-overview.json: the regex to select radosgw servers is not a regex but rather the description that ought to go into the var. Hence, no daemons/servers are selected.
  • radosgw-sync-overview.json and radosgw-overview.json uses vars that our prometheus does not know. insight as to where prometheus would get these metrics (e.g. ceph_data_sync_from_zone_fetch_bytes_* etc. / haproxy_frontend_http_responses_* / ceph_rbd_* )would be appreciated.
@berendt
Copy link
Member

berendt commented Jul 13, 2023

Thanks for trying it out and the feeback. We'll take a look at it.

@linwalth
Copy link
Contributor Author

Further information regarding the UID matter: Importing the custom grafana home dashboard with a UID creates an error on the splashpage after login (something about a dashboard not being found for a UID) - Taking the UID out of the home dashboard fixed this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants