new Ceph dashboards feedback #33

linwalth · 2023-07-13T16:16:00Z

I have found the following nonfunctional configurations in the following dashboards:

First off - no uid is needed to be set for the dashboards, as they will create their own UID when imported.
Also, a version is also not needed.

So the parameters "version" and "uuid" after the "title" parameter do not need to be set. Specifically since it can be a problem to reimport a dashboard for testing and tweaking and the UUID already exists. Though the dashboard title will also be a problem in that case ;)

irate(...[1m]) does not produce any output, specifically since the standard scrape interval is 1m.
same for deriv(...[1m])
the regex "([^:.])." will break the queries for hostschemas like "host.foo.bar.co.uk:9283" because it will cut and match after the first period, leaving the scraping instance name as "host". if this is functional behavior in other cases, i would just make sure to inform about it.
Being able to choose "All" as $job variable and/or multiple job variables from the dropdown will break many dashboard panels in osd-device-details.json,osds-overview.json,pool-detail.json
the grafana container does not seem to natively have the grafana piechart panel plugin: "Panel plugin not found: grafana-piechart-panel" (osd-overview.json)
selection by "job" label is needed in the "Top Slow Ops" Panel if using multiple clusters like so: (ceph_daemon_health_metrics{job="$job", type="SLOW_OPS", ceph_daemon=~"osd.*"})
scraping the Ceph-MGR with the Kolla Prometheus will have the instance name of the manager that was scraped as instance name, hence most of the times when grouping node_exporter metrics with ceph metrics (or sometimes just using ceph metrics, like in avg disk utilization in hosts-overview.json, to relabel the "instance" label with "exported_instance" like so (example):

label_replace(
  label_replace(
    ceph_disk_occupation_human{job=~"$job", exported_instance=~"($ceph_hosts)([\\\\.:].*)?"}, "device", "$1", "device", "/dev/(.*)"
  ), "instance", "$1", "exported_instance", "([^:]*).*"
)

radosgw-overview.json: the regex to select radosgw servers is not a regex but rather the description that ought to go into the var. Hence, no daemons/servers are selected.
radosgw-sync-overview.json and radosgw-overview.json uses vars that our prometheus does not know. insight as to where prometheus would get these metrics (e.g. ceph_data_sync_from_zone_fetch_bytes_* etc. / haproxy_frontend_http_responses_* / ceph_rbd_* )would be appreciated.

The text was updated successfully, but these errors were encountered:

berendt · 2023-07-13T18:01:10Z

Thanks for trying it out and the feeback. We'll take a look at it.

linwalth · 2023-07-13T22:55:22Z

Further information regarding the UID matter: Importing the custom grafana home dashboard with a UID creates an error on the splashpage after login (something about a dashboard not being found for a UID) - Taking the UID out of the home dashboard fixed this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new Ceph dashboards feedback #33

new Ceph dashboards feedback #33

linwalth commented Jul 13, 2023 •

edited

Loading

berendt commented Jul 13, 2023

linwalth commented Jul 13, 2023

new Ceph dashboards feedback #33

new Ceph dashboards feedback #33

Comments

linwalth commented Jul 13, 2023 • edited Loading

berendt commented Jul 13, 2023

linwalth commented Jul 13, 2023

linwalth commented Jul 13, 2023 •

edited

Loading