-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[collectd 6] write_prometheus handles resource attributes incorrectly. #4283
Comments
Prometheus plugin unit tests check for formatting of multiple resources working fine, and quick test of suffixing resources to the So I started going through collectd core code starting from |
I think the problem is on line 760: 760: if (c_avl_get(prom_metrics, fam->name, (void *)&prom_fam) != 0) { It's using only |
I tried dirty fix for that: eero-t@c73886a But results were unexpected (old metrics are reported again, and ones for the other devices are still mostly missing). |
Btw. while testing that, I noticed |
@octo, how device ID resource labels are supposed to work in Prometheus once this is fixed?
Is Prometheus supposed to add (Device ID is crucial information that needs to be provided with every device metric. If Prometheus querying data from |
The test in #4284 should hopefully demonstrate that. It should look approximately like this:
It's not looking like that yet though:
|
My question was more about what happens once Prometheus scrapes that. But apparently adding resource labels to metrics is something one can do with h PromQL Btw. this: https://opentelemetry.io/docs/specs/otel/metrics/sdk_exporters/prometheus/ mentions that it could be configurable in exporter:
What's the benefit of having (Either service name coming from OTEL env var, or
Hm. Maybe I need to set the such crucial information to |
…ot the metric family. For Prometheus output, the plugin groups all metrics with the same name into one `metric_family_t`. This caused problems when collectd handled metrics from multiple resources. To solve this issue, we're somewhat abusing the data structure and store per-metric resource attributes in the `family` field. That means for the metrics stored in the *write_prometheus plugin* `(metric_t).family` does not point back to the metric family containing the metric. Fixes: collectd#4283
That is not quite what happened: metrics without labels were printed without the
Good to know. I'll wait to receive a feature request before implementing this though.
Not all metrics are going to have the same job label. E.g. with #4271 we could have all sorts of resource attributes.
This is detailed here: https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/#resource-attributes The short version is: the combination of Not sure how best to square this with the |
Btw. while looking at the
|
Last ("Resource Attributes") section in: https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/ States: Which to me indicates that they it at least should be always the same for "collectd" service?
Combination of these spec statements:
Means that (some unique) device ID must be set as
Yes, that sounds like the only way to support it. I really do not like using device ID for |
…ot the metric family. For Prometheus output, the plugin groups all metrics with the same name into one `metric_family_t`. This caused problems when collectd handled metrics from multiple resources. To solve this issue, we're somewhat abusing the data structure and store per-metric resource attributes in the `family` field. That means for the metrics stored in the *write_prometheus plugin* `(metric_t).family` does not point back to the metric family containing the metric. Fixes: collectd#4283
I don't see it. What's the bug? |
I think it should be:
(Or use intermediate define for the metric name.) |
…ot the metric family. For Prometheus output, the plugin groups all metrics with the same name into one `metric_family_t`. This caused problems when collectd handled metrics from multiple resources. To solve this issue, we're somewhat abusing the data structure and store per-metric resource attributes in the `family` field. That means for the metrics stored in the *write_prometheus plugin* `(metric_t).family` does not point back to the metric family containing the metric. Fixes: collectd#4283
This is fixed with PR #4284, except for plugin reporting multiple
But I guess separate ticket could be filed about that? |
Closing since the fix has been merged.
@eero-t If it gets to that, we have a problem elsewhere because the assumption that service name + service instance ID are unique is violated. It might make sense to add a check in |
Added #4289 |
This part seems to be broken. It should consider metrics with different resources (label set label values) as unique, but adding multiple metrics with same name & labels and only differing through separate metric family resource label sets, causes all other metrics, other than the ones for the first resource, to be ignored.
EDIT: And even if metrics would all be unique also otherwise & get reported, only the first resource label set is reported as
target_info
.(Noticed while implementing #4267, and moving per-GPU metric labels to a resource label set.)
Originally posted by @eero-t in #4213 (comment)
The text was updated successfully, but these errors were encountered: