Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[collectd 6] write_prometheus handles resource attributes incorrectly. #4283

Closed
octo opened this issue Feb 19, 2024 · 15 comments · Fixed by #4284
Closed

[collectd 6] write_prometheus handles resource attributes incorrectly. #4283

octo opened this issue Feb 19, 2024 · 15 comments · Fixed by #4284
Labels
Bug A genuine bug

Comments

@octo
Copy link
Member

octo commented Feb 19, 2024

  • Add a feature to utils/resource_metrics/ allowing us to specify whether a new instance of a metric_t should be added to the metric family (current behavior, geared towards plugins that are flushed periodically and that should not lose any data point), or whether the new metric replaces the previous one (for write_prometheus, which is only interested in the latest state of each metric).

This part seems to be broken. It should consider metrics with different resources (label set label values) as unique, but adding multiple metrics with same name & labels and only differing through separate metric family resource label sets, causes all other metrics, other than the ones for the first resource, to be ignored.

EDIT: And even if metrics would all be unique also otherwise & get reported, only the first resource label set is reported as target_info.

(Noticed while implementing #4267, and moving per-GPU metric labels to a resource label set.)

Originally posted by @eero-t in #4213 (comment)

@octo octo added the Bug A genuine bug label Feb 19, 2024
@eero-t
Copy link
Contributor

eero-t commented Feb 19, 2024

Prometheus plugin unit tests check for formatting of multiple resources working fine, and quick test of suffixing resources to the fam->name (key string) given to c_avl_*() calls in write_prometheus, did not help.

So I started going through collectd core code starting from plugin_dispatch_metric_family() onward, but haven't found yet where the metric families given to it get lost.

@octo
Copy link
Member Author

octo commented Feb 19, 2024

I think the problem is on line 760:

760: if (c_avl_get(prom_metrics, fam->name, (void *)&prom_fam) != 0) {

It's using only fam->name to identify the family.

@eero-t
Copy link
Contributor

eero-t commented Feb 19, 2024

I tried dirty fix for that: eero-t@c73886a

But results were unexpected (old metrics are reported again, and ones for the other devices are still mostly missing).

@eero-t
Copy link
Contributor

eero-t commented Feb 19, 2024

Btw. while testing that, I noticed prom_missing() missing unlock in one return path.

@eero-t
Copy link
Contributor

eero-t commented Feb 19, 2024

@octo, how device ID resource labels are supposed to work in Prometheus once this is fixed?

$ curl 127.0.0.1:9999/metrics 
# HELP target_info Target metadata
# TYPE target_info gauge
target_info{job="unknown_service:collectd",dev_file="card1",host_name="icx-cp-1",pci_bdf="0000:4d:00.0",pci_dev="0x56c0"} 1

# HELP hw_gpu_temperature_celsius Temperature sensor value (in Celsius) when queried
# TYPE hw_gpu_temperature_celsius gauge
hw_gpu_temperature_celsius{job="unknown_service:collectd",location="global-max"} 46 1708360673834
hw_gpu_temperature_celsius{job="unknown_service:collectd",location="gpu-max"} 46 1708360673834

Is Prometheus supposed to add target_info labels to all metrics following the target_info statement?

(Device ID is crucial information that needs to be provided with every device metric. If Prometheus querying data from collectd will just lose target_info information, e.g. by providing it as a separate metric, and one needs to provide that info anyway as metric labels, resource labels are just extra bloat.)

@octo
Copy link
Member Author

octo commented Feb 19, 2024

how device ID resource labels are supposed to work in Prometheus once this is fixed?

The test in #4284 should hopefully demonstrate that. It should look approximately like this:

# HELP target_info Target metadata
# TYPE target_info gauge
target_info{job="name1",instance="instance1",host_name="example.org"} 1
target_info{job="name1",instance="instance2",host_name="example.net"} 1

# HELP unit_test_total
# TYPE unit_test_total counter
unit_test_total{job="name1",instance="instance1"} 42
unit_test_total{job="name1",instance="instance2"} 23

It's not looking like that yet though:

  1. The job and instance labels are not created for regular metrics, just the "target_info" one.
  2. Multiple metrics with different resource attributes don't work as intended.

@eero-t
Copy link
Contributor

eero-t commented Feb 19, 2024

The test in #4284 should hopefully demonstrate that. It should look approximately like this:

My question was more about what happens once Prometheus scrapes that. But apparently adding resource labels to metrics is something one can do with h PromQL join query, or by telling scraper to "transform" target_info labels to metrics:
https://grafana.com/blog/2023/07/20/a-practical-guide-to-data-collection-with-opentelemetry-and-prometheus/

Btw. this: https://opentelemetry.io/docs/specs/otel/metrics/sdk_exporters/prometheus/

mentions that it could be configurable in exporter:
"A Prometheus Exporter MAY offer configuration to add resource attributes as metric attributes. ... The configuration SHOULD allow the user to select which resource attributes to copy (e.g. include / exclude or regular expression based). Copied Resource attributes MUST NOT be excluded from target_info."

  1. The job and instance labels are not created for regular metrics, just the "target_info" one.

job labels are already added to both target_info & regular metrics, instance labels are not. (see my above comment)

What's the benefit of having job label in all metrics, when they're going to be same for all metrics coming from collectd?

(Either service name coming from OTEL env var, or unknown_service:collectd.)

(Device ID is crucial information that needs to be provided with every device metric. If Prometheus querying data from collectd will just lose target_info information, e.g. by providing it as a separate metric, and one needs to provide that info anyway as metric labels, resource labels are just extra bloat.)

Hm. Maybe I need to set the such crucial information to instance resource label, so it gets included to all metrics?

octo added a commit to octo/collectd that referenced this issue Feb 19, 2024
…ot the metric family.

For Prometheus output, the plugin groups all metrics with the same name into
one `metric_family_t`. This caused problems when collectd handled metrics from
multiple resources.

To solve this issue, we're somewhat abusing the data structure and store
per-metric resource attributes in the `family` field. That means for the
metrics stored in the *write_prometheus plugin* `(metric_t).family` does not
point back to the metric family containing the metric.

Fixes: collectd#4283
@octo
Copy link
Member Author

octo commented Feb 20, 2024

The job and instance labels are not created for regular metrics, just the "target_info" one.

That is not quite what happened: metrics without labels were printed without the job and instance labels. This is also fixed in #4284

A Prometheus Exporter MAY offer configuration to add resource attributes as metric attributes.

Good to know. I'll wait to receive a feature request before implementing this though.

What's the benefit of having job label in all metrics, when they're going to be same for all metrics coming from collectd?

Not all metrics are going to have the same job label. E.g. with #4271 we could have all sorts of resource attributes.

Device ID is crucial information that needs to be provided with every device metric.

This is detailed here: https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/#resource-attributes

The short version is: the combination of service.name (aka. job) and service.instance.id (aka. instance) must uniquely identify the resource. You can then join with the target_info metric to retrieve additional resource attributes.

Not sure how best to square this with the Device resource schema. Maybe copy the unique device ID into device.id and service.instance.id?

@eero-t
Copy link
Contributor

eero-t commented Feb 20, 2024

Btw. while looking at the target_info stuff in prometheus plugin code, I noticed bug for its name when EXPOSE_OPEN_METRICS define is set:

#ifdef EXPOSE_OPEN_METRICS
  strbuf_print(buf, "# TYPE target info\n");
  strbuf_print(buf, "# HELP target Target metadata\n");
#else
  strbuf_print(buf, "# HELP target_info Target metadata\n");
  strbuf_print(buf, "# TYPE target_info gauge\n");
#endif
...
    strbuf_print(buf, "target_info{");

@eero-t
Copy link
Contributor

eero-t commented Feb 20, 2024

Not all metrics are going to have the same job label. E.g. with #4271 we could have all sorts of resource attributes.

Last ("Resource Attributes") section in: https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/

States:
"The In the collector Prometheus exporters, the service.name and service.namespace attributes MUST be combined as <service.namespace>/<service.name>, or <service.name> if namespace is empty, to form the job metric label."

Which to me indicates that they it at least should be always the same for "collectd" service?

Device ID is crucial information that needs to be provided with every device metric.

This is detailed here: https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/#resource-attributes

Combination of these spec statements:

  • The service.instance.id attribute, if present, MUST be converted to the instance label; otherwise, instance should be added with an empty value
  • There MUST be at most one target_info metric exported for each unique combination of job and instance

Means that (some unique) device ID must be set as service.instance.id, otherwise resource attributes cannot be used for device metrics as there will not be enough target_info items for them, nor necessary labels to match them with.

Not sure how best to square this with the Device resource schema. Maybe copy the unique device ID into device.id and service.instance.id?

Yes, that sounds like the only way to support it.

I really do not like using device ID for service.instance.id though, especially as the other "Resource Attributes" section on that spec page states service.instance.id being "A unique identifier of the target. By default, it should be the <host>:<port> of the scraped URL"...

octo added a commit to octo/collectd that referenced this issue Feb 20, 2024
…ot the metric family.

For Prometheus output, the plugin groups all metrics with the same name into
one `metric_family_t`. This caused problems when collectd handled metrics from
multiple resources.

To solve this issue, we're somewhat abusing the data structure and store
per-metric resource attributes in the `family` field. That means for the
metrics stored in the *write_prometheus plugin* `(metric_t).family` does not
point back to the metric family containing the metric.

Fixes: collectd#4283
@octo
Copy link
Member Author

octo commented Feb 20, 2024

Btw. while looking at the target_info stuff in prometheus plugin code, I noticed bug for its name when EXPOSE_OPEN_METRICS define is set:

I don't see it. What's the bug?

@eero-t
Copy link
Contributor

eero-t commented Feb 20, 2024

I think it should be:

#ifdef EXPOSE_OPEN_METRICS
    strbuf_print(buf, "target{");
#else
    strbuf_print(buf, "target_info{");
#endif

(Or use intermediate define for the metric name.)

Related: open-telemetry/opentelemetry-specification#3871

eero-t pushed a commit to eero-t/collectd that referenced this issue Feb 20, 2024
…ot the metric family.

For Prometheus output, the plugin groups all metrics with the same name into
one `metric_family_t`. This caused problems when collectd handled metrics from
multiple resources.

To solve this issue, we're somewhat abusing the data structure and store
per-metric resource attributes in the `family` field. That means for the
metrics stored in the *write_prometheus plugin* `(metric_t).family` does not
point back to the metric family containing the metric.

Fixes: collectd#4283
@eero-t
Copy link
Contributor

eero-t commented Feb 22, 2024

This is fixed with PR #4284, except for plugin reporting multiple target_info items per unique job+instance pair when job does not set instance:

$ curl --no-progress-meter 127.0.0.1:9999/metrics 
# HELP target_info Target metadata
# TYPE target_info gauge
target_info{job="unknown_service:collectd",instance="",dev_file="card0",host_name="icx-cp-1",pci_bdf="0000:4d:00.0",pci_dev="0x56c0"} 1
target_info{job="unknown_service:collectd",instance="",dev_file="card2",host_name="icx-cp-1",pci_bdf="0000:93:00.0",pci_dev="0x56c0"} 1

But I guess separate ticket could be filed about that?

@octo
Copy link
Member Author

octo commented Feb 24, 2024

Closing since the fix has been merged.

This is fixed with PR #4284, except for plugin reporting multiple target_info items per unique job+instance pair when job does not set instance:

@eero-t If it gets to that, we have a problem elsewhere because the assumption that service name + service instance ID are unique is violated. It might make sense to add a check in plugin_dispatch_metric_family() to verify that resources follow this convention. Could you open an issue for this?

@eero-t
Copy link
Contributor

eero-t commented Feb 26, 2024

Added #4289

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug A genuine bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants