Inconsistent rate() Function Output between Prometheus and Mimir on Histogram Data #9767

richardmoe · 2024-10-29T14:28:15Z

Describe the bug

We have observed unexpected behavior when using the rate() function on histogram metrics in Mimir compared to Prometheus. Specifically, we sporadically see a significant spike in the Mimir output that is not present in Prometheus.

To Reproduce

Start Prometheus: 2.50.1 with remote write to Mimir
Start Mimir 2.13
Run histogram_quantile(0.99, sum by (le,pod) (rate(my_metric_bucket{service="my-service"}[1m])))

Expected behavior

Expected to see the same results in Prometheus and Mimir.

Environment

Infrastructure: Kubernetes
Deployment tool: Helm
2 Prometheus instances in a kubernetes cluster with remote write to Mimir

Additional Context

The text was updated successfully, but these errors were encountered:

colega · 2024-11-08T13:01:23Z

Hello, does this always happen to you on the last sample?

Note that Mimir doesn't offer isolation, because of its distributed fashion. When series for different buckets of an histogram are written, there's a moment when some of them are written but others still aren't. If the query is executed at that specific moment, the histogram_quantile function may only see higher buckets but not the lower ones, thus increasing the p99 value.

There's no easy fix for this on classic histograms, and we are not planning to fix it because this issue doesn't exist in native histograms which are becoming stable now with the release of Prometheus 3.0.

Please, reopen the issue if you see this happening consistently in samples that were already written "a while ago".

richardmoe · 2024-11-11T13:20:27Z

Hi again, we can also see the issue in metrics written a while ago. Here is an example of graph over metrics written almost 2 weeks ago:

colega · 2024-11-11T15:24:27Z

In this case, I would recommend you digging down to a single histogram series and check what's going on with the buckets.

I would check one of the pods that differs, and query an instant query of that in grafana as: rec_api_request_latency_bucket{...}[$__range]. That will show you the raw data stored, and you could check what's going on.

richardmoe · 2024-11-13T16:11:08Z

The data from an instant query looks pretty similar and I haven't been able to see any big difference there.

colega · 2024-11-14T08:17:16Z

You need to switch Format to Time series to render them as graphs, and I'd recommend you rendering both datasources on the same graph if you want to compare (use mixed data source, then choose a data-source per query).

richardmoe · 2024-11-15T15:55:33Z

To get a time series graph you need range or both type.

colega · 2024-11-15T16:02:02Z

You're still rendering the histogram_quantile, that's why you can't render time series from an instant query, please see my suggestion above:

I would check one of the pods that differs, and query an instant query of that in grafana as: rec_api_request_latency_bucket{...}[$__range]

Something like this:

colega closed this as completed Nov 8, 2024

colega reopened this Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent rate() Function Output between Prometheus and Mimir on Histogram Data #9767

Inconsistent rate() Function Output between Prometheus and Mimir on Histogram Data #9767

richardmoe commented Oct 29, 2024

colega commented Nov 8, 2024

richardmoe commented Nov 11, 2024

colega commented Nov 11, 2024 •

edited

Loading

richardmoe commented Nov 13, 2024 •

edited

Loading

colega commented Nov 14, 2024

richardmoe commented Nov 15, 2024 •

edited

Loading

colega commented Nov 15, 2024

Inconsistent rate() Function Output between Prometheus and Mimir on Histogram Data #9767

Inconsistent rate() Function Output between Prometheus and Mimir on Histogram Data #9767

Comments

richardmoe commented Oct 29, 2024

Describe the bug

To Reproduce

Expected behavior

Environment

Additional Context

colega commented Nov 8, 2024

richardmoe commented Nov 11, 2024

colega commented Nov 11, 2024 • edited Loading

richardmoe commented Nov 13, 2024 • edited Loading

colega commented Nov 14, 2024

richardmoe commented Nov 15, 2024 • edited Loading

colega commented Nov 15, 2024

colega commented Nov 11, 2024 •

edited

Loading

richardmoe commented Nov 13, 2024 •

edited

Loading

richardmoe commented Nov 15, 2024 •

edited

Loading