Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All Prometheus histogram buckets are malformed #13869

Open
ofek opened this issue Jan 21, 2022 · 11 comments
Open

All Prometheus histogram buckets are malformed #13869

ofek opened this issue Jan 21, 2022 · 11 comments
Labels
help wanted lifecycle/stale Stale type/bug The PR fixed a bug or issue reported a bug

Comments

@ofek
Copy link
Contributor

ofek commented Jan 21, 2022

Describe the bug

As documented in the official spec and mentioned in Pulsar's docs, histogram buckets are suffixed by _bucket with an upper bound le label.

Instead, the label and value is embedded in the metric name as a suffix:

# TYPE pulsar_storage_write_latency_le_0_5 gauge
pulsar_storage_write_latency_le_0_5{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_write_latency_le_1 gauge
pulsar_storage_write_latency_le_1{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_write_latency_le_5 gauge
pulsar_storage_write_latency_le_5{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_write_latency_le_10 gauge
pulsar_storage_write_latency_le_10{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_write_latency_le_20 gauge
pulsar_storage_write_latency_le_20{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_write_latency_le_50 gauge
pulsar_storage_write_latency_le_50{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_write_latency_le_100 gauge
pulsar_storage_write_latency_le_100{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_write_latency_le_200 gauge
pulsar_storage_write_latency_le_200{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_write_latency_le_1000 gauge
pulsar_storage_write_latency_le_1000{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_write_latency_overflow gauge
pulsar_storage_write_latency_overflow{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_write_latency_count gauge
pulsar_storage_write_latency_count{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_write_latency_sum gauge
pulsar_storage_write_latency_sum{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_ledger_write_latency_le_0_5 gauge
pulsar_storage_ledger_write_latency_le_0_5{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_ledger_write_latency_le_1 gauge
pulsar_storage_ledger_write_latency_le_1{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_ledger_write_latency_le_5 gauge
pulsar_storage_ledger_write_latency_le_5{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_ledger_write_latency_le_10 gauge
pulsar_storage_ledger_write_latency_le_10{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_ledger_write_latency_le_20 gauge
pulsar_storage_ledger_write_latency_le_20{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_ledger_write_latency_le_50 gauge
pulsar_storage_ledger_write_latency_le_50{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_ledger_write_latency_le_100 gauge
pulsar_storage_ledger_write_latency_le_100{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_ledger_write_latency_le_200 gauge
pulsar_storage_ledger_write_latency_le_200{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_ledger_write_latency_le_1000 gauge
pulsar_storage_ledger_write_latency_le_1000{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_ledger_write_latency_overflow gauge
pulsar_storage_ledger_write_latency_overflow{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_ledger_write_latency_count gauge
pulsar_storage_ledger_write_latency_count{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_storage_ledger_write_latency_sum gauge
pulsar_storage_ledger_write_latency_sum{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_entry_size_le_128 gauge
pulsar_entry_size_le_128{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_entry_size_le_512 gauge
pulsar_entry_size_le_512{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_entry_size_le_1_kb gauge
pulsar_entry_size_le_1_kb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_entry_size_le_2_kb gauge
pulsar_entry_size_le_2_kb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_entry_size_le_4_kb gauge
pulsar_entry_size_le_4_kb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_entry_size_le_16_kb gauge
pulsar_entry_size_le_16_kb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_entry_size_le_100_kb gauge
pulsar_entry_size_le_100_kb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_entry_size_le_1_mb gauge
pulsar_entry_size_le_1_mb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_entry_size_le_overflow gauge
pulsar_entry_size_le_overflow{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_entry_size_count gauge
pulsar_entry_size_count{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
# TYPE pulsar_entry_size_sum gauge
pulsar_entry_size_sum{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/metadata",partition="-1"} 0.0 1642722619078
pulsar_storage_write_latency_le_0_5{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_le_1{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_le_5{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_le_10{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_le_20{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_le_50{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_le_100{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_le_200{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_le_1000{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_overflow{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_count{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_sum{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_0_5{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_1{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_5{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_10{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_20{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_50{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_100{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_200{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_1000{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_overflow{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_count{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_sum{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_128{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_512{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_1_kb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_2_kb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_4_kb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_16_kb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_100_kb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_1_mb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_overflow{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_entry_size_count{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_entry_size_sum{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/coordinate",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_le_0_5{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_le_1{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_le_5{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_le_10{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_le_20{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_le_50{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_le_100{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_le_200{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_le_1000{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_overflow{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_count{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_write_latency_sum{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_0_5{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_1{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_5{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_10{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_20{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_50{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_100{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_200{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_le_1000{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_overflow{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_count{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_storage_ledger_write_latency_sum{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_128{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_512{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_1_kb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_2_kb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_4_kb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_16_kb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_100_kb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_1_mb{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_entry_size_le_overflow{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_entry_size_count{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079
pulsar_entry_size_sum{cluster="standalone",namespace="public/functions",topic="persistent://public/functions/assignments",partition="-1"} 0.0 1642722619079

To Reproduce

Steps to reproduce the behavior:

curl -L http://localhost:8080/metrics
version: '3'

services:
  pulsar:
    container_name: pulsar
    image: apachepulsar/pulsar:2.9.1
    command:
    - bash
    - -c
    - >
      bin/apply-config-from-env-with-prefix.py BOOKKEEPER_ conf/bookkeeper.conf &&
      bin/apply-config-from-env-with-prefix.py BROKER_ conf/broker.conf &&
      bin/apply-config-from-env-with-prefix.py STANDALONE_ conf/standalone.conf &&
      exec bin/pulsar standalone
    ports:
    - '6650:6650'
    - '8080:8080'
    environment:
    - BOOKKEEPER_enableStatistics=true
    - BOOKKEEPER_prometheusStatsHttpPort=8080
    - BROKER_exposeTopicLevelMetricsInPrometheus=true
    - BROKER_exposeConsumerLevelMetricsInPrometheus=true
    - BROKER_exposeProducerLevelMetricsInPrometheus=true
    - BROKER_exposeManagedLedgerMetricsInPrometheus=true
    - BROKER_exposeManagedCursorMetricsInPrometheus=true
    - BROKER_exposePublisherStats=true
    - BROKER_exposePreciseBacklogInPrometheus=true
    - BROKER_splitTopicAndPartitionLabelInPrometheus=true
    - STANDALONE_exposeTopicLevelMetricsInPrometheus=true
    - STANDALONE_exposeConsumerLevelMetricsInPrometheus=true
    - STANDALONE_exposeProducerLevelMetricsInPrometheus=true
    - STANDALONE_exposeManagedLedgerMetricsInPrometheus=true
    - STANDALONE_exposeManagedCursorMetricsInPrometheus=true
    - STANDALONE_exposePublisherStats=true
    - STANDALONE_exposePreciseBacklogInPrometheus=true
    - STANDALONE_splitTopicAndPartitionLabelInPrometheus=true

Desktop (please complete the following information):

  • OS: Windows 10
@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@ofek
Copy link
Contributor Author

ofek commented Feb 24, 2022

Bump.

@tjiuming
Copy link
Contributor

It's the doc's issue

@ofek
Copy link
Contributor Author

ofek commented Mar 10, 2022

No, this is broken

@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@ofek
Copy link
Contributor Author

ofek commented Apr 15, 2022

bump

@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@asafm
Copy link
Contributor

asafm commented Oct 19, 2022

Hi @ofek, you are absolutely correct.
I've been working over the last several months documenting the current state of metrics and releasing the document to the community just 2-3 weeks ago. As you can see in there, it's a known issue.

This document is part of a large effort to refactor how the metrics are defined, used, and exported in Pulsar.

@codelipenghui @merlimat - we potentially don't have to wait for the full refactor, but provide a fix just for exporting histograms - it's not a small fix, but it's not a complicated fix. the biggest issue is once we do that of course, we break compatibility, so this must be done gradually with flags (oldHistogram=true, newHistogram=false). WDYT?

@asafm
Copy link
Contributor

asafm commented Oct 24, 2022

@ofek I forgot to explain there is another issue you haven't mentioned: histogram bucket values today are delta-resets, meaning most of them are reset every configurable interval (30sec/1min). Prometheus quantile function assumes the values are incremental counters. This is another thing that needs to be fixed.
This as well breaks backward compatibility of course.

@asafm
Copy link
Contributor

asafm commented Nov 26, 2023

This will be solved as part of PIP-264 implementation. Parent issue for tracking it is here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted lifecycle/stale Stale type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

4 participants