Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponential Histogram support to the Prometheus exporter #6015

Merged
merged 3 commits into from
Jan 3, 2024

Conversation

fstab
Copy link
Member

@fstab fstab commented Nov 27, 2023

This PR adds support for exponential histograms in the Prometheus exporter as defined in the Spec:

An OpenTelemetry Exponential Histogram with a cumulative aggregation temporality MUST be converted to a Prometheus Native Histogram.

About the Implementation

The upstream Prometheus Java Client library recently introduced a prometheus-metrics-model, which implements immutable read-only MetricSnapshots.

With this PR, we introduced an Otel2PrometheusConverter that converts OpenTelemetry's MetricData to Prometheus' MetricSnapshots.

That way, we can use the upstream HTTP exporter for exposing the MetricSnapshots in various exposition formats.

Benefits

  • In Prometheus, exponential histograms require the Prometheus protobuf exposition format. By using the upstream exposition formats from the Prometheus library this comes for free, no need to re-implement the exposition format in opentelemetry-java.
  • The Prometheus community are currently considering to work on a new version of OpenMetrics protobuf. By using the upstream exposition formats it will be trivial to add support for this in opentelemetry-java once this is done, because it will be sufficient to update the HTTP server dependency.
  • The new PrometheusMetricReader class implements the Prometheus MultiCollector interface, which means it can be registered as a collector with the PrometheusRegistry. This makes it very easy to mix and match Prometheus instrumentation with OpenTelemetry SDK instrumentation when transitioning.

Dependencies

This PR adds a dependency to io.prometheus:prometheus-metrics-httpserver. However, the upstream module does not introduce any transitive dependencies outside the io.prometheus group. The Prometheus protobuf support is shaded, and does not introduce any external dependency. Here's the dependency tree as produced by ./gradlew :exporters:prometheus:dependencies:

\--- io.prometheus:prometheus-metrics-exporter-httpserver:1.1.0
     \--- io.prometheus:prometheus-metrics-exporter-common:1.1.0
          +--- io.prometheus:prometheus-metrics-model:1.1.0
          \--- io.prometheus:prometheus-metrics-exposition-formats:1.1.0
               +--- io.prometheus:prometheus-metrics-model:1.1.0
               +--- io.prometheus:prometheus-metrics-config:1.1.0
               \--- io.prometheus:prometheus-metrics-shaded-protobuf:1.1.0

Related

This PR implements #5940 and fixes #5997. It will also make it easy to implement #6013.

Copy link

codecov bot commented Nov 27, 2023

Codecov Report

Attention: 87 lines in your changes are missing coverage. Please review.

Comparison is base (ffd53c7) 91.11% compared to head (f1f027d) 90.94%.

Files Patch % Lines
.../exporter/prometheus/Otel2PrometheusConverter.java 71.53% 58 Missing and 20 partials ⚠️
...porter/prometheus/PrometheusHttpServerBuilder.java 37.50% 5 Missing ⚠️
...etry/exporter/prometheus/PrometheusHttpServer.java 89.65% 3 Missing ⚠️
...ry/exporter/prometheus/PrometheusMetricReader.java 90.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #6015      +/-   ##
============================================
- Coverage     91.11%   90.94%   -0.18%     
+ Complexity     5736     5676      -60     
============================================
  Files           628      626       -2     
  Lines         16810    16637     -173     
  Branches       1662     1649      -13     
============================================
- Hits          15317    15131     -186     
- Misses         1028     1050      +22     
+ Partials        465      456       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@asafm asafm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the comment, the memory cost here is severe, and collides head to head with REUSABLE_DATA MemoryMode. I'm in the middle of implementing it, and when you make this change of data conversion, which cost lots of allocations, and worst: handing over control to a library outside the scope of this project, makes completing MemoryMode impossible.

My suggestion as proposed in your proposal is to implement this with existing implementation. It's a small amount of code, 0 disruption to OTel Java SDK and most importantly: no external dependencies. My 2 comments in the proposal explained it in more detail.

@fstab
Copy link
Member Author

fstab commented Nov 27, 2023

Thanks for looking into this @asafm.

Implement this with existing implementation. It's a small amount of code, 0 disruption to OTel Java SDK

I don't think this is true. Adding support for Prometheus native histograms is not possible within the existing implementation, as they require a different data model and exposition format. This is a major refactoring in any case. Moreover, converting OpenTelemetry's exponential histograms to Prometheus' native histograms is not trivial.

no external dependencies

I actually think the upstream dependency for the Prometheus endpoint is a good thing.

First, it's not a random 3rd party dependency. Prometheus and OpenTelemetry are both CNCF projects, and we pledged mutual compatibility in the CNCF governance.

The previous implementation of the Prometheus exporter copied the upstream Prometheus code into opentelemetry-java. This will become increasingly difficult to maintain as we introduce complex metric types like Prometheus native histograms and more exposition formats like Prometheus protobuf and OpenMetrics protobuf.

Moreover, from a community perspective, using a common Prometheus metrics implementation will help keep the communities aligned. If we use a common Prometheus metrics implementation we will quickly recognize incompatibilities with future changes. If OpenTelemetry keeps using their own copy of the upstream code there is a higher risk of slight differences getting unnoticed.

cost lots of allocations

I'm not convinced that caching objects in memory results in better gc performance than allocating short-lived objects. After all, the gc marks live objects, and the more live objects you have the longer the mark phase.

However, if there is a way to improve gc performance in the Prometheus exporter it would be better to fix this upstream so that both communities can benefit rather than copying the code into opentelemetry-java and only fixing it there.

Anyway, I would appreciate if you could consider this PR. The only alternative I see is to duplicate the upstream implementation, copy it into opentelemetry-java, and have duplicate maintenance effort for all future developments like OpenMetrics protobuf. I don't think this is the better solution.

@asafm
Copy link
Contributor

asafm commented Nov 27, 2023

I don't think this is true. Adding support for Prometheus native histograms is not possible within the existing implementation, as they require a different data model and exposition format. This is a major refactoring in any case. Moreover, converting OpenTelemetry's exponential histograms to Prometheus' native histograms is not trivial.

You are correct in the exposition format. Your PR title is a bit misleading.
This PR is actually adding the protobuf-based exposition format to the Prometheus Exporter.

It's not trivial, but it's not rocket science or any mad algorithms.
At the end we're talking about 2 things:

  1. Writing protobuf in certain format - very much like we have today in OTLP exporter.
  2. Converting between OTel data structure to Prometheus data structure.

Regarding (2): Converting exponential histogram to Prometheus native histogram in your PR is 20 lines of code. Doesn't seem overly complex.

I looked at the Prometheus metrics library you wish to see: It's insanely expensive in memory allocations. In generates tons of objects as it translates to another model within.

My intentions with MemoryMode.REUSABLE_DATA, which were detailed in my proposal, is to make sure the exporters are merely taking the OTel aggregated data structures and writing them to the wire - no middle man objects. This is truly happens for current Prometheus exporter. OTLP exporter needs to be refactored to get there.

Adding another exporter which is located outside of this project, makes it almost impossible to do.

This will become increasingly difficult to maintain as we introduce complex metric types like Prometheus native histograms and more exposition formats like Prometheus protobuf and OpenMetrics protobuf.

Protocols have a tendency to become stable and not evolve that much. Especially metric ones. So saying "increasingly difficult to maintain" is simply not true IMO. Protocols don't change often.
Implementing it is a one time effort.
Look at the amount of changes Prometheus Text Exposition format had over the last 10 years. Not much at all.

I'm not convinced that caching objects in memory results in better gc performance than allocating short-lived objects. After all, the gc marks live objects, and the more live objects you have the longer the mark phase.

I beg the differ. We have seen in production in Apache Pulsar - its a absolute killer.
You think such low latency systems like Pulsar or Kafka, are perfectly fine with allocation millions of objects ? Those systems and other databases are built around the notion of avoiding memory allocations where possible. M3DB which is written in Go for example, by Uber - you can see how they use object pooling also there - because it's a highly sensitive to latency.

Again, in my opinion, merging this PR is killing MemoryMode which I'm working on right now, before you submitted your proposal.

My suggestion: Protocols are simple, hence it's not rocket science to have OTel implement protobuf-based exposition format. They also don't change often, so maintenance is easy.

OTel is a super impactful project, that will be used in any type of applications, and as such should continue to be 0-dependency, as it ensures it has full control, especially in light of future development it will introduce.

@trask
Copy link
Member

trask commented Nov 27, 2023

Having a dependency on https://github.com/prometheus/client_java is potentially a good thing.

It's the official Prometheus Java client, and so I wouldn't generally expect to try to avoid it when exporting to Prometheus.

if there is a way to improve gc performance in the Prometheus exporter it would be better to fix this upstream

I think it would be great to explore this further.

@jack-berg
Copy link
Member

I actually think the upstream dependency for the Prometheus endpoint is a good thing.

Having a dependency on https://github.com/prometheus/client_java is potentially a good thing.

I hadn't thought about this problem from that perspective. We actually take this approach with all the other built in exporters:

  • ZipkinSpanExporter depends on io.zipkin.reporter2:zipkin-reporter, and io.zipkin.reporter2:zipkin-sender-okhttp3, and converts to the zipkin data model in OtelToZipkinSpanTransformer.
  • JaegerGrpcSpanExpoter depends on the jaeger protos and converts to the jaeger data model in SpanMarshaler.
  • JaegerThriftSpanExporter depends on io.jaegertracing:jaeger-client and converts to the jaeger data model in Adapter.

However, prometheus is the exception. As @fstab points out, the "The previous implementation of the Prometheus exporter copied the upstream Prometheus code into opentelemetry-java", the current OpenTelemetry code was based on the prometheus TextFormat as referenced in Serializer. But if you look at #4178 where the prometheus serialization was originally added, part of the intent was definitely performance oriented, to avoid extra allocations:

It's nice to also get some performance by directly serializing given how huge they can be (in particular, lots of arrays allocated for histograms in the adapter code that are eliminated). Many Java apps have a strong correlation between full GC and Prometheus scrapes :)

I think the philosophical question we need to answer is: Do we have an obligation to hold ourselves to different or higher performance bar for the built in exporters we provider than the upstream implementations? I think probably no.

But I still want to have a good answer for performance centric use cases. In my opinion, the answer for this is to lean into optimizing OTLP (as I discussed in this comment), and depending on prometheus support for native OTLP ingest. @asafm counters that the road for native OTLP ingest could be a long one, discussing other systems part of the prometheus ecosystem like M3DB, Cortex, and VictoriaMetrics. I did a quick search, and it looks like OTLP support is not on the radar of M3DB, is likely to be supported in Cortex, and ❗is supported in VictoriaMetrics❗, so I think it might not actually take that long.

@fstab
Copy link
Member Author

fstab commented Nov 28, 2023

Thanks @jack-berg.

OTLP support in the Prometheus ecosystem

For completeness:

@asafm
Copy link
Contributor

asafm commented Nov 29, 2023

Thanks @jack-berg for the detailed response. It did add the context I lacked here.
After carefully reading your response, and thinking about it taking in all the context provided, I agree with your approach, which basically says: Let's refactor OTLP exporters to be almost zero memory allocations, since OTel owns and controls this protocol. If you are performance sensitive and decided to use REUSABLE_DATA memory mode, you should opt-in for OTLP exporters. If you do work with Prometheus database specifically, you can ship it to OTel collector and have it serve it Prometheus format. If you work with others like M3DB or Cortex, the collector can ship it to them using remote write protocol.

In light of having the other protocols use a 3rd party library, it makes sense for Prometheus to follow suit.

My personal preference, in an answer to your philosophical question is a bit different, but it doesn't change the fact that I agree with the approach chosen. OTel, as it stands today, IMO is x10 a better metrics library than any other, at least in the Java ecosystem. As such, it can be even better by having its implementation set the higher bar performance wise. It's a bit like the Apple ecosystem - you have everything on the chain, hence it's ultimately performs better, with the disadvantages it brings.

One point that also made me lean in to your approach is that eventually the industry would consolidate into a single protocol and libraries. OTLP protocol and OTel SDK will become the defacto standard, hence the usage of Prometheus exporter would eventually decline and will not be the go-to exporter.

Practically speaking, it means in Apache Pulsar documentation, I'll recommend not configuring Prometheus exporter and opting for OTLP exporters, whether it's targeted at OTel collector or directly to a vendor supporting OTLP.

Copy link
Member

@jack-berg jack-berg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this - I saw several things in the code that significantly simplify understanding the code and make it less brittle.

exporters/prometheus/build.gradle.kts Outdated Show resolved Hide resolved
* @see <a href="https://prometheus.io/docs/practices/naming/#base-units">Prometheus best practices
* for units</a>
*/
final class PrometheusUnitsHelper {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@psx95 would you care to review the changes to this file to see if any of the work in #5400 is undone?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delayed response @jack-berg , this notification got missed - I will take a look once and raise any concerns I find.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @psx95!

// Null unit
Arguments.of(null, null),
// Misc - unit cleanup - no case match special char
Arguments.of("$1000", "1000"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we adapt / retain this test to confirm the new behavior of PrometheusUnitHelper?

Copy link
Member Author

@fstab fstab Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I restored the test, but I also changed some tests that I think were wrong:

  • Arguments.of("m/m", "meters_per_minute") -> Arguments.of("m/min", "meters_per_minute")
  • Arguments.of("W/w", "watts_per_week"), -> Arguments.of("W/wk", "watts_per_week")
  • Arguments.of("TBy/y", "terabytes_per_year") -> Arguments.of("TBy/a", "terabytes_per_year")
  • Arguments.of("g/g", "grams_per_g") -> Arguments.of("g/x", "grams_per_x") (this is supposed to test an unknown "per" unit, but g is not unknown)
  • Removed all tests for removing special characters like Arguments.of("__test $/°C", "test_per_C") because I could not find any reference that these characters should be removed.

Please let me know if I got it wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me. @psx95 please take a look if you can.

Copy link
Contributor

@psx95 psx95 Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to confirm the following -

  • In the collector-contrib, there is a separate list of 'per' unit(s). Which is why the "per" test cases were using those units specifically.
  • I do not see "wk" as a specified unit in the collector. Are we okay with the discrepancy between collector and SDK ?
  • In the collector, certain units - for instance "mo", "y" are only used for "per" unit cases i.e. only in the second part of the metric unit when split using "/". IIUC, the current implementation might use the mappings for any part of the unit. Is this intentional ?
  • Regarding the removing unsupported symbols - it was from the OTel Spec Doc, OTLP -> Prometheus. It mentions that metric name is supposed to match the regex [a-zA-Z_:]([a-zA-Z0-9_:])* which makes symbols like $ invalid and replaced with _ instead.
  • In places where the spec was ambiguous, the unit conversion implementation in Append unit to prometheus metric names #5400 was done to be consistent with the collector implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see "wk" as a specified unit in the collector. Are we okay with the discrepancy between collector and SDK ?

While I like the idea of symmetry with the collector, the wk unit is defined in ucum, so I'm ok including it. UCUM is defined in the metric semantic conventions for defining unit.

In the collector-contrib, there is a separate list of 'per' unit(s). Which is why the "per" test cases were using those units specifically.
In the collector, certain units - for instance "mo", "y" are only used for "per" unit cases i.e. only in the second part of the metric unit when split using "/". IIUC, the current implementation might use the mappings for any part of the unit. Is this intentional ?

The specification doesn't include any language that restricts the conversion to the *_per_* version to a subset of units, so I think its fine. Makes it simpler to understand.

Regarding the removing unsupported symbols - it was from the OTel Spec Doc, OTLP -> Prometheus. It mentions that metric name is supposed to match the regex a-zA-Z_:* which makes symbols like $ invalid and replaced with _ instead.

Metric names are still sanitized according to those rules. However, we now delegate to the prometheus client library: https://github.com/open-telemetry/opentelemetry-java/blob/main/exporters/prometheus/src/main/java/io/opentelemetry/exporter/prometheus/Otel2PrometheusConverter.java#L448

In places where the spec was ambiguous, the unit conversion implementation in #5400 was done to be consistent with the collector implementation.

If we've diverged in any ways that aren't discussed here, I'd love to resolve those inconsistencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for clearing this up.

One thing I did notice was that the definition of illegal characters is different in Prometheus Client Library and the regex defined in OTel Spec. (OTel spec does not allow .)

  • Regex allowed by OTel Spec - [a-zA-Z_:]([a-zA-Z0-9_:])*
  • Regex allowed by Prometheus Client Library - ^[a-zA-Z_.:][a-zA-Z0-9_.:]+$

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we've diverged in any ways that aren't discussed here, I'd love to resolve those inconsistencies.

I think we should add back the tests that were testing for removal of special characters and leading/trailing _ in the unit.

Regarding special characters: While it is not explicitly specified in spec, it is implicitly indicated by the regex used in the prometheus client library.

Regarding leading/trailing _: Removing these are not mandated by the spec, but the implementation of unit conversion in the collector does not allow for any leading or trailing _.

Copy link
Member Author

@fstab fstab Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

illegal characters

Here's the context: The Prometheus community decided (see dev summit 2023-09) to allow UTF-8 characters in metric and attribute names for better OTel compatibility.

This is not implemented in the Prometheus server yet, so today the Prometheus exposition format will only create metric names and attribute names matching [a-zA-Z_:]([a-zA-Z0-9_:])* (without the dot).

However, we extended the Prometheus client library to allow dots in metric names, hence the new regexp that allows dots.

The idea is: Users can use dots when creating metrics, so they can use metric names compliant with OpenTelemetry's semantic conventions today. Dots are implicitly converted to underscores in the exposition format. However, as soon as the Prometheus server and exposition format supports dots users can enable dots without having to re-instrument their applications.

TL;DR: While dots are allowed when creating metrics, they are implicitly replaced with underscores when exposing metrics, at least until dots become officially supported by the Prometheus server and exposition format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context @fstab.

@fstab
Copy link
Member Author

fstab commented Dec 15, 2023

Thanks a lot for the review @jack-berg. I worked through the comments, rebased, and pushed a commit with the code review fulfillment. Please review again.

Copy link
Member

@jack-berg jack-berg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still a number of unaddressed comments (maybe they were hidden in the github UI or something?), but the only one which is non-trivial is the question of whether to use the default registry by default.

dependencyManagement/build.gradle.kts Outdated Show resolved Hide resolved
exporters/prometheus/build.gradle.kts Show resolved Hide resolved
// Null unit
Arguments.of(null, null),
// Misc - unit cleanup - no case match special char
Arguments.of("$1000", "1000"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me. @psx95 please take a look if you can.

@fstab
Copy link
Member Author

fstab commented Dec 15, 2023

There are still a number of unaddressed comments (maybe they were hidden in the github UI or something?), but the only one which is non-trivial is the question of whether to use the default registry by default.

Oh my, I didn't expand the "hidden conversations". Sorry for that, I'll work through them asap.

Signed-off-by: Fabian Stäber <[email protected]>
@fstab
Copy link
Member Author

fstab commented Dec 17, 2023

I worked through the rest of the comment, sorry for missing the hidden conversations previously.

Copy link
Member

@jack-berg jack-berg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. Thanks!

@jack-berg
Copy link
Member

Heads up @open-telemetry/java-approvers - I'd like to get this merged for the January 1.34.0 release so please leave any comments / feedback you have with enough time allotted to respond.

@jack-berg
Copy link
Member

Will merge tomorrow if there are no additional comments.

@jack-berg jack-berg merged commit d45fb3f into open-telemetry:main Jan 3, 2024
17 of 18 checks passed
metricData, PrometheusType.forMetric(metricData)));
}

private static Stream<Arguments> provideRawMetricDataForTest() {
Copy link
Contributor

@psx95 psx95 Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, these test cases are still valid. If this is indeed the case, is there a way we could retain these test cases ?

This is mostly because the OTel spec doc places some constraints on metric names depending on type of metric.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out #6138 where I restore these test cases.

@psx95
Copy link
Contributor

psx95 commented Jan 8, 2024

@jack-berg, @fstab I left a couple comments (mostly around tests). Let me know if they make sense to you.

Apologies for the late response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants