POC: OpenTelemetry metrics rework #3448

timwoj · 2023-11-09T20:32:31Z

This is a draft PR for a rework I’ve been working on of our telemetry subsystem. The intention for the PR is to discuss the current code structure and whether anything needs to change before finishing off the rest of the implementation.

The goal here is to move the telemetry code out of the Broker framework into a framework of its own, using OpenTelemetry as the underlying transport framework. The unit tests all pass but the btests are currently failing in a lot of places. The failures are mostly for common reasons (especially for histograms, the bucket comment below).

What’s implemented:

Basic family and instrument infrastructure. The existing instrument types are all there with minimal changes to the API for using them.
Script-level family and instrument API. This exists just as it does in the Broker version, with no changes.
Script-level collection API. This changes quite a bit, and will be described below.

What’s features are still missing:

A thread that reads process state (cpu, memory, etc) and reports it to prometheus. I would have liked to use OTel’s observable instruments for this but ran into some snags. The primary one is that each observable only ties to a single metric, so you’d need a separate observer thread for each of cpu, memory, and so on. This leads to having to read that system information multiple times, or having a separate tread to read it once on a timer and store it. At that point, there might as well just be a thread that reads it and sets non-observable instruments.
Aggregation of metric data to a single node. I could see this being implemented one of two ways. We could simply make each of the methods in the instrument classes (Counter, Gauge, and Histogram) write the current value to a topic whenever something changes. This could lead to a lot of extra event traffic though, and may not be terribly efficient. A second method would be to implement an OTel exporter, add it to a periodic metric reader, and have it export the values to the topic on a timer. We’d need to figure out how to set the values in the instruments directly, since the API OTel exposes only allows you to add to the value.
The endpoint attribute that is added to every metric in the Broker implementation is missing.

Some broken things in OTel:

Histogram buckets aren’t being set correctly. This is a known issue on OTel’s side, there’s an issue open, and we’re waiting for them to fix it.
OTel doesn’t handle the kSum aggregation type correctly when it comes to their Prometheus exporter, so it doesn’t set the _total suffix on the metric name before exporting. This is already fixed in an open PR that will hopefully land before the next OTel release.
OTel doesn’t handle blank unit values correctly in the Prometheus exporter and always appends the underscore. This is also fixed in the same PR as above. We’re handling both of these two on our side already, but the workaround could be removed once it’s fixed upstream.

API Changes and proposed changes:

All of the objects stored by the telemetry framework changed to std::shared_ptrs, including the data passed to OpaqueVals for script land. This required a few changes around the rest of the code base, but nothing really significant.
The script-level collection API changes from multiple C++ methods to a single method. Previously we would collect a list of metrics that match a filter, return them back to the BIF, then for each of those collect a RecordVal for the data and store them in a VectorVal in the BIF itself. This changes to a single method you can call in the manager that returns the VectorVal containing all of the matching records directly instead of the intermediate step.
I’m fairly certain that the placeholder “1” value for units can be removed and the default value for the methods that can take it can instead change to a blank string. OTel will handle the blank string correctly, so we’d have to change it on our side as it is.

Builds on Windows are currently failing because of Conan dependency issues.

awelzel

I skimmed through and tried some things, left some comments. I can't say I fully understand all the OTEL concepts and also have ignored the histogram situation, but otherwise not too intimidating :-)

I left a comment about performance of Inc(), probably need to figure that out. Otherwise no separate stats thread and making callback based metrics first-class would be interesting. I'm not overly fond of the "update something ever x seconds" which may lead to stale metrics.

src/telemetry/Counter.h

src/telemetry/Gauge.h

src/telemetry/ProcessStats.h

src/telemetry/ProcessStats.cc

src/telemetry/Manager.cc

src/telemetry/ProcessStats.cc

src/telemetry/Manager.cc

awelzel · 2023-11-23T15:15:53Z

src/telemetry/Counter.h


    /**
     * Increments the value by @p amount.
     * @pre `amount >= 0`
     */
-    void Inc(int64_t amount) noexcept { broker::telemetry::inc(hdl, amount); }
+    void Inc(BaseType amount) noexcept {
+        handle->Add(amount, attributes);


I smoke tested performance locally with a 200k SYN packet pcap and zeek -r time goes from ~10sec to ~11.3sec, Not crazy, but seems too much. There's opentelemetry stacks showing in flamegraphs for the EventHandler and log write counting and session handling significantly.

Is it possible to get access to the counter directly? Skimming the stacks it appears there's a lookup done in some hashtable for each Add() call. Or alternatively switch to a Callback based / Observable instrument exposing the value that's kept. Otherwise, this is introducing significant overhead even if metrics aren't used.

Flamegraph with "telemetry" highlighted.

FWIW: caf::telemetry matches 0.2% of samples on current master.

There is a lookup in the GetOrAdd methods that does a comparison across all of the existing counters for a matching set of labels. That's probably what it is, without having actually looked at the flamegraph. In some cases we do store the counters locally (session::Manager, for one) but we don't do that in every case.

Ah, I took it as something within handle->Add(amount, attributes). opentelemetry needs to internally find the right counter of the given labels/attributes.

I'm looking at a flamegraph of it now. I might have to open an issue on their side, as this dives really quickly down into their code. MetricAttributeIterable::ForEachKeyValue is our code. I might be able to optimize the loop itself by storing the values locally as opentelemetry::common::AttributeValue objects instead of strings. That'd avoid a conversion every time we do a lookup, but ForEachKeyValue is dominated mostly by the call to the callback method passed as an argument. That callback function comes from OTEL itself so we don't have much control over its performance.

ec5b2ab cuts down on it a bit by storing a secondary map that are views into the existing string map, but in the otel types so it doesn't have to do conversions. With the 200k syns pcap we use on os-perf-1, it cuts down the contribution of zeek::telemetry from 11.5% to 9.3%. There's still a performance hit from the function call though. I asked on their Slack channel what else we can do about it.

I talked to one of the OTEL guys and shared a flamegraph with him. He agreed that it's a performance issue for it to be rehashing the map every time, even though the map hasn't changed at all since the last time. He said he'd look into it as part of some other performance issues he's investigating soon.

src/telemetry/Counter.cc

awelzel · 2023-11-23T16:12:10Z

The endpoint attribute that is added to every metric in the Broker implementation is missing.

But it does add {otel_scope_name} - would be nice if that was suppressed :-)

# HELP zeek_log_stream_writes_total_ Total number of log writes for the given stream.                                                                                                                                                                                                                                                                                                                                                                                                                                                                 │  CPUs which run at the same hardware frequency: 7
# TYPE zeek_log_stream_writes_total_ counter                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          │  CPUs which need to have their frequency coordinated by software: 7
zeek_log_stream_writes_total_{endpoint="",module="PacketFilter",stream="PacketFilter::LOG",otel_scope_name="zeek"} 1                                                                                                                                                                                                                                                                                                                                                                                                                                  │  maximum transition latency: 20.0 us.
# TYPE process_open_fds_ gauge                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        │  hardware limits: 400 MHz - 4.60 GHz
process_open_fds_{otel_scope_name="process"} 25

timwoj · 2023-12-02T21:09:26Z

The endpoint attribute that is added to every metric in the Broker implementation is missing.

But it does add {otel_scope_name} - would be nice if that was suppressed :-)

# HELP zeek_log_stream_writes_total_ Total number of log writes for the given stream.                                                                                                                                                                                                                                                                                                                                                                                                                                                                 │  CPUs which run at the same hardware frequency: 7
# TYPE zeek_log_stream_writes_total_ counter                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          │  CPUs which need to have their frequency coordinated by software: 7
zeek_log_stream_writes_total_{endpoint="",module="PacketFilter",stream="PacketFilter::LOG",otel_scope_name="zeek"} 1                                                                                                                                                                                                                                                                                                                                                                                                                                  │  maximum transition latency: 20.0 us.
# TYPE process_open_fds_ gauge                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        │  hardware limits: 400 MHz - 4.60 GHz
process_open_fds_{otel_scope_name="process"} 25

I agree that I'd love for that to be suppressed but it's apparently part of the otel spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md#instrumentation-scope-1

Prometheus exporters MUST add the scope name as the otel_scope_name label and the scope version as the otel_scope_version label on all metric points by default, based on the scope the original data point was nested in.

awelzel · 2023-12-04T09:20:38Z

I agree that I'd love for that to be suppressed but it's apparently part of the otel spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md#instrumentation-scope-1

Prometheus exporters MUST add the scope name as the otel_scope_name label and the scope version as the otel_scope_version label on all metric points by default, based on the scope the original data point was nested in.

Which is followed by this sentence:

Prometheus exporters SHOULD provide a configuration option to disable the otel_scope_info metric and otel_scope_ labels.

There's no such option? :-)

timwoj · 2023-12-08T16:37:03Z

There's no such option? :-)

Not that I could find. I opened open-telemetry/opentelemetry-cpp#2442 after asking on Slack and getting no response for a few days.

EDIT: This is fixed by open-telemetry/opentelemetry-cpp#2451, once that gets merged and released.

With the broker version the copying was mostly ok because all of the actual telemetry data was stored internally in broker and accessed via handle objects. Copying an object wouldn't matter as long as the handle was maintained. In the OTEL version, a copy of the count data is stored local to the Counter/Gauge/Histogram objects, so copies through the OpaqueVal objects will screw things up and the values get "lost". This is unfortunately a breaking change to the API, as GetOrAdd() returns a shared_ptr now instead of a bare object.

- Reduce code duplication via a set of base classes for Counter/Gauge/Histogram - Only create a single otel instrument for each prefix/name combination

We shouldn't need to do these things. There's a PR open in otel-cpp to fix both issues on their side, but it's been open since June and hasn't been completed yet. Hopefully it'll land soon.

timwoj · 2024-01-18T20:32:45Z

The otel work is on hold until they fix all of the issues that I've opened with them, since most of them are blockers to our usage. I'm going to close this and will open a new one when there's something to review.

timwoj force-pushed the topic/timw/open-telemetry branch 2 times, most recently from 2da4d00 to 8c7ef63 Compare November 10, 2023 16:11

timwoj added the fullci Run the full set of tasks in CI label Nov 10, 2023

timwoj marked this pull request as draft November 10, 2023 16:54

timwoj force-pushed the topic/timw/open-telemetry branch 7 times, most recently from 5b1e4bd to f1fbfef Compare November 13, 2023 22:22

timwoj changed the title ~~RFC: OpenTelemetry~~ POC: OpenTelemetry Nov 14, 2023

timwoj changed the title ~~POC: OpenTelemetry~~ POC: OpenTelemetry metrics rework Nov 14, 2023

timwoj force-pushed the topic/timw/open-telemetry branch 7 times, most recently from f80d41a to 11ff00a Compare November 18, 2023 00:00

timwoj removed the fullci Run the full set of tasks in CI label Nov 20, 2023

timwoj force-pushed the topic/timw/open-telemetry branch 3 times, most recently from 25f564b to df5d2d2 Compare November 22, 2023 17:24

awelzel reviewed Nov 23, 2023

View reviewed changes

timwoj force-pushed the topic/timw/open-telemetry branch 2 times, most recently from 50679fd to 3d3e30f Compare December 7, 2023 21:46

timwoj force-pushed the topic/timw/open-telemetry branch from 3d3e30f to 6ec89cc Compare December 8, 2023 00:23

timwoj added 8 commits January 2, 2024 15:12

Add opentelemetry-cpp submodule, FindOpenTelemetry cmake module

84eadfc

Convert existing telemetry code to use OpenTelemetry

6f35dfd

Big rework of everything telemetry

50838b1

- Reduce code duplication via a set of base classes for Counter/Gauge/Histogram - Only create a single otel instrument for each prefix/name combination

Lay down framework for async/observable counters

51f213d

Add kSum view based on is_sum arguments, remove existing views

0ff4930

Pass default upper bounds to histogram instruments via views

33e6747

Refactor all of the code involved in the Collect bifs

b12354b

timwoj force-pushed the topic/timw/open-telemetry branch from 2722f68 to dc3dfe5 Compare January 2, 2024 22:12

timwoj and others added 6 commits January 3, 2024 15:42

Remove old Collect interface, replace with new one (API break)

0671d05

Rework metric naming to set _total suffix, remove passing unit to otel

44a1d8d

We shouldn't need to do these things. There's a PR open in otel-cpp to fix both issues on their side, but it's been open since June and hasn't been completed yet. Hopefully it'll land soon.

CI: Add curl-devel packages needed by open-telemetry to Dockerfiles

606724d

Minor amount of header cleanup

a5ca052

Fix some bugs with {Counter,Gauge,Histogram}Instance methods

6df1dd6

Implement process stats retrieval for macOS, Linux, and FreeBSD

bb8a96d

timwoj force-pushed the topic/timw/open-telemetry branch from 8c124c1 to 9e8cc24 Compare January 3, 2024 22:49

timwoj and others added 5 commits January 8, 2024 17:19

Remove code that initializes metrics in Broker

4e2ef64

Insert 'endpoint' label and value into every metric

c6eea71

Update a few btests that are just reorderings

3ba41d2

MetricFamily: Store otel view version of attributes

52ec9ae

Cleanup histogram creation code a little, trying to fix buckets

6ff2cc3

timwoj force-pushed the topic/timw/open-telemetry branch from f69b93c to 6ff2cc3 Compare January 9, 2024 00:19

timwoj closed this Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POC: OpenTelemetry metrics rework #3448

POC: OpenTelemetry metrics rework #3448

timwoj commented Nov 9, 2023 •

edited

Loading

awelzel left a comment

awelzel Nov 23, 2023

awelzel Nov 23, 2023

awelzel Nov 23, 2023

timwoj Nov 27, 2023

awelzel Nov 28, 2023

timwoj Nov 28, 2023

timwoj Dec 2, 2023 •

edited

Loading

timwoj Dec 14, 2023

awelzel commented Nov 23, 2023

timwoj commented Dec 2, 2023

awelzel commented Dec 4, 2023

timwoj commented Dec 8, 2023 •

edited

Loading

timwoj commented Jan 18, 2024

POC: OpenTelemetry metrics rework #3448

POC: OpenTelemetry metrics rework #3448

Conversation

timwoj commented Nov 9, 2023 • edited Loading

What’s implemented:

What’s features are still missing:

Some broken things in OTel:

API Changes and proposed changes:

awelzel left a comment

Choose a reason for hiding this comment

awelzel Nov 23, 2023

Choose a reason for hiding this comment

awelzel Nov 23, 2023

Choose a reason for hiding this comment

awelzel Nov 23, 2023

Choose a reason for hiding this comment

timwoj Nov 27, 2023

Choose a reason for hiding this comment

awelzel Nov 28, 2023

Choose a reason for hiding this comment

timwoj Nov 28, 2023

Choose a reason for hiding this comment

timwoj Dec 2, 2023 • edited Loading

Choose a reason for hiding this comment

timwoj Dec 14, 2023

Choose a reason for hiding this comment

awelzel commented Nov 23, 2023

timwoj commented Dec 2, 2023

awelzel commented Dec 4, 2023

timwoj commented Dec 8, 2023 • edited Loading

timwoj commented Jan 18, 2024

timwoj commented Nov 9, 2023 •

edited

Loading

timwoj Dec 2, 2023 •

edited

Loading

timwoj commented Dec 8, 2023 •

edited

Loading