Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ensure the correct ids are serialized and propagated #3709

Merged
merged 8 commits into from
Jun 17, 2024

Conversation

mabdinur
Copy link
Contributor

What does this PR do?

Motivation:

Additional Notes:

How to test the change?

Unsure? Have a question? Request a review!

@github-actions github-actions bot added otel OpenTelemetry-related changes tracing labels Jun 12, 2024
profiling test updates

fix profiling tests 2

revert

Apply suggestions from code review

lint
@mabdinur mabdinur marked this pull request as ready for review June 13, 2024 20:34
@mabdinur mabdinur requested a review from a team as a code owner June 13, 2024 20:34
@mabdinur
Copy link
Contributor Author

Profiling tests are failing due to 64bit unsigned span_ids being converted to signed integers.

@marcotc Is this a bug in the profiler?

Copy link
Member

@marcotc marcotc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just two small changes to suggested.

@marcotc
Copy link
Member

marcotc commented Jun 13, 2024

Profiling tests are failing due to 64bit unsigned span_ids being converted to signed integers.

I don't see any changes in your PR that should affect it, can you figure out what line exactly in your PR is triggering the issue?

@mabdinur
Copy link
Contributor Author

mabdinur commented Jun 14, 2024

Profiling tests are failing due to 64bit unsigned span_ids being converted to signed integers.

I don't see any changes in your PR that should affect it, can you figure out what line exactly in your PR is triggering the issue?

In the failing test the span_id of 13756630617282998475 is encoded as -4690113456426553141. Both numbers have the same 2s complement representation 1011111011101001010111101010100100100001011010100111000011001011.

It seems like the profiler is using signed 64bit integers to correlate spans to threads. However I can't find where this occurs. Looking at the thread collector's implementation it seems like the span_id should be an unsigned integer: https://github.com/DataDog/dd-trace-rb/blob/master/ext/datadog_profiling_native_extension/collectors_thread_context.c#L158. The ruby otel tracer generates span_ids up to 2^64 while the datadog ruby tracer generates span ids up to 2^63 so this wasn't an issue in the past.

cc: @ivoanjo

Update

I think I found the issue. LabelValue.num stores integers as i64. This is where the unsigned 64bit span_ids are miscast to signed integers: https://github.com/DataDog/dd-trace-rb/blob/5a95611148ceb8744c9692c6bc3f57e66094a970/ext/datadog_profiling_native_extension/collectors_thread_context.c#L741-#L742.

@ivoanjo
Copy link
Member

ivoanjo commented Jun 17, 2024

Taking a look, thanks for the ping!

Pprof decodes numeric labels as signed 64-bit values BUT as a slight
cheat both the profiler and the backend interpret them as unsigned
64-bit values so we can represent the full 64-bit values used for span
ids in traces.

But we were not accounting for this in our tests, so the decoded pprofs
we use for testing were not matching the expected values.

I've tweaked the pprof test reading code to account for this.
@ivoanjo ivoanjo requested a review from a team as a code owner June 17, 2024 13:47
@ivoanjo
Copy link
Member

ivoanjo commented Jun 17, 2024

I've pushed a fix in a99f0e5 . Sorry for breaking your PR! I remember thinking "I should correct for the unsigned thing in our tests..." when working on #3466 but clearly ended up not doing it.

As I stated in the commit message -- both the profiler code and the backend know to treat these values as unsigned, and it was really only the test code that also decodes profiles to match on them that wasn't doing the same adjustment.

LMK if I can provide any more help :)

@mabdinur
Copy link
Contributor Author

I've pushed a fix in a99f0e5 . Sorry for breaking your PR! I remember thinking "I should correct for the unsigned thing in our tests..." when working on #3466 but clearly ended up not doing it.

As I stated in the commit message -- both the profiler code and the backend know to treat these values as unsigned, and it was really only the test code that also decodes profiles to match on them that wasn't doing the same adjustment.

LMK if I can provide any more help :)

It's good to know span_ids are represented as unsigned integers in the backend. This fix looks good to me. Thanks for looking into this.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.11%. Comparing base (e086475) to head (a99f0e5).
Report is 21 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3709   +/-   ##
=======================================
  Coverage   98.11%   98.11%           
=======================================
  Files        1225     1225           
  Lines       72795    72815   +20     
  Branches     3482     3486    +4     
=======================================
+ Hits        71420    71441   +21     
+ Misses       1375     1374    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mabdinur
Copy link
Contributor Author

@marcotc This PR should be good to go

@marcotc marcotc merged commit 75278d6 into master Jun 17, 2024
167 checks passed
@marcotc marcotc deleted the munir/fix-otel-span-ids branch June 17, 2024 21:05
@github-actions github-actions bot added this to the 2.2.0 milestone Jun 17, 2024
@TonyCTHsu TonyCTHsu mentioned this pull request Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
otel OpenTelemetry-related changes tracing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants