Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

active_job: Context propagation is broken #1131

Closed
lavoiesl opened this issue Aug 20, 2024 · 1 comment · Fixed by #1132
Closed

active_job: Context propagation is broken #1131

lavoiesl opened this issue Aug 20, 2024 · 1 comment · Fixed by #1132
Labels
bug Something isn't working

Comments

@lavoiesl
Copy link
Contributor

lavoiesl commented Aug 20, 2024

Description of the bug

I believe #677 (comment) caused Context propagation to break.

This is a bit hard to detect since a link is created if the execution is within a span, but it links to the parent span rather than the span newly created

Share details about your runtime

Operating system details: Linux, Ubuntu 20.04 LTS
RUBY_ENGINE: "ruby"
RUBY_VERSION: "3.3.4"
RUBY_DESCRIPTION: "ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin23]"

Share a simplified reproduction if possible

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  gem "opentelemetry-api"
  gem "opentelemetry-sdk"
  gem "opentelemetry-instrumentation-active_job", "0.7.6"
  gem "activejob", "7.1.3.4"
  gem "activesupport", "7.1.3.4"
end

require "opentelemetry-api"
require "opentelemetry-sdk"
require "opentelemetry-instrumentation-active_job"
require "active_job"

EXPORTER = OpenTelemetry::SDK::Trace::Export::InMemorySpanExporter.new
span_processor = OpenTelemetry::SDK::Trace::Export::SimpleSpanProcessor.new(EXPORTER)

OpenTelemetry::SDK.configure do |c|
  c.add_span_processor(span_processor)
  c.use("OpenTelemetry::Instrumentation::ActiveJob", { propagation_style: :link, span_naming: :queue })
end

SimpleTracer = OpenTelemetry.tracer_provider.tracer("Bug Report")

class MyJob < ActiveJob::Base
  def perform
  end
end

class TestAdapterTest < ActiveSupport::TestCase
  setup do
    EXPORTER.reset
  end

  test "test adapter" do
    singleton_class.include(ActiveJob::TestHelper)
    ActiveJob::Base.queue_adapter = :test

    MyJob.perform_later
    perform_enqueued_jobs

    publish, process = EXPORTER.finished_spans.sort_by(&:start_timestamp)

    assert_nil(publish.links)

    # Fails, no links
    assert_not_nil(process.links)
    assert_equal(publish.hex_span_id, process.links.first.span_context.hex_span_id)
  end

  test "test adapter + in_span" do
    singleton_class.include(ActiveJob::TestHelper)
    ActiveJob::Base.queue_adapter = :test

    SimpleTracer.in_span("tests") do
      MyJob.perform_later
    end

    perform_enqueued_jobs

    tests, publish, process = EXPORTER.finished_spans.sort_by(&:start_timestamp)

    assert_nil(publish.links)
    assert_not_nil(process.links)

    # Fails, maps to the "tests" span
    assert_equal(publish.hex_span_id, process.links.first.span_context.hex_span_id)
  end

  test "inline adapter" do
    ActiveJob::Base.queue_adapter = :inline

    MyJob.perform_later

    publish, process = EXPORTER.finished_spans.sort_by(&:start_timestamp)

    assert_nil(publish.links)
    assert_not_nil(process.links)

    # Succeeds, by accident
    assert_equal(publish.hex_span_id, process.links.first.span_context.hex_span_id)
  end

  test "inline adapter + in_span" do
    ActiveJob::Base.queue_adapter = :inline

    SimpleTracer.in_span("tests") do
      MyJob.perform_later
    end

    tests, publish, process = EXPORTER.finished_spans.sort_by(&:start_timestamp)

    assert_nil(publish.links)
    assert_not_nil(process.links)

    # Fails, maps to the "tests" span
    assert_equal(publish.hex_span_id, process.links.first.span_context.hex_span_id)
  end

  test "async adapter" do
    ActiveJob::Base.queue_adapter = :async
    ActiveJob::Base.queue_adapter.immediate = true

    MyJob.perform_later

    publish, process = EXPORTER.finished_spans.sort_by(&:start_timestamp)

    assert_nil(publish.links)
    assert_not_nil(process.links)

    # Succeeds, by accident
    assert_equal(publish.hex_span_id, process.links.first.span_context.hex_span_id)
  end

  test "async adapter + in_span" do
    ActiveJob::Base.queue_adapter = :async
    ActiveJob::Base.queue_adapter.immediate = true

    SimpleTracer.in_span("tests") do
      MyJob.perform_later
    end

    tests, publish, process = EXPORTER.finished_spans.sort_by(&:start_timestamp)

    assert_nil(publish.links)
    assert_not_nil(process.links)

    # Fails, maps to the "tests" span
    assert_equal(publish.hex_span_id, process.links.first.span_context.hex_span_id)
  end
end

Minitest.run
$ ruby test.rb
Fetching gem metadata from https://rubygems.org/........
Resolving dependencies...
I, [2024-08-20T18:29:06.194617 #48517]  INFO -- : Instrumentation: OpenTelemetry::Instrumentation::ActiveJob was successfully installed with the following options {:propagation_style=>:link, :force_flush=>false, :span_naming=>:queue}
Run options: --seed 17200

# Running:

[ActiveJob] Enqueued MyJob (Job ID: 5d63e49e-8bc0-471b-bc79-06753e1a2c11) to Test(default)
[ActiveJob] [MyJob] [5d63e49e-8bc0-471b-bc79-06753e1a2c11] Performing MyJob (Job ID: 5d63e49e-8bc0-471b-bc79-06753e1a2c11) from Test(default) enqueued at 2024-08-20T22:29:06.278132000Z
[ActiveJob] [MyJob] [5d63e49e-8bc0-471b-bc79-06753e1a2c11] Performed MyJob (Job ID: 5d63e49e-8bc0-471b-bc79-06753e1a2c11) from Test(default) in 9.63ms
F[ActiveJob] [MyJob] [b3634b4e-6e74-44c9-a867-a7ee7eac2fec] Performing MyJob (Job ID: b3634b4e-6e74-44c9-a867-a7ee7eac2fec) from Async(default) enqueued at 2024-08-20T22:29:06.303247000Z
[ActiveJob] [MyJob] [b3634b4e-6e74-44c9-a867-a7ee7eac2fec] Performed MyJob (Job ID: b3634b4e-6e74-44c9-a867-a7ee7eac2fec) from Async(default) in 0.08ms
[ActiveJob] Enqueued MyJob (Job ID: b3634b4e-6e74-44c9-a867-a7ee7eac2fec) to Async(default)
.[ActiveJob] [MyJob] [a9584559-c363-4540-8647-bf6e6ac31f44] Performing MyJob (Job ID: a9584559-c363-4540-8647-bf6e6ac31f44) from Inline(default) enqueued at 2024-08-20T22:29:06.304050000Z
[ActiveJob] [MyJob] [a9584559-c363-4540-8647-bf6e6ac31f44] Performed MyJob (Job ID: a9584559-c363-4540-8647-bf6e6ac31f44) from Inline(default) in 0.06ms
[ActiveJob] Enqueued MyJob (Job ID: a9584559-c363-4540-8647-bf6e6ac31f44) to Inline(default)
F[ActiveJob] [MyJob] [9a31b95c-0425-4c44-84ad-bf4dd7dbee90] Performing MyJob (Job ID: 9a31b95c-0425-4c44-84ad-bf4dd7dbee90) from Async(default) enqueued at 2024-08-20T22:29:06.311565000Z
[ActiveJob] [MyJob] [9a31b95c-0425-4c44-84ad-bf4dd7dbee90] Performed MyJob (Job ID: 9a31b95c-0425-4c44-84ad-bf4dd7dbee90) from Async(default) in 0.23ms
[ActiveJob] Enqueued MyJob (Job ID: 9a31b95c-0425-4c44-84ad-bf4dd7dbee90) to Async(default)
F[ActiveJob] Enqueued MyJob (Job ID: 90c40efb-15b9-448f-b30e-f73a40d3c2ca) to Test(default)
[ActiveJob] [MyJob] [90c40efb-15b9-448f-b30e-f73a40d3c2ca] Performing MyJob (Job ID: 90c40efb-15b9-448f-b30e-f73a40d3c2ca) from Test(default) enqueued at 2024-08-20T22:29:06.317607000Z
[ActiveJob] [MyJob] [90c40efb-15b9-448f-b30e-f73a40d3c2ca] Performed MyJob (Job ID: 90c40efb-15b9-448f-b30e-f73a40d3c2ca) from Test(default) in 0.06ms
F[ActiveJob] [MyJob] [631bc064-413e-4025-9175-7b4cb7f598cd] Performing MyJob (Job ID: 631bc064-413e-4025-9175-7b4cb7f598cd) from Inline(default) enqueued at 2024-08-20T22:29:06.318092000Z
[ActiveJob] [MyJob] [631bc064-413e-4025-9175-7b4cb7f598cd] Performed MyJob (Job ID: 631bc064-413e-4025-9175-7b4cb7f598cd) from Inline(default) in 0.04ms
[ActiveJob] Enqueued MyJob (Job ID: 631bc064-413e-4025-9175-7b4cb7f598cd) to Inline(default)
.

Finished in 0.044545s, 134.6953 runs/s, 381.6365 assertions/s.

  1) Failure:
TestAdapterTest#test_test_adapter_+_in_span [test.rb:66]:
--- expected
+++ actual
@@ -1,3 +1,3 @@
 # encoding: US-ASCII
 #    valid: true
-"1ed9395bd9d225c3"
+"11ce557efe82eff8"


  2) Failure:
TestAdapterTest#test_inline_adapter_+_in_span [test.rb:92]:
--- expected
+++ actual
@@ -1,3 +1,3 @@
 # encoding: US-ASCII
 #    valid: true
-"1c245b894b7b40ad"
+"3ceb892af3243489"


  3) Failure:
TestAdapterTest#test_async_adapter_+_in_span [test.rb:120]:
--- expected
+++ actual
@@ -1,3 +1,3 @@
 # encoding: US-ASCII
 #    valid: true
-"235152d886879174"
+"11eb8bf7f748a17f"


  4) Failure:
TestAdapterTest#test_test_adapter [test.rb:48]:
Expected nil to not be nil.

6 runs, 17 assertions, 4 failures, 0 errors, 0 skips
@lavoiesl lavoiesl added the bug Something isn't working label Aug 20, 2024
@lavoiesl
Copy link
Contributor Author

lavoiesl commented Aug 20, 2024

At first, I thought that perhaps we needed to pass the context created by context_with_span to OpenTelemetry.propagation.inject at

OpenTelemetry.propagation.inject(job.__otel_headers) # This must be transmitted over the wire
{ span: span, ctx_token: OpenTelemetry::Context.attach(OpenTelemetry::Trace.context_with_span(span)) }

But that's when I realized that the current context is never set.

Instead, I think OpenTelemetry::Instrumentation::ActiveJob::Handlers::Default#start needs to add the the current Context stack and for finish_span to pop it.

Actually, the issue is that the context is set before injecting it in job.__otel_headers, so the parent context ends up being injected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant