chore(profiling): turn off forking test #11559

taegyunkim · 2024-11-26T21:32:24Z

To unblock fix(crashtracking): resolve issue with zombie processes being behind before re:invent code freeze

Checklist

PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

github-actions · 2024-11-26T21:32:53Z

CODEOWNERS have been resolved as:

ddtrace/internal/datadog/profiling/dd_wrapper/test/CMakeLists.txt       @DataDog/profiling-python

pr-commenter · 2024-11-26T22:12:45Z

Benchmarks

Benchmark execution time: 2024-11-26 22:12:43

Comparing candidate commit 857c936 in PR branch taegyunkim/turnoff-forkdeath with baseline commit 3fabd0a in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 388 metrics, 2 unstable metrics.

nsrip-dd · 2024-11-27T15:17:02Z

ddtrace/internal/datadog/profiling/dd_wrapper/test/CMakeLists.txt

@@ -35,5 +35,7 @@ endfunction()
 dd_wrapper_add_test(test_initialization test_initialization.cpp)
 dd_wrapper_add_test(test_api test_api.cpp)
 dd_wrapper_add_test(test_threading test_threading.cpp)
-dd_wrapper_add_test(test_forking test_forking.cpp)
+# TODO: Re-enable test_forking. It was turned off as it started to fail with
+# v14.3.1 in https://github.com/DataDog/dd-trace-py/pull/11547.


From the failures in that PR (example: https://gitlab.ddbuild.io/DataDog/apm-reliability/dd-trace-py/-/jobs/721327091) it's not immediately clear to me what's going wrong. I'm hesitant to skip these tests as they have helped me catch at least one real problem (a deadlock after forking with the libdatadog string table). Could you say more about why we should skip the tests? Are the failures not indicating a real problem in libdatadog or our library?

Didn't reproduce the failures with just plain cmake + ctest, but was able to produce failures running via scripts/ddtest plus riot. Building from #11547. Needed to add logging to determine this (TODO: send a PR) but in ForkDeathTest.SampleInThreadsAndForkManyFast and others we have child processes dying with segfaults. Built with debug info and got some cores. Here's a crash:

(gdb) bt #0 atomic_load<usize> () at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/sync/atomic.rs:3288 #1 load () at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/sync/atomic.rs:2396 #2 push_or_else<*mut core::ffi::c_void, crossbeam_queue::array_queue::{impl#4}::push::{closure_env#0}<*mut core::ffi::c_void>> () at /go/src/github.com/DataDog/apm-reliability/libddprof-build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossbeam-queue-0.3.11/src/array_queue.rs:134 #3 push<*mut core::ffi::c_void> () at /go/src/github.com/DataDog/apm-reliability/libddprof-build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossbeam-queue-0.3.11/src/array_queue.rs:206 #4 {closure#0} () at ddcommon-ffi/src/array_queue.rs:145 #5 ddog_ArrayQueue_push () at ddcommon-ffi/src/array_queue.rs:143 #6 0x00007f20d6780012 in Datadog::SynchronizedSamplePool::return_sample (this=<optimized out>, sample=<optimized out>) at /root/project/ddtrace/internal/datadog/profiling/dd_wrapper/src/synchronized_sample_pool.cpp:63 #7 0x00007f20d678042f in Datadog::SampleManager::drop_sample (sample=<optimized out>, sample=<optimized out>) at /usr/include/c++/8/bits/unique_ptr.h:342 #8 0x000055c456cc8054 in send_sample (id=<optimized out>) at /root/project/ddtrace/internal/datadog/profiling/dd_wrapper/test/test_utils.hpp:153 #9 0x000055c456cc819f in emulate_sampler (id=0, sleep_time_ns=10000, done=...) at /root/project/ddtrace/internal/datadog/profiling/dd_wrapper/test/test_utils.hpp:165 #10 0x00007f20d712ab2f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #11 0x00007f20d6eb8fa3 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #12 0x00007f20d6dea06f in clone () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) p $_siginfo._sifields._sigfault.si_addr $1 = (void *) 0x80

Seems like a legitimate issue? Will continue investigating, but I definitely don't want to just turn off these tests at this point.

Edit to add: also worth noting I think the Error uploading (failed ddog_prof_Exporter_send: error trying to connect: received corrupt message of type InvalidContentType: received corrupt message of type InvalidContentType) lines might be noise, since the test (as far as I can tell) isn't actually set up to point to an agent that actually exists and doesn't even check that uploads succeed. We just see the stderr logs when the test fails.

chore(profiling): turn off forking test

857c936

taegyunkim requested a review from a team as a code owner November 26, 2024 21:32

taegyunkim added changelog/no-changelog A changelog entry is not required for this PR. Profiling Continous Profling labels Nov 26, 2024

taegyunkim requested a review from brettlangdon November 26, 2024 21:32

taegyunkim enabled auto-merge (squash) November 26, 2024 21:34

nsrip-dd reviewed Nov 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(profiling): turn off forking test #11559

chore(profiling): turn off forking test #11559

taegyunkim commented Nov 26, 2024 •

edited

Loading

github-actions bot commented Nov 26, 2024

pr-commenter bot commented Nov 26, 2024

nsrip-dd Nov 27, 2024

nsrip-dd Nov 27, 2024 •

edited

Loading

chore(profiling): turn off forking test #11559

Are you sure you want to change the base?

chore(profiling): turn off forking test #11559

Conversation

taegyunkim commented Nov 26, 2024 • edited Loading

Checklist

Reviewer Checklist

github-actions bot commented Nov 26, 2024

pr-commenter bot commented Nov 26, 2024

Benchmarks

nsrip-dd Nov 27, 2024

Choose a reason for hiding this comment

nsrip-dd Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

taegyunkim commented Nov 26, 2024 •

edited

Loading

nsrip-dd Nov 27, 2024 •

edited

Loading