Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elasticapm/transport: specific shutdown handling for http transport #2085

Merged
merged 2 commits into from
Jul 26, 2024

Conversation

xrmx
Copy link
Member

@xrmx xrmx commented Jul 18, 2024

What does this pull request do?

We have a race condition at http transport shutdown where our atexit handler is racing against urllib3 ConnectionPool weakref finalizer. Having the urllib3 finalizer called before atexit would lead to have our thread hang waiting for send any eventual queued data via urllib3 pools that are closed.
Force the creation of a new PoolManager so that we are always able to flush.

Related issues

Closes #1975

basepi
basepi previously approved these changes Jul 18, 2024
@xrmx xrmx force-pushed the ensure-to-flush-everything-at-exit branch from 568c0ba to 20d51c0 Compare July 19, 2024 08:36
basepi
basepi previously approved these changes Jul 19, 2024
@xrmx
Copy link
Member Author

xrmx commented Jul 23, 2024

It looks like the tests that replicates the submission timeout is flaky on windows:

=========================== short test summary info ===========================
FAILED tests/transports/test_urllib3.py::test_close_timeout_error_without_flushing - assert True is False
 +  where True = <bound method Event.is_set of <threading.Event object at 0x00000224A10BE0D0>>()
 +    where <bound method Event.is_set of <threading.Event object at 0x00000224A10BE0D0>> = <threading.Event object at 0x00000224A10BE0D0>.is_set
 +      where <threading.Event object at 0x00000224A10BE0D0> = <elasticapm.transport.http.Transport object at 0x00000224A10CDDF0>._flushed
===== 1 failed, 745 passed, 63 skipped, 138 warnings in 90.95s (0:01:30) ======

@basepi
Copy link
Contributor

basepi commented Jul 23, 2024

It looks like the tests that replicates the submission timeout is flaky on windows

Honestly, I've been extremely tempted to remove our windows support entirely. I don't think anyone uses it, and the majority of our flakey tests have been on windows. (You can easily search for some of our other tests we skip on windows due to flakiness.)

@xrmx xrmx force-pushed the ensure-to-flush-everything-at-exit branch from 20d51c0 to 08136b4 Compare July 24, 2024 15:48
We have a race condition at http transport shutdown where our atexit
handler is racing against urllib3 ConnectionPool weakref finalizer.
Having the urllib3 finalizer called before atexit would lead to
have our thread hang waiting for send any eventual queued data via
urllib3 pools that are closed.
Force the creation of a new PoolManager so that we are always able to flush.
@xrmx xrmx force-pushed the ensure-to-flush-everything-at-exit branch from 08136b4 to 25c4f9e Compare July 25, 2024 09:52
@xrmx xrmx requested a review from basepi July 25, 2024 09:57
@xrmx
Copy link
Member Author

xrmx commented Jul 25, 2024

It looks like the tests that replicates the submission timeout is flaky on windows

Honestly, I've been extremely tempted to remove our windows support entirely. I don't think anyone uses it, and the majority of our flakey tests have been on windows. (You can easily search for some of our other tests we skip on windows due to flakiness.)

This time the problem was the test itself :)

@xrmx xrmx merged commit ae65416 into elastic:main Jul 26, 2024
90 checks passed
xrmx added a commit to xrmx/apm-agent-python that referenced this pull request Aug 19, 2024
…lastic#2085)

We have a race condition at http transport shutdown where our atexit
handler is racing against urllib3 ConnectionPool weakref finalizer.
Having the urllib3 finalizer called before atexit would lead to
have our thread hang waiting for send any eventual queued data via
urllib3 pools that are closed.
Force the creation of a new PoolManager so that we are always able to flush.
xrmx added a commit that referenced this pull request Aug 19, 2024
…2085)

We have a race condition at http transport shutdown where our atexit
handler is racing against urllib3 ConnectionPool weakref finalizer.
Having the urllib3 finalizer called before atexit would lead to
have our thread hang waiting for send any eventual queued data via
urllib3 pools that are closed.
Force the creation of a new PoolManager so that we are always able to flush.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Closing the transport connection timed out." caused by race condition
2 participants