Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use aiormq with closing fix that gracefully teardown RMQ connection #6672

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

unkcpz
Copy link
Member

@unkcpz unkcpz commented Dec 20, 2024

fixes: #6673

apply the change of mosquito/aiormq#208

@unkcpz unkcpz requested a review from agoscinski as a code owner December 20, 2024 08:41
@unkcpz unkcpz marked this pull request as draft December 20, 2024 08:41
@unkcpz unkcpz force-pushed the aiormq-connection-close-timeout-fix branch from 34a1676 to 3851fdd Compare December 20, 2024 08:42
@unkcpz unkcpz marked this pull request as ready for review December 20, 2024 08:42
@unkcpz unkcpz force-pushed the aiormq-connection-close-timeout-fix branch from b3b39ef to c7f02fe Compare December 20, 2024 08:43
Copy link

codecov bot commented Dec 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.93%. Comparing base (02cbe0c) to head (c7f02fe).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6672      +/-   ##
==========================================
- Coverage   77.94%   77.93%   -0.00%     
==========================================
  Files         563      563              
  Lines       41761    41761              
==========================================
- Hits        32545    32543       -2     
- Misses       9216     9218       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@unkcpz
Copy link
Member Author

unkcpz commented Dec 20, 2024

This is an improvment on the experience of running pytest for daemon related tests, it will other wise hit the timeout for closing the broker. The time of running whole tests suit is only improved a bit since the closing only run once per session.

But I think it may also improve the responsiveness of verdi daemon stop (need future benchmark and test).
From test point of view, if the change I did for aiormq is correct and we got this PR merged, we can have a daemon_client_clean fixture on the function scope that will always close the rmq after every test which make the tests more independent and more parallel fearless.

@unkcpz unkcpz added the pr/blocked PR is blocked by another PR that should be merged first label Dec 20, 2024
@unkcpz
Copy link
Member Author

unkcpz commented Dec 21, 2024

I went through the code base and found in aiida-core the workaround was introduced for solving the issue, it may not needed after this fixed. @sphuber am I correct about the test is related and is other place in the code base that may relate? I think it is and I'll add comment at #5732

@pytest.mark.requires_rmq
def test_disconnect():
"""Test the communicator disconnect.
When the dependency ``kiwipy`` was updated to v0.8, it introduced a problem with shutting down the communicator.
After at least one process would have been run, trying to disconnect the communcitor would time out. The problem
is related to the update of the lower lying libraries ``aio-pika`` and ``aiormq`` to v9.4 and v6.8, respectively.
After much painstaking debugging the cause could not be determined, nor a solution. This test is added to
demonstrate the problematic behavior. Getting the communicator and then disconnecting it (through calling
:meth:`aiida.manage.manager.Manager.reset_profile`) works fine. However, if a process is a run before closing it,
for example running a calcfunction, the closing of the communicator will raise a ``TimeoutError``.
"""
from aiida.manage import get_manager
manager = get_manager()
manager.get_communicator()
manager.reset_profile() # This returns just fine
result, node = add_calcfunction.run_get_node(1)
assert node.is_finished_ok
assert result == 2
manager.reset_profile() # This hangs before timing out

@sphuber
Copy link
Contributor

sphuber commented Dec 21, 2024

@unkcpz I did indeed add that workaround because it was causing the tests to fail (or even hang, don't remember exactly anymore). I spent a long time debugging but never managed to fix it nor produce a MWE to post to aiormq directly. It would indeed be great if it is indeed a bug there that can be fixed so we can remove the workaround.

@unkcpz unkcpz marked this pull request as draft December 31, 2024 20:24
@unkcpz
Copy link
Member Author

unkcpz commented Dec 31, 2024

Indeed there might be a more deep cause of the problem. Since after my PR of decoupling the rmq/kiwipy from plumpy and aiida-core in #6675, the problem disappeared without the need of mosquito/aiormq#208.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/blocked PR is blocked by another PR that should be merged first
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timeout when closing the broker
2 participants