Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CU-8695pvhfe fix usage monitoring for multiprocessing #488

Merged
merged 4 commits into from
Sep 17, 2024

Conversation

mart-r
Copy link
Collaborator

@mart-r mart-r commented Sep 9, 2024

Previously usage monitoring wouldn't work for multiprocessing.

The issue was 2-fold.

  1. For CAT.get_entities_multi_texts and CAT.multiprocessing_batch_docs_size (which uses the former internally)
    • Uses the CAT.pipe and Pipe.batch_multi_processHad directly
    • Had to add logging to the input/output separately
  2. For CAT.multiprocessing_batch_char_size
    • Runs on different processes, which means that different instances are used
      • And (for some reason) cleanup (i.e a __del__ call - which would flush automatically) is not done
    • Had to flush the usage monitor (i.e write to disk) at the end of each multiprocessing method (CAT._mp_cons) explicitly

The PR also adds tests to make sure the monitoring actually works as well. The 3 added tests failed in the previous state of the project.

When using CAT.multiprocessing_batch_char_size (CAT._multiprocessing_batch and CAT._mp_cons internally), flush the usage monitor at the end of multiprocessing method.
When using CAT.get_entities_multi_texts or CAT.multiprocessing_batch_docs_size (uses the former internally), add logging of usage to output
@tomolopolis
Copy link
Member

…sig.

Avoid checking length of (potentially) non-existent strings. Avoid early iteration of generator.
Copy link
Member

@tomolopolis tomolopolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mart-r mart-r merged commit eb912d6 into master Sep 17, 2024
8 checks passed
mart-r added a commit to mart-r/MedCAT that referenced this pull request Oct 14, 2024
* CU-8695pvhfe: Rename a test class

* CU-8695pvhfe: Add tests for multiprocessig usage monitoring

* CU-8695pvhfe: Fix usage monitor for multiprocessig.

When using CAT.multiprocessing_batch_char_size (CAT._multiprocessing_batch and CAT._mp_cons internally), flush the usage monitor at the end of multiprocessing method.
When using CAT.get_entities_multi_texts or CAT.multiprocessing_batch_docs_size (uses the former internally), add logging of usage to output

* CU-8695pvhfe: Fix remaining issues with usage monitor for multiprocessig.

Avoid checking length of (potentially) non-existent strings. Avoid early iteration of generator.
@mart-r mart-r deleted the CU-8695pvhfe-fix-usage-monitor-mp branch November 18, 2024 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants