Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postprocesses very slow due to Python’s GIL #153

Open
PhilippvK opened this issue Mar 28, 2024 · 1 comment
Open

Postprocesses very slow due to Python’s GIL #153

PhilippvK opened this issue Mar 28, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request priority:medium Medium priority task

Comments

@PhilippvK
Copy link
Member

I realized that only one core is utilized during the POSTPROCESS stage when running computation heavy postprocesses (--post analyze_instructions) even when using the --parallel flag.

This makes sense since we are using a ThreadPoolExecutor to execute several runs in parallel, which works well for I/O bound (including calls to 3rd party sub processes) tasks such as found in the TUNE, BUILD and COMPILE. For compute-bound Python code, we are running into problems due to Python’s Global Interpreter Lock (GIL) which basically only allows one Thread to use the interpreter at any point time to ensure thread safety.

A solution for this limitation ist to use the ProcessPoolExecutor which is forking a new completely independent process instead and therefore not facing the same issue. However there are a few challenges involved with this approach:

  • No shared variables between workers: Everything needs to be passed by arguments
  • All data passed between MLonMCU and workers needs to be serializable using Pythons pickle feature. This is problematic as some items used by MLonMCU (decorated functions, context locks,…) cannot be pickled but I am working on a solution for this.
@PhilippvK PhilippvK added enhancement New feature or request priority:medium Medium priority task labels Mar 28, 2024
@PhilippvK PhilippvK self-assigned this Mar 28, 2024
@PhilippvK
Copy link
Member Author

Here is a visualization of the problem.

Legend:
process_pool, per_stage=1: not supported
process_pool, per_stage=0: ~1min
thread_pool, per_stage=1: ~15min
thread_pool, per_stage=0: ~5min

mlonmcu_ram_cpu_disk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority:medium Medium priority task
Projects
None yet
Development

No branches or pull requests

1 participant