-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nested multiprocessing in notebooks with mpire #29
Comments
Had issues myself using def _job(data_pack, sample):
# unpack shared data ...
clr_dict, view_df = data_pack
# define cooler to work on ...
_clr = clr_dict[sample]
from cooltools.sandbox.obs_over_exp_cooler import expected_full
# calculate full expected (cis + trans)
_exp_full = expected_full(
_clr,
view_df=view_df,
smooth_cis=False,
aggregate_trans=True,
expected_column_name="expected",
nproc=8,
)
return (sample, _exp_full)
# have to use daemon=False, because _job is multiprocessing-based already ...
with WorkerPool(
n_jobs=16,
shared_objects=(clrs_samples_dict, hg38_arms),
daemon=False,
start_method='forkserver', # or spawn ...
use_dill=True
) as wpool:
results = wpool.map(
_job,
list(clrs_samples_dict.keys()),
progress_bar=True
)
# sort out the results ...
exps_full_dict = {sample: _exp for sample, _exp in results} key here , is that |
since nested multiprocess is experimental - better to keep track of related issues and such on the mpire side sybrenjansen/mpire#105 |
Consider demonstrating an example of parallel execution of some of the cooltools API functions for multiple samples - i.e. when an API function itself is using multiprocessing and we want to do it in the in the notebook ...
If one have a big multicore system (16 real cores and more) it is easy to run several CLI tasks in parallel for multiple samples, where each task itself is using several cores - i.e. is multiprocessed. Very often such multiprocessed operations does not scale well beyond 8-12 processes - so it is indeed more "economical" to process multiple samples at once with fewer cores each.
Now - what if we want to achieve the same but in the notebook ? It is not trivial to do so - because multiprocess does not allow nesting (the way we typically use it/out of the box). Now it can be easily done with MPIRE https://github.com/sybrenjansen/mpire , which allows running multiple multiprocessed task in parallel and its API is very similar to multiprocess itself ... Check it out:
mpire test:
one-by-one using a ton of cores per task:
this has limited application to projects with many samples and people with big workstations - but when those 2 criteria are both met - the speed up is very appreciated
The text was updated successfully, but these errors were encountered: