Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise default param registration #119

Open
mtmorgan opened this issue Jul 9, 2020 · 5 comments
Open

Revise default param registration #119

mtmorgan opened this issue Jul 9, 2020 · 5 comments

Comments

@mtmorgan
Copy link
Collaborator

mtmorgan commented Jul 9, 2020

MulticoreParam() is appealing for interactive use, but problematic in package use, as discussed eg drisso/zinbwave#38 (comment). Update default strategy to use SnowParam() as the default.

@c-mertes
Copy link

c-mertes commented Sep 1, 2020

Just wanted to leave my two cents here as we had already some issues with this too (referring to: drisso/zinbwave#38 (comment)). As of my knowledge BiocParallel ignores SLURM, Snakemake or other job scheduler which provides number of cores to be used (which would be a nice feature request). Hence, if you do not specify this in your code on the user side explicitly you overcommit on cpu and memory by default, which usually ends in termination of your job (at least on SLURM where memory limits are enforced).

So instead of relying on N-2 as default I would rather suggest as default min(10, N-2) (or any other reasonable value for 10) as it will use all cores on a desktop/laptop, but will not drain right away a server/cluster with 100 cores or more. This should also be sufficient for most of the end-users.

@mtmorgan
Copy link
Collaborator Author

mtmorgan commented Sep 1, 2020

@c-mertes I think the BatchtoolsParam() is appropriate for use on SLURM; the vignette 'Introduction to BatchtoolsParam' discusses the registryargs parameter and use of templates for controlling use of cores.

@c-mertes
Copy link

c-mertes commented Sep 1, 2020

Sorry for not being too clear on this @mtmorgan. Batchtools is great if you want to parallelize within R across a cluster, but I was referring to the scenario where you use snakemake or another workflow manager that spawns jobs across clusters using e.g. SLURM and then call R scripts and parallelize within the job with MulticoreParams. Here one could capture the environment variables to restrict the cores. Maybe this scenario is just too specific.

@DarwinAwardWinner
Copy link

@c-mertes I think you are talking about an unrelated issue. This issue is about switching from parallelization using forking to non-forking parallelization, not changing the default number of cores.

@DarwinAwardWinner
Copy link

With regard to parallelization in RStudio, BiocParallel could borrow the logic used in future: https://github.com/HenrikBengtsson/future/blob/develop/R/supportsMulticore.R#L71-L91

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants