You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using R-4.3.2 with doParallel v1.0.17 and foreach v1.5.2
I have built a proprietary R package which includes a function which parallelizes some tasks and sends them along with chunks of data to a number of workers. The package works as expected and the execution time was reduced to 1/4 of sequential execution time.:
f = function(data, workers, ...) {
< some computations involving splitting the data >
res = foreach( seq(along=data)
.combine = 'rbind'
, . multicombine = TRUE etc.) %dopar% {
< long computation >
}
return (res)
}
However, I have tried to send the memoised (cached to memory) copy of this function (e.g. memoised_f above) to each worker by adding the zzz.R script suggested in help(memoise) to the R directory of the package. As result the execution time - while still shorter than the sequential time - increased compared to parallel unmemoised version above.
Using the code in the zzz.R script outside the package works well but the gain in execution time is not substantially different from before (~ 1/4 of sequential time). However, the function does not seem to have been memoised this way.
My question is: should zzz.R script be included in the package or should the suggested code be used independently, outside the package? The documentation is not entirely clear in this respect.
Thank you!
The text was updated successfully, but these errors were encountered:
Hi @drag05, I'm not a memoise developer but a regular user and I tried to use memoise in parallel in a package of mine with a similar issue.
From my understanding, it's not possible to use memoise in parallel and it is discussed in issue #29.
Unless you're able to make sure that threads are not accessing the cache at the same time, and you should hack yourself a solution to do so, it's not possible.
@Rekyt Thank you for your reply! I have dropped the zzz.R solution suggested by memoise documentation. The solution I have adopted is optional for the User and looks similar to the one suggested by @chochkov from issue #29.
This solution speeds up execution but not by much (i.e. it takes out seconds from tens of minutes of execution time).
Issue #29 discusses the use of flock package however
my understanding is that flock uses disk caching instead of memory caching which may contribute negatively to execution speed.
the memoise function itself has caching options so it can be set to write to disk without needing flock for that.
Thank you!
I am using
R-4.3.2
withdoParallel v1.0.17
andforeach v1.5.2
I have built a proprietary R package which includes a function which parallelizes some tasks and sends them along with chunks of data to a number of workers. The package works as expected and the execution time was reduced to 1/4 of sequential execution time.:
However, I have tried to send the memoised (cached to memory) copy of this function (e.g. memoised_f above) to each worker by adding the
zzz.R
script suggested inhelp(memoise)
to the R directory of the package. As result the execution time - while still shorter than the sequential time - increased compared to parallel unmemoised version above.Using the code in the
zzz.R
script outside the package works well but the gain in execution time is not substantially different from before (~ 1/4 of sequential time). However, the function does not seem to have been memoised this way.My question is: should
zzz.R
script be included in the package or should the suggested code be used independently, outside the package? The documentation is not entirely clear in this respect.Thank you!
The text was updated successfully, but these errors were encountered: