-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
removing a cluster doesn't free memory #116
Comments
Can confirm, it's really hit or miss. I'm on 0.1.0.9 and Ubuntu 20, and the cluster won't free memory after finishing an operation and then collect. So I have hack that checks the available free memory and if it gets below a threshold tries to shut down and reset the cluster. And that also isn't freeing the memory either. |
To cleanup, you can just delete the cluster and then it'll be cleaned up on the next library(multidplyr)
cl <- new_cluster(4)
ps::ps_children()
#> [[1]]
#> <ps::ps_handle> PID=10623, NAME=R, AT=2021-04-30 13:24:35
#>
#> [[2]]
#> <ps::ps_handle> PID=10627, NAME=R, AT=2021-04-30 13:24:36
#>
#> [[3]]
#> <ps::ps_handle> PID=10631, NAME=R, AT=2021-04-30 13:24:36
#>
#> [[4]]
#> <ps::ps_handle> PID=10635, NAME=R, AT=2021-04-30 13:24:36
rm(cl)
gc()
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 761424 40.7 1311987 70.1 NA 1311987 70.1
#> Vcells 1408671 10.8 8388608 64.0 32768 2355271 18.0
ps::ps_children()
#> list() Created on 2021-04-30 by the reprex package (v2.0.0) |
I can confirm the toy example works correctly. I'm sorry I only have a production example, and I'm conflating two different but possibly related problems (1) nodes not freeing memory while being used and (2) rm(cluster) not reliably destroying a cluster. I have a for loop that subsets a parquet file loaded with arrow based on a variable, sends it to partitions with multidplyr to be summarized, pulls the results back, and saves it. Each iteration of that for loop, memory use increases. Asking nodes to destroy their objects and gc() only frees a small amount. Destroying the entire cluster with rm() doesn't always shut down those nodes and free their memory as in this run below. Today I tried explicitly calling kill first and that appears to guarantee destroying the cluster, which solves problem 2.
Here is a sample of output that shows memory use creeping up each iteration of the for loop, and then the final rm() call not destroying the cluster and freeing its memory.
Non-reproducible production example showing attempts to free memory, and new explicit kill command
|
Ok, sounds like I should kill explicitly when the cluster is |
Try |
Hello! I am having the same issue as @Erinaceida on Windows machine where One thing to note is that even though the cluster shut down, the RStudio R Session memory usage continued to grow. So, I ended with no cluster, but still had the large memory usage. This is just an FYI, hoping that it might lead to some clue as to why the memory keeps growing after cluster shutdown.
|
ReadMe and other multidplyr tutorials created by the authors do not mention how to close clusters after these have been initiated.
I apologise, as I'm probably not using the package correctly, but it's a bit difficult to find a solution from the documentation. My code looks like this:
running cluster_rm(cluster) throws an error
Error in stopCluster(cluster) : could not find function "stopCluster"
while using rm(cluster) doesn't solve the issue of R holding on to RAM.
How can I close my cluster?
The text was updated successfully, but these errors were encountered: