left_join two dataframes out of memory #353
Unanswered
SMousavi90
asked this question in
Q&A
Replies: 1 comment
-
I suspect your left join has a lot of duplicate keys. For example, if i generate
This would also fail to join on my computer and it's allocating 72.7GB of RAM. You can't improve the situation by having more workers, but you might be able to fix the issue by having fewer workers and working on smaller chunks like this.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have recently started using this library but I have an issue while doing a join between two data.frames
a <- my first dataframe has 2127625 rows
b <- the second one has 73364 rows
after setting up the diskframe, It shows that:
The number of workers available for disk.frame is 8
a and b are converted to diskframes before doing the join:
now at some part of my code I'm doing this join:
c <- a %>% left_join(b)
this line returns:
then I tried doing it in this way:
c <- a %>% left_join(b, merge_by_chunk_id = TRUE)
firstly, it used the whole (62.8 GB) of my ram, then returned an error:
and didn't do the join!
I also tested it with:
setup_disk.frame(workers = 16)
same result!
just to mention that I have done some joins on my other data.frames and they were done successfully but this one (which is only greater than the other data.frames) failed.
could you please help me to understand what the problem is?
Beta Was this translation helpful? Give feedback.
All reactions