Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory efficient join #424

Open
matthewgson opened this issue Mar 24, 2023 · 0 comments
Open

Memory efficient join #424

matthewgson opened this issue Mar 24, 2023 · 0 comments
Labels
feature a feature request or enhancement

Comments

@matthewgson
Copy link

data.table has a great feature to join two large datasets efficiently by reference.
I'm wondering if this could be available in dtplyr.

Below are very nice details about the difference from StackOverflow (by Jaap)

library(bench)
bm <- mark(AA <- BB[AA, on = .(aa)],
           AA[BB, on = .(aa), cc := cc],
           iterations = 1)
> bm[,c(1,3,5)]
# A tibble: 2 x 3
  expression                         median mem_alloc
  <bch:expr>                       <bch:tm> <bch:byt>
1 AA <- BB[AA, on = .(aa)]            4.98s     4.1GB
2 AA[BB, on = .(aa), `:=`(cc, cc)] 560.88ms   384.6MB
@markfairbanks markfairbanks added the feature a feature request or enhancement label Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants