-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple joins in one query #984
Conversation
* Transfer attributes while inlining * Change semi-join vars to tibble * Document
Conflicts: R/verb-copy-to.R R/verb-joins.R R/verb-select.R tests/testthat/test-verb-joins.R
@mgirlich I don't have time for those issues at the moment (and I'd need to think more about #918) so think we're good to go for a release. Are you thinking 2.2.2 or 2.3.0? (The main difference is whether or not you think we should do a blog post). Do you want to handle most of the release (i.e. work through the bullets in |
The release contains
While I really like these features I don't necessarily see the need for a blog post. Also there are a couple of other improvements for nicer SQL still missing (e.g. this PR) and it feels a bit weird to blog about half finished work. I created a release issue #994. |
Can you please give me a suggested reading guide for this PR? |
lf <- lazy_frame(x = 1, a = 1, .name = "df1")
lf2 <- lazy_frame(x = 1, b = 2, .name = "df2")
lf3 <- lazy_frame(x = 1, b = 2, .name = "df3")
# exit in the `new_query` branch
debugonce(add_join)
out <- left_join(lf, lf2, by = "x")
# extend the multi join query
debugonce(add_join)
out %>% inner_join(lf3, by = "x")
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I followed every detail, but overall the code makes sense to me and I don't get the sense it's more complicated than it needs to be. Thanks for working on this!!
y_as <- join_alias$y | ||
|
||
as_current <- x_lq$table_names$as | ||
if (!is_null(x_as)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit fiddly to me and I wonder if we're putting too much emphasis on x_as
/y_as
which should be relatively rarely used features. I think it would be fine if the last specified alias wins (assuming that's easier to implement)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would still require a check if sql_on
is used, e.g.
left_join(x, y, x_as = "x", y_as = "y", sql_on = "x.a = y.a") %>%
left_join(
z,
x_as = "dummy",
y_as = "z",
sql_on = "dummy.a = z.a"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That feels pretty fiddly to me too; I'm not sure we need to support so many edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is quite fiddly. But I'm not sure there is a nice alternative. We need some kind of check to make sure that
left_join(lf, lf, sql_on = "...") %>%
left_join(lf, sql_on = "...")
works. Of course, we could simply start a new query but in the end the checks won't be much simpler. Also, at least for now it feels like more work to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, lets leave it as is then.
Co-authored-by: Hadley Wickham <[email protected]>
@hadley Is there anything else you want to have changed or can we merge this? 😄 |
Looks good to me! |
Woo hooo! |
Closes #865.
One has to be quite careful when a series of
left/inner_join()
can be done in a single query but it produces much nicer queries in the end.Remarks
vec_as_names()
to produce unique names. We might want something else? I just remembered that we considered usingabbreviate()
. I'll take a look at that in another PR.lazy_join_query
structure is basically not needed in this our codebase anymore. It would make sense to remove it but as it is exported I wanted to check whether it is in use somewhere or not.Names for Subqueries
Produces
Self-Join
User Provided Alias
The alias provided by the user always wins. So this produces
Note that an alias for
x
can also be provided later on. So, this produces the same query:Conflicting Alias
In case of a conflicting alias a subquery is used