-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust optimizer improvements #1199
Rust optimizer improvements #1199
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1199 +/- ##
==========================================
+ Coverage 85.52% 85.54% +0.01%
==========================================
Files 77 77
Lines 4256 4261 +5
Branches 794 795 +1
==========================================
+ Hits 3640 3645 +5
Misses 447 447
Partials 169 169 ☔ View full report in Codecov by Sentry. |
dask_planner/src/sql/optimizer.rs
Outdated
fact_dimension_ratio: Option<f64>, | ||
max_fact_tables: Option<usize>, | ||
preserve_user_order: Option<bool>, | ||
filter_selectivity: Option<f64>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kind of annoying/bulky to have to pass in these parameters every time we do a DaskSqlOptimizer::new()
, but since there's really only 2 places in our code where we do this, I figured it's not too bad.
dask_sql/sql-schema.yaml
Outdated
@@ -69,6 +69,38 @@ properties: | |||
description: | | |||
Whether to apply the dynamic partition pruning optimizer rule. | |||
|
|||
verbose_optimizer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like the name verbose_optimizer
, but I couldn't really think of a better name...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm no naming expert. I like the feature though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally would rather avoid this configuration for now, but if we do have it maybe it would make sense as something like sql.explain.skipped_optimizers
? Then it could be a list where users pass any arbitrary rules, but it defaults to whatever rules we always want disabled for explains.
dask_sql/sql-schema.yaml
Outdated
@@ -69,6 +69,38 @@ properties: | |||
description: | | |||
Whether to apply the dynamic partition pruning optimizer rule. | |||
|
|||
verbose_optimizer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm no naming expert. I like the feature though.
dask_sql/context.py
Outdated
@@ -542,11 +550,16 @@ def explain( | |||
:obj:`str`: a description of the created relational algebra. | |||
|
|||
""" | |||
dynamic_partition_pruning = dask_config.get("sql.dynamic_partition_pruning") | |||
if not dask_config.get("sql.verbose_optimizer"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What am I missing here? Seems like getting the config value for sql.verbose_optimizer
but then setting values related to sql.dynamic_partition_pruning
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So when I was working on the DPP PR, I found it pretty annoying that the c.explain()
output would contain such long INLIST
expressions (hundreds and thousands of values), which aren't very useful for a user who maybe wouldn't even understand where all the values are coming from. With verbose_optimizer
set to False, though, it just affects the c.explain()
function, temporarily turning off DPP so that we don't get the long INLIST
expressions and instead just get what the LogicalPlan
would be if DPP was turned off. Then at the end we switch DPP back on (if it was originally set on), so that the query itself can still run with DPP.
So if DPP is on and verbose_optimizer
is False, DPP will still run with c.sql()
, but c.explain()
won't contain the DPP values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually I like what this is aiming to accomplish, but don't really like the idea of implicitly skipping optimizations without warning the user. Feel like maybe it might make more sense to document the introduction of the verbose logical plans in the schema for sql.dynamic_partition_pruning
, with a suggestion to disable this config for c.explain()
if this behavior isn't desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now the default is to run DPP but to not have it in c.explain()
. This is because having it in c.explain()
would introduce filters that are not specified by the user (i.e., filters that are added by DPP) which potentially makes the LogicalPlan
more difficult for the user to interpret. So we're not actually skipping DPP, we're just not printing how it affects the LogicalPlan
for the user to see.
I like the idea of renaming it to sql.explain.skipped_optimizers
or sql.dynamic_partition_pruning.verbose
to make it more clear, although I do prefer having it default to not having DPP printed in c.explain()
(but still run DPP by default).
Hi @charlesbluca I wanted to ask about this getting merged in? |
Sorry this has sat on the backburner for a while - might make sense to merge in the ongoing work in #1249, as I imagine we'll want to prioritize that for release and it includes a bump from ADP 28 -> 32 that could affect things here |
@charlesbluca same rustc version errors here? |
@charlesbluca this should be OK now, thanks! |
Hi @charlesbluca LMK what you think of this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sarahyurick!
From #1121:
From #1069: