Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust optimizer improvements #1199

Merged
merged 31 commits into from
Jan 31, 2024

Conversation

sarahyurick
Copy link
Collaborator

@sarahyurick sarahyurick commented Jul 7, 2023

From #1121:

  • Allow the fact_dimension_ratio and possibly other parameters to be specified by the user
  • Modify the c.explain() function to cut off large strings of INLIST vectors
  • DPP functionality PyTests
  • DPP config PyTests

From #1069:

  • Consider exposing parameters to the user

@codecov-commenter
Copy link

codecov-commenter commented Jul 10, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (d6bbc14) 85.52% compared to head (9d0a613) 85.54%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1199      +/-   ##
==========================================
+ Coverage   85.52%   85.54%   +0.01%     
==========================================
  Files          77       77              
  Lines        4256     4261       +5     
  Branches      794      795       +1     
==========================================
+ Hits         3640     3645       +5     
  Misses        447      447              
  Partials      169      169              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sarahyurick sarahyurick marked this pull request as ready for review July 10, 2023 22:53
fact_dimension_ratio: Option<f64>,
max_fact_tables: Option<usize>,
preserve_user_order: Option<bool>,
filter_selectivity: Option<f64>,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of annoying/bulky to have to pass in these parameters every time we do a DaskSqlOptimizer::new(), but since there's really only 2 places in our code where we do this, I figured it's not too bad.

@@ -69,6 +69,38 @@ properties:
description: |
Whether to apply the dynamic partition pruning optimizer rule.

verbose_optimizer:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like the name verbose_optimizer, but I couldn't really think of a better name...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm no naming expert. I like the feature though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally would rather avoid this configuration for now, but if we do have it maybe it would make sense as something like sql.explain.skipped_optimizers? Then it could be a list where users pass any arbitrary rules, but it defaults to whatever rules we always want disabled for explains.

@@ -69,6 +69,38 @@ properties:
description: |
Whether to apply the dynamic partition pruning optimizer rule.

verbose_optimizer:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm no naming expert. I like the feature though.

@@ -542,11 +550,16 @@ def explain(
:obj:`str`: a description of the created relational algebra.

"""
dynamic_partition_pruning = dask_config.get("sql.dynamic_partition_pruning")
if not dask_config.get("sql.verbose_optimizer"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What am I missing here? Seems like getting the config value for sql.verbose_optimizer but then setting values related to sql.dynamic_partition_pruning?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when I was working on the DPP PR, I found it pretty annoying that the c.explain() output would contain such long INLIST expressions (hundreds and thousands of values), which aren't very useful for a user who maybe wouldn't even understand where all the values are coming from. With verbose_optimizer set to False, though, it just affects the c.explain() function, temporarily turning off DPP so that we don't get the long INLIST expressions and instead just get what the LogicalPlan would be if DPP was turned off. Then at the end we switch DPP back on (if it was originally set on), so that the query itself can still run with DPP.

So if DPP is on and verbose_optimizer is False, DPP will still run with c.sql(), but c.explain() won't contain the DPP values.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually I like what this is aiming to accomplish, but don't really like the idea of implicitly skipping optimizations without warning the user. Feel like maybe it might make more sense to document the introduction of the verbose logical plans in the schema for sql.dynamic_partition_pruning, with a suggestion to disable this config for c.explain() if this behavior isn't desired.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the default is to run DPP but to not have it in c.explain(). This is because having it in c.explain() would introduce filters that are not specified by the user (i.e., filters that are added by DPP) which potentially makes the LogicalPlan more difficult for the user to interpret. So we're not actually skipping DPP, we're just not printing how it affects the LogicalPlan for the user to see.

I like the idea of renaming it to sql.explain.skipped_optimizers or sql.dynamic_partition_pruning.verbose to make it more clear, although I do prefer having it default to not having DPP printed in c.explain() (but still run DPP by default).

dask_planner/src/sql/optimizer.rs Outdated Show resolved Hide resolved
@sarahyurick
Copy link
Collaborator Author

Hi @charlesbluca I wanted to ask about this getting merged in?

@charlesbluca
Copy link
Collaborator

Sorry this has sat on the backburner for a while - might make sense to merge in the ongoing work in #1249, as I imagine we'll want to prioritize that for release and it includes a bump from ADP 28 -> 32 that could affect things here

@sarahyurick
Copy link
Collaborator Author

@charlesbluca same rustc version errors here?

@sarahyurick
Copy link
Collaborator Author

@charlesbluca this should be OK now, thanks!

@sarahyurick
Copy link
Collaborator Author

Hi @charlesbluca LMK what you think of this.

Copy link
Collaborator

@charlesbluca charlesbluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sarahyurick!

@charlesbluca charlesbluca merged commit db7931d into dask-contrib:main Jan 31, 2024
32 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants