Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

Index Scan + Index Join Limit #1422

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open

Conversation

thepinetree
Copy link
Contributor

@thepinetree thepinetree commented Dec 29, 2020

Index Scan + Index Join Limit

Description

Limit clauses are currently not propagated to the IndexScanPlanNode nor the IndexJoinPlanNode and as a result, the execution engine can't take advantage of pushing down the limit during operation. Instead, this is done in-post, with a LimitPlanNode doing so after an index scan is completed.

This PR adds functionality for the limit value to be pushed down to index scans, and is used in TPC-C. Limits values will be pushed down to their child LogicalGet via transformation rule and converted to values in the PhysicalIndexScan which are then set in the IndexScanPlanNode. The PR also moves the OrderByOrderingType from the optimizer to the catalog as a precursor to further changes to involve the sort direction of columns in creating/scanning an index.

The final implementation effort is to introduce optional properties to push down ORDER BY sort properties into index scans whenever possible. There are a few alternative possible implementations:

  1. Converting properties to include a boolean flag as to whether they are optional or not.
  2. Converting property sets to include a boolean flag for each property in the set to identify whether they are optional or not.
    We choose method 1 in this implementation, though the alternative is not difficult to switch to 2 (see PR Index Scan Limit #1031).

Additionally, a guide to optimizer development is included for future developers.

Further work

A description of further work is included in Issue #1421

@thepinetree thepinetree added in-progress This PR is being actively worked on and not ready to be reviewed or merged. Mark PRs with this. ready-for-ci Indicate that this build should be run through CI. labels Dec 29, 2020
@thepinetree thepinetree self-assigned this Dec 29, 2020
@thepinetree thepinetree added ready-for-review This PR passes all checks and is ready to be reviewed. Mark PRs with this. and removed ready-for-ci Indicate that this build should be run through CI. labels Dec 30, 2020
- [ ] Add information for subquery transformation above
- [ ] Add information about aggregations in `SELECT` here
- [ ] Add information about `SelectDistinct` in `SELECT` here
- If a `LIMIT` exists in the select statement, the value of the limit and associated offset, along with any `ORDER_BY` clauses, are stored in a `LogicalLimit` node that is added as the parent of the existing `output_expr_`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful here to briefly explain why ORDER_BY information is stored within a LIMIT operator

- `GroupExpression`: a class to track a specific operator (logical or physical) in a group which tracks the transaction, group number, whether the stats have been derived, the lowest cost satisfying the required properties of an expression, and the details of the expression.
- `Group`: a class to collect group expressions representing logically equivalent expressions including logical and physical transformations.
- `LogicalExpression`: a group expression which represents a logical expression i.e. the logical operator that specifies the general behavior of a node
- `PhysicalExpression`: a group expression which represents a logical expression i.e. the physical operator that specifies the implementation details of the expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to say "logical expression" here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which line is this about?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"PhysicalExpression: a group expression which represents a logical expression..." I think you meant to put "physical expression" instead of "logical expression".

- `LogicalExpression`: a group expression which represents a logical expression i.e. the logical operator that specifies the general behavior of a node
- `PhysicalExpression`: a group expression which represents a logical expression i.e. the physical operator that specifies the implementation details of the expression
- `Rule`: definition of a transformation from one expression to another (may be logical transformation, physical implementation, or rewrite)
- `Property`: interface to define a physical property, currently of which only sort properties exist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful if we had a couple of sentences explaining what a property is and why we have them.

@jkosh44
Copy link
Contributor

jkosh44 commented Jan 19, 2021

Hey @thepinetree I'm just going to consolidate everything we discussed on slack here so that you have a single place to go to when you return to this PR, let me know if I'm missing anything. I'm not sure which of these belong in this PR, or should be done it later PRs, I guess that's up to you/Will/whoever makes those decisions.

  1. We need someway of pushing down limits to OrderBy nodes that are generated through the PropertyEnforcer
  2. We need to remove the sort information from Limit nodes, and remove the OrderBy generation for limit's in the PlanGenerator.
  3. Since the InputColumnDeriver uses an unordered_set to store columns, the order of the columns returned is essentially "random". This can lead to unnecessary ProjectionPlanNodes being generated in the middle of the query plan to reorder columns.

@thepinetree thepinetree added blocked This issue or pull request is in progress, but dependent on another task being completed first. do-not-merge This PR is either not intended to merge, or did not pass review. and removed in-progress This PR is being actively worked on and not ready to be reviewed or merged. Mark PRs with this. ready-for-review This PR passes all checks and is ready to be reviewed. Mark PRs with this. blocked This issue or pull request is in progress, but dependent on another task being completed first. labels Feb 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
do-not-merge This PR is either not intended to merge, or did not pass review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants