-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Coral-Incremental] Cost calculation for RelNode #516
base: master
Are you sure you want to change the base?
Conversation
public CostInfo getExecutionCost(RelNode rel) { | ||
if (rel instanceof TableScan) { | ||
return getExecutionCostTableScan((TableScan) rel); | ||
} else if (rel instanceof LogicalJoin) { | ||
return getExecutionCostJoin((LogicalJoin) rel); | ||
} else if (rel instanceof LogicalUnion) { | ||
return getExecutionCostUnion((LogicalUnion) rel); | ||
} else if (rel instanceof LogicalProject) { | ||
return getExecutionCostProject((LogicalProject) rel); | ||
} | ||
return new CostInfo(0.0, 0.0); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the core cost calculation framework.
It
- Walk through the whole RelNode tree and get the cost
- Keep the Row Count Information which will be used for IO cost
coral-incremental/src/main/java/com/linkedin/coral/incremental/RelNodeCostEstimator.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
private Double estimateJoinSelectivity(List<JoinKey> joinKeys) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't write the logic now,
it will be from the join key, and from the statistic map info like distinct/row_count to get Selectivity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
coral-incremental/src/main/java/com/linkedin/coral/incremental/RelNodeCostEstimator.java
Show resolved
Hide resolved
coral-incremental/src/main/java/com/linkedin/coral/incremental/RelNodeCostEstimator.java
Outdated
Show resolved
Hide resolved
coral-incremental/src/main/java/com/linkedin/coral/incremental/RelNodeCostEstimator.java
Outdated
Show resolved
Hide resolved
coral-incremental/src/main/java/com/linkedin/coral/incremental/RelNodeCostEstimator.java
Outdated
Show resolved
Hide resolved
coral-incremental/src/main/java/com/linkedin/coral/incremental/RelNodeCostEstimator.java
Outdated
Show resolved
Hide resolved
coral-incremental/src/main/java/com/linkedin/coral/incremental/RelNodeCostEstimator.java
Outdated
Show resolved
Hide resolved
coral-incremental/src/main/java/com/linkedin/coral/incremental/RelNodeCostEstimator.java
Show resolved
Hide resolved
coral-incremental/src/main/java/com/linkedin/coral/incremental/RelNodeCostEstimator.java
Outdated
Show resolved
Hide resolved
private CostInfo getExecutionCostJoin(LogicalJoin join) { | ||
RelNode left = join.getLeft(); | ||
RelNode right = join.getRight(); | ||
if (!(left instanceof TableScan) || !(right instanceof TableScan)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are expanding the cost estimator to handle joins for more than 2 tables?
Co-authored-by: Kevin Ge <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the previous feedback :)
coral-incremental/src/main/java/com/linkedin/coral/incremental/RelNodeCostEstimator.java
Show resolved
Hide resolved
coral-incremental/src/test/java/com/linkedin/coral/incremental/RelNodeCostEstimatorTest.java
Show resolved
Hide resolved
coral-incremental/src/main/java/com/linkedin/coral/incremental/TableStatistic.java
Outdated
Show resolved
Hide resolved
Overall feedback is that I believe the PR sounds too specific and there is no general framework for cost estimation that is applicable to various operators by specializing some of its aspects. I think we need to iterate over this PR to make it more abstract with various implementations. This may be better discussed over a doc. |
// The number of rows in the table | ||
Double rowCount; | ||
// The number of distinct values in each column | ||
Map<String, Double> distinctCountByRow; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work for nested columns? If yes, what is the spec of the key?
Does it work for complex types?
If the answer is no, we should be explicit about this at least in the Java doc.
// The shuffle cost of a join is the maximum shuffle cost of its children because | ||
// in modern distributed systems, the shuffle cost is dominated by the largest shuffle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Terminology is not clear here. What is the shuffle cost of children? We might need to discuss this over a doc.
What changes are proposed in this pull request, and why are they necessary?
The goal of this PR is to introduce cost calculation method for Coral Incremental module, that's given an Input RelNode, return its cost for {executing it and writing the result into output}. So, for two RelNodes, one is incremental format, and the other is batch format, we could choose one with less cost, which could have better performance when executing.
The main changes are:
How was this patch tested?
Unit test