Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add row count test for estimated row counts for SAI plan on single restrictions #1449

Merged
merged 8 commits into from
Dec 19, 2024

Conversation

k-rus
Copy link

@k-rus k-rus commented Dec 3, 2024

Closes https://github.com/riptano/cndb/issues/12139

What is the issue

Changing planning can affect estimated row count of a plan. This test was introduced by #1439 to see how introducing anti-join plan node estimates row count.

What does this PR fix and why was it fixed

This PR adds a test of row count of a plan in the presence of restrictions. Currently it tests queries with inequality, equality and half-ranges on different SAI column types and with or without histograms.

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits

@k-rus k-rus self-assigned this Dec 3, 2024
@k-rus k-rus requested a review from pkolaczk December 3, 2024 18:26
Comment on lines 41 to 42
public class EstimatedRowCountTest extends SAITester
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test tests only the NEQ operator. It doesn't look like a general test for all estimated row counts.
Consider renaming it to NEQRowCountTest or some other name wich mentions this is a NEQ test only.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test class includes only one test function now, which tests only inequality. I prepared the test and class in a way, so more tests can be added inside the class instead of adding new classes. I think it's aligned with existing tests. Let me know if I miss anything.
I will add another simple test, so it's less confusing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more tests and improved the class name. Let me know if it needs to be improved.

SAIUtil.setLatestVersion(version);

ColumnFamilyStore cfs = prepareTable(type);
double totalRows = 97.0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this number come from?
You are loading 100 rows into the table, not 97.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the table total count is 97 when I debug the code. I will check if I can retrieve the total count.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This estimated total is used in the plans factory. I changed to obtain it from the query controller.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pkolaczk this number is calculated by dividing the size of a memtable by the row size. Actually it's probabilistic.

@k-rus k-rus requested a review from pkolaczk December 5, 2024 19:19
@k-rus k-rus force-pushed the rf-add-plan-row-count-test branch from 63c798f to 384df2c Compare December 10, 2024 11:08
@k-rus k-rus marked this pull request as draft December 12, 2024 14:28
@k-rus
Copy link
Author

k-rus commented Dec 12, 2024

The test is flaky, see this build failure.
By enabling compression I was able to get different failures locally.
It seems that row count estimation is probabilistic. For memtable (and current test implementation uses only memtable) the total row count is calculated by dividing the size of the memtable by the row size.

@k-rus
Copy link
Author

k-rus commented Dec 13, 2024

One local failure example:

Testcase: testHalfRangeMiddle(org.apache.cassandra.index.sai.plan.SingleRestrictionEstimatedRowCountTest):	FAILED
[junit-timeout] expected:<48.0> but was:<63.0>
[junit-timeout] junit.framework.AssertionFailedError: expected:<48.0> but was:<63.0>
[junit-timeout] 	at org.apache.cassandra.index.sai.plan.SingleRestrictionEstimatedRowCountTest$RowCountTest.doTest(SingleRestrictionEstimatedRowCountTest.java:121)
[junit-timeout] 	at org.apache.cassandra.index.sai.plan.SingleRestrictionEstimatedRowCountTest.testHalfRangeMiddle(SingleRestrictionEstimatedRowCountTest.java:145)
[junit-timeout] 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit-timeout] 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[junit-timeout] 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[junit-timeout] 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)

Another:

[junit-timeout] Testcase: testEquality(org.apache.cassandra.index.sai.plan.SingleRestrictionEstimatedRowCountTest):	FAILED
[junit-timeout] expected:<15.0> but was:<30.0>
[junit-timeout] junit.framework.AssertionFailedError: expected:<15.0> but was:<30.0>
[junit-timeout] 	at org.apache.cassandra.index.sai.plan.SingleRestrictionEstimatedRowCountTest$RowCountTest.doTest(SingleRestrictionEstimatedRowCountTest.java:121)
[junit-timeout] 	at org.apache.cassandra.index.sai.plan.SingleRestrictionEstimatedRowCountTest.testEquality(SingleRestrictionEstimatedRowCountTest.java:165)
[junit-timeout] 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit-timeout] 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[junit-timeout] 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[junit-timeout] 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)

@k-rus k-rus force-pushed the rf-add-plan-row-count-test branch from 384df2c to 0cc54e3 Compare December 15, 2024 11:23
@k-rus k-rus marked this pull request as ready for review December 17, 2024 10:07
@k-rus
Copy link
Author

k-rus commented Dec 17, 2024

@pkolaczk I fixed the flakiness of the test by using a table builder and memtableFlushPeriod(0). Do you want to re-review the test?
My plan for extending the test for sstables is to use a separate PR.

@k-rus k-rus changed the title Add row count test for planned inequality [DRAFT] Add row count test for planned inequality Dec 17, 2024
@k-rus k-rus force-pushed the rf-add-plan-row-count-test branch from a3bc703 to cba63d5 Compare December 17, 2024 20:32
@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1449 rejected by Butler


2 new test failure(s) in 8 builds
See build details here


Found 2 new test failures

Test Explanation Branch history Upstream history
...adRepairGuarantees.test_atomic_writes[blocking] regression 🔴🔴🔴🔵🔵🔵🔵 🔵🔵🔵🔵🔵🔵🔵
...,wide=false,scenario=POST_BUILD_QUERY] regression 🔴🔵🔵🔵🔵🔵🔵 🔵🔵🔵🔵🔵🔵🔵

Found 19 known test failures

@k-rus k-rus changed the title [DRAFT] Add row count test for planned inequality Add row count test for estimated row counts for SAI plan on single restrictions Dec 19, 2024
@k-rus k-rus merged commit 315a0e1 into main Dec 19, 2024
460 of 473 checks passed
@k-rus k-rus deleted the rf-add-plan-row-count-test branch December 19, 2024 11:19
djatnieks pushed a commit that referenced this pull request Dec 19, 2024
…strictions (#1449)

Closes riptano/cndb#12139

This PR adds a test of row count of a SAI plan in the presence of
restrictions. Currently it tests queries with inequality, equality and
half-ranges on different SAI column types and with or without
histograms.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants