Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(query): reduce regex query fanout #1644

Closed
wants to merge 28 commits into from

Conversation

alextheimer
Copy link
Contributor

@alextheimer alextheimer commented Aug 13, 2023

Pull Request checklist

  • The commit(s) message(s) follows the contribution guidelines ?
  • Tests for the changes have been added (for bug fixes / features) ?
  • Docs have been added / updated (for bug fixes / features) ?

Currently, the ShardKeyRegexPlanner asks its inner planner to create an ExecPlan for each one of a LogicalPlan's shard keys. This is inefficient; if 100 shard keys are matched but all data lies on only two partitions, it would be much more efficient to instead ask the inner planner to materialize one plan per partition.

This PR materializes ExecPlans in this more-efficient manner. The ShardKeyRegexPlanner now passes LogicalPlans with original regex filters to lower-level planners. The MultiPartitionPlanner now has infrastructure added to resolve shard-keys and determine the partitions to be queried; it now materializes plans per-partition (instead of per-shard-key). Again, regex is passed to lower-level planners; the SingleClusterPlanner now also has the infrastructure to materialize regex-filtered plans.

Copy link
Member

@vishramachandran vishramachandran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alextheimer Query Stats need to be keyed with "multiple-ns" for the namespace field for NS-Regex queries within partition.

@alextheimer
Copy link
Contributor Author

@vishramachandran it looks like the latest develop has this covered (I've rebased the PR):

QueryStats are also updated in e.g. label-value iterators, but I'll leave those keys alone since they aren't related to PromQL queries.

}

// // TODO: what is this the purpose of this test? The lookback is 300s, and the assignment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why comment out the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping others would know what this test was for-- it looks like we used to expect partition-split plans would always return at least one data-point per partition, regardless of lookback/offset. That doesn't jive with the way we handle partition-splits elsewhere in the planners, so I think this test should be removed.

The reason it passed prior to this PR is because the MPP sent PeriodicSeries and RawSeries through the old materializePeriodicAndRawSeries logic. That always materialized at least one plan per partition.

materializePeriodicAndRawSeries was built to materialize plans with equality-only filters, so that would not work for this PR. Instead, plans are:

  • if split, passed to the regex-handling materializeSplitLeafPlan, which accounts correctly for lookback.
  • if not split, forwarded as-is to any partitions that own the query data.

@alextheimer alextheimer closed this Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants