Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storcon: improve initial shard scheduling #9081

Merged
merged 4 commits into from
Sep 24, 2024

Conversation

VladLazar
Copy link
Contributor

Problem

Scheduling on tenant creation uses different heuristics compared to the scheduling done during
background optimizations. This results in scenarios where shards are created and then immediately
migrated by the optimizer.

Summary of changes

  1. Make scheduler aware of the type of the shard it is scheduling (attached vs secondary).
    We wish to have different heuristics.
  2. For attached shards, include the attached shard count from the context in the node score
    calculation. This brings initial shard scheduling in line with what the optimization passes do.
  3. Add a test for (2).

This looks like a bigger change than required, but the refactoring serves as the basis for az-aware
shard scheduling where we also need to make the distinction between attached and secondary shards.

Closes #8969

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

Different scheduling heuristics shall be applied depending
on whether the shard is attached or secondary.
We wish to apply different scheduling heuristics based on whether
the shard is an attached shard or a secondary shard.

This commit achieves that by having separate score types for the
two cases. For attached shards, it also includes the count of attached
shards for the current tenant from the scheduling context. This brings
initial shard scheduling in line with the scheduling done during
optimizations.
Copy link

github-actions bot commented Sep 20, 2024

5133 tests run: 4969 passed, 0 failed, 164 skipped (full report)


Flaky tests (5)

Postgres 17

Postgres 16

Postgres 14

  • test_ondemand_wal_download_in_replication_slot_funcs: release-x86-64

Code coverage* (full report)

  • functions: 32.1% (7456 of 23228 functions)
  • lines: 49.9% (60041 of 120270 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
7825755 at 2024-09-24T09:14:55.289Z :recycle:

@VladLazar VladLazar marked this pull request as ready for review September 23, 2024 09:07
@VladLazar VladLazar requested a review from a team as a code owner September 23, 2024 09:07
@VladLazar VladLazar requested review from yliang412 and jcsp September 23, 2024 09:07
Copy link
Collaborator

@jcsp jcsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how this is expressed with types. Use of member order for the contributions to score is much more readable than my original .0, .1, .2 line of code.

@VladLazar VladLazar enabled auto-merge (squash) September 23, 2024 17:04
@VladLazar VladLazar merged commit 9490360 into main Sep 24, 2024
83 checks passed
@VladLazar VladLazar deleted the vlad/storcon-improve-initial-scheduling branch September 24, 2024 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage controller: tenant creation sometimes disagrees with background optimization
2 participants