Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage controller: make scheduling more stable wrt optimization #9275

Closed
wants to merge 3 commits into from

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Oct 4, 2024

Problem

During scheduling, if we are scheduling an even number of shards across an odd number of nodes (e.g. 8 on 5), we can end up choosing to put two secondaries on the same node, which makes that node's affinity score high, and causes two attached locations to go onto some other node. Later, the optimizer migrates one of those attachments away, violating our expectation that the initial scheduling of a tenant should agree with the optimizer, to avoid generating spurious migrations.

Closes: #8969

Summary of changes

  • Adjust scoring for secondary locations to prefer scheduling onto nodes that have fewer other secondaries: i.e. given two shards and two nodes, it is preferable to schedule a primary and secondary on each one, rather than two primaries on one and two secondaries on another. This was already the behaviour of attached locations, but not of secondaries.

This change is incomplete -- the change to secondary scheduling has a knock-on effect on what happens when adding nodes, which can be fixed by changing how optimize_secondary works, but that change is not safe wrt AZs. I think we might need a more general re-think on how optimize-secondary works.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

Copy link

github-actions bot commented Oct 4, 2024

5058 tests run: 4872 passed, 0 failed, 186 skipped (full report)


Flaky tests (4)

Postgres 16

Postgres 15

Postgres 14

Code coverage* (full report)

  • functions: 31.4% (7494 of 23901 functions)
  • lines: 49.6% (60307 of 121541 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
62aa8f2 at 2024-10-04T11:04:59.242Z :recycle:

@jcsp
Copy link
Collaborator Author

jcsp commented Nov 28, 2024

better approach in #9916

@jcsp jcsp closed this Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/tech_debt Area: related to tech debt c/storage/controller Component: Storage Controller
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage controller: tenant creation sometimes disagrees with background optimization
1 participant