storage controller: make scheduling more stable wrt optimization #9275

jcsp · 2024-10-04T10:03:00Z

Problem

During scheduling, if we are scheduling an even number of shards across an odd number of nodes (e.g. 8 on 5), we can end up choosing to put two secondaries on the same node, which makes that node's affinity score high, and causes two attached locations to go onto some other node. Later, the optimizer migrates one of those attachments away, violating our expectation that the initial scheduling of a tenant should agree with the optimizer, to avoid generating spurious migrations.

Closes: #8969

Summary of changes

Adjust scoring for secondary locations to prefer scheduling onto nodes that have fewer other secondaries: i.e. given two shards and two nodes, it is preferable to schedule a primary and secondary on each one, rather than two primaries on one and two secondaries on another. This was already the behaviour of attached locations, but not of secondaries.

This change is incomplete -- the change to secondary scheduling has a knock-on effect on what happens when adding nodes, which can be fixed by changing how optimize_secondary works, but that change is not safe wrt AZs. I think we might need a more general re-think on how optimize-secondary works.

Checklist before requesting a review

I have performed a self-review of my code.
If it is a core feature, I have added thorough tests.
Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

Do not forget to reformat commit message to not include the above checklist

Closes: #8969

github-actions · 2024-10-04T11:05:00Z

5058 tests run: 4872 passed, 0 failed, 186 skipped (full report)

Flaky tests (4)

Postgres 16

test_cli_start_stop: release-arm64
test_subscriber_restart: release-x86-64

Postgres 15

test_subscriber_restart: release-x86-64

Postgres 14

test_replica_start_scan_clog_crashed_xids: release-arm64

Code coverage* (full report)

functions: 31.4% (7494 of 23901 functions)
lines: 49.6% (60307 of 121541 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
62aa8f2 at 2024-10-04T11:04:59.242Z :recycle:}

jcsp · 2024-11-28T09:57:03Z

better approach in #9916

jcsp added 3 commits October 3, 2024 11:14

storage controller: fix a disagreement between schedule + optimize

b66bf89

Closes: #8969

DNM: test-log

efc3f1c

hacky way to make tests pass

62aa8f2

jcsp added a/tech_debt Area: related to tech debt c/storage/controller Component: Storage Controller labels Oct 4, 2024

This was referenced Oct 4, 2024

storage controller: tenant creation sometimes disagrees with background optimization #8969

Open

pageserver: stabilize & refine controller scale test #8971

Merged

jcsp closed this Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage controller: make scheduling more stable wrt optimization #9275

storage controller: make scheduling more stable wrt optimization #9275

jcsp commented Oct 4, 2024

github-actions bot commented Oct 4, 2024

Postgres 16

Postgres 15

Postgres 14

jcsp commented Nov 28, 2024

storage controller: make scheduling more stable wrt optimization #9275

storage controller: make scheduling more stable wrt optimization #9275

Conversation

jcsp commented Oct 4, 2024

Problem

Summary of changes

Checklist before requesting a review

Checklist before merging

github-actions bot commented Oct 4, 2024

5058 tests run: 4872 passed, 0 failed, 186 skipped (full report)

Postgres 16

Postgres 15

Postgres 14

Code coverage* (full report)

jcsp commented Nov 28, 2024