storage controller: tenant creation sometimes disagrees with background optimization #8969

jcsp · 2024-09-09T09:59:01Z

Tenant creation uses schedule_shard, which chooses affinity scores based on the absolute number of shards that a tenant has on a node, treating attached and secondary locations the same.

However, the optimize_all logic prefers to move attached locations away from nodes that have lots of existing attachments for the same tenant.

This can lead to a situation where we create a tenant, and then immediately start migrating one of its shards, because several attached locations for the same tenant were already on that node.

We can fix this by modifying schedule_shard to know whether it is scheduling an attached or secondary location, and if scheduling an attached location then include ScheduleContext::attached_nodes in the affinity score calculation.

This will probably get solved implicitly when implementing AZ-aware scheduling, as that will also need to distinguish attached vs. secondary locations.

The text was updated successfully, but these errors were encountered:

## Problem Scheduling on tenant creation uses different heuristics compared to the scheduling done during background optimizations. This results in scenarios where shards are created and then immediately migrated by the optimizer. ## Summary of changes 1. Make scheduler aware of the type of the shard it is scheduling (attached vs secondary). We wish to have different heuristics. 2. For attached shards, include the attached shard count from the context in the node score calculation. This brings initial shard scheduling in line with what the optimization passes do. 3. Add a test for (2). This looks like a bigger change than required, but the refactoring serves as the basis for az-aware shard scheduling where we also need to make the distinction between attached and secondary shards. Closes #8969

jcsp · 2024-10-03T10:46:23Z

I think our fix isn't quite complete, I'm analyzing a log where this happened again...

Closes: #8969

jcsp · 2024-10-04T10:03:21Z

There's a unit test that reproduces the unstable case in #9275

jcsp added a/tech_debt Area: related to tech debt c/storage/controller Component: Storage Controller labels Sep 9, 2024

jcsp assigned VladLazar Sep 9, 2024

VladLazar mentioned this issue Sep 20, 2024

storcon: improve initial shard scheduling #9081

Merged

5 tasks

VladLazar closed this as completed in #9081 Sep 24, 2024

jcsp reopened this Oct 3, 2024

jcsp assigned jcsp and unassigned VladLazar Oct 3, 2024

jcsp added a commit that referenced this issue Oct 4, 2024

storage controller: fix a disagreement between schedule + optimize

b66bf89

Closes: #8969

jcsp mentioned this issue Oct 4, 2024

storage controller: make scheduling more stable wrt optimization #9275

Closed

5 tasks

jcsp added a commit that referenced this issue Oct 10, 2024

work around #8969

491c9ee

This was referenced Nov 27, 2024

Epic: storage controller: AZ-aware scheduling #8264

Open

storcon: rework scheduler optimisation, prioritize AZ #9916

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage controller: tenant creation sometimes disagrees with background optimization #8969

storage controller: tenant creation sometimes disagrees with background optimization #8969

jcsp commented Sep 9, 2024

jcsp commented Oct 3, 2024

jcsp commented Oct 4, 2024

storage controller: tenant creation sometimes disagrees with background optimization #8969

storage controller: tenant creation sometimes disagrees with background optimization #8969

Comments

jcsp commented Sep 9, 2024

jcsp commented Oct 3, 2024

jcsp commented Oct 4, 2024