Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage controller: tenant creation sometimes disagrees with background optimization #8969

Open
Tracked by #8264
jcsp opened this issue Sep 9, 2024 · 2 comments · Fixed by #9081 · May be fixed by #9916
Open
Tracked by #8264

storage controller: tenant creation sometimes disagrees with background optimization #8969

jcsp opened this issue Sep 9, 2024 · 2 comments · Fixed by #9081 · May be fixed by #9916
Assignees
Labels
a/tech_debt Area: related to tech debt c/storage/controller Component: Storage Controller

Comments

@jcsp
Copy link
Collaborator

jcsp commented Sep 9, 2024

Tenant creation uses schedule_shard, which chooses affinity scores based on the absolute number of shards that a tenant has on a node, treating attached and secondary locations the same.

However, the optimize_all logic prefers to move attached locations away from nodes that have lots of existing attachments for the same tenant.

This can lead to a situation where we create a tenant, and then immediately start migrating one of its shards, because several attached locations for the same tenant were already on that node.

We can fix this by modifying schedule_shard to know whether it is scheduling an attached or secondary location, and if scheduling an attached location then include ScheduleContext::attached_nodes in the affinity score calculation.

This will probably get solved implicitly when implementing AZ-aware scheduling, as that will also need to distinguish attached vs. secondary locations.

@jcsp jcsp added a/tech_debt Area: related to tech debt c/storage/controller Component: Storage Controller labels Sep 9, 2024
VladLazar added a commit that referenced this issue Sep 24, 2024
## Problem

Scheduling on tenant creation uses different heuristics compared to the
scheduling done during
background optimizations. This results in scenarios where shards are
created and then immediately
migrated by the optimizer. 

## Summary of changes

1. Make scheduler aware of the type of the shard it is scheduling
(attached vs secondary).
We wish to have different heuristics.
2. For attached shards, include the attached shard count from the
context in the node score
calculation. This brings initial shard scheduling in line with what the
optimization passes do.
3. Add a test for (2).

This looks like a bigger change than required, but the refactoring
serves as the basis for az-aware
shard scheduling where we also need to make the distinction between
attached and secondary shards.

Closes #8969
@jcsp
Copy link
Collaborator Author

jcsp commented Oct 3, 2024

I think our fix isn't quite complete, I'm analyzing a log where this happened again...

@jcsp
Copy link
Collaborator Author

jcsp commented Oct 4, 2024

There's a unit test that reproduces the unstable case in #9275

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/tech_debt Area: related to tech debt c/storage/controller Component: Storage Controller
Projects
None yet
2 participants