Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storcon: AZ awareness when filling nodes #9947

Closed
wants to merge 2 commits into from
Closed

storcon: AZ awareness when filling nodes #9947

wants to merge 2 commits into from

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Nov 29, 2024

(Plan on bundling this into #9916)

Problem

When we fill a node after draining it, we should move back the shards whose home AZ matches the node being filled, rather than arbitrary shards.

Part of #8264

Summary of changes

  • While planning fills, exclude any shards that have an AZ preference that doesn't match the node being filled.

There are edge cases where this might not be ideal, such as if a system is extremely overloaded to the point that many shards are running in non-preferred AZs. However, filling nodes is an optimisation, not something that has to happen, and if a significant number of shards can't be scheduled into their preferred AZ then we have bigger problems.

@jcsp jcsp marked this pull request as ready for review November 29, 2024 18:21
@jcsp jcsp requested a review from a team as a code owner November 29, 2024 18:21
@jcsp jcsp requested review from yliang412 and VladLazar November 29, 2024 18:21
@jcsp jcsp added t/feature Issue type: feature, for new features or requests c/storage/controller Component: Storage Controller labels Nov 29, 2024
Copy link

6952 tests run: 6644 passed, 0 failed, 308 skipped (full report)


Flaky tests (2)

Postgres 15

Postgres 14

Code coverage* (full report)

  • functions: 30.2% (8181 of 27046 functions)
  • lines: 47.7% (64849 of 135995 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
ac611a6 at 2024-11-29T19:30:50.259Z :recycle:

Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, but what does this mean for regions where there's only one PS per AZ? We end up with an unbalanced cluster after restart and optimizations, which are not AZ-affinity aware, kick in and start moving things anyway.

@jcsp
Copy link
Collaborator Author

jcsp commented Dec 2, 2024

Hmm, wasn't the existing code going to fight the optimiser too? (it still moves shards around to fill nodes, in ways that might disagree with the optimiser)

If necessary we can combine this one with the big AZ scheduling PR #9916 (still a WIP) which lines up the optimiser with AZ awareness.

@VladLazar
Copy link
Contributor

Hmm, wasn't the existing code going to fight the optimiser too? (it still moves shards around to fill nodes, in ways that might disagree with the optimiser)

Kind of:

  • Existing code: We migrate based on shard count without any AZ awareness. The optimiser looks at node affinity and attachments in the context of a shard. Fill logic already tries to not load the node with too many shards of the same tenant. They shouldn't disagrees massively.
  • Proposed code: We end up with an empty node at the end of the rolling restart and the optimiser will fill it up with a slow trickle.

If necessary we can combine this one with the big AZ scheduling PR #9916 (still a WIP) which lines up the optimiser with AZ awareness.

Perhaps that would be better. wdyt?

github-merge-queue bot pushed a commit that referenced this pull request Dec 12, 2024
## Problem

When we update our scheduler/optimization code to respect AZs properly
(#9916), the choice of AZ
becomes a much higher-stakes decision. We will pretty much always run a
tenant in its preferred AZ, and that AZ is fixed for the lifetime of the
tenant (unless a human intervenes)

Eventually, when we do auto-balancing based on utilization, I anticipate
that part of that will be to automatically change the AZ of tenants if
our original scheduling decisions have caused imbalance, but as an
interim measure, we can at least avoid making this scheduling decision
based purely on which AZ contains the emptiest node.

This is a precursor to #9947

## Summary of changes

- When creating a tenant, instead of scheduling a shard and then reading
its preferred AZ back, make the AZ decision first.
- Instead of choosing AZ based on which node is emptiest, use the median
utilization of nodes in each AZ to pick the AZ to use. This avoids bad
AZ decisions during periods when some node has very low utilization
(such as after replacing a dead node)

I considered also making the selection a weighted pseudo-random choice
based on utilization, but wanted to avoid destabilising tests with that
for now.
github-merge-queue bot pushed a commit that referenced this pull request Dec 17, 2024
## Problem

It is unreliable for the control plane to infer the AZ for computes from
where the tenant is currently attached, because if a tenant happens to
be in a degraded state or a release is ongoing while a compute starts,
then the tenant's attached AZ can be a different one to where it will
run long-term, and the control plane doesn't check back later to restart
the compute.

This can land in parallel with
#9947

## Summary of changes

- Thread through the preferred AZ into the compute hook code via the
reconciler
- Include the preferred AZ in the body of compute hook notifications
@jcsp
Copy link
Collaborator Author

jcsp commented Dec 19, 2024

This will land as part of #9916

@jcsp jcsp closed this Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/controller Component: Storage Controller t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants