-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storcon: AZ awareness when filling nodes #9947
Conversation
6952 tests run: 6644 passed, 0 failed, 308 skipped (full report)Flaky tests (2)Postgres 15
Postgres 14
Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
ac611a6 at 2024-11-29T19:30:50.259Z :recycle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good, but what does this mean for regions where there's only one PS per AZ? We end up with an unbalanced cluster after restart and optimizations, which are not AZ-affinity aware, kick in and start moving things anyway.
Hmm, wasn't the existing code going to fight the optimiser too? (it still moves shards around to fill nodes, in ways that might disagree with the optimiser) If necessary we can combine this one with the big AZ scheduling PR #9916 (still a WIP) which lines up the optimiser with AZ awareness. |
Kind of:
Perhaps that would be better. wdyt? |
## Problem When we update our scheduler/optimization code to respect AZs properly (#9916), the choice of AZ becomes a much higher-stakes decision. We will pretty much always run a tenant in its preferred AZ, and that AZ is fixed for the lifetime of the tenant (unless a human intervenes) Eventually, when we do auto-balancing based on utilization, I anticipate that part of that will be to automatically change the AZ of tenants if our original scheduling decisions have caused imbalance, but as an interim measure, we can at least avoid making this scheduling decision based purely on which AZ contains the emptiest node. This is a precursor to #9947 ## Summary of changes - When creating a tenant, instead of scheduling a shard and then reading its preferred AZ back, make the AZ decision first. - Instead of choosing AZ based on which node is emptiest, use the median utilization of nodes in each AZ to pick the AZ to use. This avoids bad AZ decisions during periods when some node has very low utilization (such as after replacing a dead node) I considered also making the selection a weighted pseudo-random choice based on utilization, but wanted to avoid destabilising tests with that for now.
## Problem It is unreliable for the control plane to infer the AZ for computes from where the tenant is currently attached, because if a tenant happens to be in a degraded state or a release is ongoing while a compute starts, then the tenant's attached AZ can be a different one to where it will run long-term, and the control plane doesn't check back later to restart the compute. This can land in parallel with #9947 ## Summary of changes - Thread through the preferred AZ into the compute hook code via the reconciler - Include the preferred AZ in the body of compute hook notifications
This will land as part of #9916 |
(Plan on bundling this into #9916)
Problem
When we fill a node after draining it, we should move back the shards whose home AZ matches the node being filled, rather than arbitrary shards.
Part of #8264
Summary of changes
There are edge cases where this might not be ideal, such as if a system is extremely overloaded to the point that many shards are running in non-preferred AZs. However, filling nodes is an optimisation, not something that has to happen, and if a significant number of shards can't be scheduled into their preferred AZ then we have bigger problems.