Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storcon: automatically clear Pause/Stop scheduling policies to enable detaches #10011

Merged
merged 5 commits into from
Dec 7, 2024

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Dec 4, 2024

Problem

We saw a tenant get stuck when it had been put into Pause scheduling mode to pin it to a pageserver, then it was left idle for a while and the control plane tried to detach it.

Close: #9957

Summary of changes

  • When changing policy to Detached or Secondary, set the scheduling policy to Active.
  • Add a test that exercises this
  • When persisting tenant shards, set their generation_pageserver to null if the placement policy is not Attached (this enables consistency checks to work, and avoids leaving state in the DB that could be confusing/misleading in future)

@jcsp jcsp added t/bug Issue Type: Bug c/storage/controller Component: Storage Controller labels Dec 4, 2024
@jcsp jcsp requested a review from VladLazar December 4, 2024 14:34
@jcsp jcsp marked this pull request as ready for review December 4, 2024 14:34
@jcsp jcsp requested a review from a team as a code owner December 4, 2024 14:34
Copy link

github-actions bot commented Dec 4, 2024

7051 tests run: 6736 passed, 0 failed, 315 skipped (full report)


Flaky tests (8)

Postgres 17

Postgres 16

Postgres 15

Postgres 14

Code coverage* (full report)

  • functions: 31.4% (8329 of 26510 functions)
  • lines: 47.8% (65507 of 137138 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
294a941 at 2024-12-06T14:45:06.167Z :recycle:

@jcsp jcsp force-pushed the jcsp/issue-9957-detach-paused branch from 519c449 to d97286e Compare December 4, 2024 18:42
Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code itself look fine.


Is this the behaviour we want though? We lose the shard scheduling policy whenever cplane detaches (e.g. idle tenant detach). This is also confusing.

Would this alternative work?
When detaching or downgrading to secondary due to external location conf, the storage controller proxies it to the pageserver irrespective of the shard policy. Pageserver handles it and we preserved the policy on the storcon.

@jcsp
Copy link
Collaborator Author

jcsp commented Dec 5, 2024

Is this the behaviour we want though? We lose the shard scheduling policy whenever cplane detaches (e.g. idle tenant detach). This is also confusing.

In common cases, yes... this is perhaps a symptom of the scheduling policies being a bit over-general as a concept. What we're using them for in practice is to use Pause to pin a tenant to a pageserver, and once we are detached, it usually makes sense to forget that pin.

When detaching or downgrading to secondary due to external location conf, the storage controller proxies it to the pageserver irrespective of the shard policy. Pageserver handles it and we preserved the policy on the storcon.

This could be fragile: it's okay if the pageserver is available, or if the pageserver is marked offline (because when it becomes available it'll get true'd up), but if the pageserver is unavailable but not yet marked offline, then it would prevent us changing the configuration of a tenant. That's arguably tolerable, but it sort of breaks the model that we can change configs in the foreground and reconcile in the background.

Then again, in the pure reconciliation loop model, it's correct to simply not detach a tenant if someone sets its placement policy to detach it but it scheduling policy is paused. For me this just points to the scheduling policies being defined in a way that doesn't quite fit how we want to use them. Maybe it should be like an pin: Option<NodeId> instead and then it intuitively makes sense that it gets cleared when you detach.

I do generally agree that there's a bit of "code smell" here... but fudging the scheduling policy is probably a less bad smell than changing the config functions to do pageserver work in the foreground instead of going via reconciliation.

@VladLazar
Copy link
Contributor

Is this the behaviour we want though? We lose the shard scheduling policy whenever cplane detaches (e.g. idle tenant detach). This is also confusing.

In common cases, yes... this is perhaps a symptom of the scheduling policies being a bit over-general as a concept. What we're using them for in practice is to use Pause to pin a tenant to a pageserver, and once we are detached, it usually makes sense to forget that pin.

Oke, that's fair. Could you please update the comment for the schedulling policies to mention when they get reset?

but if the pageserver is unavailable but not yet marked offline, then it would prevent us changing the configuration of a tenant

I don't see it. We update the db and in-memory state and will reconcile when the pageserver comes back online.

@jcsp
Copy link
Collaborator Author

jcsp commented Dec 6, 2024

Added comment about usage of ShardSchedulingPolicy in 294a941

but if the pageserver is unavailable but not yet marked offline, then it would prevent us changing the configuration of a tenant

I don't see it. We update the db and in-memory state and will reconcile when the pageserver comes back online.

But the reconcile wouldn't do anything if scheduling mode was still set to Pause (i.e. for that to work we'd have to also have the change in this PR), thinking specifically of the case of a brief unavailability where we don't do a whole-node reconcile.

@VladLazar
Copy link
Contributor

Added comment about usage of ShardSchedulingPolicy in 294a941

but if the pageserver is unavailable but not yet marked offline, then it would prevent us changing the configuration of a tenant

I don't see it. We update the db and in-memory state and will reconcile when the pageserver comes back online.

But the reconcile wouldn't do anything if scheduling mode was still set to Pause (i.e. for that to work we'd have to also have the change in this PR), thinking specifically of the case of a brief unavailability where we don't do a whole-node reconcile.

Right, special casing it was implied it for detach and downgrade to secondary was implied in my proposal.

Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this. I was trying to explore alternatives in the PR thread, but we don't have to block this PR.

@jcsp jcsp added this pull request to the merge queue Dec 7, 2024
Merged via the queue into main with commit ec79087 Dec 7, 2024
82 checks passed
@jcsp jcsp deleted the jcsp/issue-9957-detach-paused branch December 7, 2024 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/controller Component: Storage Controller t/bug Issue Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storcon: requests to detach should automatically set scheduling policy to Active
2 participants