-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: generic shutdown #529
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Motivation ========== We're seeing unclosed connections from VM-based computes to pageservers. They are reproducibly due to `suspend_compute` actions in console. An earlier attempt to mitigate this was made in #523 but it didn't show any effect. Background ========== Busybox Init & Shutdown ------------------------ Our inittab is handled by the busybox init implementation. The poweroff command is the busybox shutdown implementation. The poweroff command knows that busybox init is running and hence simply tells the busybox init to shut down. The busybox init shutdown handling works like so: 0. Receive SIGUSR2 from the poweroff command. 1. Stop waiting for child processes to exit, and stop restarting child processes that are marked `respawn` in the inittab. 2. Run the `shutdown` directives in the inittab, in the order in which they are specified. 3. Send SIGTERM to all processes. 4. Sleep 1 second. 5. (minor details omitted) 6. Call into kernel to poweroff. Use Of This Repo In Neon ------------------------ In Neon's autoscaling, we use vm-builder to generate a NeonVM image that launches the `compute_ctl` process of the Neon compute image, which in turn 1. waits for a spec 2. gets basebackup from compute 3. launches Postgres 4. waits for Postgres to exit 5. does a sync safekeepers 6. exits itself. Neon Control Plane's `suspend_compute` relies on ACPI shutdown signalling for graceful shutdown of the NeonVM. If the NeonVM doesn't shut down timely, the pod that contains the qemu process gets SIGKILLed. The Hypothesis ============== Before this PR, the ACPI shutdown handler for NeonVM would `pg_ctl stop` the postgres that we launched in step 3 and immediately call poweroff. Two problems with this: 1. There is activity inside compute_ctl after postgres exits (step 5) 2. In theory, compute_ctl could exit (step 6) and then get restrted by inittab before we call poweroff. The hypothesis is that either of these cases re-open TCP connections to pageserver (page_service port 6400), and when we `poweroff`, these connections remain open. If they're idle (empty sendq on pageserver) then they will remain open on pageserver. Changes ======= Pageserver will soon switch to more aggressive connection timeouts. While we validate that this is safe, try the change in this PR: We change the ACPI shutdown handler to just call busybox's poweroff. It uses a `shutdown` action in the inittab to gracefully shut down postgres. The `vmstart_command_finished.fifo` is used to wait not just for Postgres to exit, but for the entire `vmstart` to exit. Note that, as explained in the background section, `vmstart` will not be `respawn`ed during poweroff. Hence, once we observe the vmstart_command_finished.fifo write from within vmstart to the fifo, we know that compute_ctl is gone.
TLC throws error: Deadlocks in the following state. When vmshutdown executed vmshutdown_pg_ctl_stop, `postgres_running=NULL. Then postgres started. /\ debug_shutdown_request_observed = TRUE /\ machine_running = TRUE /\ pc = [vmshutdown |-> "vmshutdown_wait_for_running_command", init |-> "wait_for_vmshutdown", respawn_vmstart |-> "respawn_vmstarter_wait_postgres", postgres |-> "postgres_await_shutdown_or_crash"] /\ postgres_exited_pids = {1} /\ postgres_next_pids = <<>> /\ postgres_running = 2 /\ postgres_shutdown_request_pending = NULL /\ postgres_spawn_pending = NULL /\ respawn_current_postgres_pid = 2 /\ shutdown_signal_received = TRUE /\ start_allowed = FALSE /\ start_allowed_locked = TRUE /\ vmshutdown_exited = FALSE /\ vmstarter_sh_running = TRUE Please enter the commit message for your changes. Lines starting
While the vmstarter_sh is running, the flock is held. Exploit that fact in vmshutdown: if trylock fails, we know the workload is still running / running again, so, we need to continue trying to shut it down.
problame
changed the base branch from
main
to
problame/correct-graceful-shutdown
September 25, 2023 10:29
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
stacked atop #525 , don't expect immediate updates