WIP: generic shutdown #529

problame · 2023-09-25T10:29:15Z

stacked atop #525 , don't expect immediate updates

Motivation ========== We're seeing unclosed connections from VM-based computes to pageservers. They are reproducibly due to `suspend_compute` actions in console. An earlier attempt to mitigate this was made in #523 but it didn't show any effect. Background ========== Busybox Init & Shutdown ------------------------ Our inittab is handled by the busybox init implementation. The poweroff command is the busybox shutdown implementation. The poweroff command knows that busybox init is running and hence simply tells the busybox init to shut down. The busybox init shutdown handling works like so: 0. Receive SIGUSR2 from the poweroff command. 1. Stop waiting for child processes to exit, and stop restarting child processes that are marked `respawn` in the inittab. 2. Run the `shutdown` directives in the inittab, in the order in which they are specified. 3. Send SIGTERM to all processes. 4. Sleep 1 second. 5. (minor details omitted) 6. Call into kernel to poweroff. Use Of This Repo In Neon ------------------------ In Neon's autoscaling, we use vm-builder to generate a NeonVM image that launches the `compute_ctl` process of the Neon compute image, which in turn 1. waits for a spec 2. gets basebackup from compute 3. launches Postgres 4. waits for Postgres to exit 5. does a sync safekeepers 6. exits itself. Neon Control Plane's `suspend_compute` relies on ACPI shutdown signalling for graceful shutdown of the NeonVM. If the NeonVM doesn't shut down timely, the pod that contains the qemu process gets SIGKILLed. The Hypothesis ============== Before this PR, the ACPI shutdown handler for NeonVM would `pg_ctl stop` the postgres that we launched in step 3 and immediately call poweroff. Two problems with this: 1. There is activity inside compute_ctl after postgres exits (step 5) 2. In theory, compute_ctl could exit (step 6) and then get restrted by inittab before we call poweroff. The hypothesis is that either of these cases re-open TCP connections to pageserver (page_service port 6400), and when we `poweroff`, these connections remain open. If they're idle (empty sendq on pageserver) then they will remain open on pageserver. Changes ======= Pageserver will soon switch to more aggressive connection timeouts. While we validate that this is safe, try the change in this PR: We change the ACPI shutdown handler to just call busybox's poweroff. It uses a `shutdown` action in the inittab to gracefully shut down postgres. The `vmstart_command_finished.fifo` is used to wait not just for Postgres to exit, but for the entire `vmstart` to exit. Note that, as explained in the background section, `vmstart` will not be `respawn`ed during poweroff. Hence, once we observe the vmstart_command_finished.fifo write from within vmstart to the fifo, we know that compute_ctl is gone.

…r.sh resolves #525 (comment)

…y a command

TLC throws error: Deadlocks in the following state. When vmshutdown executed vmshutdown_pg_ctl_stop, `postgres_running=NULL. Then postgres started. /\ debug_shutdown_request_observed = TRUE /\ machine_running = TRUE /\ pc = [vmshutdown |-> "vmshutdown_wait_for_running_command", init |-> "wait_for_vmshutdown", respawn_vmstart |-> "respawn_vmstarter_wait_postgres", postgres |-> "postgres_await_shutdown_or_crash"] /\ postgres_exited_pids = {1} /\ postgres_next_pids = <<>> /\ postgres_running = 2 /\ postgres_shutdown_request_pending = NULL /\ postgres_spawn_pending = NULL /\ respawn_current_postgres_pid = 2 /\ shutdown_signal_received = TRUE /\ start_allowed = FALSE /\ start_allowed_locked = TRUE /\ vmshutdown_exited = FALSE /\ vmstarter_sh_running = TRUE Please enter the commit message for your changes. Lines starting

While the vmstarter_sh is running, the flock is held. Exploit that fact in vmshutdown: if trylock fails, we know the workload is still running / running again, so, we need to continue trying to shut it down.

Omrigan · 2024-04-11T08:22:46Z

Talked to @problame, this all won't be important after #728

problame and others added 16 commits September 22, 2023 21:51

powerdown script no longer exists

02ed11b

fixups

389d64e

signal vmstarter.sh exit from vmstart instead of from inside vmstarte…

a41dba3

…r.sh resolves #525 (comment)

switch to bug-free & model-checked shutdown impl

e7f4040

fixup: vmshutdown wasn't using flock correctly, always need to specif…

a31d501

…y a command

document the process tree inside the VM

25a0d73

fix log message stderr redirection typo

03ae60d

spec: clean up flocking

0e2f0c5

spec: better variable name for vmstart_running

fbc2699

spec: be more precise in what the "bounded number of crashes" means

9727217

spec: remove unused variable

4dde7cc

fix the signal loss condition & update spec to reflect fix

4e58bd3

While the vmstarter_sh is running, the flock is held. Exploit that fact in vmshutdown: if trylock fails, we know the workload is still running / running again, so, we need to continue trying to shut it down.

forgot to update the sequence diagram

9dbd220

WIP: generic vmshutdown

3a11191

problame changed the base branch from main to problame/correct-graceful-shutdown September 25, 2023 10:29

Base automatically changed from problame/correct-graceful-shutdown to main September 25, 2023 15:55

sharnoff mentioned this pull request Oct 21, 2023

Epic: Unify vm-builders #577

Closed

2 tasks

Omrigan closed this Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: generic shutdown #529

WIP: generic shutdown #529

problame commented Sep 25, 2023 •

edited

Loading

Omrigan commented Apr 11, 2024

WIP: generic shutdown #529

WIP: generic shutdown #529

Conversation

problame commented Sep 25, 2023 • edited Loading

Omrigan commented Apr 11, 2024

problame commented Sep 25, 2023 •

edited

Loading