Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: generic shutdown #529

Closed
wants to merge 16 commits into from
Closed

WIP: generic shutdown #529

wants to merge 16 commits into from

Conversation

problame
Copy link
Contributor

@problame problame commented Sep 25, 2023

stacked atop #525 , don't expect immediate updates

problame and others added 16 commits September 22, 2023 21:51
Motivation
==========

We're seeing unclosed connections from VM-based computes to pageservers.
They are reproducibly due to `suspend_compute` actions in console.

An earlier attempt to mitigate this was made in
#523
but it didn't show any effect.

Background
==========

Busybox Init & Shutdown
------------------------

Our inittab is handled by the busybox init implementation.
The poweroff command is the busybox shutdown implementation.
The poweroff command knows that busybox init is running and
hence simply tells the busybox init to shut down.

The busybox init shutdown handling works like so:
0. Receive SIGUSR2 from the poweroff command.
1. Stop waiting for child processes to exit, and stop restarting child
   processes that are marked `respawn` in the inittab.
2. Run the `shutdown` directives in the inittab, in the order
   in which they are specified.
3. Send SIGTERM to all processes.
4. Sleep 1 second.
5. (minor details omitted)
6. Call into kernel to poweroff.

Use Of This Repo In Neon
------------------------

In Neon's autoscaling, we use vm-builder to generate a NeonVM image that
launches the `compute_ctl` process of the Neon compute image, which in
turn
1. waits for a spec
2. gets basebackup from compute
3. launches Postgres
4. waits for Postgres to exit
5. does a sync safekeepers
6. exits itself.

Neon Control Plane's `suspend_compute` relies on ACPI shutdown
signalling for graceful shutdown of the NeonVM.
If the NeonVM doesn't shut down timely, the pod that contains
the qemu process gets SIGKILLed.

The Hypothesis
==============

Before this PR, the ACPI shutdown handler for NeonVM would `pg_ctl stop`
the postgres that we launched in step 3 and immediately call poweroff.

Two problems with this:
1. There is activity inside compute_ctl after postgres exits (step 5)
2. In theory, compute_ctl could exit (step 6) and then get restrted by
   inittab before we call poweroff.

The hypothesis is that either of these cases re-open TCP connections
to pageserver (page_service port 6400), and when we `poweroff`, these
connections remain open. If they're idle (empty sendq on pageserver)
then they will remain open on pageserver.

Changes
=======

Pageserver will soon switch to more aggressive connection timeouts.
While we validate that this is safe, try the change in this PR:

We change the ACPI shutdown handler to just call busybox's poweroff.
It uses a `shutdown` action in the inittab to gracefully shut down
postgres.

The `vmstart_command_finished.fifo` is used to wait not just for
Postgres to exit, but for the entire `vmstart` to exit.

Note that, as explained in the background section, `vmstart` will
not be `respawn`ed during poweroff.
Hence, once we observe the vmstart_command_finished.fifo write from
within vmstart to the fifo, we know that compute_ctl is gone.
TLC throws error: Deadlocks in the following state.
When vmshutdown executed vmshutdown_pg_ctl_stop, `postgres_running=NULL.
Then postgres started.

/\  debug_shutdown_request_observed = TRUE
/\  machine_running = TRUE
/\  pc = [vmshutdown |-> "vmshutdown_wait_for_running_command", init |-> "wait_for_vmshutdown", respawn_vmstart |-> "respawn_vmstarter_wait_postgres", postgres |-> "postgres_await_shutdown_or_crash"]
/\  postgres_exited_pids = {1}
/\  postgres_next_pids = <<>>
/\  postgres_running = 2
/\  postgres_shutdown_request_pending = NULL
/\  postgres_spawn_pending = NULL
/\  respawn_current_postgres_pid = 2
/\  shutdown_signal_received = TRUE
/\  start_allowed = FALSE
/\  start_allowed_locked = TRUE
/\  vmshutdown_exited = FALSE
/\  vmstarter_sh_running = TRUE

 Please enter the commit message for your changes. Lines starting
While the vmstarter_sh is running, the flock is held.
Exploit that fact in vmshutdown: if trylock fails, we know the workload
is still running / running again, so, we need to continue trying to shut it down.
@problame problame changed the base branch from main to problame/correct-graceful-shutdown September 25, 2023 10:29
Base automatically changed from problame/correct-graceful-shutdown to main September 25, 2023 15:55
@sharnoff sharnoff mentioned this pull request Oct 21, 2023
2 tasks
@Omrigan
Copy link
Contributor

Omrigan commented Apr 11, 2024

Talked to @problame, this all won't be important after #728

@Omrigan Omrigan closed this Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants