Skip to content

Commit

Permalink
compute_ctl: Use 'fast' shutdown for Postgres termination (#8289)
Browse files Browse the repository at this point in the history
## Problem

We currently use 'immediate' mode in the most commonly used shutdown
path, when the control plane calls a `compute_ctl` API to terminate
Postgres inside compute without waiting for the actual pod / VM
termination. Yet, 'immediate' shutdown doesn't create a shutdown
checkpoint and ROs have bad times figuring out the list of running xacts
during next start.

## Summary of changes

Use 'fast' mode, which creates a shutdown checkpoint that is important
for ROs to get a list of running xacts faster instead of going through
the CLOG. On the control plane side, we poll this `compute_ctl`
termination API for 10s, it should be enough as we don't really write
any data at checkpoint time. If it times out, we anyway switch to the
slow k8s-based termination.

See https://www.postgresql.org/docs/current/server-shutdown.html for the
list of modes and signals.

The default VM shutdown hook already uses `fast` mode, see [1]

[1]
https://github.com/neondatabase/neon/blob/c9fd8d76937c2031fd4fea1cdf661d6cf4f00dc3/vm-image-spec.yaml#L30-L31

Related to #6211
  • Loading branch information
ololobus authored Jul 8, 2024
1 parent a68edad commit 84b039e
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions compute_tools/src/compute.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1386,7 +1386,9 @@ pub fn forward_termination_signal() {
let pg_pid = PG_PID.load(Ordering::SeqCst);
if pg_pid != 0 {
let pg_pid = nix::unistd::Pid::from_raw(pg_pid as i32);
// use 'immediate' shutdown (SIGQUIT): https://www.postgresql.org/docs/current/server-shutdown.html
kill(pg_pid, Signal::SIGQUIT).ok();
// Use 'fast' shutdown (SIGINT) because it also creates a shutdown checkpoint, which is important for
// ROs to get a list of running xacts faster instead of going through the CLOG.
// See https://www.postgresql.org/docs/current/server-shutdown.html for the list of modes and signals.
kill(pg_pid, Signal::SIGINT).ok();
}
}

1 comment on commit 84b039e

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3129 tests run: 3001 passed, 1 failed, 127 skipped (full report)


Failures on Postgres 14

  • test_basebackup_with_high_slru_count[github-actions-selfhosted-vectored-10-13-30]: release
# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_basebackup_with_high_slru_count[release-pg14-github-actions-selfhosted-vectored-10-13-30]"
Flaky tests (1)

Postgres 16

  • test_multixid_wraparound_import: release

Code coverage* (full report)

  • functions: 32.6% (6939 of 21280 functions)
  • lines: 50.0% (54550 of 109059 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
84b039e at 2024-07-08T19:21:42.366Z :recycle:

Please sign in to comment.