Skip to content

Commit

Permalink
speculative workaround for high unclosed connection count between com…
Browse files Browse the repository at this point in the history
…pute smgr & pageserver (#523)

One possible explanation for our high ESTABLISHED connection count
between computes and pageserver's page_service.rs is that we kill the
VMs before the FIN/RST packets from the `pg_ctl -m stop` leave the VM's
kernel.

The sleep added in this PR would give them a good chance.

Roll this out and observe.

Context: https://neondb.slack.com/archives/C036U0GRMRB/p1694427077510779?thread_ts=1694425364.531029&cid=C036U0GRMRB
  • Loading branch information
problame authored Sep 18, 2023
1 parent d1cf9da commit e5248b0
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions neonvm/tools/vm-builder/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,13 @@ action=/neonvm/bin/powerdown
scriptPowerDown = `#!/neonvm/bin/sh
su -p postgres --session-command '/usr/local/bin/pg_ctl stop -D /var/db/postgres/compute/pgdata -m fast --wait -t 10'
# The poweroff below is the busybox poweroff which goes straight to the kernel, i.e., LINUX_REBOOT_CMD_POWER_OFF / RB_POWER_OFF.
# Our experiments have shown that, generally, this type of hard shutdown will not FIN/RST existing TCP connections (i.e., state ESTABLISHED).
# Now, for the particular case of NeonVM, we just did 'pg_ctl stop'.
# But, the libpagestore.c is currently not explicitly shutting down the TCP connection to pageservers, relying on the kernel
# to implicitly close the socket on exit.
# This sleep here is to give the kernel time to send the FIN/RST.
sleep 1
/neonvm/bin/poweroff
`

Expand Down

0 comments on commit e5248b0

Please sign in to comment.