compute_ctl apply_config sets GUC max_stack_depth which terminates parallel worker (e.g. creat index parallel worker) #10184
Labels
a/performance
Area: relates to performance of the system
c/compute
Component: compute, excluding postgres itself
c/control-plane
Component: Control Plane
c/PostgreSQL
Component: PostgreSQL features and bugs
t/bug
Issue Type: Bug
Steps to reproduce
Run a long-running maintenance task (like create btree index) with parallel workers.
Sporadically these tasks fail during apply_config operation with error message
ERROR: parameter "max_stack_depth" cannot be set during a parallel operation
apply_config uses pg_ctl reload -D to send SIGHUP to postgres and somehow this causes max_stack_depth to be set.
And indeed it seems postgres is ALWAYS setting max_stack_depth in
https://github.com/neondatabase/postgres/blob/97f9fde349c6de6d573f5ce96db07eca60ce6185/src/backend/utils/misc/guc.c#L1585
if (stack_rlimit > 0) && if (new_limit > 100)
when it receives a SIGHUP
I would consider this an upstream bug, because it means that pg_ctl reload command would terminate parallel operations.
For Neon this is extremely bad as we raise SIGHUP in every apply_config operation which is quite frequent
Expected result
Long-running tasks complete as in vanilla postgres (where normally the "system" doesn't send frequent SIGHUP to postgres)
Actual result
Sporadic failure
Environment
Staging, for example endpoint `ep-summer-darkness-w2ldx7r7.us-east-2.aws.neon.build/neondb
Logs, links
Example of failing statement
this is on table (96 GiB):
when using
-c maintenance_work_mem=8388608 -c max_parallel_maintenance_workers=7
see https://github.com/neondatabase/neon/actions/runs/12293061677/job/34304987910
and discussion here https://neondb.slack.com/archives/C04DGM6SMTM/p1734513223148249?thread_ts=1733997259.898819&cid=C04DGM6SMTM
The text was updated successfully, but these errors were encountered: