[bitnami/pgpool] Enhance pgpool-II Recovery Mechanism #75225

seatrain · 2024-11-26T02:04:23Z

Description of the change
Added the -D flag to pgpool-II's startup configuration. This ensures that pgpool-II discards the existing pgpool_status file upon restart, allowing it to flush node statuses and accurately recognize the new primary PostgreSQL node.

Benefits
Automated Recovery:
Enables pgpool-II to automatically discard stale status information on restart, ensuring it correctly identifies and attaches to the current primary node without manual intervention.

Enhanced High Availability:
Improves the resilience of the PostgreSQL cluster by preventing continuous pod restarts due to outdated pgpool_status data.

Possible drawbacks

Applicable issues
Fixes #30577

Additional information

Signed-off-by: longshibin <[email protected]>

juan131 · 2024-11-26T11:36:10Z

Thanks so much for this great contribution @seatrain ! Do you think this PR addresses the issues below?

[bitnami/postgresql-ha] ERROR: cannot execute INSERT in a read-only transaction charts#23219

The idea of flushing was discussed there with pros and cons.

seatrain · 2024-11-26T13:54:54Z

Yes, this PR can help resolve the issue where pgpool marks a node as "down" even when it is the new primary. When pgpool mistakenly treats a standby node as the primary, it can redirect INSERT or UPDATE queries to that node, causing the "cannot execute INSERT in a read-only transaction" error. By discarding the stale status file and refreshing the node states, pgpool will be able to recognize the new primary node and avoid this issue.

Regarding custom recovery configurations, I believe there is no need for additional configuration in pgpool itself, as repmgr handles the failover and promotes the new primary. Pgpool’s role is simply to detect and react to these role changes, so it can automatically adjust to the updated cluster state without needing custom recovery logic.

Stevenpc3 · 2024-11-26T20:57:56Z

Where did you find the flags to pass? Is this configuration similar to detach_false_primary from https://www.pgpool.net/docs/42/en/html/runtime-config-failover.html

I found the flag here https://www.pgpool.net/docs/36/en/html/pgpool.html and a few other places

seatrain · 2024-11-26T23:40:50Z

I found the -D flag in the pgpool documentation. While the documentation mentions that using -D could potentially cause a split-brain scenario in a multi-node pgpool setup, this concern is only relevant in cluster mode. In our case, we are using a single-node pgpool architecture, so the risk of split-brain does not apply. Furthermore, in our setup, pgpool does not manage failover; that responsibility lies with repmgr, and pgpool simply needs to detect role changes. The documentation's assumption that pgpool controls failover is not aligned with our architecture.

As for the detach_false_primary setting, I believe it is primarily intended to prevent split-brain scenarios by ensuring pgpool does not incorrectly detach a node that it mistakenly identifies as a failed primary. Enabling the detach_false_primary flag can further enhance pgpool’s recovery by allowing it to better handle such situations.

juan131 · 2024-11-27T07:45:42Z

Thanks so much the context @seatrain !!

While the documentation mentions that using -D could potentially cause a split-brain scenario in a multi-node pgpool setup, this concern is only relevant in cluster mode. In our case, we are using a single-node pgpool architecture, so the risk of split-brain does not apply. Furthermore, in our setup, pgpool does not manage failover; that responsibility lies with repmgr, and pgpool simply needs to detect role changes. The documentation's assumption that pgpool controls failover is not aligned with our architecture.

As you can see in the links below, the number of Pgpool replicas can be configured on the PostgreSQL HA chart:

So I guess we have two alternatives:

Double-check the -D isn't problematic when running N replicas (N>1) given failover responsibility relies on repmgr.
Add some env. var to the container that enables the usage of the flag (PGPOOL_DISCARD_STATUS or sth like this) and ensure the chart only adds this env. var when there's a single replica.

seatrain · 2024-11-27T08:17:56Z

Thanks so much the context @seatrain !!

While the documentation mentions that using -D could potentially cause a split-brain scenario in a multi-node pgpool setup, this concern is only relevant in cluster mode. In our case, we are using a single-node pgpool architecture, so the risk of split-brain does not apply. Furthermore, in our setup, pgpool does not manage failover; that responsibility lies with repmgr, and pgpool simply needs to detect role changes. The documentation's assumption that pgpool controls failover is not aligned with our architecture.

As you can see in the links below, the number of Pgpool replicas can be configured on the PostgreSQL HA chart:

https://github.com/bitnami/charts/blob/main/bitnami/postgresql-ha/values.yaml#L1229-L1231

https://github.com/bitnami/charts/blob/main/bitnami/postgresql-ha/templates/pgpool/deployment.yaml#L22

So I guess we have two alternatives:

Double-check the -D isn't problematic when running N replicas (N>1) given failover responsibility relies on repmgr.

Add some env. var to the container that enables the usage of the flag (PGPOOL_DISCARD_STATUS or sth like this) and ensure the chart only adds this env. var when there's a single replica.

hello, In the Bitnami pgpool-II architecture, even if multiple pgpool replicas are deployed, they operate in single-node mode. This is because the pgpool configuration does not include watchdog settings, which are essential for forming a pgpool-II cluster. As a result, each pgpool-II replica functions independently without coordinating with others to manage cluster-wide state.

Additionally, the design of Bitnami's pgpool-II and PostgreSQL HA charts delegates failover responsibilities to repmgr, with pgpool-II solely tasked with detecting role changes. Pgpool-II does not manage failover or build a pgpool cluster. Instead, each pgpool pod functions as an independent single-node pgpool instance rather than a member of a pgpool cluster.

Therefore, there is no need to consider whether to enable the -D flag based on the number of pgpool replicas. The -D flag safely ensures that each pgpool-II instance discards stale status information upon restart, facilitating accurate detection of the new primary node managed by repmgr without introducing risks associated with multi-node cluster configurations.

Stevenpc3 · 2024-11-27T12:18:50Z

So what if you add the variable, but default to setting -D. Then people have the option to disable if they want for whatever reason?

Also, would this be considered a breaking change or a minor bump?

I do think the change seems good.

juan131

Thanks so much! In order to make it as flexible as possible while addressing the bug, I think we could add the PGPOOL_DISCARD_STATUS env. variable with a yes default value.

You'll have to edit the pgpool_env function (see the link below) adding sth like this:

export PGPOOL_DISCARD_STATUS="${PGPOOL_DISCARD_STATUS:-yes}"

https://github.com/bitnami/containers/blob/main/bitnami/pgpool/4/debian-12/rootfs/opt/bitnami/scripts/libpgpool.sh#L60-L91

bitnami/pgpool/4/debian-12/rootfs/opt/bitnami/scripts/pgpool/run.sh

Signed-off-by: seatrain <[email protected]>

seatrain · 2024-11-27T13:59:52Z

I have made the edits you suggested. Thank you!

juan131

LGTM

juan131 · 2024-11-27T15:16:45Z

Triggering a new container release

juan131 · 2024-11-28T12:21:50Z

Hi everyone! A new container image version (4.5.5-debian-12-r0) was released including the patch.

Stevenpc3 · 2024-11-28T12:35:54Z

Nice! I assume a chart update for postgresl-ha will follow soon?

juan131 · 2024-11-28T20:01:07Z

Exactly @Stevenpc3 ! See bitnami/charts#30676

github-actions bot added pgpool triage Triage is needed labels Nov 26, 2024

github-actions bot assigned javsalgar Nov 26, 2024

github-actions bot requested a review from javsalgar November 26, 2024 02:04

[bitnami/postgresql-ha] Enhance pgpool-II Recovery Mechanism

4e38563

Signed-off-by: longshibin <[email protected]>

seatrain force-pushed the main branch from d3d42d2 to 4e38563 Compare November 26, 2024 02:20

seatrain changed the title ~~[bitnami/postgresql-ha] Enhance pgpool-II Recovery Mechanism~~ [bitnami/pgpool] Enhance pgpool-II Recovery Mechanism Nov 26, 2024

javsalgar added verify Execute verification workflow for these changes in-progress labels Nov 26, 2024

github-actions bot removed the triage Triage is needed label Nov 26, 2024

github-actions bot unassigned javsalgar Nov 26, 2024

github-actions bot removed the request for review from javsalgar November 26, 2024 09:05

github-actions bot assigned juan131 Nov 26, 2024

github-actions bot requested a review from juan131 November 26, 2024 09:05

juan131 requested changes Nov 27, 2024

View reviewed changes

bitnami/pgpool/4/debian-12/rootfs/opt/bitnami/scripts/pgpool/run.sh Outdated Show resolved Hide resolved

seatrain added 2 commits November 27, 2024 21:56

Add PGPOOL_DISCARD_STATUS

e7be9bf

Signed-off-by: seatrain <[email protected]>

Add PGPOOL_DISCARD_STATUS

80da5e6

Signed-off-by: seatrain <[email protected]>

juan131 approved these changes Nov 27, 2024

View reviewed changes

juan131 merged commit 3cd3a91 into bitnami:main Nov 27, 2024
9 checks passed

github-actions bot added solved and removed in-progress labels Nov 27, 2024

juan131 mentioned this pull request Dec 20, 2024

[bitnami/postgresql-ha] ERROR: cannot execute INSERT in a read-only transaction bitnami/charts#23219

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bitnami/pgpool] Enhance pgpool-II Recovery Mechanism #75225

[bitnami/pgpool] Enhance pgpool-II Recovery Mechanism #75225

seatrain commented Nov 26, 2024

juan131 commented Nov 26, 2024

seatrain commented Nov 26, 2024

Stevenpc3 commented Nov 26, 2024 •

edited

Loading

seatrain commented Nov 26, 2024

juan131 commented Nov 27, 2024

seatrain commented Nov 27, 2024

Stevenpc3 commented Nov 27, 2024

juan131 left a comment

seatrain commented Nov 27, 2024

juan131 left a comment

juan131 commented Nov 27, 2024

juan131 commented Nov 28, 2024

Stevenpc3 commented Nov 28, 2024

juan131 commented Nov 28, 2024

[bitnami/pgpool] Enhance pgpool-II Recovery Mechanism #75225

[bitnami/pgpool] Enhance pgpool-II Recovery Mechanism #75225

Conversation

seatrain commented Nov 26, 2024

juan131 commented Nov 26, 2024

seatrain commented Nov 26, 2024

Stevenpc3 commented Nov 26, 2024 • edited Loading

seatrain commented Nov 26, 2024

juan131 commented Nov 27, 2024

seatrain commented Nov 27, 2024

Stevenpc3 commented Nov 27, 2024

juan131 left a comment

Choose a reason for hiding this comment

seatrain commented Nov 27, 2024

juan131 left a comment

Choose a reason for hiding this comment

juan131 commented Nov 27, 2024

juan131 commented Nov 28, 2024

Stevenpc3 commented Nov 28, 2024

juan131 commented Nov 28, 2024

Stevenpc3 commented Nov 26, 2024 •

edited

Loading