Fix race condition between detach_false_primary and follow_primary command.
authorTatsuo Ishii <ishii@sraoss.co.jp>
Tue, 11 May 2021 10:51:19 +0000 (19:51 +0900)
committerTatsuo Ishii <ishii@sraoss.co.jp>
Tue, 11 May 2021 12:41:03 +0000 (21:41 +0900)
commitaece05fbe4baf5c38d7c6d4eb8c159883be717d7
tree0f0aee1f8cf8ae8fa73ee6095d8f0619a446117e
parent153aeb7ddfd71d148374317761b591cffa29bcdd
Fix race condition between detach_false_primary and follow_primary command.

It was reported that if detach_false_primary and follow_primary
command are running concurrently, many problem occured:

https://www.pgpool.net/pipermail/pgpool-general/2021-April/007583.html

Typical problem is, no primary node is found at the end.

I confirmed that this can be easily reproduced:

https://www.pgpool.net/pipermail/pgpool-hackers/2021-May/003893.html

In this commit new functions pool_acquire_follow_primary_lock(bool
block) and pool_release_follow_primary_lock(void) are introduced. They
are responsible for acquiring or releasing the lock. There are 3
places where those functions are used:

1) find_primary_node

This function is called upon startup and failover in the main pgpool
process to find new primary node.

2) failover

This function is called in the follow_primary_command subprocess
forked off by pgpool main process to execute follow_primary_command
script. The lock should be help until all follow_primary_command are
completed.

3) streaming replication check

Before starting verify_backend_node, which is the work horse of
detach_false_primary, the lock must be acquired. If it fails, just
skip the streaming replication check cycle.

The commit also deal with the case when watchdog is enabled.

https://www.pgpool.net/pipermail/pgpool-hackers/2021-May/003894.html

Multiple pgpool nodes perform detach_false_primary concurrently and
this is the cause of the problem.  To fix this detach_false_primary is
performed only on the leader node. Also if the quorum is absent,
detach_false_primary is not performed.
src/include/pool.h
src/main/pgpool_main.c
src/streaming_replication/pool_worker_child.c
src/test/regression/tests/018.detach_primary/test.sh