Fixing the design of failover command propagation on watchdog cluster
authorMuhammad Usama <m.usama@gmail.com>
Mon, 14 Nov 2016 19:32:01 +0000 (00:32 +0500)
committerMuhammad Usama <m.usama@gmail.com>
Mon, 14 Nov 2016 19:32:01 +0000 (00:32 +0500)
commitef6c0c8fb0899c65d37c2d83150683a952bad756
tree3b2b931c470d525348082da1eab5f251161b4eda
parentf6ec43456cf91231d92d34fef4b1b44055bc1180
Fixing the design of failover command propagation on watchdog cluster

Overhauling the design of how failover, failback and promote node commands are
propagated to the watchdog nodes. Previously the watchdog on pgpool-II node that
needs to perform the node command (failover, failback or promote node) used to
broadcast the failover command to all attached pgpool-II nodes. And this
sometimes makes the synchronization issues, especially when the watchdog cluster
contains a large number of nodes and consequently the failover command sometimes
gets executed by more than one pgpool-II.

Now with this commit all the node commands are forwarded to the
master/coordinator watchdog, which in turn propagates to all standby nodes.
Apart from above the commit also changes the failover command interlocking
mechanism and now only the master/coordinator node can become the lock holder
so the failover commands will only get executed on the master/coordinator node.
15 files changed:
src/include/pool.h
src/include/watchdog/watchdog.h
src/include/watchdog/wd_ipc_commands.h
src/include/watchdog/wd_ipc_defines.h
src/include/watchdog/wd_json_data.h
src/main/pgpool_main.c
src/pcp_con/pcp_worker.c
src/pcp_con/recovery.c
src/protocol/pool_process_query.c
src/protocol/pool_proto_modules.c
src/utils/pool_signal.c
src/utils/pool_stream.c
src/watchdog/watchdog.c
src/watchdog/wd_commands.c
src/watchdog/wd_json_data.c