pgpool2.git
3 years agoDoc: fix typos.
Bo Peng [Thu, 25 Nov 2021 06:19:45 +0000 (15:19 +0900)]
Doc: fix typos.

3 years agoFix redundant code.
Tatsuo Ishii [Mon, 22 Nov 2021 07:31:39 +0000 (16:31 +0900)]
Fix redundant code.

Patch contributed by Lu Chenyang.

3 years agoDoc: fix release note typo V4_2_6 V4_2_6_RPM
Masaya Kawamoto [Thu, 18 Nov 2021 03:23:36 +0000 (03:23 +0000)]
Doc: fix release note typo

3 years agoPrepare 4.2.6
Masaya Kawamoto [Thu, 18 Nov 2021 03:07:58 +0000 (03:07 +0000)]
Prepare 4.2.6

3 years agoFix doc version.
Masaya Kawamoto [Thu, 18 Nov 2021 03:05:45 +0000 (03:05 +0000)]
Fix doc version.

3 years agoAdd release notes for Pgpool-II 4.2.6
Masaya Kawamoto [Thu, 18 Nov 2021 02:28:12 +0000 (02:28 +0000)]
Add release notes for Pgpool-II 4.2.6

3 years agoReject extraneous data after SSL encryption handshake.
Tatsuo Ishii [Wed, 17 Nov 2021 10:26:11 +0000 (19:26 +0900)]
Reject extraneous data after SSL encryption handshake.

In the server side implementation of SSL negotiation
(pool_ssl_negotiate_serverclient()), it was possible for a
man-in-the-middle attacker to inject arbitrary SQL commands. This is
possible if Pgpool-II is configured to use cert authentication or
hostssl + trust. This resembles PostgreSQL's CVE-2021-23214.

Similarly, in the client side implementation of SSL negotiation
(pool_ssl_negotiate_clientserver()), it was possible for a
man-in-the-middle attacker to inject arbitrary responses. This is
possible if PostgreSQL is using trust authentication with a clientcert
requirement. It is not possible with cert authentication because
Pgpool-II does not implement the cert authentication between Pgpool-II

To fix these reject extraneous data in the read buffer after SSL
encryption handshake.
and PostgreSQL. This resembles PostgreSQL's CVE-2021-23222.

3 years agoDeal with PostgreSQL 14 while processing pg_terminate_backend().
Tatsuo Ishii [Tue, 16 Nov 2021 00:45:31 +0000 (09:45 +0900)]
Deal with PostgreSQL 14 while processing pg_terminate_backend().

Do not reject two arguments form of pg_terminate_backend() as
PostgreSQL 14 or after accept two arguments.

3 years agoFix occasional 073.pg_terminate_backend regression test failure.
Tatsuo Ishii [Tue, 16 Nov 2021 00:02:47 +0000 (09:02 +0900)]
Fix occasional 073.pg_terminate_backend regression test failure.

The test used "ps -ef" command to find the process which is running
SELECT command.  However in some cases the "ps -ef" command omits part
of "SELECT" in its output and this made the test fail.
So use "ps -efw" instead of "ps -ef" to prevent it.

3 years agoRename regression test 074.
Bo Peng [Mon, 8 Nov 2021 08:27:38 +0000 (17:27 +0900)]
Rename regression test 074.

3 years agoFix application_name array lacking an entry for logger process.
Tatsuo Ishii [Mon, 25 Oct 2021 00:40:36 +0000 (09:40 +0900)]
Fix application_name array lacking an entry for logger process.

Add logger process entry to the process name array. This was missed
since the logger process was added.
Per coverity.

3 years agoEnhance SIGHLD handler of Pgpool-II main process.
Tatsuo Ishii [Sun, 24 Oct 2021 07:22:33 +0000 (16:22 +0900)]
Enhance SIGHLD handler of Pgpool-II main process.

When Pgpool-II child is killed by SIGKILL signal, the SIGHLD handler
just emitted LOG level message as other signals.  But SIGKILL is an
important event, for example killed by OOM killer. So emit a WARNING
level message instead.

Per suggestion from Michail Alexakis.
Discussion: https://www.pgpool.net/pipermail/pgpool-general/2021-October/007808.html

3 years agoFix connection counter issue when reserved_connections is 0.
Tatsuo Ishii [Sun, 24 Oct 2021 02:30:25 +0000 (11:30 +0900)]
Fix connection counter issue when reserved_connections is 0.

If reserved_connections is 0, we don't need to manage the connection
counter to check if the count is larger than
(pool_config->num_init_children - pool_config->reserved_connections).
So remove the check.  This will prevent unwanted "Sorry, too many
clients already" error" by accidental counter leak.

For reserved_connections > 0 case, we need to fix the counter leak but
it's another story.

Discussion: https://www.pgpool.net/pipermail/pgpool-general/2021-October/007808.html

3 years agoFix signal handler for SIGTERM, SIGINT and SIGQUIT.
Tatsuo Ishii [Tue, 12 Oct 2021 01:40:11 +0000 (10:40 +0900)]
Fix signal handler for SIGTERM, SIGINT and SIGQUIT.

It did not properly save errno and it was possible to overwrite errno.

3 years agoDoc: fix documentation typos.
Bo Peng [Wed, 6 Oct 2021 01:30:24 +0000 (10:30 +0900)]
Doc: fix documentation typos.

3 years agoFix typos in documentation and sample scripts.
Bo Peng [Mon, 4 Oct 2021 10:56:49 +0000 (19:56 +0900)]
Fix typos in documentation and sample scripts.

Patch is created by Kazufumi Noto.

3 years agoDoc: fix yum install command typo in configuration example.
Bo Peng [Wed, 29 Sep 2021 14:47:17 +0000 (23:47 +0900)]
Doc: fix yum install command typo in configuration example.

4 years agoFix for bug-732: Segmentation fault at failover ...
Muhammad Usama [Mon, 27 Sep 2021 17:07:48 +0000 (22:07 +0500)]
Fix for bug-732: Segmentation fault at failover ...

trigger_failover_command() had an assumption that old primary can never be NULL,
which of course is not the case.

4 years agoFix pg_config command path to avoid test failure.
Bo Peng [Thu, 23 Sep 2021 12:54:00 +0000 (21:54 +0900)]
Fix pg_config command path to avoid test failure.

4 years agoFix psql command path to avoid test failure.
Bo Peng [Thu, 23 Sep 2021 11:44:27 +0000 (20:44 +0900)]
Fix psql command path to avoid test failure.

4 years agoFix occasional hang in COPY FROM.
Tatsuo Ishii [Thu, 16 Sep 2021 06:44:51 +0000 (15:44 +0900)]
Fix occasional hang in COPY FROM.

If an error occurs while doing COPY FROM, it was possible the
Pgpool-II waited forever for a response from backend after COPY end
marker was sent from frontend. Pgpool expected a new message arrives
to socket, but it is possible that the message (in this case an error
message) is already in the backend read buffer. The fix is, check the
buffer is empty or not before reading from the socket.
New test case (07.copy_hang) is also added.

The bug was found by Bo Peng.

4 years agoDoc: fix incorrect file name in "Pgpool-II on Kubernetes". V4_2_5 V4_2_5_RPM
Bo Peng [Mon, 13 Sep 2021 08:12:56 +0000 (17:12 +0900)]
Doc: fix incorrect file name in "Pgpool-II on Kubernetes".

4 years agoPrepare 4.2.5
Bo Peng [Mon, 13 Sep 2021 07:45:23 +0000 (16:45 +0900)]
Prepare 4.2.5

4 years agoDoc: add release notes for Pgpool-II 4.2.5.
Bo Peng [Mon, 13 Sep 2021 05:57:59 +0000 (14:57 +0900)]
Doc: add release notes for Pgpool-II 4.2.5.

4 years agoFix for bug-731: Fails to execute follow_primary_command..
Muhammad Usama [Fri, 10 Sep 2021 10:49:46 +0000 (15:49 +0500)]
Fix for bug-731: Fails to execute follow_primary_command..

It was a segmentation fault happening because of a silly coding mistake in
parse_wd_exec_cluster_command_json() function.
The problem was the WDExecCommandArg** arg for returning the argument list
was not getting populated properly.
As part of the fix, to get rid of complexity and confusion, I have changed
the function signature to accept List* type argument for returning the
parsed command args instead of an array and its length.

Thanks to Bo Peng, Emond Papegaaij and Tatsuo Ishi for reviewing
and testing the fix

4 years agoDoc: fixed "PGPOOL SHOW" documentaion that was missing in the previous commit 151e8f5...
Bo Peng [Mon, 6 Sep 2021 08:45:07 +0000 (17:45 +0900)]
Doc: fixed "PGPOOL SHOW" documentaion that was missing in the previous commit 151e8f54961b1a0394cfc86c2f4ddf715b41ceea.

4 years agoFix incorrect PGPOOL SHOW option.
Bo Peng [Fri, 3 Sep 2021 05:16:01 +0000 (14:16 +0900)]
Fix incorrect PGPOOL SHOW option.

Since 4.2 other_pgpool is removed, "PGPOOL SHOW other_pgpool" command should be changed to "PGPOOL SHOW watchdog".

4 years agoFix bug of "PGPOOL SHOW heartbeat" and "PGPOOL SHOW ALL" command.
Bo Peng [Fri, 3 Sep 2021 04:58:46 +0000 (13:58 +0900)]
Fix bug of "PGPOOL SHOW heartbeat" and "PGPOOL SHOW ALL" command.

The last member in heartbeat_* array was not displayed in the command result.

4 years agoFix to allow log_rotation_age to be set to 0.
Tatsuo Ishii [Thu, 2 Sep 2021 00:07:15 +0000 (09:07 +0900)]
Fix to allow log_rotation_age to be set to 0.

According to the docs, it is possible to set 0 to log_rotation_age so
that the log rotation feature is disabled. But the allowed minimum
value for it was mistakenly set to 10, which made it impossible to set
log_rotation_age to 0.

4 years agoFix pgpool_setup to do nothing when no new main node is available.
Tatsuo Ishii [Wed, 25 Aug 2021 00:52:39 +0000 (09:52 +0900)]
Fix pgpool_setup to do nothing when no new main node is available.

When no new main node is available due to there's only 1 node
remaining, pgpool_setup tried to promote a node with node id -1.  This
is impossible. Fix is just skipping whole failover process if new main
node id is -1.

4 years agoFix the incorrect display of load balancing node in raw mode.
Bo Peng [Wed, 18 Aug 2021 08:00:28 +0000 (17:00 +0900)]
Fix the incorrect display of load balancing node in raw mode.

In raw mode, Pgpool-II sends all queies to main node.
This is harmless, but it may confuse users.

4 years agoFix pgpool logger process eats 100% cpu.
Tatsuo Ishii [Thu, 12 Aug 2021 07:37:26 +0000 (16:37 +0900)]
Fix pgpool logger process eats 100% cpu.

The select(2) loop in the logger process did not initialize
timeout.tv_usec, which could cause error in select(2) because of too
big tv_usec. In this select(2) immediately returned and iterate the
for loop.

Problem reported and patch provided by Fang Jun.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2021-August/003993.html

4 years agoFix typo in SI related functions.
Tatsuo Ishii [Wed, 11 Aug 2021 00:16:34 +0000 (09:16 +0900)]
Fix typo in SI related functions.

The typo pattern was "aquire" (wrong), which should have been "acquire".
This fix has been already applied to master branch.

4 years agoFix compiler warning.
Tatsuo Ishii [Wed, 11 Aug 2021 00:07:58 +0000 (09:07 +0900)]
Fix compiler warning.

4 years agoUpdate Makefile.in and configure.
Bo Peng [Tue, 10 Aug 2021 12:04:43 +0000 (21:04 +0900)]
Update Makefile.in and configure.

4 years agoFix "ar: `u' modifier ignored since `D' is the default (see `U')" warnings.
Tatsuo Ishii [Tue, 10 Aug 2021 10:07:59 +0000 (19:07 +0900)]
Fix "ar: `u' modifier ignored since `D' is the default (see `U')" warnings.

The warning can be eliminated by changing arguments of ar command from
"cru" to "cr".  There were two places of the warnings: src/libs/pcp
and src/watchdog.  For the former, we need to fix libtool, which is
generated by configure. So fix configure.ac.  For the latter, just fix
src/watchdog/Makefile.am.

4 years agoUpdate configure and src/config/pool_config.c due to the previous commit.
Bo Peng [Tue, 10 Aug 2021 07:54:04 +0000 (16:54 +0900)]
Update configure and src/config/pool_config.c due to the previous commit.

4 years agoFix compiler warnings generated by newer version of gcc.
Tatsuo Ishii [Tue, 10 Aug 2021 07:12:55 +0000 (16:12 +0900)]
Fix compiler warnings generated by newer version of gcc.

gcc on Ubuntu 20 (9.3.0) generates tons of compiler warnings something like:

In file included from /usr/include/string.h:495,
>                  from pg_md5.c:31:
> In function 。strncpy「,
>     inlined from 。update_pool_passwd_from_file「 at pg_md5.c:276:3,
>     inlined from 。main「 at pg_md5.c:136:3:
> /usr/include/x86_64-linux-gnu/bits/string_fortified.h:106:10: warning: 。__builtin___strncpy_chk「 output may be truncated copying between 0 and 128 bytes from a string of length 257 [-Wstringop-truncation]
>   106 |   return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
>       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>

I have found a message in PostgreSQL hackers mailing list regarding similar issue:
https://www.postgresql.org/message-id/flat/21789.1529170195%40sss.pgh.pa.us#525c384d13505fa6f1f25c50b00d7a08

To fix these warnings:

1) fix misuse of strncpy() and friends

2) add -Wno-stringop-truncation to CLFAG

This commit implelents #1 and #2.

Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2021-August/003990.html

4 years agoFix SI mode to acquire a snapshot with an internal transaction.
Tatsuo Ishii [Tue, 10 Aug 2021 04:59:32 +0000 (13:59 +0900)]
Fix SI mode to acquire a snapshot with an internal transaction.

In SI mode a data modifying statement needs to start internal
transaction.  However it forgot to acquire a snapshot. This commit
fixes that in both simple query protocol and extended query protocol.

4 years agoDoc: fix typo in Japanese release note.
Masaya Kawamoto [Tue, 10 Aug 2021 01:31:40 +0000 (01:31 +0000)]
Doc: fix typo in Japanese release note.

4 years agoDoc: Mention that double quotes are required in "PGPOOL SHOW" command if the paramete...
Bo Peng [Mon, 9 Aug 2021 01:43:47 +0000 (10:43 +0900)]
Doc: Mention that double quotes are required in "PGPOOL SHOW" command if the parameter contains uppercase letters.

4 years agoFix backend_flag* parameter shown twice while executing "pgpool show all".
Tatsuo Ishii [Sun, 8 Aug 2021 04:19:33 +0000 (13:19 +0900)]
Fix backend_flag* parameter shown twice while executing "pgpool show all".

There are two entries of "backend_flag" for "ALLOW_TO_FAILOVER" and
"ALWAYS_PRIMARY" in the config variable struct. This is mostly ok but
"pgpool show all" command displayed both backend_flag entries, which
looks redundant. The reason for this is, report_all_variables() shows
grouped variables first then other variables except already shown as
grouped variables.  Unfortunately build_variable groups() is not smart
enough to build grouped variable data: it only registers the first
backend_flag entry and leaves the second entry. Since the second entry
is not a grouped variable, backend_flag is shown firstly as a grouped
variable and then is show as a non grouped variable in
report_all_variables(). To fix this, mark that the second variable is
also a grouped variable (the flag is set by
build_config_variables()). See bug 728 for the report of the problem.

Also add/fix comments.

4 years agoPrepare 4.2.4 V4_2_4 V4_2_4_RPM
Masaya Kawamoto [Tue, 3 Aug 2021 02:31:28 +0000 (02:31 +0000)]
Prepare 4.2.4

4 years agoFix: typo in release note.
Masaya Kawamoto [Tue, 3 Aug 2021 02:07:50 +0000 (02:07 +0000)]
Fix: typo in release note.

4 years agoAdd release notes.
Masaya Kawamoto [Tue, 3 Aug 2021 01:55:34 +0000 (01:55 +0000)]
Add release notes.

4 years agoDoc: Update configuration example "Pgpool-II on Kubernetes".
Bo Peng [Mon, 2 Aug 2021 16:58:31 +0000 (01:58 +0900)]
Doc: Update configuration example "Pgpool-II on Kubernetes".

4 years agoDoc: add more explanation about backend_application_name.
Tatsuo Ishii [Tue, 20 Jul 2021 07:01:08 +0000 (16:01 +0900)]
Doc: add more explanation about backend_application_name.

4 years agoFix: Typo in a function name
Muhammad Usama [Fri, 16 Jul 2021 10:43:12 +0000 (15:43 +0500)]
Fix: Typo in a function name

4 years agoImplementing the follow_primary command-locking over the watchdog channel.
Muhammad Usama [Fri, 16 Jul 2021 07:17:27 +0000 (12:17 +0500)]
Implementing the follow_primary command-locking over the watchdog channel.

Supplementary fix for [pgpool-hackers: 3892] Problem with detach_false_primary..

commit:455f00dd5f5b7b94bd91aa0b6b40aab21dceabb9 fixed a race condition between
detach_false_primary and follow_primary commands. Part of the fix was to make
sure that the detach_false_primary should only be executed on the
leader watchdog node.

The mentioned commit ensures the execution of detach_false_primary on the
watchdog leader by getting the watchdog status from within the main process.
The design is good enough for most cases, but has the potential to fail if
the cluster goes into the election process just after the main process
has read the status.

To fix that, this commit implements the synchronization of follow_primary_command
execution using the distributed locks over the watchdog channel.

The idea is, just before executing the follow_primary during the failover process
we instruct all standby watchdog nodes to acquire a lock on their respective
nodes to block the false primary detection during the period when the
follow_primary is being executed on the leader watchdog node.

Moreover to keep the watchdog process blocked on waiting for the lock the commit
introduced the pending remote lock mechanism, so that remote locks can get
acquired in the background after the completion of the inflight replication checks.

Finally, REQ_DETAIL_CONFIRMED flag is removed from degenerate_backend_set()
request that gets issued to detach the false primary, That means all quorum
and consensus rules must be satisfied for the detach to happen.

4 years agoDoc: Fix documentation typos.
Bo Peng [Wed, 14 Jul 2021 14:40:28 +0000 (23:40 +0900)]
Doc: Fix documentation typos.

4 years agoDoc: Fix documentation typos.
Bo Peng [Wed, 14 Jul 2021 14:05:16 +0000 (23:05 +0900)]
Doc: Fix documentation typos.

4 years agoFix client side hang when describe message is followed by NoData response.
Tatsuo Ishii [Fri, 9 Jul 2021 02:59:33 +0000 (11:59 +0900)]
Fix client side hang when describe message is followed by NoData response.

For certain describe request for a query including "LISTEN", NoData
message can be returned. Since Pgpool-II buffered NoData, clients
could hang waiting for the NoData message in vain. To fix this, do not
buffer NoData message.

Problem reported and patch provided by Daniel van de Giessen.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2021-July/003951.html

4 years agoFix query cache to not cache SQLValueFunctions (CURRENT_TIME, CURRENT_USER etc.).
Tatsuo Ishii [Wed, 7 Jul 2021 04:03:59 +0000 (13:03 +0900)]
Fix query cache to not cache SQLValueFunctions (CURRENT_TIME, CURRENT_USER etc.).

Checking SQLValueFunctions was missed whether to be cached or not and
they were regarded as non function objects. As a result they were
cached.

Also add more test cases.

4 years agoDoc: fix typo in in memory query cache document.
Tatsuo Ishii [Tue, 6 Jul 2021 23:44:03 +0000 (08:44 +0900)]
Doc: fix typo in in memory query cache document.

4 years agoFix typo in pgpool.conf samples.
Tatsuo Ishii [Sat, 3 Jul 2021 01:08:25 +0000 (10:08 +0900)]
Fix typo in pgpool.conf samples.

4 years agoAdd buffer length check.
Tatsuo Ishii [Fri, 2 Jul 2021 03:50:29 +0000 (12:50 +0900)]
Add buffer length check.

get_info_from_conninfo() did not check the size of the provided
buffers.  Add length parameters so that it can check the buffer size.

4 years agoFix sending invalid message in SI mode.
Tatsuo Ishii [Thu, 1 Jul 2021 04:41:34 +0000 (13:41 +0900)]
Fix sending invalid message in SI mode.

When a query is aborted by specific reason like serialization error,
Pgpool-II sends error query to abort transactions running on non main
nodes. The message length of the query was incorrect and it caused
"invalid string in message" error on backend.

4 years agoFix rsync parameter in pgpool_setup.
Tatsuo Ishii [Mon, 28 Jun 2021 05:59:06 +0000 (14:59 +0900)]
Fix rsync parameter in pgpool_setup.

It did not exclude "log" directory, which is the default logging
directory in recent PostgreSQL versions. This made hard to examine
PostgreSQL log, since it is copied from primary server.

4 years agoFix follow primary command creation in pgpool_setup.
Tatsuo Ishii [Fri, 25 Jun 2021 10:58:53 +0000 (19:58 +0900)]
Fix follow primary command creation in pgpool_setup.

If the target node of the follow primary script is primary and the
server version is before PostgreSQL 12, then the follow primary
command fails because there's no recovery.conf (12 or later is fine,
because "standby.signal" and "myrecovery.conf" are created).

To fix this, if server version is before 12, rename recovery.done to
recovery.conf.

Also this should fix 034.promote_node failure with PostgreSQL 11 or
before.

4 years agoFix the width of some struct member of LifeCheckNode.
Tatsuo Ishii [Thu, 24 Jun 2021 07:08:13 +0000 (16:08 +0900)]
Fix the width of some struct member of LifeCheckNode.

They were defined using constant numbers. Use more appropriate #define
symbols to make them consistent with other declarations.

The issue was detected by a Coverity run.

Also fix copyright year.

4 years agoFix follow primary script creation.
Tatsuo Ishii [Thu, 24 Jun 2021 06:09:35 +0000 (15:09 +0900)]
Fix follow primary script creation.

When PostgreSQL version is lower than 12, the script should have edit
recovery.conf, rather than myrecovery.conf.

4 years agoFix pgpool_setup to generate portable follow_primary.sh.
Tatsuo Ishii [Mon, 21 Jun 2021 01:09:37 +0000 (10:09 +0900)]
Fix pgpool_setup to generate portable follow_primary.sh.

The script did not provide necessary information to execute pcp
commands: pcp port, PCPPASSFILE and path to pcp commands.

Also fix the failover.sh generation. Currently there's no user to use
the information but to avoid possible confusion in the future, the
information is also added.

These should fix the regression test failure of 034.promote_node.

4 years agoDoc: fix parameter name typo in ldap option
Takuma Hoshiai [Mon, 21 Jun 2021 23:29:01 +0000 (08:29 +0900)]
Doc: fix parameter name typo in ldap option

4 years agoFix 075.detach_primary_left_down_node.
Tatsuo Ishii [Mon, 21 Jun 2021 01:33:41 +0000 (10:33 +0900)]
Fix 075.detach_primary_left_down_node.

The test script executed pcp_detach_node but it always failed because
it tried to connect to UNIX domain socket.

4 years agoFix 031.connection_life_time.
Tatsuo Ishii [Mon, 21 Jun 2021 01:25:49 +0000 (10:25 +0900)]
Fix 031.connection_life_time.

The test script executed pcp_recovery_node but it always failed
because it tried to connect to UNIX domain socket.

4 years agoFix 018.detach_primary error in the log.
Tatsuo Ishii [Mon, 21 Jun 2021 01:19:23 +0000 (10:19 +0900)]
Fix 018.detach_primary error in the log.

The regression test itself passes but execution of pcp_watchdog_info
to get information failed because it tried to connect to UNIX domain
socket.

4 years agoUpdate Copyright year.
Tatsuo Ishii [Thu, 17 Jun 2021 23:03:21 +0000 (08:03 +0900)]
Update Copyright year.

4 years agoFix pgpool_setup in creating base backup script.
Tatsuo Ishii [Tue, 15 Jun 2021 08:10:54 +0000 (17:10 +0900)]
Fix pgpool_setup in creating base backup script.

4 years agoFix orphan process is left when pgpool is going down.
Tatsuo Ishii [Wed, 9 Jun 2021 04:43:44 +0000 (13:43 +0900)]
Fix orphan process is left when pgpool is going down.

When pgpool is going down while follow primary command is ongoing,
some process started by follow primary child process could be left.
The follow primary child calls trigger_failover_command() to run the
follow primary script by using system(3). Unfortunately the process
started by system(3) was not tracked by anyone and it was possible
that the process(es) are left. To reproduce the problem you can edit
the test script (test,sh) for the regression test
075.detach_primary_left_down to change the timeout counter to small
number, say 1. i.e.,
change:
cnt=60
to
cnt=1

To fix the problem, the follow primary child process calls setsid(2)
to set new session id. When Pgpool-II main goes down, the exit handler
kills all the process including the process started by system(3), by
specifying kill(-(follow primary child process pid)).

Also in the initialization of the follow primary process, unmask
signals and assign default behavior to each signal (this should have
been done much earlier).

4 years agoFix pcp_detach_node leaves down node.
Tatsuo Ishii [Tue, 8 Jun 2021 10:06:13 +0000 (19:06 +0900)]
Fix pcp_detach_node leaves down node.

Detaching primary node using pcp_detach_node leaves a standby node
after follow primary command was executed.

This can be reproduced reliably by following steps:

$ pgpool_setup -n 4
$ ./startall
$ pcp_detatch_node -p 11001 0

This is caused by that pcp_recovery_node is denied by pcp child process:

2021-06-05 07:22:17: follow_child pid 6593: LOG:  execute command: /home/t-ishii/work/Pgpool-II/current/x/etc/follow_primary.sh 3 /tmp 11005 /home/t-ishii/work/Pgpool-II/current/x/data3 1 0 /tmp 0 11002 /home/t-ishii/work/Pgpool-II/current/x/data0
2021-06-05 07:22:17: pcp_main pid 6848: LOG:  forked new pcp worker, pid=7027 socket=6
2021-06-05 07:22:17: pcp_child pid 7027: ERROR:  failed to process PCP request at the moment
2021-06-05 07:22:17: pcp_child pid 7027: DETAIL:  failback is in progress

it complains that a failback request is still going. The reason why the
failback is not completed is, find_primary_node_repeatedly() is trying
to acquire the follow primary lock. However the follow primary command
has already acquired the lock and it is waiting for the completion of
the failback request. Thus this is a kind of dead lock situation.

How to solve this?

The purpose of the follow primary lock is to prevent concurrent run of
follow primary command and detach false primary by the streaming
replication check. We cannot throw it away. However it is not always
necessary to acquire the lock by find_primary_node_repeatedly(). If
it does not try to acquire the lock, failover/failback will not be
blocked and will finish soon, thus Req_info->switching flags will be
promptly turned to false.

When a primary node is detached, failover command is called and new
primary is selected. At this point find_primary_node_repeatedly() is
surely needed to run to find the new primary. However, once follow
primary command starts, the primary will not be changed. So my idea
is, find_primary_node_repeatedly() checks whether follow primary
command is running or not. If it is running, just returns the current
primary. Otherwise acquires the lock.

For this purpose, new shared memory variable
Req_info->follow_primary_ongoing was introduced. The flag is set/unset
by follow primary process.

New regression test 075.detach_primary_left_down_node is added.

Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2021-June/003916.html

4 years agoFix backend_application_name cannot be changed by reloading.
Tatsuo Ishii [Thu, 3 Jun 2021 05:39:06 +0000 (14:39 +0900)]
Fix backend_application_name cannot be changed by reloading.

The manual explicitly stats that it is possible to change
backend_application_name on the fly by reloading the configuration
file but the code set unnecessary restriction on that.

4 years agoDoc: fix wd_life_point description
Masaya Kawamoto [Wed, 2 Jun 2021 07:53:21 +0000 (07:53 +0000)]
Doc: fix wd_life_point description

The description of the default value was lacked in the Japanese doc.

4 years agoAdd comment to Req_info->conn_counter.
Tatsuo Ishii [Mon, 31 May 2021 07:56:10 +0000 (16:56 +0900)]
Add comment to Req_info->conn_counter.

4 years agoAdd comments to watchdog.c
Tatsuo Ishii [Mon, 31 May 2021 08:00:02 +0000 (17:00 +0900)]
Add comments to watchdog.c

4 years agoFix occasional failure of 028.watchdog_enable_consensus_with_half_votes test.
Tatsuo Ishii [Mon, 31 May 2021 06:19:40 +0000 (15:19 +0900)]
Fix occasional failure of 028.watchdog_enable_consensus_with_half_votes test.

It seems the failure was caused in this scenario:

Testing total nodes: 4. enable_consensus_with_half_of_the_votes: on
shutdown node pgpool2
2021-05-30 07:41:54: main pid 28819: LOG:  stop request sent to pgpool. waiting for termination....done.
shutdown node pgpool1
2021-05-30 07:41:56: main pid 28825: LOG:  stop request sent to pgpool. waiting for termination....done.
Quorum does not exist. Test failed

So the test failed at 07:41:56. In the mean time pgpool.log showed:
2021-05-30 07:41:54: watchdog pid 28569: LOG:  We have lost the cluster leader node "localhost:50008 Linux e1aa95e1fe13"
:
:
2021-05-30 07:41:59: watchdog pid 28569: LOG:  watchdog node state changed from [STANDING FOR LEADER] to [LEADER]
2021-05-30 07:41:59: watchdog pid 28569: LOG:  I am announcing my self as leader/coordinator watchdog node

The quorum was established at 07:41:59. That means the test for quorum
existence was too early.

To fix this, insert "sleep 5" after shutting down pgpool.

4 years agoEnhance watchdog_setup script.
Tatsuo Ishii [Sat, 29 May 2021 08:32:38 +0000 (17:32 +0900)]
Enhance watchdog_setup script.

shutdownall script generated by watchdog_setup shutdowns in the node
number order i.e.: 0, 1, 2...  This causes PostgreSQL backend shutdown
when pgpool0 node went down and node 1, 2... trigger failover event,
which is not necessary in the whole shutdown sequence.  Shutting down
in the reverse order (...2, 1, 0) should prevent this and shorten the
whole shutdown sequence.

Also this should prevent occasional 018.detach_primary and
028.watchdog_enable_consensus_with_half_votes test timeout (they use
watchdog_setup).

4 years agoFix occasional regression test 018.detach_primary error.
Tatsuo Ishii [Fri, 28 May 2021 07:47:45 +0000 (16:47 +0900)]
Fix occasional regression test 018.detach_primary error.

According to the buildfarm log, the test failed at after this:
with this:
$PGPOOL_INSTALL_DIR/bin/pcp_watchdog_info -v -w -p $PCP_PORT
error message was:
ERROR: connection to socket "/tmp/.s.PGSQL.50001" failed with error "No such file or directory"
This suggests that pcp_watchdog_info fails because pcp server has not started yet.

To fix this add wait_for_pgpool_startup before pcp_watchdog_info.

4 years agoFix maximum length of hostnames including domain name.
Tatsuo Ishii [Thu, 27 May 2021 10:15:46 +0000 (19:15 +0900)]
Fix maximum length of hostnames including domain name.

The maximum length of hostnames was 128, which is not incorrect.
Moreover there were multiple places where the maximum length of hostname is defined.
So create unified definition of it in libpcp_ext.h.

Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2021-May/003904.html

4 years agoFix watchdog communication race condition.
Tatsuo Ishii [Fri, 21 May 2021 23:00:05 +0000 (08:00 +0900)]
Fix watchdog communication race condition.

Watchdog sends information from the watchdog process to the Pgpool-II
main process using SIGUSR1. To pass detailed messages it uses shared
memory area. First it sets a message to the shared memory area then
sends SIGUSR1 to the main process. The main process received the
signal and the signal handler sets a global variable so that
sigusr1_interrupt_processor() processes it. However it is possible
that while sigusr1_interrupt_processor() is running new signal
arrives. In this case the new signal is caught but the global variable
is set to 0 after sigusr1_interrupt_processor() returns. This means
that the new message is not processed until new signal arrives, which
could cause significant delay before the message was processed.

To fix the problem, sigusr1_interrupt_processor() is repeatedly called
until there's no pending message.

Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2021-May/003901.html

4 years agoFix for bug:684: Watchdog node status not updating after rebooting.
Muhammad Usama [Thu, 20 May 2021 12:30:02 +0000 (17:30 +0500)]
Fix for bug:684: Watchdog node status not updating after rebooting.

A node should broadcast its status to the whole cluster after
joining the cluster as standby.

4 years agoFix "file name is too long (max 99)" error while creating a tarball. V4_2_3 V4_2_3_RPM
Bo Peng [Wed, 19 May 2021 13:52:03 +0000 (22:52 +0900)]
Fix "file name is too long (max 99)" error while creating a tarball.

4 years agoPrepare 4.2.3.
Bo Peng [Wed, 19 May 2021 13:13:33 +0000 (22:13 +0900)]
Prepare 4.2.3.

4 years agoAdd release notes.
Bo Peng [Wed, 19 May 2021 12:53:05 +0000 (21:53 +0900)]
Add release notes.

4 years agoImprove sample scripts.
Bo Peng [Wed, 19 May 2021 11:32:09 +0000 (20:32 +0900)]
Improve sample scripts.

4 years agoImprove sample scripts.
Bo Peng [Tue, 18 May 2021 16:28:59 +0000 (01:28 +0900)]
Improve sample scripts.

Replcace dots or hyphens in replication slot name to underscore.

4 years agoFix error message while checking watchdog configuration.
Tatsuo Ishii [Tue, 18 May 2021 05:09:17 +0000 (14:09 +0900)]
Fix error message while checking watchdog configuration.

Since Pgpool-II 4.2 there's no "other watchdog node" concept.
All watchdog nodes are registered on all watchdog nodes.

4 years agoDoc: fix typo.
Tatsuo Ishii [Tue, 18 May 2021 02:23:26 +0000 (11:23 +0900)]
Doc: fix typo.

4 years agoUpdate copyright year.
Tatsuo Ishii [Fri, 14 May 2021 02:44:47 +0000 (11:44 +0900)]
Update copyright year.

4 years agoRevert "Fix memory leak in pcp_node_info."
Tatsuo Ishii [Thu, 13 May 2021 11:35:00 +0000 (20:35 +0900)]
Revert "Fix memory leak in pcp_node_info."

This reverts commit 7576c0e3cae3ffb2fa5735a33358f7ba278206a0.

4 years agoFix regression test 018.detach_primary test.
Tatsuo Ishii [Wed, 12 May 2021 08:55:18 +0000 (17:55 +0900)]
Fix regression test 018.detach_primary test.

The test script did not explicitly specify the path to
pcp_watchdog_info.

4 years agoFix memory leak in pcp_node_info.
Tatsuo Ishii [Wed, 12 May 2021 02:33:56 +0000 (11:33 +0900)]
Fix memory leak in pcp_node_info.

Detected by Covery.

4 years agoFix race condition between detach_false_primary and follow_primary command.
Tatsuo Ishii [Tue, 11 May 2021 10:51:19 +0000 (19:51 +0900)]
Fix race condition between detach_false_primary and follow_primary command.

It was reported that if detach_false_primary and follow_primary
command are running concurrently, many problem occured:

https://www.pgpool.net/pipermail/pgpool-general/2021-April/007583.html

Typical problem is, no primary node is found at the end.

I confirmed that this can be easily reproduced:

https://www.pgpool.net/pipermail/pgpool-hackers/2021-May/003893.html

In this commit new functions pool_acquire_follow_primary_lock(bool
block) and pool_release_follow_primary_lock(void) are introduced. They
are responsible for acquiring or releasing the lock. There are 3
places where those functions are used:

1) find_primary_node

This function is called upon startup and failover in the main pgpool
process to find new primary node.

2) failover

This function is called in the follow_primary_command subprocess
forked off by pgpool main process to execute follow_primary_command
script. The lock should be help until all follow_primary_command are
completed.

3) streaming replication check

Before starting verify_backend_node, which is the work horse of
detach_false_primary, the lock must be acquired. If it fails, just
skip the streaming replication check cycle.

The commit also deal with the case when watchdog is enabled.

https://www.pgpool.net/pipermail/pgpool-hackers/2021-May/003894.html

Multiple pgpool nodes perform detach_false_primary concurrently and
this is the cause of the problem.  To fix this detach_false_primary is
performed only on the leader node. Also if the quorum is absent,
detach_false_primary is not performed.

4 years agoDoc: update copyright year.
Tatsuo Ishii [Tue, 11 May 2021 05:35:19 +0000 (14:35 +0900)]
Doc: update copyright year.

4 years agoDoc: fix description about heartbeat_device.
Tatsuo Ishii [Tue, 11 May 2021 05:14:33 +0000 (14:14 +0900)]
Doc: fix description about heartbeat_device.

It did not mention the parameter can only be used if Pgpool-II started
as root.

4 years agoDoc: enhance description on enable_consensus_with_half_votes.
Tatsuo Ishii [Mon, 10 May 2021 00:02:15 +0000 (09:02 +0900)]
Doc: enhance description on enable_consensus_with_half_votes.

Although this parameter is written in "Controlling the Failover
behavior" section, this does affect to not only the backend failover
but quorum of Pgpool-II itself. So add a note to make it clear.

4 years agoDoc: remove incorrect description about failover_when_quorum_exists.
Tatsuo Ishii [Sun, 9 May 2021 23:36:42 +0000 (08:36 +0900)]
Doc: remove incorrect description about failover_when_quorum_exists.

"Please note that if the number of watchdog nodes is even, we regard
that quorum exists when the number of live nodes is greater than or
equal to half of total watchdog nodes." This is not correct anymore
since Pgpool-II 4.1 in which enable_consensus_with_half_votes has been
introduced.

4 years agoDoc: fix typo.
Tatsuo Ishii [Sun, 9 May 2021 21:55:30 +0000 (06:55 +0900)]
Doc: fix typo.

4 years agoDoc: fix typo.
Tatsuo Ishii [Sat, 8 May 2021 22:04:12 +0000 (07:04 +0900)]
Doc: fix typo.

4 years agoFix broken database/app redirect preference in statement level load balancing mode.
Bo Peng [Wed, 5 May 2021 09:48:15 +0000 (18:48 +0900)]
Fix broken database/app redirect preference in statement level load balancing mode.

Reported in bug707.

4 years agoFix watchdog_setup to not fail when -n is not specified.
Tatsuo Ishii [Wed, 5 May 2021 00:26:58 +0000 (09:26 +0900)]
Fix watchdog_setup to not fail when -n is not specified.

watchdog_setup failed if -n (number of PostgreSQL clusters) is not
specified. Now if -n is not specified, assume "-n = 2", which is same
as in pgpool_setup.