pgpool2.git
4 years agoDoc: add more explanation about backend_application_name.
Tatsuo Ishii [Tue, 20 Jul 2021 07:01:08 +0000 (16:01 +0900)]
Doc: add more explanation about backend_application_name.

4 years agoAllow to log error messages in extended query mode in certain cases.
Tatsuo Ishii [Sat, 17 Jul 2021 07:37:27 +0000 (16:37 +0900)]
Allow to log error messages in extended query mode in certain cases.

In the extended query mode, it is possible that error messages are not
logged even after a sync message is received. With this commit
whenever ERROR response is arrived from backend in the extended query
mode, the error message is logged.

4 years agoImplementing the follow_primary command-locking over the watchdog channel.
Muhammad Usama [Fri, 16 Jul 2021 07:17:27 +0000 (12:17 +0500)]
Implementing the follow_primary command-locking over the watchdog channel.

Supplementary fix for [pgpool-hackers: 3892] Problem with detach_false_primary..

commit:455f00dd5f5b7b94bd91aa0b6b40aab21dceabb9 fixed a race condition between
detach_false_primary and follow_primary commands. Part of the fix was to make
sure that the detach_false_primary should only be executed on the
leader watchdog node.

The mentioned commit ensures the execution of detach_false_primary on the
watchdog leader by getting the watchdog status from within the main process.
The design is good enough for most cases, but has the potential to fail if
the cluster goes into the election process just after the main process
has read the status.

To fix that, this commit implements the synchronization of follow_primary_command
execution using the distributed locks over the watchdog channel.

The idea is, just before executing the follow_primary during the failover process
we instruct all standby watchdog nodes to acquire a lock on their respective
nodes to block the false primary detection during the period when the
follow_primary is being executed on the leader watchdog node.

Moreover to keep the watchdog process blocked on waiting for the lock the commit
introduced the pending remote lock mechanism, so that remote locks can get
acquired in the background after the completion of the inflight replication checks.

Finally, REQ_DETAIL_CONFIRMED flag is removed from degenerate_backend_set()
request that gets issued to detach the false primary, That means all quorum
and consensus rules must be satisfied for the detach to happen.

4 years agoDoc: Fix documentation typos.
Bo Peng [Wed, 14 Jul 2021 14:40:28 +0000 (23:40 +0900)]
Doc: Fix documentation typos.

4 years agoDoc: Fix documentation typos.
Bo Peng [Wed, 14 Jul 2021 14:05:16 +0000 (23:05 +0900)]
Doc: Fix documentation typos.

4 years agoUse milliseconds logging for Pgpool-II and PostgreSQL.
Tatsuo Ishii [Tue, 13 Jul 2021 13:21:44 +0000 (22:21 +0900)]
Use milliseconds logging for Pgpool-II and PostgreSQL.

This will make analyzing timing sensitive issues easier. Before we
could not analyze sub-second timing issues.

4 years agoAdd support for log time stamp with milliseconds.
Tatsuo Ishii [Tue, 13 Jul 2021 10:34:28 +0000 (19:34 +0900)]
Add support for log time stamp with milliseconds.

4 years agoFix 033.prefer_lower_standbu_delay
Masaya Kawamoto [Mon, 12 Jul 2021 04:52:40 +0000 (04:52 +0000)]
Fix 033.prefer_lower_standbu_delay

Sometimes this test failed due to the waiting time after pcp_reload_conf
was short. I modified the script to confirm the change by PGPOOL SHOW
before running the test.

4 years agoFix client side hang when describe message is followed by NoData response.
Tatsuo Ishii [Fri, 9 Jul 2021 02:59:33 +0000 (11:59 +0900)]
Fix client side hang when describe message is followed by NoData response.

For certain describe request for a query including "LISTEN", NoData
message can be returned. Since Pgpool-II buffered NoData, clients
could hang waiting for the NoData message in vain. To fix this, do not
buffer NoData message.

Problem reported and patch provided by Daniel van de Giessen.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2021-July/003951.html

4 years agoFix query cache to not cache SQLValueFunctions (CURRENT_TIME, CURRENT_USER etc.).
Tatsuo Ishii [Wed, 7 Jul 2021 04:03:59 +0000 (13:03 +0900)]
Fix query cache to not cache SQLValueFunctions (CURRENT_TIME, CURRENT_USER etc.).

Checking SQLValueFunctions was missed whether to be cached or not and
they were regarded as non function objects. As a result they were
cached.

Also add more test cases.

4 years agoDoc: fix typo in in memory query cache document.
Tatsuo Ishii [Tue, 6 Jul 2021 23:44:03 +0000 (08:44 +0900)]
Doc: fix typo in in memory query cache document.

4 years agoRevert "Fix 033.prefer_lower_standbu_delay"
Masaya Kawamoto [Mon, 5 Jul 2021 06:43:01 +0000 (06:43 +0000)]
Revert "Fix 033.prefer_lower_standbu_delay"

This reverts commit 61fb18a08d143c41d87ffa81dc1767c371f11917.
Previous commit was to deal with reload config sometimes not working. However that was not a good idea because it hides the real cause of the test failure.

4 years agoFix 033.prefer_lower_standbu_delay
Masaya Kawamoto [Mon, 5 Jul 2021 04:33:01 +0000 (04:33 +0000)]
Fix 033.prefer_lower_standbu_delay

4 years agoDoc: fix typo.
Tatsuo Ishii [Sun, 4 Jul 2021 02:24:25 +0000 (11:24 +0900)]
Doc: fix typo.

4 years agoFix typo in pgpool.conf samples.
Tatsuo Ishii [Sat, 3 Jul 2021 01:08:25 +0000 (10:08 +0900)]
Fix typo in pgpool.conf samples.

4 years agoAdd buffer length check.
Tatsuo Ishii [Fri, 2 Jul 2021 03:50:29 +0000 (12:50 +0900)]
Add buffer length check.

get_info_from_conninfo() did not check the size of the provided
buffers.  Add length parameters so that it can check the buffer size.

4 years agoFix compile warning
Takuma Hoshiai [Thu, 1 Jul 2021 06:32:08 +0000 (15:32 +0900)]
Fix compile warning

4 years agoFix sending invalid message in SI mode.
Tatsuo Ishii [Thu, 1 Jul 2021 04:41:34 +0000 (13:41 +0900)]
Fix sending invalid message in SI mode.

When a query is aborted by specific reason like serialization error,
Pgpool-II sends error query to abort transactions running on non main
nodes. The message length of the query was incorrect and it caused
"invalid string in message" error on backend.

4 years agoRemove /*NO LOAD BALANCE*/ comment
Takuma Hoshiai [Tue, 29 Jun 2021 02:29:39 +0000 (11:29 +0900)]
Remove /*NO LOAD BALANCE*/ comment

This comment was used for query load balancing controlling.
Currently, we can use allow_sql_comment and primary_routing_query_pattern_list
and things like that, so this speical comment is unnecessary.

4 years agoFix rsync parameter in pgpool_setup.
Tatsuo Ishii [Mon, 28 Jun 2021 05:59:06 +0000 (14:59 +0900)]
Fix rsync parameter in pgpool_setup.

It did not exclude "log" directory, which is the default logging
directory in recent PostgreSQL versions. This made hard to examine
PostgreSQL log, since it is copied from primary server.

4 years agoRemove unused static function.
Tatsuo Ishii [Sat, 26 Jun 2021 01:11:27 +0000 (10:11 +0900)]
Remove unused static function.

The function free_processInfo() is no longer needed due to the
refactoring c7690453.

4 years agoFix follow primary command creation in pgpool_setup.
Tatsuo Ishii [Fri, 25 Jun 2021 10:58:53 +0000 (19:58 +0900)]
Fix follow primary command creation in pgpool_setup.

If the target node of the follow primary script is primary and the
server version is before PostgreSQL 12, then the follow primary
command fails because there's no recovery.conf (12 or later is fine,
because "standby.signal" and "myrecovery.conf" are created).

To fix this, if server version is before 12, rename recovery.done to
recovery.conf.

Also this should fix 034.promote_node failure with PostgreSQL 11 or
before.

4 years agoFix the width of some struct member of LifeCheckNode.
Tatsuo Ishii [Thu, 24 Jun 2021 07:08:13 +0000 (16:08 +0900)]
Fix the width of some struct member of LifeCheckNode.

They were defined using constant numbers. Use more appropriate #define
symbols to make them consistent with other declarations.

The issue was detected by a Coverity run.

Also fix copyright year.

4 years agoFix follow primary script creation.
Tatsuo Ishii [Thu, 24 Jun 2021 06:09:35 +0000 (15:09 +0900)]
Fix follow primary script creation.

When PostgreSQL version is lower than 12, the script should have edit
recovery.conf, rather than myrecovery.conf.

4 years agoDoc: fix parameter name typo in ldap option
thoshiai [Mon, 21 Jun 2021 23:22:07 +0000 (08:22 +0900)]
Doc: fix parameter name typo in ldap option

4 years agoFix the commit b69e695758dcb2fea0dacfeaf1e40299352ccdaf and c75e1456b1259fc172c127d26...
Masaya Kawamoto [Mon, 21 Jun 2021 06:09:28 +0000 (06:09 +0000)]
Fix the commit b69e695758dcb2fea0dacfeaf1e40299352ccdaf and c75e1456b1259fc172c127d26b32285424399907

changing the permission of the regression test script and
adding prefer_lower_standby_delay to pool_process_reporting.c

4 years agoFix the commit b69e695758dcb2fea0dacfeaf1e40299352ccdaf
Masaya Kawamoto [Mon, 21 Jun 2021 05:19:19 +0000 (05:19 +0000)]
Fix the commit b69e695758dcb2fea0dacfeaf1e40299352ccdaf

4 years agoNew feature "prefer_lower_delay_standby"
Masaya Kawamoto [Mon, 21 Jun 2021 04:34:03 +0000 (04:34 +0000)]
New feature "prefer_lower_delay_standby"

Currently on streaming replication mode, when the node selected as the load
balancing node has streaming delay over delay_threshold, pgpool sends the
query to the primary regardless of other standbys.

When this parameter is set to on, if the delayed load balancing node is the
standby, pgpool selects the new load balancing node from the lowest delay standbys.

If all standbys are delayed or the primary is chosen as the load balancing node,
pgpool sends the query to the primary.

4 years agoFix 075.detach_primary_left_down_node.
Tatsuo Ishii [Mon, 21 Jun 2021 01:33:41 +0000 (10:33 +0900)]
Fix 075.detach_primary_left_down_node.

The test script executed pcp_detach_node but it always failed because
it tried to connect to UNIX domain socket.

4 years agoFix 031.connection_life_time.
Tatsuo Ishii [Mon, 21 Jun 2021 01:25:49 +0000 (10:25 +0900)]
Fix 031.connection_life_time.

The test script executed pcp_recovery_node but it always failed
because it tried to connect to UNIX domain socket.

4 years agoFix 018.detach_primary error in the log.
Tatsuo Ishii [Mon, 21 Jun 2021 01:19:23 +0000 (10:19 +0900)]
Fix 018.detach_primary error in the log.

The regression test itself passes but execution of pcp_watchdog_info
to get information failed because it tried to connect to UNIX domain
socket.

4 years agoFix pgpool_setup to generate portable follow_primary.sh.
Tatsuo Ishii [Mon, 21 Jun 2021 01:09:37 +0000 (10:09 +0900)]
Fix pgpool_setup to generate portable follow_primary.sh.

The script did not provide necessary information to execute pcp
commands: pcp port, PCPPASSFILE and path to pcp commands.

Also fix the failover.sh generation. Currently there's no user to use
the information but to avoid possible confusion in the future, the
information is also added.

These should fix the regression test failure of 034.promote_node.

4 years agoFix 034.promote_node test failure.
Tatsuo Ishii [Fri, 18 Jun 2021 23:23:12 +0000 (08:23 +0900)]
Fix 034.promote_node test failure.

It appeared that under buildfarm environment, pcp_socketdir is set to
/var/run/postgresql, but pcp client's socket directory are not changed
from /tmp (it is defined at the compile time). Probably we should
workaround this someday, but I just gives "-h localhost" parameter to
pcp_promote_node to avoid this problem.

4 years agoUpdate Copyright year.
Tatsuo Ishii [Thu, 17 Jun 2021 23:03:21 +0000 (08:03 +0900)]
Update Copyright year.

4 years agoFix regression test 034.promote_node failure.
Tatsuo Ishii [Wed, 16 Jun 2021 13:02:18 +0000 (22:02 +0900)]
Fix regression test 034.promote_node failure.

According to the build farm log, the cause of failure is, pcp server
was not yet ready.  So wait for up 10 seconds by calling pcp_node_info
before executing pcp_promote_node.

4 years agoRefactor "SHOW pool_pool_pools" and pcp_proc_info.
Tatsuo Ishii [Wed, 16 Jun 2021 08:26:01 +0000 (17:26 +0900)]
Refactor "SHOW pool_pool_pools" and pcp_proc_info.

These commands had many duplicated codes and did not use
infrastructures to make the code shorter. Now they are fixed and it is
much easier to add new data.

4 years agoFix pgpool_setup in creating base backup script.
Tatsuo Ishii [Tue, 15 Jun 2021 08:10:54 +0000 (17:10 +0900)]
Fix pgpool_setup in creating base backup script.

4 years agoAllow to specify node id to be promoted in pcp_promote_node.
Tatsuo Ishii [Tue, 15 Jun 2021 07:55:07 +0000 (16:55 +0900)]
Allow to specify node id to be promoted in pcp_promote_node.

For this purpose new parameter "-s" or "--switchover" is added.  If
the parameter is specified, current primary node is detached and the
target node id is passed to the failover script's argument "main
node". Since most of failover script promotes the "main node", this
results in promoting the node specified in pcp_promote_node.

The idea was initially proposed by Nathan Ward.  Patch was created by
me and reviewed by: Nathan Ward, Umar Hayat and Lachezar Dobrev.

Discussion:
https://www.pgpool.net/pipermail/pgpool-general/2021-April/007532.html
https://www.pgpool.net/pipermail/pgpool-hackers/2021-June/003921.html

4 years agoDo not use replication slot.
Tatsuo Ishii [Tue, 15 Jun 2021 07:52:10 +0000 (16:52 +0900)]
Do not use replication slot.

regress.sh enabled replication slot for pgpool_setup run.  The reason
is not clear but this makes some of tests occasionally failed.

4 years agoNew feature: Allow pcp_node_ino to list all backend nodes information.
Bo Peng [Tue, 15 Jun 2021 02:46:57 +0000 (11:46 +0900)]
New feature: Allow pcp_node_ino to list all backend nodes information.

4 years agoUpdate gram_minimal.c which is generated by gram_minimal.y
Bo Peng [Fri, 11 Jun 2021 16:26:39 +0000 (01:26 +0900)]
Update gram_minimal.c which is generated by gram_minimal.y

4 years agoFix orphan process is left when pgpool is going down.
Tatsuo Ishii [Wed, 9 Jun 2021 04:43:44 +0000 (13:43 +0900)]
Fix orphan process is left when pgpool is going down.

When pgpool is going down while follow primary command is ongoing,
some process started by follow primary child process could be left.
The follow primary child calls trigger_failover_command() to run the
follow primary script by using system(3). Unfortunately the process
started by system(3) was not tracked by anyone and it was possible
that the process(es) are left. To reproduce the problem you can edit
the test script (test,sh) for the regression test
075.detach_primary_left_down to change the timeout counter to small
number, say 1. i.e.,
change:
cnt=60
to
cnt=1

To fix the problem, the follow primary child process calls setsid(2)
to set new session id. When Pgpool-II main goes down, the exit handler
kills all the process including the process started by system(3), by
specifying kill(-(follow primary child process pid)).

Also in the initialization of the follow primary process, unmask
signals and assign default behavior to each signal (this should have
been done much earlier).

4 years agoFix pcp_detach_node leaves down node.
Tatsuo Ishii [Tue, 8 Jun 2021 10:06:13 +0000 (19:06 +0900)]
Fix pcp_detach_node leaves down node.

Detaching primary node using pcp_detach_node leaves a standby node
after follow primary command was executed.

This can be reproduced reliably by following steps:

$ pgpool_setup -n 4
$ ./startall
$ pcp_detatch_node -p 11001 0

This is caused by that pcp_recovery_node is denied by pcp child process:

2021-06-05 07:22:17: follow_child pid 6593: LOG:  execute command: /home/t-ishii/work/Pgpool-II/current/x/etc/follow_primary.sh 3 /tmp 11005 /home/t-ishii/work/Pgpool-II/current/x/data3 1 0 /tmp 0 11002 /home/t-ishii/work/Pgpool-II/current/x/data0
2021-06-05 07:22:17: pcp_main pid 6848: LOG:  forked new pcp worker, pid=7027 socket=6
2021-06-05 07:22:17: pcp_child pid 7027: ERROR:  failed to process PCP request at the moment
2021-06-05 07:22:17: pcp_child pid 7027: DETAIL:  failback is in progress

it complains that a failback request is still going. The reason why the
failback is not completed is, find_primary_node_repeatedly() is trying
to acquire the follow primary lock. However the follow primary command
has already acquired the lock and it is waiting for the completion of
the failback request. Thus this is a kind of dead lock situation.

How to solve this?

The purpose of the follow primary lock is to prevent concurrent run of
follow primary command and detach false primary by the streaming
replication check. We cannot throw it away. However it is not always
necessary to acquire the lock by find_primary_node_repeatedly(). If
it does not try to acquire the lock, failover/failback will not be
blocked and will finish soon, thus Req_info->switching flags will be
promptly turned to false.

When a primary node is detached, failover command is called and new
primary is selected. At this point find_primary_node_repeatedly() is
surely needed to run to find the new primary. However, once follow
primary command starts, the primary will not be changed. So my idea
is, find_primary_node_repeatedly() checks whether follow primary
command is running or not. If it is running, just returns the current
primary. Otherwise acquires the lock.

For this purpose, new shared memory variable
Req_info->follow_primary_ongoing was introduced. The flag is set/unset
by follow primary process.

New regression test 075.detach_primary_left_down_node is added.

Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2021-June/003916.html

4 years agoFix backend_application_name cannot be changed by reloading.
Tatsuo Ishii [Thu, 3 Jun 2021 05:39:06 +0000 (14:39 +0900)]
Fix backend_application_name cannot be changed by reloading.

The manual explicitly stats that it is possible to change
backend_application_name on the fly by reloading the configuration
file but the code set unnecessary restriction on that.

4 years agoDoc: fix wd_life_point description
Masaya Kawamoto [Wed, 2 Jun 2021 07:53:21 +0000 (07:53 +0000)]
Doc: fix wd_life_point description

The description of the default value was lacked in the Japanese doc.

4 years agoAdd comments to watchdog.c
Tatsuo Ishii [Mon, 31 May 2021 08:00:02 +0000 (17:00 +0900)]
Add comments to watchdog.c

4 years agoAdd comment to Req_info->conn_counter.
Tatsuo Ishii [Mon, 31 May 2021 07:56:10 +0000 (16:56 +0900)]
Add comment to Req_info->conn_counter.

4 years agoFix occasional failure of 028.watchdog_enable_consensus_with_half_votes test.
Tatsuo Ishii [Mon, 31 May 2021 06:19:40 +0000 (15:19 +0900)]
Fix occasional failure of 028.watchdog_enable_consensus_with_half_votes test.

It seems the failure was caused in this scenario:

Testing total nodes: 4. enable_consensus_with_half_of_the_votes: on
shutdown node pgpool2
2021-05-30 07:41:54: main pid 28819: LOG:  stop request sent to pgpool. waiting for termination....done.
shutdown node pgpool1
2021-05-30 07:41:56: main pid 28825: LOG:  stop request sent to pgpool. waiting for termination....done.
Quorum does not exist. Test failed

So the test failed at 07:41:56. In the mean time pgpool.log showed:
2021-05-30 07:41:54: watchdog pid 28569: LOG:  We have lost the cluster leader node "localhost:50008 Linux e1aa95e1fe13"
:
:
2021-05-30 07:41:59: watchdog pid 28569: LOG:  watchdog node state changed from [STANDING FOR LEADER] to [LEADER]
2021-05-30 07:41:59: watchdog pid 28569: LOG:  I am announcing my self as leader/coordinator watchdog node

The quorum was established at 07:41:59. That means the test for quorum
existence was too early.

To fix this, insert "sleep 5" after shutting down pgpool.

4 years agoEnhance watchdog_setup script.
Tatsuo Ishii [Sat, 29 May 2021 08:32:38 +0000 (17:32 +0900)]
Enhance watchdog_setup script.

shutdownall script generated by watchdog_setup shutdowns in the node
number order i.e.: 0, 1, 2...  This causes PostgreSQL backend shutdown
when pgpool0 node went down and node 1, 2... trigger failover event,
which is not necessary in the whole shutdown sequence.  Shutting down
in the reverse order (...2, 1, 0) should prevent this and shorten the
whole shutdown sequence.

Also this should prevent occasional 018.detach_primary and
028.watchdog_enable_consensus_with_half_votes test timeout (they use
watchdog_setup).

4 years agoFix occasional regression test 018.detach_primary error.
Tatsuo Ishii [Fri, 28 May 2021 07:47:45 +0000 (16:47 +0900)]
Fix occasional regression test 018.detach_primary error.

According to the buildfarm log, the test failed at after this:
with this:
$PGPOOL_INSTALL_DIR/bin/pcp_watchdog_info -v -w -p $PCP_PORT
error message was:
ERROR: connection to socket "/tmp/.s.PGSQL.50001" failed with error "No such file or directory"
This suggests that pcp_watchdog_info fails because pcp server has not started yet.

To fix this add wait_for_pgpool_startup before pcp_watchdog_info.

4 years agoFix maximum length of hostnames including domain name.
Tatsuo Ishii [Thu, 27 May 2021 10:15:46 +0000 (19:15 +0900)]
Fix maximum length of hostnames including domain name.

The maximum length of hostnames was 128, which is not incorrect.
Moreover there were multiple places where the maximum length of hostname is defined.
So create unified definition of it in libpcp_ext.h.

Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2021-May/003904.html

4 years agoFix watchdog communication race condition.
Tatsuo Ishii [Fri, 21 May 2021 12:11:27 +0000 (21:11 +0900)]
Fix watchdog communication race condition.

Watchdog sends information from the watchdog process to the Pgpool-II
main process using SIGUSR1. To pass detailed messages it uses shared
memory area. First it sets a message to the shared memory area then
sends SIGUSR1 to the main process. The main process received the
signal and the signal handler sets a global variable so that
sigusr1_interrupt_processor() processes it. However it is possible
that while sigusr1_interrupt_processor() is running new signal
arrives. In this case the new signal is caught but the global variable
is set to 0 after sigusr1_interrupt_processor() returns. This means
that the new message is not processed until new signal arrives, which
could cause significant delay before the message was processed.

To fix the problem, sigusr1_interrupt_processor() is repeatedly called
until there's no pending message.

Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2021-May/003901.html

4 years agoFix for bug:684: Watchdog node status not updating after rebooting.
Muhammad Usama [Thu, 20 May 2021 12:30:02 +0000 (17:30 +0500)]
Fix for bug:684: Watchdog node status not updating after rebooting.

A node should broadcast its status to the whole cluster after
joining the cluster as standby.

4 years agoFix "file name is too long (max 99)" error while creating a tarball.
Bo Peng [Wed, 19 May 2021 13:52:03 +0000 (22:52 +0900)]
Fix "file name is too long (max 99)" error while creating a tarball.

4 years agoAdd release notes.
Bo Peng [Wed, 19 May 2021 12:53:05 +0000 (21:53 +0900)]
Add release notes.

4 years agoImprove sample scripts.
Bo Peng [Wed, 19 May 2021 11:32:09 +0000 (20:32 +0900)]
Improve sample scripts.

4 years agoImprove sample scripts.
Bo Peng [Tue, 18 May 2021 16:28:59 +0000 (01:28 +0900)]
Improve sample scripts.

Replcace dots or hyphens in replication slot name to underscore.

4 years agoFix error message while checking watchdog configuration.
Tatsuo Ishii [Tue, 18 May 2021 05:09:17 +0000 (14:09 +0900)]
Fix error message while checking watchdog configuration.

Since Pgpool-II 4.2 there's no "other watchdog node" concept.
All watchdog nodes are registered on all watchdog nodes.

4 years agoDoc: fix typo.
Tatsuo Ishii [Tue, 18 May 2021 02:23:26 +0000 (11:23 +0900)]
Doc: fix typo.

4 years agoFix 018.detach_primary test.
Tatsuo Ishii [Fri, 14 May 2021 22:30:43 +0000 (07:30 +0900)]
Fix 018.detach_primary test.

The script tried to extract primary node id from the results of show
pool_nodes.  From 4.3 it includes the actual backend status. As a
result the script could accidentally picks up a node id which is
actually a standby (but has primary status as PostgreSQL) from
Pgpool-II's point of view.

This resulted in a shell syntax error here because $primary_node could
have node "0 2" for example.

if [ $primary_node != 0 ];then

4 years agoUpdate copyright year.
Tatsuo Ishii [Fri, 14 May 2021 02:44:47 +0000 (11:44 +0900)]
Update copyright year.

4 years agoFix regression test 018.detach_primary test.
Tatsuo Ishii [Wed, 12 May 2021 08:55:18 +0000 (17:55 +0900)]
Fix regression test 018.detach_primary test.

The test script did not explicitly specify the path to
pcp_watchdog_info.

4 years agoFix memory leak in pcp_node_info.
Tatsuo Ishii [Wed, 12 May 2021 02:33:56 +0000 (11:33 +0900)]
Fix memory leak in pcp_node_info.

Detected by Covery.

4 years agoFix race condition between detach_false_primary and follow_primary command.
Tatsuo Ishii [Tue, 11 May 2021 10:24:23 +0000 (19:24 +0900)]
Fix race condition between detach_false_primary and follow_primary command.

It was reported that if detach_false_primary and follow_primary
command are running concurrently, many problem occured:

https://www.pgpool.net/pipermail/pgpool-general/2021-April/007583.html

Typical problem is, no primary node is found at the end.

I confirmed that this can be easily reproduced:

https://www.pgpool.net/pipermail/pgpool-hackers/2021-May/003893.html

In this commit new functions pool_acquire_follow_primary_lock(bool
block) and pool_release_follow_primary_lock(void) are introduced. They
are responsible for acquiring or releasing the lock. There are 3
places where those functions are used:

1) find_primary_node

This function is called upon startup and failover in the main pgpool
process to find new primary node.

2) failover

This function is called in the follow_primary_command subprocess
forked off by pgpool main process to execute follow_primary_command
script. The lock should be help until all follow_primary_command are
completed.

3) streaming replication check

Before starting verify_backend_node, which is the work horse of
detach_false_primary, the lock must be acquired. If it fails, just
skip the streaming replication check cycle.

The commit also deal with the case when watchdog is enabled.

https://www.pgpool.net/pipermail/pgpool-hackers/2021-May/003894.html

Multiple pgpool nodes perform detach_false_primary concurrently and
this is the cause of the problem.  To fix this detach_false_primary is
performed only on the leader node. Also if the quorum is absent,
detach_false_primary is not performed.

4 years agoDoc: update copyright year.
Tatsuo Ishii [Tue, 11 May 2021 05:35:19 +0000 (14:35 +0900)]
Doc: update copyright year.

4 years agoDoc: fix description about heartbeat_device.
Tatsuo Ishii [Tue, 11 May 2021 05:14:33 +0000 (14:14 +0900)]
Doc: fix description about heartbeat_device.

It did not mention the parameter can only be used if Pgpool-II started
as root.

4 years agoDoc: enhance description on enable_consensus_with_half_votes.
Tatsuo Ishii [Mon, 10 May 2021 00:02:15 +0000 (09:02 +0900)]
Doc: enhance description on enable_consensus_with_half_votes.

Although this parameter is written in "Controlling the Failover
behavior" section, this does affect to not only the backend failover
but quorum of Pgpool-II itself. So add a note to make it clear.

4 years agoDoc: remove incorrect description about failover_when_quorum_exists.
Tatsuo Ishii [Sun, 9 May 2021 23:36:42 +0000 (08:36 +0900)]
Doc: remove incorrect description about failover_when_quorum_exists.

"Please note that if the number of watchdog nodes is even, we regard
that quorum exists when the number of live nodes is greater than or
equal to half of total watchdog nodes." This is not correct anymore
since Pgpool-II 4.1 in which enable_consensus_with_half_votes has been
introduced.

4 years agoDoc: fix typo.
Tatsuo Ishii [Sun, 9 May 2021 21:55:30 +0000 (06:55 +0900)]
Doc: fix typo.

4 years agoDoc: fix typo.
Tatsuo Ishii [Sat, 8 May 2021 22:04:12 +0000 (07:04 +0900)]
Doc: fix typo.

4 years agoFix broken database/app redirect preference in statement level load balancing mode.
Bo Peng [Wed, 5 May 2021 09:48:15 +0000 (18:48 +0900)]
Fix broken database/app redirect preference in statement level load balancing mode.

Reported in bug707.

4 years agoUpdate .gitignore file.
Bo Peng [Wed, 5 May 2021 08:13:05 +0000 (17:13 +0900)]
Update .gitignore file.

4 years agoUpdate configure file.
Bo Peng [Wed, 5 May 2021 07:58:44 +0000 (16:58 +0900)]
Update configure file.

4 years agoFix watchdog_setup to not fail when -n is not specified.
Tatsuo Ishii [Wed, 5 May 2021 00:26:58 +0000 (09:26 +0900)]
Fix watchdog_setup to not fail when -n is not specified.

watchdog_setup failed if -n (number of PostgreSQL clusters) is not
specified. Now if -n is not specified, assume "-n = 2", which is same
as in pgpool_setup.

4 years agoDoc: fix typo.
Tatsuo Ishii [Tue, 4 May 2021 21:56:13 +0000 (06:56 +0900)]
Doc: fix typo.

4 years agoDoc: fix typo.
Tatsuo Ishii [Tue, 4 May 2021 11:38:31 +0000 (20:38 +0900)]
Doc: fix typo.

4 years agoAllow to build pgpool with PostgreSQL 9.0 or before.
Tatsuo Ishii [Mon, 3 May 2021 11:04:38 +0000 (20:04 +0900)]
Allow to build pgpool with PostgreSQL 9.0 or before.

pool_process_reporting.c referrers to PQpingParams() which was
introduced in PostgreSQL 9.1. Check existence of the function in
configure and if it does not exist, set "pg_status" column of "show
pool_nodes" and "pcp_node_info" to be set "unknown".

4 years agoFix pgpool_setup to not show an error.
Tatsuo Ishii [Mon, 3 May 2021 07:40:00 +0000 (16:40 +0900)]
Fix pgpool_setup to not show an error.

In the streaming replication mode pgpool_setup showed error:
:
:
recovery node 2...pcp_recovery_node -- Command Successful
done.
creating follow primary script
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
shutdown all
:
:

While creating followers, pgpool_setup confirmed using
wait_for_pgpool_startup that pgpool comes up online except the last
node. This is unnecessary and it should confirm that pgpool comes up
even in the last node.

4 years agoAdd new option to pgpool_setup.
Tatsuo Ishii [Sun, 2 May 2021 10:27:50 +0000 (19:27 +0900)]
Add new option to pgpool_setup.

Add new option "-e" which omits to create PostgreSQL database
clusters.  This is option is intended to be used by watchdog_setup.
watchdog_setup needs to creates set of PostgreSQL database clusters
for each Pgpool-II node.  Actually it created identical database
clusters for each Pgpool-II node then replaced them with symlink to
pgpool node 0. This is waste of time. Now watchdog_setup invokes
pgpool_setup using "-e" option and for pgpool node 1 and so on to save
time and resource.

4 years agoChange show pool_nodes/pcp_node_info to not use pg_isready.
Tatsuo Ishii [Sat, 1 May 2021 22:41:08 +0000 (07:41 +0900)]
Change show pool_nodes/pcp_node_info to not use pg_isready.

Using extra PostgreSQL binary pg_isready is not comfortable for
packagers and container users.  Use libpq's PQpingParams instead.

4 years agoFix watchdog_setup.
Tatsuo Ishii [Fri, 30 Apr 2021 05:22:56 +0000 (14:22 +0900)]
Fix watchdog_setup.

watchdog_setup creates database cluster entity under pgpool0. In other
pgpool nodes's pgpool.conf just has the port number for PostgreSQL in
pgpool0.  But backend_data_directory remains their own PostgreSQL
cluster. For example:

backend_data_directory0 = '/home/t-ishii/work/Pgpool-II/current/x/pgpool2/data0'

This is fine until online recovery runs. If it referrers to the database
cluster directory, which is not correct of course.

Fix this by replacing database cluster directories with symlinks to
pgppol/data0 and so on. This will reduce disk space.

Also fix usage message that now Snapshot Isolation mode is supported.

4 years agoFix verify_backend_node_status().
Tatsuo Ishii [Tue, 27 Apr 2021 08:24:14 +0000 (17:24 +0900)]
Fix verify_backend_node_status().

It is possible that backend_hostname is Unix domain socket but
wal_receiver connects via TCP/IP localhost.
verify_backend_node_status() should accept this as normal.  This
actually happened in the Pgpool-II cluster created by pgpool_setup.  I
found this while testing detach_false_primary.

4 years agoFix default value of log_directory.
Tatsuo Ishii [Mon, 26 Apr 2021 05:06:59 +0000 (14:06 +0900)]
Fix default value of log_directory.

The value written in the configuration file was '/tmp/pgpool_log'
which was different from the compile time built-in value and the value
explained in the docs.

Patch contributed by KAWAMOTO Masaya.

4 years agoFix copyright.
Tatsuo Ishii [Sun, 25 Apr 2021 08:38:20 +0000 (17:38 +0900)]
Fix copyright.

Pgproto was originally written by me and then contributed to PgPool
Global Development Group in 2018.

4 years agoSet application name to pgproto.
Tatsuo Ishii [Sun, 25 Apr 2021 08:24:36 +0000 (17:24 +0900)]
Set application name to pgproto.

Now that from 4.2 application name can be used in the pgpool log,
pgproto now has its own application name "pgproto".

Also fix a bug in creating connection string. While adding user name,
it did not use strncat(), instead it used strcat().

4 years agoDoc: add missing explanation about clear text password authentication.
Tatsuo Ishii [Tue, 20 Apr 2021 04:10:57 +0000 (13:10 +0900)]
Doc: add missing explanation about clear text password authentication.

4 years agoDoc: enhance client authentication document.
Tatsuo Ishii [Tue, 20 Apr 2021 00:31:52 +0000 (09:31 +0900)]
Doc: enhance client authentication document.

Mention that pg_md5 command requires --config-file option.

4 years agoDoc: enhance in memory query cache document.
Tatsuo Ishii [Sun, 18 Apr 2021 09:43:24 +0000 (18:43 +0900)]
Doc: enhance in memory query cache document.

Mention that in extended query mode, the cache registration time
varies depending on clustering mode.

4 years agoDoc: enhance show pool_cache manual.
Tatsuo Ishii [Tue, 13 Apr 2021 04:28:38 +0000 (13:28 +0900)]
Doc: enhance show pool_cache manual.

Add note that not all columns shows meaningful values when cache
storage is memcached.

4 years agoFix regression test 072 and 074.
Tatsuo Ishii [Mon, 12 Apr 2021 03:00:54 +0000 (12:00 +0900)]
Fix regression test 072 and 074.

In these tests pgproto is used. However the script forgot to specify
the path to the command.  It is interesting that test 072 is keeping
reporting ok. This is because the test script does not care even if
pgproto does not found.

4 years agoCode restructuring for memory cache.
Tatsuo Ishii [Sun, 11 Apr 2021 07:33:23 +0000 (16:33 +0900)]
Code restructuring for memory cache.

Add new function pool_discard_current_temp_query_cache(). We used to
do same thing in multiple places to discard temporary query cache of
current query context. The new function performs the work, which
should prevent mistakes like bug 700.

Since this is just a code restructuring, I only apply this to master
and 4.2 stable branches.

4 years agoAdd new regression test 074.bug700_memqcache_bug_segfault_at_close_complete which...
Tatsuo Ishii [Sun, 11 Apr 2021 05:25:31 +0000 (14:25 +0900)]
Add new regression test 074.bug700_memqcache_bug_segfault_at_close_complete which was missed in the previous commit.

The test was missed in commit: a531e783c90a88ab429d0de83fadb7e41c787a92

4 years agoFix pgpool crash when query cache enabled.
Tatsuo Ishii [Fri, 9 Apr 2021 10:32:56 +0000 (19:32 +0900)]
Fix pgpool crash when query cache enabled.

Pgpool-II crashed upon receiving CloseComplete.
This only happened in other than streaming and logical replication mode.

The minimum test case is as follows:

'P' "S1" "SELECT 1" 0
'B' "P1" "S1" 0 0 0
'E' "P1" 0
'C' 'P' "P1"
'B' "P2" "S1" 0 0 0
'E' "P2" 0
'C' 'P' "P2"
'S'
'Y'
'X'

A query statement S1 is bound to portal P1 and P1 is closed. When
CommandComplete message arrives, CloseComplete() discard temp query
cache buffer corresponding to the query context. Unfortunately it
forgot to set NULL to query_context->temp_cache. So whnen next time
other portal P2 which was also bound to S1 is closed, CloseComplete()
tries to free memory which was already freed by previous
CloseComplete. This leads to a segfault.

Fix is set NULL to query_context->temp_cache when the CloseComplete()
is called.

The reason why in streaming and logical replication this does occur
is, unlike other mode, in these mode query_context->temp_cache is
already freed and set to NULL when CommandComplete arrives.

Also new regression test
074.bug700_memqcache_bug_segfault_at_close_complete is added.

Per bug 700.

4 years agoDoc: Fix documentation typos.
Bo Peng [Mon, 5 Apr 2021 16:53:43 +0000 (01:53 +0900)]
Doc: Fix documentation typos.

4 years agoImprove sample scripts.
Bo Peng [Mon, 5 Apr 2021 14:31:18 +0000 (23:31 +0900)]
Improve sample scripts.
- Empty pg_replslot directory of the standby node after running pg_rewind, because pg_replslot directory may be copied from the primary node in old PostgreSQL versions.
- While creating/dropping replication slot, access remote database using psql instead of uing ssh.

4 years agoFix that query cache is not created in other than streaming and logical replication...
Tatsuo Ishii [Mon, 5 Apr 2021 02:03:06 +0000 (11:03 +0900)]
Fix that query cache is not created in other than streaming and logical replication mode.

We used to create query cache in ReadyForQuery() in extended query
mode in other than streaming and logical replication mode. However if
following message sequence is sent from frontend, the query cache was
never created because pool_is_cache_safe() returns false in
pool_handle_query_cache(). Why? Because pool_is_cache_safe() examines
the current query context, and the current query context is for "END"
message.

'P' "" "BEGIN" 0
'B' "" "" 0 0 0
'E' "" 0
'P' "S1" "SELECT 1" 0
'B' "S1" "S1" 0 0 0
'E' "S1" 0
'P' "" "END" 0
'B' "" "" 0 0 0
'E' "" 0
'S'
'Y'
'X'

So this commit changes CommandComplete() so that
pool_handle_query_cache() gets called in not only streaming and
logical replication mode. pool_handle_query_cache() will create a
temporary query cache and it will be processed when next time
ReadyForQuery() is called for an END message.

I found the bug while taking care of:
https://www.pgpool.net/mantisbt/view.php?id=700

Note that if the transaction is ended by a simple query message "END",
the bug does not appear because extended query SELECT messages will be
followed by a SYNC message, which will produce a Ready for query
message, and ReadyForQuery() will happily register query cache since
this time pool_is_cache_safe() returns true.

I think this is a long standing bug. The reason why this was not found
earlier is, despite the similar message sequence is created by the JDBC
driver, CommandComplete() already handles in the way described above.

4 years agoFix show pool_nodes when pg_isready is not in command search path.
Tatsuo Ishii [Fri, 2 Apr 2021 05:14:08 +0000 (14:14 +0900)]
Fix show pool_nodes when pg_isready is not in command search path.

If pg_isready is not in the command search path, show pool_node's
"pg_status" showed "down" because pg_isready cannot be invoked. To fix
this, set Makefile variable PGSQL_BIN_DIR and use the path in the show
pool_nodes.

4 years agoFix pgpool_setup so that it fail back to full restore if failed in restarting.
Tatsuo Ishii [Wed, 31 Mar 2021 07:09:13 +0000 (16:09 +0900)]
Fix pgpool_setup so that it fail back to full restore if failed in restarting.

While taking care of "[pgpool-general: 7456] Expected behaviour after pcp_detach_node ?"

https://www.pgpool.net/pipermail/pgpool-general/2021-March/007514.html

I noticed that restarting target server in follow primary script could
fail.  This could happen when former primary goes to down status using
pcp_detach_node.  The former primary will not start due to timeline
and LSN divergence.  To fix this, fail back to full restore using
pg_recovery if restarting server.

4 years agoFix comments in sample comfiguration files to avoid error occurred in pgpooladmin.
Bo Peng [Wed, 24 Mar 2021 15:25:50 +0000 (00:25 +0900)]
Fix comments in sample comfiguration files to avoid error occurred in pgpooladmin.

4 years agoFix hang with asyncpg.
Tatsuo Ishii [Thu, 18 Mar 2021 05:27:27 +0000 (14:27 +0900)]
Fix hang with asyncpg.

asyncpg (Python frontend driver with asynchronous I/O) uses extended
protocol. When it issues describe message, it is followed by Flush
message. Unfortunately Pgpool-II just buffers flush message and
frontend cannot receive the message from pgpool. To fix this, Now
SimpleForwardToFrontend() flushes the send buffer while processing
describe message.

Discussion:
https://www.pgpool.net/pipermail/pgpool-general/2021-March/007495.html