Tatsuo Ishii [Mon, 21 Jun 2021 01:19:23 +0000 (10:19 +0900)]
Fix 018.detach_primary error in the log.
The regression test itself passes but execution of pcp_watchdog_info
to get information failed because it tried to connect to UNIX domain
socket.
Tatsuo Ishii [Thu, 17 Jun 2021 23:03:21 +0000 (08:03 +0900)]
Update Copyright year.
Tatsuo Ishii [Tue, 15 Jun 2021 08:10:54 +0000 (17:10 +0900)]
Fix pgpool_setup in creating base backup script.
Tatsuo Ishii [Wed, 9 Jun 2021 04:43:44 +0000 (13:43 +0900)]
Fix orphan process is left when pgpool is going down.
When pgpool is going down while follow primary command is ongoing,
some process started by follow primary child process could be left.
The follow primary child calls trigger_failover_command() to run the
follow primary script by using system(3). Unfortunately the process
started by system(3) was not tracked by anyone and it was possible
that the process(es) are left. To reproduce the problem you can edit
the test script (test,sh) for the regression test
075.detach_primary_left_down to change the timeout counter to small
number, say 1. i.e.,
change:
cnt=60
to
cnt=1
To fix the problem, the follow primary child process calls setsid(2)
to set new session id. When Pgpool-II main goes down, the exit handler
kills all the process including the process started by system(3), by
specifying kill(-(follow primary child process pid)).
Also in the initialization of the follow primary process, unmask
signals and assign default behavior to each signal (this should have
been done much earlier).
Tatsuo Ishii [Tue, 8 Jun 2021 10:06:13 +0000 (19:06 +0900)]
Fix pcp_detach_node leaves down node.
Detaching primary node using pcp_detach_node leaves a standby node
after follow primary command was executed.
This can be reproduced reliably by following steps:
$ pgpool_setup -n 4
$ ./startall
$ pcp_detatch_node -p 11001 0
This is caused by that pcp_recovery_node is denied by pcp child process:
2021-06-05 07:22:17: follow_child pid 6593: LOG: execute command: /home/t-ishii/work/Pgpool-II/current/x/etc/follow_primary.sh 3 /tmp 11005 /home/t-ishii/work/Pgpool-II/current/x/data3 1 0 /tmp 0 11002 /home/t-ishii/work/Pgpool-II/current/x/data0
2021-06-05 07:22:17: pcp_main pid 6848: LOG: forked new pcp worker, pid=7027 socket=6
2021-06-05 07:22:17: pcp_child pid 7027: ERROR: failed to process PCP request at the moment
2021-06-05 07:22:17: pcp_child pid 7027: DETAIL: failback is in progress
it complains that a failback request is still going. The reason why the
failback is not completed is, find_primary_node_repeatedly() is trying
to acquire the follow primary lock. However the follow primary command
has already acquired the lock and it is waiting for the completion of
the failback request. Thus this is a kind of dead lock situation.
How to solve this?
The purpose of the follow primary lock is to prevent concurrent run of
follow primary command and detach false primary by the streaming
replication check. We cannot throw it away. However it is not always
necessary to acquire the lock by find_primary_node_repeatedly(). If
it does not try to acquire the lock, failover/failback will not be
blocked and will finish soon, thus Req_info->switching flags will be
promptly turned to false.
When a primary node is detached, failover command is called and new
primary is selected. At this point find_primary_node_repeatedly() is
surely needed to run to find the new primary. However, once follow
primary command starts, the primary will not be changed. So my idea
is, find_primary_node_repeatedly() checks whether follow primary
command is running or not. If it is running, just returns the current
primary. Otherwise acquires the lock.
For this purpose, new shared memory variable
Req_info->follow_primary_ongoing was introduced. The flag is set/unset
by follow primary process.
New regression test 075.detach_primary_left_down_node is added.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2021-June/003916.html
Tatsuo Ishii [Thu, 3 Jun 2021 05:39:06 +0000 (14:39 +0900)]
Fix backend_application_name cannot be changed by reloading.
The manual explicitly stats that it is possible to change
backend_application_name on the fly by reloading the configuration
file but the code set unnecessary restriction on that.
Masaya Kawamoto [Wed, 2 Jun 2021 07:53:21 +0000 (07:53 +0000)]
Doc: fix wd_life_point description
The description of the default value was lacked in the Japanese doc.
Tatsuo Ishii [Mon, 31 May 2021 07:56:10 +0000 (16:56 +0900)]
Add comment to Req_info->conn_counter.
Tatsuo Ishii [Mon, 31 May 2021 08:00:02 +0000 (17:00 +0900)]
Add comments to watchdog.c
Tatsuo Ishii [Mon, 31 May 2021 06:19:40 +0000 (15:19 +0900)]
Fix occasional failure of 028.watchdog_enable_consensus_with_half_votes test.
It seems the failure was caused in this scenario:
Testing total nodes: 4. enable_consensus_with_half_of_the_votes: on
shutdown node pgpool2
2021-05-30 07:41:54: main pid 28819: LOG: stop request sent to pgpool. waiting for termination....done.
shutdown node pgpool1
2021-05-30 07:41:56: main pid 28825: LOG: stop request sent to pgpool. waiting for termination....done.
Quorum does not exist. Test failed
So the test failed at 07:41:56. In the mean time pgpool.log showed:
2021-05-30 07:41:54: watchdog pid 28569: LOG: We have lost the cluster leader node "localhost:50008 Linux
e1aa95e1fe13"
:
:
2021-05-30 07:41:59: watchdog pid 28569: LOG: watchdog node state changed from [STANDING FOR LEADER] to [LEADER]
2021-05-30 07:41:59: watchdog pid 28569: LOG: I am announcing my self as leader/coordinator watchdog node
The quorum was established at 07:41:59. That means the test for quorum
existence was too early.
To fix this, insert "sleep 5" after shutting down pgpool.
Tatsuo Ishii [Sat, 29 May 2021 08:32:38 +0000 (17:32 +0900)]
Enhance watchdog_setup script.
shutdownall script generated by watchdog_setup shutdowns in the node
number order i.e.: 0, 1, 2... This causes PostgreSQL backend shutdown
when pgpool0 node went down and node 1, 2... trigger failover event,
which is not necessary in the whole shutdown sequence. Shutting down
in the reverse order (...2, 1, 0) should prevent this and shorten the
whole shutdown sequence.
Also this should prevent occasional 018.detach_primary and
028.watchdog_enable_consensus_with_half_votes test timeout (they use
watchdog_setup).
Tatsuo Ishii [Fri, 28 May 2021 07:47:45 +0000 (16:47 +0900)]
Fix occasional regression test 018.detach_primary error.
According to the buildfarm log, the test failed at after this:
with this:
$PGPOOL_INSTALL_DIR/bin/pcp_watchdog_info -v -w -p $PCP_PORT
error message was:
ERROR: connection to socket "/tmp/.s.PGSQL.50001" failed with error "No such file or directory"
This suggests that pcp_watchdog_info fails because pcp server has not started yet.
To fix this add wait_for_pgpool_startup before pcp_watchdog_info.
Tatsuo Ishii [Thu, 27 May 2021 10:15:46 +0000 (19:15 +0900)]
Fix maximum length of hostnames including domain name.
The maximum length of hostnames was 128, which is not incorrect.
Moreover there were multiple places where the maximum length of hostname is defined.
So create unified definition of it in libpcp_ext.h.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2021-May/003904.html
Tatsuo Ishii [Fri, 21 May 2021 23:00:05 +0000 (08:00 +0900)]
Fix watchdog communication race condition.
Watchdog sends information from the watchdog process to the Pgpool-II
main process using SIGUSR1. To pass detailed messages it uses shared
memory area. First it sets a message to the shared memory area then
sends SIGUSR1 to the main process. The main process received the
signal and the signal handler sets a global variable so that
sigusr1_interrupt_processor() processes it. However it is possible
that while sigusr1_interrupt_processor() is running new signal
arrives. In this case the new signal is caught but the global variable
is set to 0 after sigusr1_interrupt_processor() returns. This means
that the new message is not processed until new signal arrives, which
could cause significant delay before the message was processed.
To fix the problem, sigusr1_interrupt_processor() is repeatedly called
until there's no pending message.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2021-May/003901.html
Muhammad Usama [Thu, 20 May 2021 12:30:02 +0000 (17:30 +0500)]
Fix for bug:684: Watchdog node status not updating after rebooting.
A node should broadcast its status to the whole cluster after
joining the cluster as standby.
Bo Peng [Wed, 19 May 2021 13:52:03 +0000 (22:52 +0900)]
Fix "file name is too long (max 99)" error while creating a tarball.
Bo Peng [Wed, 19 May 2021 13:13:33 +0000 (22:13 +0900)]
Prepare 4.2.3.
Bo Peng [Wed, 19 May 2021 12:53:05 +0000 (21:53 +0900)]
Add release notes.
Bo Peng [Wed, 19 May 2021 11:32:09 +0000 (20:32 +0900)]
Improve sample scripts.
Bo Peng [Tue, 18 May 2021 16:28:59 +0000 (01:28 +0900)]
Improve sample scripts.
Replcace dots or hyphens in replication slot name to underscore.
Tatsuo Ishii [Tue, 18 May 2021 05:09:17 +0000 (14:09 +0900)]
Fix error message while checking watchdog configuration.
Since Pgpool-II 4.2 there's no "other watchdog node" concept.
All watchdog nodes are registered on all watchdog nodes.
Tatsuo Ishii [Tue, 18 May 2021 02:23:26 +0000 (11:23 +0900)]
Doc: fix typo.
Tatsuo Ishii [Fri, 14 May 2021 02:44:47 +0000 (11:44 +0900)]
Update copyright year.
Tatsuo Ishii [Thu, 13 May 2021 11:35:00 +0000 (20:35 +0900)]
Revert "Fix memory leak in pcp_node_info."
This reverts commit
7576c0e3cae3ffb2fa5735a33358f7ba278206a0.
Tatsuo Ishii [Wed, 12 May 2021 08:55:18 +0000 (17:55 +0900)]
Fix regression test 018.detach_primary test.
The test script did not explicitly specify the path to
pcp_watchdog_info.
Tatsuo Ishii [Wed, 12 May 2021 02:33:56 +0000 (11:33 +0900)]
Fix memory leak in pcp_node_info.
Detected by Covery.
Tatsuo Ishii [Tue, 11 May 2021 10:51:19 +0000 (19:51 +0900)]
Fix race condition between detach_false_primary and follow_primary command.
It was reported that if detach_false_primary and follow_primary
command are running concurrently, many problem occured:
https://www.pgpool.net/pipermail/pgpool-general/2021-April/007583.html
Typical problem is, no primary node is found at the end.
I confirmed that this can be easily reproduced:
https://www.pgpool.net/pipermail/pgpool-hackers/2021-May/003893.html
In this commit new functions pool_acquire_follow_primary_lock(bool
block) and pool_release_follow_primary_lock(void) are introduced. They
are responsible for acquiring or releasing the lock. There are 3
places where those functions are used:
1) find_primary_node
This function is called upon startup and failover in the main pgpool
process to find new primary node.
2) failover
This function is called in the follow_primary_command subprocess
forked off by pgpool main process to execute follow_primary_command
script. The lock should be help until all follow_primary_command are
completed.
3) streaming replication check
Before starting verify_backend_node, which is the work horse of
detach_false_primary, the lock must be acquired. If it fails, just
skip the streaming replication check cycle.
The commit also deal with the case when watchdog is enabled.
https://www.pgpool.net/pipermail/pgpool-hackers/2021-May/003894.html
Multiple pgpool nodes perform detach_false_primary concurrently and
this is the cause of the problem. To fix this detach_false_primary is
performed only on the leader node. Also if the quorum is absent,
detach_false_primary is not performed.
Tatsuo Ishii [Tue, 11 May 2021 05:35:19 +0000 (14:35 +0900)]
Doc: update copyright year.
Tatsuo Ishii [Tue, 11 May 2021 05:14:33 +0000 (14:14 +0900)]
Doc: fix description about heartbeat_device.
It did not mention the parameter can only be used if Pgpool-II started
as root.
Tatsuo Ishii [Mon, 10 May 2021 00:02:15 +0000 (09:02 +0900)]
Doc: enhance description on enable_consensus_with_half_votes.
Although this parameter is written in "Controlling the Failover
behavior" section, this does affect to not only the backend failover
but quorum of Pgpool-II itself. So add a note to make it clear.
Tatsuo Ishii [Sun, 9 May 2021 23:36:42 +0000 (08:36 +0900)]
Doc: remove incorrect description about failover_when_quorum_exists.
"Please note that if the number of watchdog nodes is even, we regard
that quorum exists when the number of live nodes is greater than or
equal to half of total watchdog nodes." This is not correct anymore
since Pgpool-II 4.1 in which enable_consensus_with_half_votes has been
introduced.
Tatsuo Ishii [Sun, 9 May 2021 21:55:30 +0000 (06:55 +0900)]
Doc: fix typo.
Tatsuo Ishii [Sat, 8 May 2021 22:04:12 +0000 (07:04 +0900)]
Doc: fix typo.
Bo Peng [Wed, 5 May 2021 09:48:15 +0000 (18:48 +0900)]
Fix broken database/app redirect preference in statement level load balancing mode.
Reported in bug707.
Tatsuo Ishii [Wed, 5 May 2021 00:26:58 +0000 (09:26 +0900)]
Fix watchdog_setup to not fail when -n is not specified.
watchdog_setup failed if -n (number of PostgreSQL clusters) is not
specified. Now if -n is not specified, assume "-n = 2", which is same
as in pgpool_setup.
Tatsuo Ishii [Tue, 4 May 2021 21:56:13 +0000 (06:56 +0900)]
Doc: fix typo.
Tatsuo Ishii [Tue, 4 May 2021 11:38:31 +0000 (20:38 +0900)]
Doc: fix typo.
Tatsuo Ishii [Mon, 3 May 2021 07:47:24 +0000 (16:47 +0900)]
Fix pgpool_setup to not show an error.
In the streaming replication mode pgpool_setup showed error:
:
:
recovery node 2...pcp_recovery_node -- Command Successful
done.
creating follow primary script
psql: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
shutdown all
:
:
While creating followers, pgpool_setup confirmed using
wait_for_pgpool_startup that pgpool comes up online except the last
node. This is unnecessary and it should confirm that pgpool comes up
even in the last node.
Tatsuo Ishii [Fri, 30 Apr 2021 05:22:56 +0000 (14:22 +0900)]
Fix watchdog_setup.
watchdog_setup creates database cluster entity under pgpool0. In other
pgpool nodes's pgpool.conf just has the port number for PostgreSQL in
pgpool0. But backend_data_directory remains their own PostgreSQL
cluster. For example:
backend_data_directory0 = '/home/t-ishii/work/Pgpool-II/current/x/pgpool2/data0'
This is fine until online recovery runs. If it referrers to the database
cluster directory, which is not correct of course.
Fix this by replacing database cluster directories with symlinks to
pgppol/data0 and so on. This will reduce disk space.
Also fix usage message that now Snapshot Isolation mode is supported.
Tatsuo Ishii [Tue, 27 Apr 2021 08:24:14 +0000 (17:24 +0900)]
Fix verify_backend_node_status().
It is possible that backend_hostname is Unix domain socket but
wal_receiver connects via TCP/IP localhost.
verify_backend_node_status() should accept this as normal. This
actually happened in the Pgpool-II cluster created by pgpool_setup. I
found this while testing detach_false_primary.
Tatsuo Ishii [Mon, 26 Apr 2021 05:06:59 +0000 (14:06 +0900)]
Fix default value of log_directory.
The value written in the configuration file was '/tmp/pgpool_log'
which was different from the compile time built-in value and the value
explained in the docs.
Patch contributed by KAWAMOTO Masaya.
Tatsuo Ishii [Sun, 25 Apr 2021 08:38:20 +0000 (17:38 +0900)]
Fix copyright.
Pgproto was originally written by me and then contributed to PgPool
Global Development Group in 2018.
Tatsuo Ishii [Sun, 25 Apr 2021 08:24:36 +0000 (17:24 +0900)]
Set application name to pgproto.
Now that from 4.2 application name can be used in the pgpool log,
pgproto now has its own application name "pgproto".
Also fix a bug in creating connection string. While adding user name,
it did not use strncat(), instead it used strcat().
Tatsuo Ishii [Tue, 20 Apr 2021 04:10:57 +0000 (13:10 +0900)]
Doc: add missing explanation about clear text password authentication.
Tatsuo Ishii [Tue, 20 Apr 2021 00:31:52 +0000 (09:31 +0900)]
Doc: enhance client authentication document.
Mention that pg_md5 command requires --config-file option.
Tatsuo Ishii [Sun, 18 Apr 2021 09:43:24 +0000 (18:43 +0900)]
Doc: enhance in memory query cache document.
Mention that in extended query mode, the cache registration time
varies depending on clustering mode.
Tatsuo Ishii [Tue, 13 Apr 2021 04:28:38 +0000 (13:28 +0900)]
Doc: enhance show pool_cache manual.
Add note that not all columns shows meaningful values when cache
storage is memcached.
Tatsuo Ishii [Mon, 12 Apr 2021 03:00:54 +0000 (12:00 +0900)]
Fix regression test 072 and 074.
In these tests pgproto is used. However the script forgot to specify
the path to the command. It is interesting that test 072 is keeping
reporting ok. This is because the test script does not care even if
pgproto does not found.
Tatsuo Ishii [Sun, 11 Apr 2021 07:33:23 +0000 (16:33 +0900)]
Code restructuring for memory cache.
Add new function pool_discard_current_temp_query_cache(). We used to
do same thing in multiple places to discard temporary query cache of
current query context. The new function performs the work, which
should prevent mistakes like bug 700.
Since this is just a code restructuring, I only apply this to master
and 4.2 stable branches.
Tatsuo Ishii [Sun, 11 Apr 2021 05:25:31 +0000 (14:25 +0900)]
Add new regression test 074.bug700_memqcache_bug_segfault_at_close_complete which was missed in the previous commit.
The test was missed in commit:
a531e783c90a88ab429d0de83fadb7e41c787a92
Tatsuo Ishii [Fri, 9 Apr 2021 10:32:56 +0000 (19:32 +0900)]
Fix pgpool crash when query cache enabled.
Pgpool-II crashed upon receiving CloseComplete.
This only happened in other than streaming and logical replication mode.
The minimum test case is as follows:
'P' "S1" "SELECT 1" 0
'B' "P1" "S1" 0 0 0
'E' "P1" 0
'C' 'P' "P1"
'B' "P2" "S1" 0 0 0
'E' "P2" 0
'C' 'P' "P2"
'S'
'Y'
'X'
A query statement S1 is bound to portal P1 and P1 is closed. When
CommandComplete message arrives, CloseComplete() discard temp query
cache buffer corresponding to the query context. Unfortunately it
forgot to set NULL to query_context->temp_cache. So whnen next time
other portal P2 which was also bound to S1 is closed, CloseComplete()
tries to free memory which was already freed by previous
CloseComplete. This leads to a segfault.
Fix is set NULL to query_context->temp_cache when the CloseComplete()
is called.
The reason why in streaming and logical replication this does occur
is, unlike other mode, in these mode query_context->temp_cache is
already freed and set to NULL when CommandComplete arrives.
Also new regression test
074.bug700_memqcache_bug_segfault_at_close_complete is added.
Per bug 700.
Bo Peng [Mon, 5 Apr 2021 16:38:49 +0000 (01:38 +0900)]
Doc: Fix documentation typos.
Bo Peng [Mon, 5 Apr 2021 14:31:18 +0000 (23:31 +0900)]
Improve sample scripts.
- Empty pg_replslot directory of the standby node after running pg_rewind, because pg_replslot directory may be copied from the primary node in old PostgreSQL versions.
- While creating/dropping replication slot, access remote database using psql instead of uing ssh.
Tatsuo Ishii [Mon, 5 Apr 2021 02:03:06 +0000 (11:03 +0900)]
Fix that query cache is not created in other than streaming and logical replication mode.
We used to create query cache in ReadyForQuery() in extended query
mode in other than streaming and logical replication mode. However if
following message sequence is sent from frontend, the query cache was
never created because pool_is_cache_safe() returns false in
pool_handle_query_cache(). Why? Because pool_is_cache_safe() examines
the current query context, and the current query context is for "END"
message.
'P' "" "BEGIN" 0
'B' "" "" 0 0 0
'E' "" 0
'P' "S1" "SELECT 1" 0
'B' "S1" "S1" 0 0 0
'E' "S1" 0
'P' "" "END" 0
'B' "" "" 0 0 0
'E' "" 0
'S'
'Y'
'X'
So this commit changes CommandComplete() so that
pool_handle_query_cache() gets called in not only streaming and
logical replication mode. pool_handle_query_cache() will create a
temporary query cache and it will be processed when next time
ReadyForQuery() is called for an END message.
I found the bug while taking care of:
https://www.pgpool.net/mantisbt/view.php?id=700
Note that if the transaction is ended by a simple query message "END",
the bug does not appear because extended query SELECT messages will be
followed by a SYNC message, which will produce a Ready for query
message, and ReadyForQuery() will happily register query cache since
this time pool_is_cache_safe() returns true.
I think this is a long standing bug. The reason why this was not found
earlier is, despite the similar message sequence is created by the JDBC
driver, CommandComplete() already handles in the way described above.
Tatsuo Ishii [Wed, 31 Mar 2021 07:09:13 +0000 (16:09 +0900)]
Fix pgpool_setup so that it fail back to full restore if failed in restarting.
While taking care of "[pgpool-general: 7456] Expected behaviour after pcp_detach_node ?"
https://www.pgpool.net/pipermail/pgpool-general/2021-March/007514.html
I noticed that restarting target server in follow primary script could
fail. This could happen when former primary goes to down status using
pcp_detach_node. The former primary will not start due to timeline
and LSN divergence. To fix this, fail back to full restore using
pg_recovery if restarting server.
Bo Peng [Wed, 24 Mar 2021 15:25:50 +0000 (00:25 +0900)]
Fix comments in sample comfiguration files to avoid error occurred in pgpooladmin.
Tatsuo Ishii [Thu, 18 Mar 2021 05:27:27 +0000 (14:27 +0900)]
Fix hang with asyncpg.
asyncpg (Python frontend driver with asynchronous I/O) uses extended
protocol. When it issues describe message, it is followed by Flush
message. Unfortunately Pgpool-II just buffers flush message and
frontend cannot receive the message from pgpool. To fix this, Now
SimpleForwardToFrontend() flushes the send buffer while processing
describe message.
Discussion:
https://www.pgpool.net/pipermail/pgpool-general/2021-March/007495.html
Tatsuo Ishii [Tue, 16 Mar 2021 01:27:22 +0000 (10:27 +0900)]
Enhance debug message upon receiving startup packet.
While processing a startup packet, database name, user name and
application name are printed in DEBUG1, but other guc variables (if
any) were not printed. This is not helpful when studying errors like
"connection exists but startup packet length is not identical" problem
(see https://www.pgpool.net/mantisbt/view.php?id=696). With this
commit guc variables are now printed something like:
2021-03-16 10:21:32: child pid 5155: DEBUG: reading startup packet
2021-03-16 10:21:32: child pid 5155: DETAIL: guc name: client_encoding value: UTF8
Bo Peng [Sun, 14 Mar 2021 14:28:02 +0000 (23:28 +0900)]
Doc: fix typo.
Bo Peng [Sun, 14 Mar 2021 14:01:25 +0000 (23:01 +0900)]
Update comment lines of sample configuration files.
Tatsuo Ishii [Sat, 6 Mar 2021 09:07:23 +0000 (18:07 +0900)]
Doc: update "Installation from RPM" section.
It referred to old Pgpool-II 4.1 and PostgreSQL 12. They now refer to
Pgpool-II 4.2 and PostgreSQL 13. Also use dnf command instead of yum
command.
Tatsuo Ishii [Wed, 24 Feb 2021 01:33:00 +0000 (10:33 +0900)]
Doc: fix typo.
Tatsuo Ishii [Sat, 20 Feb 2021 00:45:16 +0000 (09:45 +0900)]
Doc: enhance pcp_detach_node manual.
Add more detailed description when the command is executed.
Bo Peng [Wed, 17 Feb 2021 09:22:12 +0000 (18:22 +0900)]
Prepare 4.2.2.
Bo Peng [Wed, 17 Feb 2021 05:45:31 +0000 (14:45 +0900)]
Update follow_primary.sh.sample due to the previous commit
7bd103a6a33a2a675bd37f996ab46b7819a731d7.
Bo Peng [Wed, 17 Feb 2021 03:56:54 +0000 (12:56 +0900)]
Add release note.
Tatsuo Ishii [Sun, 14 Feb 2021 01:09:39 +0000 (10:09 +0900)]
Doc: fix example in pgpool_adm pcp_node_info().
Per commit: https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=
caf5215479ee7a5b55c1dcdeb00a2fccf0ed7133
Tatsuo Ishii [Sun, 14 Feb 2021 00:28:38 +0000 (09:28 +0900)]
Fix pcp_node_info() in pgpool_adm extension.
The weight info was mistakenly handled by Float8GetDatum although the
function prototype is float4. The oversight made the weight value to 0
or certain insane value. Change it to Float4GetDatum. It is amazing
that nobody (including me) did not notice it until today.
Bo Peng [Sat, 13 Feb 2021 17:27:56 +0000 (02:27 +0900)]
Fix follow_primary.sh.sample to avoid removing recovery.conf.
Tatsuo Ishii [Tue, 9 Feb 2021 06:12:02 +0000 (15:12 +0900)]
Fix follow_primary creation in pgpool_setup.
Since PostgreSQL 13 it is possible to reload the change of
recovery_conninfo, but it is not possible to reload
recovery_command. To fix this use pg_ctl restart regardless the
version of PostgreSQL.
Also fix an oversight: forgot to create standby.signal.
Tatsuo Ishii [Mon, 8 Feb 2021 11:30:18 +0000 (20:30 +0900)]
Fix watchdog leader sync process to start health check process.
When watchdog receives status change request from other watchdog node
and calls sync_backend_from_watchdog() to sync with status of leader
node, it forgot to start health check process. For example,
1) initial pgpool_status file indicates DB node 1 is down.
2) pgpool starts up but only starts health check process for DB node 0
because node 1 is in down status.
3) pcp_attach_node is issued to other than leader pgpool node.
4) leader node updates the node status for DB node 1 and other node
syncs the status. Since sync_backend_from_watchdog() does not start
health check process, only on pgpool leader node starts health
check process but other nodes do not.
To fix this starts health check process if necessary in
sync_backend_from_watchdog().
Tatsuo Ishii [Sat, 6 Feb 2021 08:48:09 +0000 (17:48 +0900)]
Doc: fix watchdog_setup manual.
Add snapshot isolation mode to the "mode" parameter which was
forgotten when adding the mode.
Tatsuo Ishii [Fri, 5 Feb 2021 07:36:41 +0000 (16:36 +0900)]
Fix follow_primary.sh creation in pgpool_setup.
While creating the script, it did not update restore_command to point
to the primary's archive directory.
Tatsuo Ishii [Thu, 4 Feb 2021 03:23:58 +0000 (12:23 +0900)]
Add support for snapshot isolation mode.
Also add "wal_level = archive" to all modes except streaming
replication mode to deal with online recovery enhancement.
Tatsuo Ishii [Thu, 4 Feb 2021 02:28:57 +0000 (11:28 +0900)]
Fix that DB cluster path was not correct in watchdog_setup installation.
In the watchdog_setup installation generated path to DB clusters did
not correctly point to pgpool0/data0 etc. Correct them by using sed
command.
Tatsuo Ishii [Wed, 3 Feb 2021 08:01:23 +0000 (17:01 +0900)]
Fix typos in message emitted by sync_backend_from_watchdog().
Tatsuo Ishii [Tue, 2 Feb 2021 14:12:37 +0000 (23:12 +0900)]
Enhance pgpool_setup.
"shutdownall" script unconditionally waited for the pgpool.pid file is
gone after "pgpool stop". But if invalid pgpool.pid file exists, it
waits forever. To avoid this check the exit status of "pgpool stop"
and only wait if the exit status is 0.
Bo Peng [Tue, 2 Feb 2021 02:54:56 +0000 (11:54 +0900)]
Fix missing displaying of backend_clustering_mode mode in "show pool_status" command.
Tatsuo Ishii [Tue, 2 Feb 2021 00:56:23 +0000 (09:56 +0900)]
Fix messages when health check process starts.
Fix when health check process was restarted by reaper(), incorrect
"child process with pid: 14639 exited with success and will not be
restarted" was emitted.
Also enhance worker_fork_a_child() to emit a message that new worker
process started.
Bo Peng [Mon, 1 Feb 2021 16:10:56 +0000 (01:10 +0900)]
Doc: update failover command parameters order.
Bo Peng [Mon, 1 Feb 2021 14:29:29 +0000 (23:29 +0900)]
Doc: mention that the sample scripts doesn't support tablespaces in "Pgpool-II + Watchdog Setup Example".
Tatsuo Ishii [Sun, 31 Jan 2021 02:27:47 +0000 (11:27 +0900)]
Fix oversight in commit
3a36284c53c125389c999de5c6c4710973c4cb82.
When si_get_snapshot() is called, load balance node has not been
decided yet and we cannot use VALID_BACKEND macro. Use
VALID_BACKEND_RAW instead.
Tatsuo Ishii [Thu, 28 Jan 2021 09:01:28 +0000 (18:01 +0900)]
Fix segfault in snapshot isolation mode.
If one DB nodes goes down, pgpool segfaulted in the mode. This
happened in si_get_snapshot(). This was caused because it did not
check whether DB node is alive or not.
Tatsuo Ishii [Sun, 24 Jan 2021 02:55:12 +0000 (11:55 +0900)]
Fix pgpool_setup so that it creates separate archive directory for each DB node.
pgpool_setup created single archive directory for all PostgreSQL
nodes. This only works with streaming replication because only the
primary node produces archive log. However for native replication and
snapshot isolation mode this does not work as each node produces WAL.
In this fix dedicated archive directories like archivedir/data0
archivedir/data1 and so on are created.
Tatsuo Ishii [Fri, 22 Jan 2021 06:05:33 +0000 (15:05 +0900)]
Doc: enhance online recovery document.
Add explanation what should be done in the recovery first stage and
second stage. Also add mention about snapshot isolation mode where
native replication mode is explained.
Remove accidentally left caution of script timeout in the Japanese
document.
Tatsuo Ishii [Fri, 22 Jan 2021 04:33:07 +0000 (13:33 +0900)]
Fix pgpool_recovery_pitr generation pgpool_setup.
It generated redundant psql definition line.
Bo Peng [Thu, 21 Jan 2021 07:46:05 +0000 (16:46 +0900)]
Doc: fix indent.
Bo Peng [Thu, 21 Jan 2021 07:00:40 +0000 (16:00 +0900)]
Fix some variable names in follow_primary.sh.sample.
Fix some variable names of "main" to "primary" which should be used in streaming replication mode.
Tatsuo Ishii [Wed, 20 Jan 2021 08:17:02 +0000 (17:17 +0900)]
Enhance follow_primary.sh generation in pgpool_setup.
In the generated follow primary script pcp_recovery_node was used to
sync standby with new primary node. However it is sufficient in
follow primary process to just restart standby with new recovery
configuration redirected to the new primary because we use the
"latest" timeline in the configuration file.
Moreover, from PostgreSQL 13, even restarting standby is not
needed. Just changing the recovery configuration and reloading is
enough. With these changes it is expected that the follow primary
process could be significantly faster than before.
Tatsuo Ishii [Sat, 16 Jan 2021 10:15:57 +0000 (19:15 +0900)]
Doc: add explanation about the shared memory required by shared relation cache.
Add description to "Memory requirement" section so that in Pgpool-II
4.1 and after require additional shared memory for shared relation
cache.
Muhammad Usama [Wed, 13 Jan 2021 10:55:28 +0000 (15:55 +0500)]
Adjusting the allowed range and default value for log_rotation_size
Tatsuo Ishii [Wed, 13 Jan 2021 09:49:51 +0000 (18:49 +0900)]
Fix error while allocating shared memory.
If num_init_children * max_pool is huge, pool_coninfo_size() failed to
calculate the size of required shared memory because the data type
used for the variable to store shared memory size was int and it
overflows for too large shared memory size. To fix this change
pool_coninfo_size() to size_t and use appropriate size_t for the
variable used within the function.
Also fix to use proper format qualifier in
pool_shared_memory_create().
Bo Peng [Wed, 13 Jan 2021 00:18:56 +0000 (09:18 +0900)]
Doc: fix typos.
Tatsuo Ishii [Tue, 12 Jan 2021 01:40:06 +0000 (10:40 +0900)]
Fix follow_primary script creation in pgpool_setup.
There were some confusions in variable names due to recent language
fix. This fixes some of "main" to "primary" which should be used in
streaming replication mode.
Tatsuo Ishii [Fri, 1 Jan 2021 02:29:04 +0000 (11:29 +0900)]
Fix bug child_max_connections is not respected if ERROR occurs.
When frontend aborts, the counter for child_max_connections is rewound
because ereport(ERROR) issues long jump. Fix is declaring the variable
with volatile qualifier. This is a long standing bug, probably since
child_max_connections was introduced.
Bo Peng [Sun, 27 Dec 2020 12:44:00 +0000 (21:44 +0900)]
Doc: fix typos.
Bo Peng [Sun, 27 Dec 2020 12:24:54 +0000 (21:24 +0900)]
Doc: fix typos in docs.
Tatsuo Ishii [Sat, 26 Dec 2020 05:41:27 +0000 (14:41 +0900)]
Doc: fix a few typos in Japanese release notes.
Muhammad Usama [Fri, 25 Dec 2020 18:25:10 +0000 (23:25 +0500)]
Doc: few typo fixes.
Tatsuo Ishii [Wed, 23 Dec 2020 11:41:23 +0000 (20:41 +0900)]
Doc: fix typo in the 4.2.1 release note.