Bo Peng [Fri, 11 Nov 2022 14:15:00 +0000 (23:15 +0900)]
Doc: add Japanese documentation of process management.
Muhammad Usama [Wed, 9 Nov 2022 07:43:25 +0000 (12:43 +0500)]
[New-Feature] Dynamic spare process management
This feature allows selecting between static and dynamic process management modes.
Static process management is the same as the existing behavior of Pgpool-II, where it
spawns all child processes at startup. The new Dynamic mode keeps track of idle
processes and forks or kills processes to keep this number within
the specified boundaries.
Four new settings, process_management_mode, process_management_strategy,
min_spare_children, and max_spare_children are added to configure the process
management behavior, while process_management_strategy allows selecting
between three possible scaling-down strategies.
The first version of the patch was shared by "zhoujianshen at highgo.com" and reworked by me
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2022-September/004189.html
Reviewed by: Bo Peng and Tatsuo Ishii
Bo Peng [Mon, 7 Nov 2022 06:15:31 +0000 (15:15 +0900)]
Update Makefile and configure files to remove doc.zh-cn which is deleted in previous commit
fc2beb28d29667064d0c6156b9db320e1ec415fc
Bo Peng [Mon, 7 Nov 2022 05:27:01 +0000 (14:27 +0900)]
Remove out-of-date doc.zh-cn directory.
doc.zh-cn is not actually translated, it's just a copy of doc.
Bo Peng [Mon, 7 Nov 2022 04:49:06 +0000 (13:49 +0900)]
Improve follow_primary.sh.sample script:
- run checkpoint command on primary to update control file before running pg_rewind
- check if a directory exists before removing it
Bo Peng [Sun, 6 Nov 2022 10:43:01 +0000 (19:43 +0900)]
Test: update src/test/pgpool_setup.in
Bo Peng [Sun, 6 Nov 2022 07:40:27 +0000 (16:40 +0900)]
Update src/redhat/pgpool_socket_dir.patch
Tatsuo Ishii [Sun, 6 Nov 2022 01:20:28 +0000 (10:20 +0900)]
Update comments in pgpool.conf.
Update comments for listen_addresses and pcp_listen_addresses.
This shoud have been done in commits:
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=
fd0efceae011c8d2c2f7c2b26dc0a738f055972e
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=
9f727c1e267f1363012a3af599b7d7515e4ec355
Bo Peng [Sat, 5 Nov 2022 03:42:11 +0000 (12:42 +0900)]
Update src/pgpool.spec:
- Change /lib/tmpfiles.d/ file from /var/run to /run
- Install /etc/sudoers.d/pgpool
- Add new scripts aws_eip_if_cmd.sh.sample and aws_rtb_if_cmd.sh.sample
Takuma Hoshiai [Fri, 4 Nov 2022 02:07:09 +0000 (11:07 +0900)]
Add trusted_server_command parameter
This patch be able to specify a command that is used by trusted_servers
for checking up stream connection. Previously, ping command was only used,
and it was hard coded.
Default is 'ping -q -c3 %h' which means same as before.
Bo Peng [Thu, 27 Oct 2022 03:20:06 +0000 (12:20 +0900)]
Doc: update Pgpool-II 4.4 release note.
Bo Peng [Thu, 27 Oct 2022 03:03:01 +0000 (12:03 +0900)]
Doc: Update documentation of AWS configuration example and add sample scripts used for AWS.
Tatsuo Ishii [Wed, 26 Oct 2022 23:45:17 +0000 (08:45 +0900)]
Deal with idle_session_timeout.
If idle_session_timeout (added in PostgreSQL 14) is enabled and the
timeout fires, followings happen:
- If failover_on_backend_error is on (the default), Pgpool-II will
trigger failover.
- If only one of PostgreSQL servers enables idle_session_timeout,
Pgpool-II could hang.
To deal with idle_session_timeout detect_idle_session_timeout_error()
is added to detect the error code for idle_session_timeout. If the
error is detected, Pgpool-II returns the error code to frondend as a
fatal error and disconnects the session. This is a similar fix
implemented for idle_in_transaction_session_timeout.
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=
3f5986eee360f12e6a0bb77aa46f95abf5f6bc10
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2022-October/004209.html
Back-path-through: 4.0
Tatsuo Ishii [Fri, 7 Oct 2022 02:26:28 +0000 (11:26 +0900)]
Doc: enhance description about memqcache_method.
Add explanation which method should be used.
Backpatch-through: 3.7.
Tatsuo Ishii [Wed, 5 Oct 2022 01:34:18 +0000 (10:34 +0900)]
Doc: first cut of Pgpool-II 4.4 release note.
Tatsuo Ishii [Mon, 3 Oct 2022 10:21:40 +0000 (19:21 +0900)]
Fix comment in is_select_query().
This function returns false for SELECT INTO, rather than plain SELECT.
Muhammad Usama [Fri, 30 Sep 2022 20:33:12 +0000 (01:33 +0500)]
Fix: Setting memory cache size greater than 2GB causes a segfault.
The problem was in the block_address() function that returns the memory address
for a given cache block, It was using 32bit integers to calculate the offset of
the block within the shared memory space that is only good until the 2GB limit.
Tatsuo Ishii [Mon, 26 Sep 2022 02:31:42 +0000 (11:31 +0900)]
Fix to to use NIL to judge a List is empty or not.
Previously it mistakenly judge by the value of the pointer itself is
NULL, which is not allowed by our coding standard.
Tatsuo Ishii [Sat, 24 Sep 2022 08:00:12 +0000 (17:00 +0900)]
Fix query cache.
The commit
dc559c07ee5affc7035efa6e0f00185e211079a0 introduced shared
lock by using flock(2). However it opened lock file in parent process
then the file descriptor was shared by child process. This is
wrong. The lock file needs to be opened by each child process.
Tatsuo Ishii [Sat, 24 Sep 2022 02:55:03 +0000 (11:55 +0900)]
Remove unused semaphore number.
Commit
dc559c07ee5affc7035efa6e0f00185e211079a0 replaced use of
semaphore with flock. Thus the semaphore number is no longer
necessary.
Tatsuo Ishii [Fri, 23 Sep 2022 06:46:33 +0000 (15:46 +0900)]
Fix rare segfaults in pcp_proc_info, SHOW pool_pools and SHOW pool_processes.
The segfaults were in get_pools() and get_processes(). They first
extracted pid of particular process info slot on shared memory then
searched the slot again by using pid as the key. Because these steps
were not protected by any locking, it was possible that the search
using the pid failed and returned NULL if the process id is
overwritten by pgpool parent which is responsible for forking new
child process after the process exiting. As a result any subsequent
reference to the NULL pointer generated segfaults.
Solution is, first get the pointer to the process info slot then
extract the process id member from the pointer. This way, still
concurrent updating to the shared memory info by the parent process is
possible (which may lead to strange results in the output) but at
least we can avoid segfaults.
Tatsuo Ishii [Thu, 22 Sep 2022 11:49:24 +0000 (20:49 +0900)]
Fix coding style.
Declaration of variables must be before any execute statements.
Bo Peng [Thu, 22 Sep 2022 09:13:13 +0000 (18:13 +0900)]
Remove unused variable.
Tatsuo Ishii [Thu, 22 Sep 2022 04:37:09 +0000 (13:37 +0900)]
Fix coding style.
Declaration of variables must be before any execute statements.
Bo Peng [Wed, 21 Sep 2022 15:23:53 +0000 (00:23 +0900)]
Feature: Import PostgreSQL 15 BETA4 new parser.
Major changes of PostgreSQL 15 parser include:
- Add new SQL MERGE command
MERGE INTO ... USING ...
- Add new option HEADER MATCH to COPY FROM
COPY ... FROM stdin WITH (HEADER MATCH);
- Allow foreign key ON DELETE SET actions
CREATE TABLE t1 (
...
FOREIGN KEY (c1, c2) REFERENCES t2 ON DELETE SET NULL (c2)
);
- Allow SET ACCESS METHOD in ALTER TABLE
ALTER TABLE ... SET ACCESS METHOD ...;
Tatsuo Ishii [Mon, 19 Sep 2022 05:35:02 +0000 (14:35 +0900)]
Replace exclusive locking with shared locking in query cache.
Previously the query cache module used semaphore to protect critical
region in the query cache module. However this had serious down side
because the locking using semaphore is exclusive lock. This
introduces unnecessary wait for cache reading clients and performance
degradation when there are many concurrent clients.
To overcome the issue replace locking implementation with flock(2)
which allows share locking. Now the cache reading clients do not need
to fight each other to obtain a lock thus this increases
concurrency.
For this purpose pgpool main process creates a dummy file as
"logdir/memq_lock_file" and opens it. The file descriptor is inherited
by all child process so that they can issue flock(2) on the file.
Discussion: https://www.pgpool.net/pipermail/pgpool-general/2022-September/008442.html
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2022-September/004196.html
Tatsuo Ishii [Wed, 14 Sep 2022 02:16:03 +0000 (11:16 +0900)]
Deal with SSL error SSL_ERROR_ZERO_RETURN.
Previously this caused failover, which was actually unnecessary because
it means the server is just going to close the connection.
Discussion: https://www.pgpool.net/pipermail/pgpool-general/2022-September/008425.html
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2022-September/004194.html
Masaya Kawamoto [Wed, 14 Sep 2022 00:15:26 +0000 (00:15 +0000)]
Fix memory leak.
This was added in the previous commit
a94451901c9be0627c5e9db04e05ef2d6835dcd7
Masaya Kawamoto [Mon, 12 Sep 2022 09:17:08 +0000 (09:17 +0000)]
Fix to not allow Unix-domain socket paths with invalid lengths.
Tatsuo Ishii [Mon, 12 Sep 2022 04:48:26 +0000 (13:48 +0900)]
Use $HOME/tmp or $HOME for the second Unix domain path.
The socket API does not allow to use too long Unix domain socket paths.
Bo Peng [Mon, 29 Aug 2022 01:46:23 +0000 (10:46 +0900)]
Apply changes introduced in the previous commit
50cc6f18d742c76fcc3d0ba60d2b5058267effb8 to src/parser/gram_template.y and src/parser/Makefile.in.
Tatsuo Ishii [Sun, 28 Aug 2022 03:54:51 +0000 (12:54 +0900)]
Doc: mention that health check process may use SSL.
Also mention that streaming replication check may use SSL too.
This should have been added since 2010.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2022-August/004188.html
Bo Peng [Sat, 27 Aug 2022 14:51:38 +0000 (23:51 +0900)]
Apply changes introduced in the previous commit to src/parser/gram_template.y and src/parser/Makefile.in.
Tatsuo Ishii [Fri, 26 Aug 2022 05:23:59 +0000 (14:23 +0900)]
Retire pool_string module.
Currently there are two string manipulation modules: pool_string and
StringInfo. StringInfo is more feature rich and PostgreSQL compatible
because it was imported from PostgreSQL. So replace all usages of
pool_string by StringInfo. This also solves a problem reported by
Peng: i.e. struct name "String" collision: pool_string uses "String"
and PostgreSQL 15's parser also uses "String".
Bo Peng [Thu, 18 Aug 2022 05:18:20 +0000 (14:18 +0900)]
Doc: fix typo.
Bo Peng [Tue, 16 Aug 2022 05:45:17 +0000 (14:45 +0900)]
Doc: add release notes.
Masaya Kawamoto [Thu, 4 Aug 2022 08:17:08 +0000 (08:17 +0000)]
Fix possible memory overrun in allocation of fds.
Masaya Kawamoto [Thu, 4 Aug 2022 00:39:56 +0000 (00:39 +0000)]
Fix memory leak pointed out by Coverity
Tatsuo Ishii [Tue, 26 Jul 2022 06:23:40 +0000 (15:23 +0900)]
Doc: enhance description in the memory requirement section.
Add explanation about memory usage while pgpool child process is running.
Tatsuo Ishii [Tue, 26 Jul 2022 02:31:44 +0000 (11:31 +0900)]
Doc: fix typo.
Masaya Kawamoto [Fri, 22 Jul 2022 06:18:24 +0000 (06:18 +0000)]
Rename regression test 035.
Masaya Kawamoto [Wed, 20 Jul 2022 07:13:27 +0000 (07:13 +0000)]
Support for unix_socket_directories and related parameters
Support for unix_socket_directories, unix_socket_group and
unix_socket_permissions.
Tatsuo Ishii [Tue, 19 Jul 2022 13:14:41 +0000 (22:14 +0900)]
Doc: mention that certain SELECTs are not cached.
SELECTs including TIMESTAMP WITH TIMEZONE or TIME WITH TIMEZONE
SELECTs including CAST to TIMESTAMP WITH TIMEZONE or TIME WITH TIMEZONE
SELECTs including SQLValueFunction (CURRENT_TIME, CURRENT_USER etc.)
Tatsuo Ishii [Sun, 10 Jul 2022 07:04:49 +0000 (16:04 +0900)]
Test: print Pgpool-II version in the regression test.
Tatsuo Ishii [Sun, 10 Jul 2022 01:28:06 +0000 (10:28 +0900)]
Fix pgpool_recovery extension script.
It lack 6-argument form of pgpool_recovery (used by v4.2). As a
result, if 4.3 extension is already installed, pgpool_setup fails
because it wants 6-argument form of pgpool_recovery.
Tatsuo Ishii [Sat, 9 Jul 2022 09:41:18 +0000 (18:41 +0900)]
Test: Fix regression test script to look for pgpool.conf in the proper install directory.
regress.sh did set PGPOOLDIR environment variable despite that
pgpool_setup look for pgpool.conf sample files at $PGPOOLDIR/etc. As a
result, pgpool_setup looked for pgpool.conf.sample at default install
directory (usually /usr/local/etc) even if "-m noinstall" is not
set. If inappropriate pgpool.conf sample files are in
/usr/local/etc., regression test fails.
Tatsuo Ishii [Wed, 6 Jul 2022 12:31:08 +0000 (21:31 +0900)]
Fix to not cache SELECT having functions with return types are timestamptz or timetz.
Functions with return type is timestamptz or timetz is affected by time zone setting.
Consider following scenario:
1) SELECT having such functions gets called and cache created.
2) time zone is changed.
3) same SELECT is called and the cache is used. The cache value is
correct any more because of the time zone change.
Discussion: https://www.pgpool.net/pipermail/pgpool-general/2022-July/008367.html
Tatsuo Ishii [Tue, 5 Jul 2022 05:06:10 +0000 (14:06 +0900)]
Fix query cache to not cache timestamptz and timetz cast.
Even if query is constant + cast (like '2022-07-05
14:07:00'::timestamptz), the result can be changed by SET TIME ZONE
command etc. and the result should not be cached.
Also regression test data added.
Discussion: https://www.pgpool.net/pipermail/pgpool-general/2022-July/008353.html
Tatsuo Ishii [Mon, 4 Jul 2022 05:23:48 +0000 (14:23 +0900)]
Fix bug in query cache.
[pgpool-general: 8285] Timestamp cast not cached
reported that query like "Select '2022-02-18
07:00:00.006547’::timestamp" is not cached.
The function non_immutable_function_call_walker() which judges whether
the query contains non immutable functions mistakenly assumes that any
query including cast to timestamp etc. should not to be cached. These
codes were originally added to detect CURRENT_TIMESTAMP etc. as they
are transformed to type cast in raw parser. Unfortunately this is
overkill since "'2022-02-18 07:00:00.006547’::timestamp" is also
transformed to a type cast.
Fortunately Pgpool-II 3.7 and after imported PostgreSQL 10 or newer
parser, which transforms CURRENT_TIMESTAMP etc. to SQLValueFunction.
As a result, the type cast handling code in
non_immutable_function_call_walker() is not necessary anymore. So this
commit removed the code.
Note: an interested thing in the report is, "Select '2022-02-18
07:00:00.006547’::timestamp" is not cached while "Select '2022-02-18
07:00:00.006547’::date" is cached. Why? Well
non_immutable_function_call_walker() (wrongly) assumes that the type
cache is always created by SystemTypeName (a parser's function), which
always adds pg_catalog schema. This only happens when the type name is
a reserved keyword. TIMESTAMP is a reserved keyword (and TIME too). So
non_immutable_function_call_walker() catches TIMESTAMP cast as
expected, but DATE is not a reserved keyword and it is transformed to
type name without "pg_catalog" schema. So
non_immutable_function_call_walker() misses it, and it is cached.
Bo Peng [Fri, 1 Jul 2022 05:41:00 +0000 (14:41 +0900)]
Add ssh options to restore_command in sample scripts.
Patch is created by Jon SCHEWE and updated by Bo Peng.
Masaya Kawamoto [Thu, 30 Jun 2022 03:58:37 +0000 (03:58 +0000)]
Doc: fix description about using PCP password file when connect to Unix domain socket
The localhost entry in pcppass matches only for the default PCP socket
directory path, not all Unix socket connections. This behavior is the
same as pgpass.
Tatsuo Ishii [Thu, 23 Jun 2022 01:01:35 +0000 (10:01 +0900)]
Allow to run pgpool_setup on PostgreSQL 15.
basebackup.sh created by pgpool_setup uses pg_start_backup() and
pg_stop_backup(). PostgreSQL 15 renamed them. But there's more
problem. The renamed functions (pg_backup_start() and
pg_backup_stop()) must be called within a same connection. Adapting
these requires non-trivial changes. So I decided to rewrite the code
to not use pg_start_backup() and pg_stop_backup() so that it uses
pg_basebackup command instead.
Per https://www.pgpool.net/mantisbt/view.php?id=757
Tatsuo Ishii [Sat, 21 May 2022 09:09:00 +0000 (18:09 +0900)]
Enable debug1 while executing 074.bug700_memqcache_segfault.
This is necessary to pursuit occasional timeout in the test. Also add
debug logging to pool_push_pending_data() for the same purpose.
ereport for debugging to
Bo Peng [Mon, 13 Jun 2022 14:04:57 +0000 (23:04 +0900)]
Allow pgpool_setup to test sample scripts and sample config files contained in RPMs.
Currently, pgpool_setup generates scripts (e.g. failover.sh, follow_primary.sh) to test pgpool.
This commit addes a new option "-c" in src/test/regression/regress.sh
and a new environment variables TEST_SAMPLES in pgpool_setup.
Option "-c" enables TEST_SAMPLES and allows pgpool_setup to test sample scripts and
sample config files contained in RPMs.
This commit also changes "backend_hostnameX = '/tmp'" to "backend_hostnameX = 'localhost'".
Bo Peng [Mon, 13 Jun 2022 11:48:46 +0000 (20:48 +0900)]
Rename configuration parameter delegate_IP to delegate_ip.
For compatibility with the old versions, the old paramter delegate_IP can still work.
If the old paramter delegate_IP is used, Pgpool-II will set the value to
delegate_ip and throw a warning message.
This commit also fixes segfault of fail_over_on_backend_error
and changes the behavior to set the value to failover_on_backend_error
if fail_over_on_backend_error is used.
Tatsuo Ishii [Mon, 13 Jun 2022 07:09:46 +0000 (16:09 +0900)]
Add pool_config.c to src/tools/pgmd5/.gitignore.
This should avoid the error:
error: The following untracked working tree files would be overwritten by checkout:
src/tools/pgmd5/pool_config.c
Tatsuo Ishii [Mon, 6 Jun 2022 13:14:04 +0000 (22:14 +0900)]
Add debug logs to print each initial shared memory allocation.
Tatsuo Ishii [Mon, 6 Jun 2022 13:06:37 +0000 (22:06 +0900)]
Doc: fix memory requirement section.
- enhance the formula to calculate shared memory requirement so that it computes more accurate result.
- fix shared memory requirement for shared rel cache. The old value 64MB was simply wrong.
- fix process memory requirement. Previously the formula was based on
RSS. However PSS should be used because RSS includes shared memory
such as the memory used for libraries. This resuls in lot smaller
memory requirement than before.
Bo Peng [Sun, 5 Jun 2022 16:26:15 +0000 (01:26 +0900)]
Remove spaces in sample scripts.
Bo Peng [Sun, 5 Jun 2022 13:45:21 +0000 (22:45 +0900)]
Allow to rewrite archive_command in sample scripts.
Bo Peng [Sun, 5 Jun 2022 06:54:12 +0000 (15:54 +0900)]
Update PSQL connection information in sample scripts.
Tatsuo Ishii [Sat, 4 Jun 2022 04:23:22 +0000 (13:23 +0900)]
Revert "Enable debug1 while executing 074.bug700_memqcache_segfault."
This reverts commit
ce669517c946f36e1663891408ce67b90e0bf605.
Now that the problem was fixed by the commit:
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=
4725950d19ca81d78e3a3994f665fe9a095ddb4c
those debug aid is not necessary anymore.
Tatsuo Ishii [Thu, 2 Jun 2022 07:55:10 +0000 (16:55 +0900)]
Fix segfault in end_internal_transaction().
This was introduced by the commit:
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=
050be475d738ab4f8268dce21e2e5361b7dbcbee
In raw mode, end_internal_transaction() should not be called.
Tatsuo Ishii [Thu, 2 Jun 2022 01:03:12 +0000 (10:03 +0900)]
Doc: fix wrong explanation on memqcache_maxcache, memqcache_expire.
Those parameters cannot be changed by reloading config file. Restarting
pgpool is required.
Discussion: https://www.pgpool.net/pipermail/pgpool-general/2022-June/008254.html
Tatsuo Ishii [Tue, 31 May 2022 12:23:48 +0000 (21:23 +0900)]
When CloseComplete is received, foward to frontend without buffering.
It seems occasional timeour error in 074.bug700_memqcache_segfault seems to be caused by the buildfarm log as of
[pgpool-buildfarm: 2158] Pgpool-II buildfarm results CentOS7
* master PostgreSQL 12 CentOS7
testing 074.bug700_memqcache_segfault...timeout.
From the pgpool.log:
2022-05-28 17:14:52.612: pgproto pid 45886: DEBUG: pool_discard_temp_query_cache: cache discarded: 0x1be53e8
2022-05-28 17:14:52.612: pgproto pid 45886: DEBUG: memcache reset buffer
2022-05-28 17:14:52.612: pgproto pid 45886: DETAIL: discard: 0x1be57a8
2022-05-28 17:14:52.612: pgproto pid 45886: DEBUG: memcache discarding query cache array
2022-05-28 17:14:52.612: pgproto pid 45886: DETAIL: num_caches: 0
2022-05-28 17:14:52.612: pgproto pid 45886: DEBUG: memcache reset buffer
2022-05-28 17:14:52.612: pgproto pid 45886: DETAIL: create: 0x1be57a8
2022-05-28 17:14:52.612: pgproto pid 45886: DEBUG: memcache reset buffer
2022-05-28 17:14:52.612: pgproto pid 45886: DETAIL: discard temp buffer of 0x1be7c08 (SELECT 1)
2022-05-28 17:14:52.612: pgproto pid 45886: DEBUG: CloseComplete: remove sent message. kind:B, name:P1
2022-05-28 17:14:52.612: pgproto pid 45886: DEBUG: processing command complete
2022-05-28 17:14:52.612: pgproto pid 45886: DETAIL: set transaction state to T
2022-05-28 17:14:52.612: pgproto pid 45886: DEBUG: memcache reset buffer
2022-05-28 17:14:52.612: pgproto pid 45886: DETAIL: discard: 0x1be57a8
2022-05-28 17:14:52.612: pgproto pid 45886: DEBUG: memcache discarding query cache array
2022-05-28 17:14:52.612: pgproto pid 45886: DETAIL: num_caches: 0
2022-05-28 17:14:52.612: pgproto pid 45886: DEBUG: memcache reset buffer
2022-05-28 17:14:52.612: pgproto pid 45886: DETAIL: create: 0x1be57a8
2022-05-28 17:14:52.612: pgproto pid 45886: DEBUG: memcache reset buffer
2022-05-28 17:14:52.612: pgproto pid 45886: DETAIL: discard temp buffer of 0x1be7c08 (SELECT 1)
2022-05-28 17:14:52.612: pgproto pid 45886: DEBUG: CloseComplete: remove sent message. kind:B, name:P2
It seems the CloseComplete message was received from backend but if looking into results.txt:
FE=> Query (query="SET statement_timeout TO 1000")
<= BE CommandComplete(SET)
<= BE ReadyForQuery(I)
FE=> Parse(stmt="S1", query="SELECT 1")
FE=> Bind(stmt="S1", portal="P1")
FE=> Execute(portal="P1")
FE=> Close(portal="P1")
FE=> Bind(stmt="S1", portal="P2")
FE=> Execute(portal="P2")
FE=> Close(portal="P2")
FE=> Sync
<= BE ParseComplete
<= BE BindComplete
<= BE DataRow
<= BE CommandComplete(SELECT 1)
<= BE CloseComplete
<= BE BindComplete
<= BE DataRow
<= BE CommandComplete(SELECT 1)
It seems the frontend did not receive the CloseComplete
message. SimpleForwardToFrontend() did buffering when it forwarded the
CloseComplete message. I guess this is the cause of the timeout. So
let's change SimpleForwardToFrontend() so that it forwards
CloseComplete message without buffering and see what buildfarm says.
Muhammad Usama [Tue, 31 May 2022 11:19:31 +0000 (16:19 +0500)]
Skipping useless ereport/elog calls.
Although ereport() and elog() themselves are quite cheap when the error message
level is too low to be printed, some places need to do substantial work before
they can call those macros at all. message_level_is_interesting() can be handy
to allow optimizing away such setup work when nothing is to be printed.
Function message_level_is_interesting(elevel) is borrowed from PostgreSQL
source that reports whether ereport/elog will do anything.
Tatsuo Ishii [Tue, 31 May 2022 10:46:16 +0000 (19:46 +0900)]
Fix internal transaction handling bug in snapshot isolation mode.
When SELECT is executed in snapshot isolation mode, it is surrounded
in an "internal transaction". For example if there are 3 backends,
each backend starts an internal transaction with "BEGIN" command.
However, there was an oversight in end_internal_transaction(), which
is responsible for closing the transaction. It only closed the
transaction on "load balance node". This causes a problem with certain
query following SELECT, for example VACUUM. Suppose the SELECT is
executed on backend node 2, then transaction on node 0 and node 1 were
not closed. If VACUUM is executed on node 0, it fails because the
transaction is not closed on node 0 (remember, VACUUM cannot be
executed in an explicit transaction).
Also add a test case for this to the 030.snapshot_isolation test.
Bo Peng [Mon, 30 May 2022 16:10:06 +0000 (01:10 +0900)]
Updated the sample scripts to allow ssh login user and ssh key file to be set using variables.
Tatsuo Ishii [Wed, 25 May 2022 01:07:19 +0000 (10:07 +0900)]
Fix accepting INET fd bug.
This was introduced in commit:
9f727c1e267f1363012a3af599b7d7515e4ec355.
While pgpool_main initializes itself, first accepting UNIX/INET domain
sockets are set in "fds" array. Then forks child process. So far so
good. Later on pgpool_main initializes pcp sockets. But at this point
it accidentally initialized "fds" array again. As a result, if child
process is forked again, wrong fds were used by those child
process. Immediate result is, frontend could not connect to pgpool by
INET domain listen addresses (-h localhost).
Problem found by Bo peng.
Tatsuo Ishii [Sat, 21 May 2022 09:09:00 +0000 (18:09 +0900)]
Enable debug1 while executing 074.bug700_memqcache_segfault.
This is necessary to pursuit occasional timeout in the test. Also add
debug logging to pool_push_pending_data() for the same purpose.
ereport for debugging to
Tatsuo Ishii [Fri, 20 May 2022 04:55:20 +0000 (13:55 +0900)]
Add volatile modifier to a variable used in the query cache module.
"sts" variable used in pool_fetch_memory_cache() did not have volatile
modifier although it is used in PG_TRY() block. If an exception arises
the value set to variable will be lost in the PG_CATCH block. The bad
effect of this will not be triggered unless error occurs and I think
the bad effect has rarely been observed in the wild (as for as I know,
I have never heard such a report). Anyway, bug is bug.
Bo Peng [Wed, 18 May 2022 02:13:54 +0000 (11:13 +0900)]
Doc: add release notes.
Bo Peng [Wed, 18 May 2022 00:14:28 +0000 (09:14 +0900)]
Doc: move the example of "Pgpool-II on Kubernetes" to https://github.com/pgpool/pgpool2_on_k8s/tree/master/docs.
Tatsuo Ishii [Sat, 14 May 2022 00:44:21 +0000 (09:44 +0900)]
Enhance in stopping pgpool main process.
If "pgpool stop" couldn't terminate the main process within certain
period (currently 5 seconds), send the signal again. The reason why
pgpool does not accept the stopping signal is not clear at the moment,
I expect this reduces errors of buildfarm timeout. See if this works.
Tatsuo Ishii [Tue, 10 May 2022 05:20:26 +0000 (14:20 +0900)]
Fix comment.
Tatsuo Ishii [Sun, 8 May 2022 09:00:20 +0000 (18:00 +0900)]
Fix not to abort session while in failed transaction.
When an explicit transaction fails, subsequent commands should be
ignored until commit or rollback command comes in. However pgpool
sometimes aborted the session if relcache lookup happened. example:
test=# begin;
BEGIN
test=*# insert into ttt values(1);
ERROR: relation "ttt" does not exist
LINE 1: insert into ttt values(1);
^
test=!# select * from t1;
FATAL: Backend throw an error message
DETAIL: Exiting current session because of an error from backend
HINT: BACKEND Error: "current transaction is aborted, commands ignored until end of transaction block"
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
To prevent this, check the transaction state and if it's already in
aborted state, do not try to process the command, but just reply back
to frontend with an error message:
ERROR: current transaction is aborted, commands ignored until end of transaction block
By the way, while implementing the fix, I found old bugs with replication mode.
- ReadyForQuery() mistakenly called end_internal_transaction() even if
it's not in replication/si mode. Moreover,
end_internal_transaction() is called even if the internal
transaction has not started.
- end_internal_transaction() did not call pool_unset_failed_transaction();
These bugs were harmless until I tried to fix the issue (session
aborting).
New regression test 078 for this added.
Discussion: https://www.pgpool.net/pipermail/pgpool-general/2022-April/008155.html
Masaya Kawamoto [Mon, 9 May 2022 04:54:16 +0000 (04:54 +0000)]
Doc: update configuration example.
Bo Peng [Mon, 2 May 2022 05:15:32 +0000 (14:15 +0900)]
Change the PID length of pcp_proc_count command to 7 characters long.
Bo Peng [Thu, 28 Apr 2022 09:05:54 +0000 (18:05 +0900)]
Doc: update the example output of pcp_watchdog_info.
Bo Peng [Thu, 28 Apr 2022 08:34:39 +0000 (17:34 +0900)]
Doc: update the example output of pcp_watchdog_info.
Bo Peng [Thu, 28 Apr 2022 04:22:52 +0000 (13:22 +0900)]
Doc: mention that escaping is required if you are providing the password as an argument to pg_enc and the password contains a "$" character.
Muhammad Usama [Wed, 27 Apr 2022 10:30:58 +0000 (15:30 +0500)]
Fix for [pgpool-general: 7896] Possible race condition..
Watchdog does not allow the remote nodes reported lost by life-check to rejoin
the cluster until the life-check process confirms the existence of life in the
previously lost nodes. This is good enough except for the case when the
(lost by life-check) node tries to rejoin the cluster after it was restarted
(Pgpool-II service restarted).
What happens is the cluster keeps rejecting the restarted node because
the cluster's life-check doesn't agree while the restarted node's life-check
waits to be added to cluster before it can start sending the heart-beats.
The fix is to allow the previously lost remote node become the part of the
cluster after restart, no matter the lost-reason.
Issue report:
https://www.pgpool.net/pipermail/pgpool-general/2021-November/007954.html
Tatsuo Ishii [Mon, 25 Apr 2022 07:54:13 +0000 (16:54 +0900)]
Revert "Attempt to fix buildfarm timeout error."
This reverts commit
480c0ce0b76828428fd4823160012bb44d5eb53f.
It seems the fix is useless and causes problem with failover.
Tatsuo Ishii [Thu, 21 Apr 2022 06:24:26 +0000 (15:24 +0900)]
Attempt to fix buildfarm timeout error.
After inspecting the buildfarm log, I noticed it seems as if terminate
signal was ignored. My theory is system(3), which is used in
trigger_failover() ignores SIGINT and SIGQUIT. To deal with this,
check the return value of system(3) to determine whether signal was
sent while executing system(3). If so, call exit_handler after
failover.
Tatsuo Ishii [Thu, 21 Apr 2022 05:27:27 +0000 (14:27 +0900)]
Add more logging to exit_handler in pgpool main process.
Bo Peng [Mon, 18 Apr 2022 15:36:54 +0000 (00:36 +0900)]
Improve regression test to detect segmentation fault.
Tatsuo Ishii [Sun, 17 Apr 2022 12:48:51 +0000 (21:48 +0900)]
Test: stabilize 018.detach_primary.
The test occasionally failed with timeout. The cause seems the test
proceeds before watchdog are ready. So wait till watchdog is ready
using pcp_watchdog_info.
Tatsuo Ishii [Sun, 17 Apr 2022 04:52:14 +0000 (13:52 +0900)]
Test: enhance 074.bug700_memqcache_segfault
Set statement_timeout at the beginning of the test to avoid the time
out error because it takes long time before failure.
Tatsuo Ishii [Thu, 14 Apr 2022 11:01:56 +0000 (20:01 +0900)]
Fix exit_handler in pgpool main process.
It was allowed to be interrupted by signals (SIGTERM, SIGINT and
SOGQUIT) while the exit_handler. As a resit, exit_handler is executed
while executing exit_handler. This could cause infinite wait in
terminate_childrens() which is called from exit_handler. To prevent
this, protect variable "exiting" using semaphore to make sure that
only one instance of exit_handler runs at the same time.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2022-April/004149.html
Masaya Kawamoto [Mon, 11 Apr 2022 07:20:58 +0000 (07:20 +0000)]
Fix memory leak pointed out by Coverity.
Tatsuo Ishii [Sat, 9 Apr 2022 06:54:45 +0000 (15:54 +0900)]
Fix pgpool child process to obtain process information.
ProcesInfo was obtained by using pool_get_process_info(). But this API
is not suitable for child process because:
- does inefficient linear search over all ProcessInfo slots (there are
num_init_children slots).
- due to race condition the search key pid might not be set or removed
in the slot. I think it is possible that by the time when child
process starts execution, the pid is not yet set in the slot in the
shared memory. Also when child process is killed by parent process,
it may set pid to 0 before the child process receive kill signal.
So use pool_get_my_process_info() instead of pool_get_process_info().
which just returns the slot by using global variable my_proc_id as a
key and let child process use it. my_proc_id was set by the parent
process when the child process was spawn.
The call to pool_get_process_info() in child.c was added in v4.3. So
back patch to V4_3_STABLE.
Tatsuo Ishii [Sat, 9 Apr 2022 06:05:26 +0000 (15:05 +0900)]
Revert "Fix pgpool child process to obtain process information."
This reverts commit
06f69d19030deb1d72230ce489c5a4d800ad593c.
Tatsuo Ishii [Sat, 9 Apr 2022 00:08:34 +0000 (09:08 +0900)]
Fix pgpool child process to obtain process information.
ProcesInfo was obtained by using pool_get_process_info(). But this API
is not suitable for child process because:
- does inefficient linear search over all ProcessInfo slots (there are
num_init_children slots).
- due to race condition the search key pid might not be set or removed
in the slot. I think it is possible that by the time when child
process starts execution, the pid is not yet set in the slot in the
shared memory. Also when child process is killed by parent process,
it may set pid to 0 before the child process receive kill signal.
So add new API pool_get_process_info_by_process_id() which just
returns the slot by using global variable my_proc_id as a key and let
child process use it. my_proc_id was set by the parent process when
the child process was spawn.
The call to pool_get_process_info() in child.c was added in v4.3. So
back patch to V4_3_STABLE.
Tatsuo Ishii [Wed, 6 Apr 2022 07:30:35 +0000 (16:30 +0900)]
Fix shared memory allocation function.
pool_shared_memory_segment_get_chunk() which is responsible for shared
memory allocation, failed to consider request size alignment. If
requeste size is not in MAXALIGN (typically 8) bytes, it could overrun
the shared memory area. Probably harmless in the wild but better to
fix.
Tatsuo Ishii [Wed, 6 Apr 2022 05:50:56 +0000 (14:50 +0900)]
Fix possible null pointer dereference per Coverity.
Tatsuo Ishii [Mon, 4 Apr 2022 07:23:44 +0000 (16:23 +0900)]
Revert "Prevent hang in terminate_all_childrens()."
This reverts commit
a5e2e0411dc3ef91aafc8f4c5e1d7369e7eb3b46.
Tatsuo Ishii [Fri, 1 Apr 2022 10:54:49 +0000 (19:54 +0900)]
Prevent hang in terminate_all_childrens().
waitpid() was used in the function without WNOHANG was being set.
This could cause hang in waitpic().
Also fix typo. Rename terminate_all_childrens to terminate_all_children.
Muhammad Usama [Tue, 29 Mar 2022 14:31:49 +0000 (19:31 +0500)]
Fix logging for disabled pool_passwd
Refrain from emitting 'password file descriptor is NULL' warning
and error messages when pool_passwd is disabled.
Tatsuo Ishii [Sat, 26 Mar 2022 07:32:59 +0000 (16:32 +0900)]
Add pending signal check in check_requests().
Still struggling why shutdown signal is not delivered. For this
purpose add sigpending() before release the signal mask.
Tatsuo Ishii [Sat, 19 Mar 2022 09:25:15 +0000 (18:25 +0900)]
Allow shutdown interrupt while processing SIGCHILD in pgpool main.
Currently most signals are blocked in pgpool main loop. In some
situations the SIGCHLD handler (reaper()) takes long time or blocked
in wait system call. I suspect that this could cause occasional
timeout in some regression tests. So allow interrupts while executing
reaper(). Also re-implement CHECK_REQUEST macro as a function. There's
no point to implement CHECK_REQUEST using macro.