Tatsuo Ishii [Sat, 4 May 2019 09:00:12 +0000 (18:00 +0900)]
Doc: add performance Japanese doc.
Also fix typos in English performance doc.
Tatsuo Ishii [Fri, 3 May 2019 23:26:55 +0000 (08:26 +0900)]
Doc: add useful link how to create pcp.conf in the pcp reference page.
Also fix some typos.
Tatsuo Ishii [Fri, 3 May 2019 00:02:29 +0000 (09:02 +0900)]
Speed up failover when all of backends are down.
Pgpool-II tries to find primary node till search_primary_node_timeout
expires even if all of the backend are in down status. This is not
only a waste of time but makes Pgpool-II looked like hanged because
while searching primary node failover process is suspended and all of
the Pgpool-II child process are in defunct state, thus there's no
process which accepts connection requests from clients. Since the
default value of searching primary is 300 seconds, typically this
keeps on for 300 seconds. This is not comfortable for users.
So immediately give up finding primary node regardless
search_primary_node_timeout and promptly finish the failover process
if all of the backend are in down status.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2019-May/003321.html
Tatsuo Ishii [Mon, 29 Apr 2019 23:49:48 +0000 (08:49 +0900)]
Deal with PostgreSQL 12.
recovery.conf cannot be used anymore. Standby's recovery configuration
is now in postgresql.conf. Also "standby.signal" file is needed in
PostgreSQL database cluster directory to start postmaster as a standby
server.
Tatsuo Ishii [Mon, 29 Apr 2019 23:46:06 +0000 (08:46 +0900)]
Deal with PostgreSQL 12.
HeapTupleGetOid() is not available any more in PostgreSQL 12. Use
GETSTRUCT() and refer to oid column of Form_pg_proc.
Tatsuo Ishii [Fri, 26 Apr 2019 22:17:49 +0000 (07:17 +0900)]
Doc: first release of performance section.
Tatsuo Ishii [Wed, 24 Apr 2019 04:16:37 +0000 (13:16 +0900)]
Set backend_application_nameN in pgpool.conf.
Tatsuo Ishii [Wed, 24 Apr 2019 04:05:48 +0000 (13:05 +0900)]
Add backend_application_nameN.
Tatsuo Ishii [Wed, 24 Apr 2019 03:58:48 +0000 (12:58 +0900)]
Doc: fix typo.
Takuma Hoshiai [Wed, 24 Apr 2019 02:34:23 +0000 (11:34 +0900)]
Remove .sgml file to not used.
basic-config-example.sgml written by English exists doc.ja directory only,
and don't used document.
Tatsuo Ishii [Wed, 24 Apr 2019 00:52:44 +0000 (09:52 +0900)]
Add explanation about newly added columns.
Those should have been added when English doc was updated.
Tatsuo Ishii [Tue, 23 Apr 2019 08:32:59 +0000 (17:32 +0900)]
Add "replication_state" and "replication_sync_state" columns to "show pool_nodes" and friends.
This allows to show important information from pg_stat_replication,
which is available from PostgreSQL 9.1 (also without
replication_state_sync. it's available since 9.2).
For this purpose new "backend_application_name" parameter is added.
pg_stat_replication is called from pool_worker_process at the same
timing of replication delay checking.
Also modify following commands to add those new columns:
- pcp_node_info
- pgpool_adm's pcp_node_info function
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2019-April/003315.html
Tatsuo Ishii [Sun, 21 Apr 2019 06:57:22 +0000 (15:57 +0900)]
Avoid exit/fork storm of pool_worker_child process.
pool_worker_child issues query to get WAL position using do_query(),
which could throws FATAL error. In this case pool_worker_child process
exits and Pgpool-II parent immediately forks new process. This cycle
indefinitely repeats and gives high load to the system.
This could easily happen. For example if ALWAYS_MASTER flag is
mistakenly set to standby node, it will cause an error:
ERROR: recovery is in progress
HINT: WAL control functions cannot be executed during recovery.
STATEMENT: SELECT pg_current_wal_lsn()
To avoid the exit/fork storm, sleep sr_check_period.
Tatsuo Ishii [Wed, 17 Apr 2019 22:52:56 +0000 (07:52 +0900)]
Fix black_function_list's broken default value.
I accidentally broke the entry of pgpool.conf.sample when
database_redirect_preference_list and
app_name_redirect_preference_list were introduced.
Also fix mistake of the entry of pgpool.conf.sample-replication as
well.
Issue reported by Sebastiaan Alexander Mannem.
Tatsuo Ishii [Wed, 17 Apr 2019 13:11:00 +0000 (22:11 +0900)]
Fix "not enough space in buffer" error.
The error occurred while processing error message returned from
backend and the cause is that the query string in question is too
big. Problem is, the buffer is in fixed size (8192 bytes). From the
programming point of view there's absolutely no need to use fixed size
buffer. So eliminate the fixed size buffer and use palloced buffer
instead. This also saves some memory copy work.
Per bug 499.
Tatsuo Ishii [Wed, 17 Apr 2019 02:41:11 +0000 (11:41 +0900)]
Set application_name in recovery_conf.
This allows to show each cluster name as application_name. Currently
each application_name is set to: server0, server1, server2 and so on,
where the numeric part is the cluster number. This is especially
useful to use pg_stat_replication because it does not provide any useful
information regarding UNIX socket path or port number.
Tatsuo Ishii [Tue, 16 Apr 2019 06:48:44 +0000 (15:48 +0900)]
Fix DROP DATABASE failure.
When DROP DATABASE gets executed, SIGUSR1 is sent to the Pgpool-II
child process being issuing the command. In its SIGUSR1 handler,
MASTER macro is called while closing all idle connections. The MACRO
checks whether we are in failover process surely we are. As a result,
the process exits and DROP DATABASE command never been issued.
Per bug 486. However the reason of segfault in the report is not
clear. After commit:
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=
66b5aacfcc045ec1485921a5884b637fcfb6fd73
Things could be different. Let the user test the latest version in the
git repo and see if the problem is solved...
Tatsuo Ishii [Thu, 11 Apr 2019 08:32:43 +0000 (17:32 +0900)]
Doc: add load balancing description.
More will come...
Tatsuo Ishii [Thu, 11 Apr 2019 08:32:19 +0000 (17:32 +0900)]
Doc: fix typo.
Tatsuo Ishii [Tue, 9 Apr 2019 09:39:09 +0000 (18:39 +0900)]
Fix typoes.
Takuma Hoshiai [Wed, 10 Apr 2019 02:57:20 +0000 (11:57 +0900)]
Fix to compare wrong variable, when old pgpool_status file read.
Pgpool-II 3.4 or later, pgpool_status format changed, and format both old and new is supported.
Pgpool might read status in file incorrectly, when old format is reading by Pgpool.
This is rare case, and noproblem if it is happend.
Tatsuo Ishii [Tue, 9 Apr 2019 08:15:30 +0000 (17:15 +0900)]
Fix typo.
Tatsuo Ishii [Tue, 9 Apr 2019 07:14:55 +0000 (16:14 +0900)]
Add more section to peformance chapter.
Tatsuo Ishii [Tue, 9 Apr 2019 05:17:57 +0000 (14:17 +0900)]
Doc: add description about multi-statement queries to restrictions section.
Bo Peng [Sun, 7 Apr 2019 15:01:50 +0000 (00:01 +0900)]
Add test/watchdog_setup to EXTRA_DIST.
See bug470: https://www.pgpool.net/mantisbt/view.php?id=470
Tatsuo Ishii [Sun, 7 Apr 2019 00:42:31 +0000 (09:42 +0900)]
Doc: mention that multi-statement queries are sent to primary node only.
Even if the multi-statement query includes SET command, they should be
sent to primary node only. This is not explicitly mentioned nowhere
in the doc.
Per bug 492.
Tatsuo Ishii [Fri, 5 Apr 2019 07:53:51 +0000 (16:53 +0900)]
Fix md5 auth broken in raw mode.
In raw mode, pool_passwd is not necessary to authenticate PostgreSQL
using md5 auth. However if there's more than 1 backend,
authenticating with md5 fails.
Fix is, checking whether we are operating in raw mode while do md5
authentication. This was broken in 4.0, so back-patched to only 4.0.
Per bug 491.
Tatsuo Ishii [Fri, 5 Apr 2019 02:42:58 +0000 (11:42 +0900)]
Fix occasional regression test error of 008.dbredirect.
The test runs pgbench to create test data. Under slow environment
replication delay could become too much, and this would prevent load
balancing because the default value for delay_threshold is set to
10000000 in the default pgpool.conf-stream and the tests would fail.
The fix is, disabling delay_threshold. Also add "show_pool nodes" to
confirm that replication delay actually happens.
Tatsuo Ishii [Wed, 3 Apr 2019 03:04:58 +0000 (12:04 +0900)]
Fix occasional regression test failure of 014.watchdog_test_quorum_bypass.
The test script does not retry psql while failover happens and
failed. So replace psql with wait_for_pgpool_startup.
Tatsuo Ishii [Tue, 2 Apr 2019 03:56:01 +0000 (12:56 +0900)]
Abort session if failover/failback is ongoing.
If failover/failback is ongoing, there would be a risk that MASTER
node macro cannot be used. If used, it could raise a segfault because
connection to the master node is NULL or bogus.
There are several reports suspected to be caused by this (see bug 481,
482 for example).
Now the guts of the MASTER* macro (pool_virtual_master_db_node_id())
is modified to check Req_info->switching which is true while
failover/failback is ongoing. If true, emit warning message and exit
the process. There's still a small window I know, but this should
greatly reduce the chance to access bogus MASTER connection without
using any locking.
Bo Peng [Tue, 2 Apr 2019 01:14:26 +0000 (10:14 +0900)]
New feature "statement_level_load_balancing".
This feature enables selecting load balancing node per statement.
The current feature for load balancing, the load balancing node is decided
at the session start time and will not be changed until the session ends.
When set to statement_level_load_balancing = on, the load balancing node
is decided for each read query.
For example, in applications that use connection pooling remain connections
open to the backend server, because the session may be held for a long time,
the load balancing node does not change until the session ends.
In such applications, when statement_level_load_balance is enabled,
it is possible to decide load balancing node per query, not per session.
Bo Peng [Tue, 2 Apr 2019 00:21:44 +0000 (09:21 +0900)]
Generate Makefile.in by automake 1.13.4.
Tatsuo Ishii [Sat, 30 Mar 2019 13:34:45 +0000 (22:34 +0900)]
Suppress "ar: `u' modifier ignored since `D' is the default (see `U')".
This is actually a bug with libtools. To deal with this, add ARFLAGS
to parser's Makefile.am.
Tatsuo Ishii [Sat, 30 Mar 2019 12:56:03 +0000 (21:56 +0900)]
Suppress useless truncation warnings from gcc 8+.
For this purpose update c-compiler.m4 (borrowed from PostgreSQL's
config/c-compiler.m4) and add PGAC_PROG_CC_VAR_OPT(NOT_THE_CFLAGS,
[-Wformat-truncation]) to configure.ac to generate -Wformat-truncation
compiler option.
Tatsuo Ishii [Sat, 30 Mar 2019 12:29:41 +0000 (21:29 +0900)]
Suppress compiler warnings.
Suppress compiler warnings regarding write(2) returns values being
ignored. Since they are used in signal handlers, it's impossible to
print info about errors. To shut up the warnings, create a static
variable and assign the return values from write().
Tatsuo Ishii [Sat, 30 Mar 2019 11:56:29 +0000 (20:56 +0900)]
Fix compiler warnings.
To deal with compiler warnings regarding that the return value of
write(2) is ignored, replace it with write_it() which calls write()
and uses the return value from it to print error string when write()
returns error.
Tatsuo Ishii [Sat, 30 Mar 2019 01:33:58 +0000 (10:33 +0900)]
Fix wrong usage of volatile declaration.
From a PostgreSQL commit message:
Variables used after a longjmp() need to be declared volatile. In
case of a pointer, it's the pointer itself that needs to be declared
volatile, not the pointed-to value.
Same thing can be said to:
volatile StartupPacket *sp;
This should have been:
StartupPacket *volatile sp;
This also suppresses a compiler warning.
Bo Peng [Thu, 28 Mar 2019 11:56:35 +0000 (20:56 +0900)]
Change pgpool.spec.
Bo Peng [Thu, 28 Mar 2019 11:29:36 +0000 (20:29 +0900)]
Update pgpool_socket_dir.patch file.
Bo Peng [Thu, 28 Mar 2019 09:18:10 +0000 (18:18 +0900)]
Doc: Add release-notes 4.0.4-3.4.23.
Tatsuo Ishii [Thu, 28 Mar 2019 04:58:27 +0000 (13:58 +0900)]
Fix memory leak in "batch" mode in extended query.
In "batch" mode, not for every execute message, a sync message is
followed. Unfortunately Pgpool-II only discard memory of query
context for the last execute message while processing the ready for
query message. For example if 3 execute messages are sent before the
sync message, 2 of query context memory will not be freed and this
leads to serious memory leak.
To fix the problem, now the query context memory is possibly discarded
when a command complete message is returned from backend if the query
context is not referenced either by sent messages or pending messages.
If it is not referenced at all, we can discard the query context.
Also even if it is referenced, it is ok to discard the query context
if it is either an unnamed statement or an unnamed portal because it
will be discarded anyway when next unnamed statement or portal is
created.
Per bug 468.
Tatsuo Ishii [Wed, 27 Mar 2019 10:16:49 +0000 (19:16 +0900)]
Doc: add ssl_prefer_server_ciphers paramter to Japanese doc.
Muhammad Usama [Wed, 27 Mar 2019 07:51:20 +0000 (12:51 +0500)]
Add new configuration option ssl_prefer_server_ciphers
Add the new setting "ssl_prefer_server_ciphers" to let users configure if they
want client's or server's cipher order to take preference.
Yugo Nagata [Wed, 27 Mar 2019 01:08:32 +0000 (10:08 +0900)]
Specify default value of ssl_ciphers
Tatsuo Ishii [Sat, 23 Mar 2019 04:04:21 +0000 (13:04 +0900)]
Allow to set a client cipher list.
For this purpose new parameter "ssl_ciphers" is added. This is already
implemented in PostgreSQL and useful to enhance security when SSL is
enabled.
Takuma Hoshiai [Fri, 22 Mar 2019 04:30:30 +0000 (13:30 +0900)]
Doc: update chapters that 'Watchdog' of 'Tutorial' and some sections of 'Server Administration'
reviewed by Tatsu Ishii and Bo Peng.
Tatsuo Ishii [Fri, 22 Mar 2019 02:45:42 +0000 (11:45 +0900)]
Doc: enhance watchdog/Pgpool-II example so that it mentions about pg_monitor role.
Per bug 469.
Tatsuo Ishii [Mon, 18 Mar 2019 00:34:02 +0000 (09:34 +0900)]
Fix unnecessary fsync to pgpool_status file.
Whenever new connections are created to PostgreSQL backend, fsync()
was issued to pgpool_status file, which could generate excessive I/O
in certain conditions, for example num_init_children is large and
connections to backend have certain life time limit.
So reduce the chance of issuing fsync() so that it is issued only when
backend status is changed from CON_CONNECT_WAIT or others to CON_UP.
If the status is already CON_UP, we don't need to write to
pgpool_status.
Discussion: [pgpool-general: 6436] High I/O Usage on PGPool nodes
Bo Peng [Thu, 14 Mar 2019 05:21:59 +0000 (14:21 +0900)]
Add "tags" to gitignore file.
Bo Peng [Thu, 7 Mar 2019 01:27:31 +0000 (10:27 +0900)]
Fix indent of pgpool.conf sample files.
Tatsuo Ishii [Tue, 5 Mar 2019 15:13:24 +0000 (00:13 +0900)]
Doc: add more explanation to follow_master_command.
Add description how follow_master_command is executed etc.
Tatsuo Ishii [Tue, 5 Mar 2019 14:59:49 +0000 (23:59 +0900)]
Doc: add more explanation to follow_master_command.
Add description how follow_master_command is executed etc.
Tatsuo Ishii [Sun, 3 Mar 2019 00:38:29 +0000 (09:38 +0900)]
Docs: add performance doc.
Forgot to push the modification to pgpool.sgml
Tatsuo Ishii [Thu, 28 Feb 2019 00:36:24 +0000 (09:36 +0900)]
Doc: add "performance" chapter to doc.
This is just a starting up and more contents will be coming.
After completing the English doc, Japanese doc will come.
Tatsuo Ishii [Wed, 27 Feb 2019 00:38:15 +0000 (09:38 +0900)]
Fix write_status_file()'s signature.
It was mistakenly declared as write_status_file(). Of course this
should be: write_status_file(void).
Takuma Hoshiai [Mon, 25 Feb 2019 07:32:44 +0000 (16:32 +0900)]
Add new 'enable_shared_relcache' parameter.
The relation cache were stored in local cache of child processes, so all child processes executed same query to get relation cache.
If enable_shared_relcache is on, the relation cache is stored in memory cache and all child process share it.
It will expect to reduce the load that same query is executed.
Tatsuo Ishii [Sun, 24 Feb 2019 12:35:40 +0000 (21:35 +0900)]
Docs: add note to detach_false_primary configuration parameter.
To use this feature, sr_check_user must be super user or in pg_monitor
group.
Per bug 469.
Tatsuo Ishii [Sun, 24 Feb 2019 07:46:38 +0000 (16:46 +0900)]
Doc: enhance explanation about fail_over command and follow_master command.
Mention that there's complete examples of fail_over_command and
follow_master_command.
Tatsuo Ishii [Fri, 22 Feb 2019 01:09:09 +0000 (10:09 +0900)]
Eliminate select(2) system calls when they are not necessary.
The idea is checking select(2) timeout parameter set in a static
variable in pool_read() and pool_read2(). If it's -1, it means no
select timeout will be set in pool_check_fd(), which implies we can
avoid to call pool_check_fd().
Also I moved pool_check_fd() and friends to pool_stream.c from a
modularity point of view.
This gives slight performance improvement according to Jesper
Pedersen.
Bottleneck analysis and suggestions from Jesper Pedersen.
Discussion: [pgpool-hackers: 3247] https://www.pgpool.net/pipermail/pgpool-hackers/2019-February/003247.html
Bo Peng [Thu, 21 Feb 2019 00:53:07 +0000 (09:53 +0900)]
doc: Change release-note.
Bo Peng [Wed, 20 Feb 2019 11:17:10 +0000 (20:17 +0900)]
doc: Change release-note.
Bo Peng [Wed, 20 Feb 2019 10:55:21 +0000 (19:55 +0900)]
Add release-notes 4.0.3-3.4.22.
Bo Peng [Wed, 20 Feb 2019 10:52:37 +0000 (19:52 +0900)]
doc: Update configuration example "Pgpool-II + Watchdog Setup Example".
Bo Peng [Mon, 18 Feb 2019 06:18:15 +0000 (15:18 +0900)]
Skip over "host=" when getting info from conninfo string.
Patch provided by Nathan Ward.
Takuma Hoshiai [Fri, 15 Feb 2019 07:20:43 +0000 (16:20 +0900)]
Fix regression test 068
It was not working correctly, because a function of old jdbc and some fixed variable were used by this test case.
And other, typo is fixed.
Tatsuo Ishii [Fri, 15 Feb 2019 05:26:55 +0000 (14:26 +0900)]
Fix configuration change timing regarding memory_cache_enabled.
This parameter must not be changed after Pgpool-II start but it was
possible to change by reloading.
Tatsuo Ishii [Tue, 12 Feb 2019 07:59:35 +0000 (16:59 +0900)]
Fix unwanted recovery timeout in certain cases.
In the second stage of online recovery in replication mode, it is
possible it fails with timeout (message: "wait_connection_closed:
existing connections did not close in %d sec.") if connection counter
is malformed by a child process aborts with SIGKILL, SEGFAULT or etc.
This could be detected by checking if client_idle_limit_in_recovery is
enabled and it has less value than recovery_timeout because all
clients must be kicked out by the time when
client_idle_limit_in_recovery is expired. If so, we should reset
conn_counter to 0 also.
Per bug 431.
Tatsuo Ishii [Sat, 9 Feb 2019 11:55:23 +0000 (20:55 +0900)]
Enhance performance while sending message to frontend.
SimpleForwardToFrontend(), which is responsible for sending message to
frontend, does write buffering only if it is either 'D' (DataRow) or
'd' (CopyData). Other message types were immediately written to
socket. But actually this was not necessary. So if the messages are
not critical ("Command Complete", "Ready For query", + * "Error
response" and "Notice message" messages), just write to buffer. With
this 10-17% performance enhance was observed.
Discussion: [pgpool-hackers: 3233] https://www.pgpool.net/pipermail/pgpool-hackers/2019-February/003233.html
Tatsuo Ishii [Sat, 9 Feb 2019 11:34:40 +0000 (20:34 +0900)]
Avoid error or notice message analysis if it's not necessary.
After sending query to backend, Pgpool-II always calls
pool_extract_error_message() via per_node_error_log(). In the function
memory allocation is performed even if error or notice message is
returned from backend. To avoid the waste of CPU cycle, check message
kind and avoid calling pool_extract_error_message() if it's not error
or notice message.
Discussion: [pgpool-hackers: 3230] https://www.pgpool.net/pipermail/pgpool-hackers/2019-February/003230.html
Takuma Hoshiai [Thu, 7 Feb 2019 02:21:31 +0000 (11:21 +0900)]
Doc: update 'Getting Started' of 'Tutorial' chapter for version 4.1.
reviewd by Tatsuo Ishii and Bo Peng.
Tatsuo Ishii [Tue, 5 Feb 2019 11:59:38 +0000 (20:59 +0900)]
Reduce memory usage when large data set is returned from backend.
In commit
8640abfc41ff06b1e6d31315239292f4d3d4191d,
pool_wait_till_ready_for_query() was introduced to retrieve all
messages into buffer from backend until it found a "ready for query"
message when extended query protocol is used in streaming replication
mode. It could hit memory allocation limit of palloc(), which is 1GB.
This could be easily reproduced by using pgbench and pgproto for
example.
pgbench -s 100
pgproto data:
'P' "" "SELECT * FROM pgbench_accounts" 0
'B' "" "" 0 0 0
'E' "" 0
'S'
'Y'
To reduce the memory usage, introduce "suspend_reading_from_frontend"
flag in session context so that Pgpool-II does not read any message
after sync message is received. The flag is turned off when a "ready
for query" message is received from backend. Between this, Pgpool-II
reads messages from backend and forward to frontend as usual. This way
we could eliminate the necessity to store messages from backend in
buffer, thus it reduces the memory foot print.
Per bug 462.
Tatsuo Ishii [Tue, 5 Feb 2019 09:51:40 +0000 (18:51 +0900)]
Fix syntax error in extended query test script.
Checking "Some process remains" needed double quotes around
a variable.
Tatsuo Ishii [Tue, 29 Jan 2019 08:20:41 +0000 (17:20 +0900)]
Fix corner case bug with strip_quote().
strip_quote(), which is called by pattern_compare() did not properly
handle empty query string case. In the worst case it could wipe out
memory after a pointer returned from malloc(), which could cause a
segmentation fault in free() called in pattern_compare().
Per bug 458.
Tatsuo Ishii [Sun, 27 Jan 2019 02:03:14 +0000 (11:03 +0900)]
Mention that schema qualifications cannot be used in white/black_function_list.
Tatsuo Ishii [Fri, 18 Jan 2019 06:22:21 +0000 (15:22 +0900)]
Fix typo in a figure.
Takuma Hoshiai [Wed, 23 Jan 2019 06:38:40 +0000 (15:38 +0900)]
Doc: update 'Preface' chapter for version 4.1.
reviewd by Tatsuo Ishii and Bo Peng.
Takuma Hoshiai [Wed, 23 Jan 2019 00:33:18 +0000 (09:33 +0900)]
Fix typo about wd_priority in watchdog_setup.
Muhammad Usama [Fri, 18 Jan 2019 14:32:49 +0000 (19:32 +0500)]
Fixing
0000455: watchdog lifecheck process has segfalut in query mode
The issue is caused by the "AES password support in wd_lifecheck_password" commit
Since the query mode lifecheck uses threads and our MemoryManager api is not
thread safe, so when get_pgpool_config_user_password() is called from a thread
function the pstrdup() call in the function confuses the MemoryManager and
allocates the invalid address.
This commit fixes the issue by taking get_pgpool_config_user_password() function
call to outside the thread function and passing in the password as an argument
to the thread
The issue was reported by Yugo Nagata
The fix was proposed by me and some adjustments to the patch
and testing is done by Yugo Nagata
Tatsuo Ishii [Thu, 10 Jan 2019 03:20:07 +0000 (12:20 +0900)]
Fix Pgpool child segfault in a race condition.
1) frontend tries to connect to Pgpool-II
2) there's no existing connection cache
3) try to create new backend connections by calling connect_backend()
4) inside connect_backend(), pool_create_cp() gets called
5) pool_create_cp() calls new_connection()
6) failover occurs and the global backend status is set to down, but
the pgpool main does not send kill signal to the child process yet
7) inside new_connection() after checking VALID_BACKEND, it checks the
global backend status and finds it is set to down status, so that
it returns without creating new connection slot
8) connect_backend() continues and accesses the down connection slot
because local status says it's alive, which results in a segfault.
Since there's already checking for the global status in
new_connection(), a fix could be syncing the local status with the
global status there.
See [pgpool-hackers: 3214] for discussion.
Tatsuo Ishii [Thu, 3 Jan 2019 08:23:52 +0000 (17:23 +0900)]
Doc: fix typo in logdir description.
Per bug 453.
Tatsuo Ishii [Wed, 26 Dec 2018 05:54:55 +0000 (14:54 +0900)]
Enhance performance of CopyData message handling.
When COPY XX FROM STDIN gets executed (typical client is pg_dump),
each copy row data is sent from Pgpool-II to frontend using CopyData
message. Previously, one CopyData message was followed by a flush,
which costed a lot. Instead, now flush is done in subsequent Command
Complete, Notice message or Error message. A quick test reveals that
this change brings x2.5 speed up.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2018-December/003199.html
Takuma Hoshiai [Tue, 25 Dec 2018 07:50:14 +0000 (16:50 +0900)]
Fix PAM authentication failed with Pgpool-II 4.0.2.
Pgpool-II 4.0.2 was outputting unnecessary error doing PAM authentication.
This problem reported by Mouhamadou DIAW in [pgpool-general: 6353].
Tatsuo Ishii [Tue, 11 Dec 2018 22:42:31 +0000 (07:42 +0900)]
Fix occasional extended query hang.
If a client sends a extended query message such as close after sync
message but before next simple query, Pgpool-II could hang.
:
<= BE ParseComplete
<= BE BindComplete
<= BE CommandComplete(COMMIT)
<= BE ReadyForQuery(I)
FE=> Close(stmt="S2")
FE=> Close(stmt="S1")
FE=> Query (query="BEGIN") [0]
<= BE CloseComplete [1]
<= BE CloseComplete [1]
<= BE CommandComplete(BEGIN) [2]
Because of [1], query in progress flag was reset, then [2] hangs
trying to read from backend which did not receive message from
Pgpool-II because it does not refer to the query context set by [0].
Sending close after sync is not recommended according to the official
document but some sloppy drivers seem to do it. To deal with the
problem, check the doing extended query message flag before resetting
the query in progress flag.
Problem reported by Muhammad Usama.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2018-December/003164.html
Tatsuo Ishii [Thu, 6 Dec 2018 06:35:06 +0000 (15:35 +0900)]
Deal with "terminating connection due to idle-in-transaction timeout" error.
If idle_in_transaction_session_timeout parameter is set to reasonably
short in postgresql.conf, the fatal error easily occurs and the
connection from Pgpool-II to backend is terminated. This leads to
Pgpool-II either hang (if only one of PostgreSQL equips equips the
parameter) or unwanted failover (if all PostgreSQL equips with the
parameter), and both are not good. So intercept the message and send
the same message to frontend then exit to terminate the connection to
frontend. This is similar treatment as the error "connection was
terminated due to conflict with recovery, User was holding a relation
lock for too long."
Per bug 448.
Tatsuo Ishii [Tue, 4 Dec 2018 07:20:48 +0000 (16:20 +0900)]
Implement new feature not to accept incoming connections.
Currently Pgpool-II accepts up to num_init_children frontends and
queues up more connection requests until one of child process becomes
free. This mostly works well, but if each session take long time, the
queue grows long and the whole system does not work smoothly.
To overcome the problem, I propose a new way to deal with many
connection requests from frontend: just complain and refuses to
connect to Pgpool-II if number of num_init_children is already filled
up. This is exactly the same behavior as PostgreSQL.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2018-November/003153.html
Patch by Tatsuo Ishii.
Tatsuo Ishii [Mon, 26 Nov 2018 01:28:09 +0000 (10:28 +0900)]
Fix memory leak pointed out by Coverity.
Bo Peng [Wed, 21 Nov 2018 08:22:23 +0000 (17:22 +0900)]
Add release notes.
Takuma Hoshiai [Wed, 21 Nov 2018 08:09:43 +0000 (17:09 +0900)]
hange sort algorism buble sort to quick sort .
This is used to sort startup packet's parameters.
Takuma Hoshiai [Wed, 21 Nov 2018 02:49:06 +0000 (11:49 +0900)]
Fix to sort startup packet's parameters sent by client.
If order of startup packet's parameters differ between cached connection pools and connection request, did't use connection pool ,and created new connection pool.
Per bug 444.
Tatsuo Ishii [Mon, 19 Nov 2018 08:17:21 +0000 (17:17 +0900)]
Change function name and messages of detect_error.
This function name and message make enough misunderstandings that
there's something wrong. Actually this is harmless until the caller
through an error message, such as "admin shutdown" message. So change
the function name and messages to "extract_message" to eliminate such
misunderstandings.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2018-November/003115.html
Muhammad Usama [Thu, 15 Nov 2018 20:48:31 +0000 (01:48 +0500)]
Fix broken authentication for Pgpool's internal connections
The issue is caused by a mistake in "SCRAM and Certificate authentication
support" commit.
The problem is while authenticating against backend in connection_do_auth(), it
returns to caller as soon as backend returns auth ok response. So
authentication itself established fine. However connection_do_auth()
does not proceed until it receives "Ready for query", it is required
according to the frontend/backend protocol.
The fix is to keep processing the data after receiving auth_ok response
until we get ready for query.
Patch provided by Tatsuo Ishii <ishii@sraoss.co.jp> and a tiny modification
made by me
Takuma Hoshiai [Thu, 15 Nov 2018 04:49:41 +0000 (13:49 +0900)]
Add missed src/tools/pgenc/pg_enc.c by commit
bc9119f61b187577d64b63ebffb6f5412202f24f.
Tatsuo Ishii [Thu, 15 Nov 2018 00:21:30 +0000 (09:21 +0900)]
Fix memory leak found by Coverity,
This is actually harmless since in the situation pgpool child process
exits and the leaked memory is gone anyway. I just want to shut off
Coverity's complain.
Takuma Hoshiai [Tue, 13 Nov 2018 06:54:57 +0000 (15:54 +0900)]
Fix compiler warnings with gcc 8.x.
Bo Peng [Tue, 13 Nov 2018 01:50:50 +0000 (10:50 +0900)]
Fix segmentation fault occurs when a certain Bind message is sent in native replication mode.
If the number of parameter format codes is specified to one, but the number of the original query's
parameter is zero, bind_rewrite_timestamp() will call memcpy with a negative value for size_t.
This causes segmentation fault.
Patch is provided by Yugo Nagata.
Per bug 443.
Tatsuo Ishii [Thu, 8 Nov 2018 05:37:18 +0000 (14:37 +0900)]
Fix a query passed to relcache so that it uses schema qualified table name.
This should have been done for all similar queries to follow PostgreSQL's schema usage pattern.
However there was one missed at that time.
Bo Peng [Thu, 8 Nov 2018 06:02:10 +0000 (15:02 +0900)]
Start 4.1 development.
- Set version string to "4.1devel".
- Set new code name "karasukiboshi".
Tatsuo Ishii [Mon, 5 Nov 2018 12:14:58 +0000 (21:14 +0900)]
Fix query cache invalidation bug.
When a DML is executed in an explicit transaction, the table oid
buffer is wiped out by pool_reset_memqcache_buffer() and query cache
is not invalidated at the commit time because there's no DML oid
exists to invalidate query cache any more. To fix this, add new bool
parameter to pool_reset_memqcache_buffer() to specify whether to reset
table oid buffer or not. When a DML is executed in an explicit
transaction, call pool_reset_memqcache_buffer(false) to preserve the
table oid buffer.
Issue reported at https://github.com/pgpool/pgpool2/issues/19.
Tatsuo Ishii [Sat, 3 Nov 2018 13:07:31 +0000 (22:07 +0900)]
Fix segfault in extended query + query cache case.
If extended query is used when query cache enabled, Pgpool-II could
crash in certain case.
- parse before bind fires.
- a bind message is sent with previously parsed named statement and
unnamed portal.
- an explicit transaction is used.
When parse before bind fires, a new sent message corresponding to the
re-parsed message was not created.
When a bind message is sent from frontend using previously parsed
message and unnamed portal, Pgpool-II tries to add the unnamed portal
to the sent message list by calling
pool_add_sent_message(). pool_add_sent_message() removes the old
unnamed portal and add the new unnamed portal. Also it tries to remove
corresponding query context if it's not used by 2 or more sent
messages. Usually a query context is used by a named statement and an
unnamed portal and the query context will not moved.
However because when parse before bind fires a new sent message
corresponding to the re-parsed message was not created, the reference
count is 1, which cause the query context gets removed.
When the transaction ends, temporary buffer for query cache needed to
be removed. Unfortunately since the pointer to the temporary buffer
for query cache is stored in the query context which was just removed,
the pointer to the buffer points to a random address, and segfault
occurs.
Because of the the reason if query cache is not enabled, the segfalt
does not occurs.
Note that this bug is easily reproduced by using "pgbench -M
prepared". ("pgbench -M extended" does not trigger the bug because it
does not use named statements.)
Fix is, When parse before bind fires, add a new sent message
corresponding to the re-parsed message.
Tatsuo Ishii [Thu, 1 Nov 2018 00:22:46 +0000 (09:22 +0900)]
Fix memory leak in extended query + query cache enabled.
If a bind message is sent again to an existing prepared statement, it
is possible that the previously allocated bind parameter string
remains and newly allocated bind parameter string's pointer is set to
there, which leads to a memory leak.
Note that if a statement is parsed again in usual way, the parameter
string will be freed along with the old query context. So the leak
does not happen.
I suspect the use case for the memory leak (bind, execute is repeated
against a same prepared statement) is actually rare in the
field. Probably that's why the problem has not been reported until
today although the leak had existed since day 0.
The leak case can be easily reproduced by "pgbench -M prepared" by the
way.