From 3c09ac1d40315e9c67b73355073533e17d30ce81 Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii <ishii@sraoss.co.jp>
Date: Thu, 9 Jul 2020 09:11:03 +0900
Subject: [PATCH] Fix pgpool hang in a corner case.

It is possible that an "out of band" message from backend has been
read into buffer at the time when a ready for query message is
processed. If the messages are from all backends, there should be no
problem because ProcessBackendResponse() will read the messages from
all backends by using read_kind_from_backend(). However there could be
a corner case: 1) If the message is coming from only one of backend
(this could happen when recovery conflict or backend receiving SIGTERM
and so on) and 2) the message is already in the backend read
buffer. In this case pgpool will hang in pool_read() called by
read_kind_from_backend() at either: 1)
read_kind_from_one_backend(frontend, backend, (char *) &kind,
MASTER_NODE_ID) (the message is not coming from master backend) or 2)
pool_read(CONNECTION(backend, i), &kind, 1) (the message is not coming
from other than master).

Note If the message is not in the buffer, there should be no problem
since read_packets_and_process() will take care that "out of band"
messages.

The solution is, read and discard such a message in ReadyforQuery(),
emitting log to make sure that the read buffer is empty after
returning from ReadyForQuery(). (remember that unless the ready for
query message is returned to frontend, the frontend will not issue
next query and there's should be no response from backend except the
out of band messages).

If the message was FATAL, the backend will disconnect to pgpool. So
next time pgpool should notice that the connection is closed anyway.

For the master branch, probably we should treat that kind of FATAL
message in a same way as read_packets_and_process() already does. This
requires some code refactoring and I would like to leave the job
separated from this commit.
---
 src/protocol/pool_proto_modules.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/src/protocol/pool_proto_modules.c b/src/protocol/pool_proto_modules.c
index da9addcbe..beb2056ea 100644
--- a/src/protocol/pool_proto_modules.c
+++ b/src/protocol/pool_proto_modules.c
@@ -2073,6 +2073,27 @@ ReadyForQuery(POOL_CONNECTION * frontend,
 		}
 	}
 
+	/*
+	 * Make sure that no message remains in the backend buffer.  If something
+	 * remains, it could be an "out of band" ERROR or FATAL error, or a NOTICE
+	 * message, which was generated by backend itself for some reasons like
+	 * recovery conflict or SIGTERM received. If so, let's consume it and emit
+	 * a log message so that next read_kind_from_backend() will not hang in
+	 * trying to read from backend which may have not produced such a message.
+	 */
+	if (pool_is_query_in_progress())
+	{
+		for (i = 0; i < NUM_BACKENDS; i++)
+		{
+			if (!VALID_BACKEND(i))
+				continue;
+			if (!pool_read_buffer_is_empty(CONNECTION(backend, i)))
+				per_node_error_log(backend, i,
+								   "(out of band message)",
+								   "ReadyForQuery: Error or notice message from backend: ", false);
+		}
+	}
+
 	if (send_ready)
 	{
 		pool_write(frontend, "Z", 1);
-- 
2.39.5