Peter Geoghegan [Wed, 10 Dec 2025 20:35:30 +0000 (15:35 -0500)]
Return TIDs in desc order during backwards scans.
Always return TIDs in descending order when returning groups of TIDs
from an nbtree posting list tuple during nbtree backwards scans. This
makes backwards scans tend to require fewer buffer hits, since the scan
is less likely to repeatedly pin and unpin the same heap page/buffer
(we'll get exactly as many buffer hits as we get with a similar forwards
scan case).
Commit
0d861bbb, which added nbtree deduplication, originally did things
this way to avoid interfering with _bt_killitems's approach to setting
LP_DEAD bits on posting list tuples. _bt_killitems makes a soft
assumption that it can always iterate through posting lists in ascending
TID order, finding corresponding killItems[]/so->currPos.items[] entries
in that same order. This worked out because of the prior _bt_readpage
backwards scan behavior. If we just changed the backwards scan posting
list logic in _bt_readpage, without altering _bt_killitems itself, it
would break its soft assumption.
Avoid that problem by sorting the so->killedItems[] array at the start
of _bt_killitems. That way the order that dead items are saved in from
btgettuple can't matter; so->killedItems[] will always be in the same
order as so->currPos.items[] in the end. Since so->currPos.items[] is
now always in leaf page order, regardless of the scan direction used
within _bt_readpage, and since so->killedItems[] is always in that same
order, the _bt_killitems loop can continue to make a uniform assumption
about everything being in page order. In fact, sorting like this makes
the previous soft assumption about item order into a hard invariant.
Also deduplicate the so->killedItems[] array after it is sorted. That
way there's no risk of the _bt_killitems loop becoming confused by a
duplicate dead item/TID. This was possible in cases that involved a
scrollable cursor that encountered the same dead TID more than once
(within the same leaf page/so->currPos context). This doesn't come up
very much in practice, but it seems best to be as consistent as possible
about how and when _bt_killitems will LP_DEAD-mark index tuples.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=Wut2pKvbW-u3hJ_LXwsYeiXHiW8oN1GfbKPavcGo8Ow@mail.gmail.com
Jeff Davis [Wed, 10 Dec 2025 19:56:11 +0000 (11:56 -0800)]
Add pg_iswcased().
True if character has multiple case forms. Will be a useful
multibyte-aware replacement for char_is_cased().
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/
450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Jeff Davis [Wed, 10 Dec 2025 19:55:59 +0000 (11:55 -0800)]
Remove char_tolower() API.
It's only useful for an ILIKE optimization for the libc provider using
a single-byte encoding and a non-C locale, but it creates significant
internal complexity.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/
450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Heikki Linnakangas [Wed, 10 Dec 2025 17:38:07 +0000 (19:38 +0200)]
Fix bogus extra arguments to query_safe in test
The test seemed to incorrectly think that query_safe() takes an
argument that describes what the query does, similar to e.g.
command_ok(). Until commit
bd8d9c9bdf the extra arguments were
harmless and were just ignored, but when commit
bd8d9c9bdf introduced
a new optional argument to query_safe(), the extra arguments started
clashing with that, causing the test to fail.
Backpatch to v17, that's the oldest branch where the test exists. The
extra arguments didn't cause any trouble on the older branches, but
they were clearly bogus anyway.
Heikki Linnakangas [Wed, 10 Dec 2025 17:27:02 +0000 (19:27 +0200)]
Improve DDL deparsing test
1. The test initially focuses on the "parent" table, then switches to
the "part" table, and goes back to the "parent" table. That seems a
little weird, so move the tests around so that all the commands on the
"parent" table are done first, followed by the "part" table.
2. ALTER TABLE ALTER COLUMN SET EXPRESSION was not tested, so add
that.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACJufxFDi7fnwB-8xXd_ExML7-7pKbTaK4j46AJ=4-14DXvtVg@mail.gmail.com
Melanie Plageman [Wed, 10 Dec 2025 16:10:13 +0000 (11:10 -0500)]
Add comment about keeping PD_ALL_VISIBLE and VM in sync
The comment above heap_xlog_visible() about the critical integrity
requirement for PD_ALL_VISIBLE and the visibility map should also be in
heap_xlog_prune_freeze() where we set PD_ALL_VISIBLE.
Oversight in
add323da40a6bf9e
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
Melanie Plageman [Wed, 10 Dec 2025 16:10:01 +0000 (11:10 -0500)]
Simplify vacuum visibility assertion
Phase I vacuum gives the page a once-over after pruning and freezing to
check that the values of all_visible and all_frozen agree with the
result of heap_page_is_all_visible(). This is meant to keep the logic in
phase I for determining visibility in sync with the logic in phase III.
Rewrite the assertion to avoid an Assert(false).
Suggested by Andres Freund.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/mhf4vkmh3j57zx7vuxp4jagtdzwhu3573pgfpmnjwqa6i6yj5y%40sy4ymcdtdklo
Heikki Linnakangas [Wed, 10 Dec 2025 13:33:29 +0000 (15:33 +0200)]
Fix comment in GetPublicationRelations
This function gets the list of relations associated with the
publication but the comment said the opposite.
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CANhcyEV3C_CGBeDtjvKjALDJDMH-Uuc9BWfSd=eck8SCXnE=fQ@mail.gmail.com
Heikki Linnakangas [Wed, 10 Dec 2025 09:43:16 +0000 (11:43 +0200)]
Fix some near-bugs related to ResourceOwner function arguments
These functions took a ResourceOwner argument, but only checked if it
was NULL, and then used CurrentResourceOwner for the actual work.
Surely the intention was to use the passed-in resource owner. All
current callers passed CurrentResourceOwner or NULL, so this has no
consequences at the moment, but it's an accident waiting to happen for
future caller and extensions.
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAEze2Whnfv8VuRZaohE-Af+GxBA1SNfD_rXfm84Jv-958UCcJA@mail.gmail.com
Backpatch-through: 17
Michael Paquier [Wed, 10 Dec 2025 04:56:33 +0000 (13:56 +0900)]
libpq: Authorize pthread_exit() in libpq_check
pthread_exit() is added to the list of symbols allowed when building
libpq. This has been reported as possible when libpq is statically
linked to libcrypto, where pthread_exit() could be called.
Reported-by: Torsten Rupp <torsten.rupp@gmx.net>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/19095-
6d8256d0c37d4be2@postgresql.org
Michael Paquier [Wed, 10 Dec 2025 03:46:45 +0000 (12:46 +0900)]
Fix failures with cross-version pg_upgrade tests
Buildfarm members skimmer and crake have reported that pg_upgrade
running from v18 fails due to the changes of
d52c24b0f808, with the
expectations that the objects removed in the test module
injection_points should still be present post upgrades, but the test
module does not have them anymore.
The origin of the issue is that the following test modules depend on
injection_points, but they do not drop the extension once the tests
finish, leaving its traces in the dumps used for the upgrades:
- gin, down to v17
- typcache, down to v18
- nbtree, HEAD-only
Test modules have no upgrade requirements, as they are used only for..
Tests, so there is no point in keeping them around.
An alternative solution would be to drop the databases created by these
modules in AdjustUpgrade.pm, but the solution of this commit to drop the
extension is simpler. Note that there would be a catch if using a
solution based on AdjustUpgrade.pm as the database name used for the
test runs differs between configure and meson:
- configure relies on USE_MODULE_DB for the database name unicity, that
would build a database name based on the *first* entry of REGRESS, that
lists all the SQL tests.
- meson relies on a "name" field.
For example, for the test module "gin", the regression database is named
"regression_gin" under meson, while it is more complex for configure, as
of "contrib_regression_gin_incomplete_splits". So a AdjustUpgrade.pm
would need a set of DROP DATABASE IF EXISTS to solve this issue, to cope
with each build system.
The failure has been caused by
d52c24b0f808, and the problem can happen
with upgrade dumps from v17 and v18 to HEAD. This problem is not
currently reachable in the back-branches, but it could be possible that
a future change in injection_points in stable branches invalidates this
theory, so this commit is applied down to v17 in the test modules that
matter.
Per discussion with Tom Lane and Heikki Linnakangas.
Discussion: https://postgr.es/m/
2899652.
1765167313@sss.pgh.pa.us
Backpatch-through: 17
Michael Paquier [Wed, 10 Dec 2025 02:56:42 +0000 (11:56 +0900)]
Fix two issues with recently-introduced nbtree test
REGRESS has forgotten about the test nbtree_half_dead_pages, and a
.gitignore was missing from the module.
Oversights in
c085aab27819 for REGRESS and
1e4e5783e7d7 for the missing
.gitignore.
Discussion: https://postgr.es/m/aTipJA1Y1zVSmH3H@paquier.xyz
Michael Paquier [Tue, 9 Dec 2025 23:10:28 +0000 (08:10 +0900)]
Fix meson warning due to missing declaration of NM
The warning was showing up in the early stages of the meson build, when
the contents of Makefile.global is generated based on the configuration
of meson for PGXS.
NM is added to pgxs_empty. This declaration is only used internally for
the libpq sanity check, so there is no point in exposing it in PGXS.
Oversight in
4a8e6f43a6b5.
Reported-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/
4423e01f-1e52-4f47-a6ca-
05cc8081c888@eisentraut.org
Heikki Linnakangas [Tue, 9 Dec 2025 23:06:03 +0000 (01:06 +0200)]
Fix typo in comment
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/CABPTF7V8CbOXGePqrad6EH3Om7DRhNiO3C0rQ-62UuT7RdU-GQ@mail.gmail.com
David Rowley [Tue, 9 Dec 2025 23:01:14 +0000 (12:01 +1300)]
Fix misleading comment in tuplesort.c
A comment in tuplesort.c was claiming that the code was defining
INITIAL_MEMTUPSIZE so that it *does not* exceed
ALLOCSET_SEPARATE_THRESHOLD, but the code actually ensures that we
purposefully *do* exceed ALLOCSET_SEPARATE_THRESHOLD for the initial
allocation of the tuples array, as per reasons detailed in the
commentary of grow_memtuples().
Also, there's not much need to repeat the mention about
ALLOCSET_SEPARATE_THRESHOLD in each location where INITIAL_MEMTUPSIZE is
used, so remove those comments.
Author: ChangAo Chen <
cca5507@qq.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/tencent_6FA14F85D6B5B5291532D6789E07F4765C08%40qq.com
Michael Paquier [Tue, 9 Dec 2025 22:36:46 +0000 (07:36 +0900)]
Use palloc_object() and palloc_array() in backend code
The idea is to encourage more the use of these new routines across the
tree, as these offer stronger type safety guarantees than palloc().
This batch of changes includes most of the trivial changes suggested by
the author for src/backend/.
A total of 334 files are updated here. Among these files, 48 of them
have their build change slightly; these are caused by line number
changes as the new allocation formulas are simpler, shaving around 100
lines of code in total.
Similar work has been done in
0c3c5c3b06a3 and
31d3847a37be.
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/
ad0748d4-3080-436e-b0bc-
ac8f86a3466a@gmail.com
Thomas Munro [Tue, 9 Dec 2025 20:01:35 +0000 (09:01 +1300)]
Fix O_CLOEXEC flag handling in Windows port.
PostgreSQL's src/port/open.c has always set bInheritHandle = TRUE
when opening files on Windows, making all file descriptors inheritable
by child processes. This meant the O_CLOEXEC flag, added to many call
sites by commit
1da569ca1f (v16), was silently ignored.
The original commit included a comment suggesting that our open()
replacement doesn't create inheritable handles, but it was a mis-
understanding of the code path. In practice, the code was creating
inheritable handles in all cases.
This hasn't caused widespread problems because most child processes
(archive_command, COPY PROGRAM, etc.) operate on file paths passed as
arguments rather than inherited file descriptors. Even if a child
wanted to use an inherited handle, it would need to learn the numeric
handle value, which isn't passed through our IPC mechanisms.
Nonetheless, the current behavior is wrong. It violates documented
O_CLOEXEC semantics, contradicts our own code comments, and makes
PostgreSQL behave differently on Windows than on Unix. It also creates
potential issues with future code or security auditing tools.
To fix, define O_CLOEXEC to _O_NOINHERIT in master, previously used by
O_DSYNC. We use different values in the back branches to preserve
existing values. In pgwin32_open_handle() we set bInheritHandle
according to whether O_CLOEXEC is specified, for the same atomic
semantics as POSIX in multi-threaded programs that create processes.
Backpatch-through: 16
Author: Bryan Green <dbryan.green@gmail.com>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com> (minor adjustments)
Discussion: https://postgr.es/m/
e2b16375-7430-4053-bda3-
5d2194ff1880%40gmail.com
Nathan Bossart [Tue, 9 Dec 2025 19:34:22 +0000 (13:34 -0600)]
vacuumdb: Add --dry-run.
This new option instructs vacuumdb to print, but not execute, the
VACUUM and ANALYZE commands that would've been sent to the server.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CADkLM%3DckHkX7Of5SrK7g0LokPUwJ%3Dkk8JU1GXGF5pZ1eBVr0%3DQ%40mail.gmail.com
Nathan Bossart [Tue, 9 Dec 2025 19:34:22 +0000 (13:34 -0600)]
Add ParallelSlotSetIdle().
This commit refactors the code for marking a ParallelSlot as idle
to a new static inline function. This can be used to mark a slot
that was obtained via ParallelSlotGetIdle() but that we don't
intend to actually use for a query as idle again.
This is preparatory work for a follow-up commit that will add a
--dry-run option to vacuumdb.
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com
Discussion: https://postgr.es/m/CADkLM%3DckHkX7Of5SrK7g0LokPUwJ%3Dkk8JU1GXGF5pZ1eBVr0%3DQ%40mail.gmail.com
Nathan Bossart [Tue, 9 Dec 2025 19:34:22 +0000 (13:34 -0600)]
vacuumdb: Move some variables to the vacuumingOptions struct.
Presently, the "echo" and "quiet" variables are carted around to
various functions, which is a bit tedious. To simplify things,
this commit moves them into the vacuumingOptions struct and removes
the related function parameters. While at it, remove some
redundant initialization code in vacuumdb's main() function.
This is preparatory work for a follow-up commit that will add a
--dry-run option to vacuumdb.
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CADkLM%3DckHkX7Of5SrK7g0LokPUwJ%3Dkk8JU1GXGF5pZ1eBVr0%3DQ%40mail.gmail.com
Masahiko Sawada [Tue, 9 Dec 2025 19:23:45 +0000 (11:23 -0800)]
Add started_by column to pg_stat_progress_analyze view.
The new column, started_by, indicates the initiator of the
analyze ('manual' or 'autovacuum'), helping users and monitoring tools
to better understand ANALYZE behavior.
Bump catalog version.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Yu Wang <wangyu_runtime@163.com>
Discussion: https://postgr.es/m/CAA5RZ0suoicwxFeK_eDkUrzF7s0BVTaE7M%2BehCpYcCk5wiECpw%40mail.gmail.com
Masahiko Sawada [Tue, 9 Dec 2025 18:51:14 +0000 (10:51 -0800)]
Add mode and started_by columns to pg_stat_progress_vacuum view.
The new columns, mode and started_by, indicate the vacuum
mode ('normal', 'aggressive', or 'failsafe') and the initiator of the
vacuum ('manual', 'autovacuum', or 'autovacuum_wraparound'),
respectively. This allows users and monitoring tools to better
understand VACUUM behavior.
Bump catalog version.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Yu Wang <wangyu_runtime@163.com>
Discussion: https://postgr.es/m/CAOzEurQcOY-OBL_ouEVfEaFqe_md3vB5pXjR_m6L71Dcp1JKCQ@mail.gmail.com
Nathan Bossart [Tue, 9 Dec 2025 17:01:38 +0000 (11:01 -0600)]
doc: Fix titles of some pg_buffercache functions.
As in commit
59d6c03956, use <function> rather than <structname> in
the <title> to be consistent with how other functions in this
module are documented.
Oversights in commits
dcf7e1697b and
9ccc049dfe.
Author: Noboru Saito <noborusai@gmail.com>
Discussion: https://postgr.es/m/CAAM3qn%2B7KraFkCyoJCHq6m%3DurxcoHPEPryuyYeg%3DQ0EjJxjdTA%40mail.gmail.com
Backpatch-through: 18
Tom Lane [Tue, 9 Dec 2025 16:43:25 +0000 (11:43 -0500)]
Support "j" length modifier in snprintf.c.
POSIX has for a long time defined the "j" length modifier for
printf conversions as meaning the size of intmax_t or uintmax_t.
We got away without supporting that so far, because we were not
using intmax_t anywhere. However, commit
e6be84356 re-introduced
upstream's use of intmax_t and PRIdMAX into zic.c. It emerges
that on some platforms (at least FreeBSD and macOS), <inttypes.h>
defines PRIdMAX as "jd", so that snprintf.c falls over if that is
used. (We hadn't noticed yet because it would only be apparent
if bad data is fed to zic, resulting in an error report, and even
then the only visible symptom is a missing line number in the
error message.)
We could revert that decision from our copy of zic.c, but
on the whole it seems better to update snprintf.c to support
this standard modifier. There might well be extensions,
now or in future, that expect it to work.
I did this in the lazy man's way of translating "j" to either
"l" or "ll" depending on a compile-time sizeof() check, just
as was done long ago to support "z" for size_t. One could
imagine promoting intmax_t to have full support in snprintf.c,
for example converting fmtint()'s value argument and internal
arithmetic to use [u]intmax_t not [unsigned] long long. But
that'd be more work and I'm hesitant to do it anyway: if there
are any platforms out there where intmax_t is actually wider
than "long long", this would doubtless result in a noticeable
speed penalty to snprintf(). Let's not go there until we have
positive evidence that there's a reason to, and some way to
measure what size of penalty we're taking.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/
3210703.
1765236740@sss.pgh.pa.us
Heikki Linnakangas [Tue, 9 Dec 2025 15:06:40 +0000 (17:06 +0200)]
Add wait event for the group commit delay before WAL flush
Author: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://www.postgresql.org/message-id/CA%2BFpmFf-hWXtrC0Q3Cr_Xo78zuP_M_VC5xgWPOYOkwqOD0T8eg@mail.gmail.com
Heikki Linnakangas [Tue, 9 Dec 2025 12:05:13 +0000 (14:05 +0200)]
Fix warning about wrong format specifier for off_t type
Per OS X buildfarm members.
Heikki Linnakangas [Tue, 9 Dec 2025 11:53:03 +0000 (13:53 +0200)]
Widen MultiXactOffset to 64 bits
This eliminates MultiXactOffset wraparound and the 2^32 limit on the
total number of multixid members. Multixids are still limited to 2^31,
but this is a nice improvement because 'members' can grow much faster
than the number of multixids. On such systems, you can now run longer
before hitting hard limits or triggering anti-wraparound vacuums.
Not having to deal with MultiXactOffset wraparound also simplifies the
code and removes some gnarly corner cases.
We no longer need to perform emergency anti-wraparound freezing
because of running out of 'members' space, so the offset stop limit is
gone. But you might still not want 'members' to consume huge amounts
of disk space. For that reason, I kept the logic for lowering vacuum's
multixid freezing cutoff if a large amount of 'members' space is
used. The thresholds for that are roughly the same as the "safe" and
"danger" thresholds used before, 2 billion transactions and 4 billion
transactions. This keeps the behavior for the freeze cutoff roughly
the same as before. It might make sense to make this smarter or
configurable, now that the threshold is only needed to manage disk
usage, but that's left for the future.
Add code to pg_upgrade to convert multitransactions from the old to
the new format, rewriting the pg_multixact SLRU files. Because
pg_upgrade now rewrites the files, we can get rid of some hacks we had
put in place to deal with old bugs and upgraded clusters. Bump catalog
version for the pg_multixact/offsets format change.
Author: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACG%3DezaWg7_nt-8ey4aKv2w9LcuLthHknwCawmBgEeTnJrJTcw@mail.gmail.com
Heikki Linnakangas [Tue, 9 Dec 2025 11:45:01 +0000 (13:45 +0200)]
Move pg_multixact SLRU page format definitions to a separate header
This makes them accessible from pg_upgrade, needed by the next commit.
I'm doing this mechanical move as a separate commit to make the next
commit's changes to these definitions more obvious.
Author: Maxim Orlov <orlovmg@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACG%3DezbZo_3_fnx%3DS5BfepwRftzrpJ%2B7WET4EkTU6wnjDTsnjg@mail.gmail.com
Dean Rasheed [Tue, 9 Dec 2025 10:49:16 +0000 (10:49 +0000)]
doc: Fix statement about ON CONFLICT and deferrable constraints.
The description of deferrable constraints in create_table.sgml states
that deferrable constraints cannot be used as conflict arbitrators in
an INSERT with an ON CONFLICT DO UPDATE clause, but in fact this
restriction applies to all ON CONFLICT clauses, not just those with DO
UPDATE. Fix this, and while at it, change the word "arbitrators" to
"arbiters", to match the terminology used elsewhere.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWsybvZP3ce8rGcVNx-QHuDOJZDz8y=p1SzqHwjRXyV4Q@mail.gmail.com
Backpatch-through: 14
Richard Guo [Tue, 9 Dec 2025 08:09:27 +0000 (17:09 +0900)]
Fix distinctness check for queries with grouping sets
query_is_distinct_for() is intended to determine whether a query never
returns duplicates of the specified columns. For queries using
grouping sets, if there are no grouping expressions, the query may
contain one or more empty grouping sets. The goal is to detect
whether there is exactly one empty grouping set, in which case the
query would return a single row and thus be distinct.
The previous logic in query_is_distinct_for() was incomplete because
the check was insufficiently thorough and could return false when it
could have returned true. It failed to consider cases where the
DISTINCT clause is used on the GROUP BY, in which case duplicate empty
grouping sets are removed, leaving only one. It also did not
correctly handle all possible structures of GroupingSet nodes that
represent a single empty grouping set.
To fix, add a check for the groupDistinct flag, and expand the query's
groupingSets tree into a flat list, then verify that the expanded list
contains only one element.
No backpatch as this could result in plan changes.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMbWs480Z04NtP8-O55uROq2Zego309+h3hhaZhz6ztmgWLEBw@mail.gmail.com
Richard Guo [Tue, 9 Dec 2025 07:56:26 +0000 (16:56 +0900)]
Fix const-simplification for index expressions and predicate
Similar to the issue with constraint and statistics expressions fixed
in
317c117d6, index expressions and predicate can also suffer from
incorrect reduction of NullTest clauses during const-simplification,
due to unfixed varnos and the use of a NULL root. It has been
reported that this issue can cause the planner to fail to pick up a
partial index that it previously matched successfully.
Because we need to cache the const-simplified index expressions and
predicate in the relcache entry, we cannot fix the Vars before
applying eval_const_expressions. To ensure proper reduction of
NullTest clauses, this patch runs eval_const_expressions a second time
-- after the Vars have been fixed and with a valid root.
It could be argued that the additional call to eval_const_expressions
might increase planning time, but I don't think that's a concern. It
only runs when index expressions and predicate are present; it is
relatively cheap when run on small expression trees (which is
typically the case for index expressions and predicate), and it runs
on expressions that have already been const-simplified once, making
the second pass even cheaper. In return, in cases like the one
reported, it allows the planner to match and use partial indexes,
which can lead to significant execution-time improvements.
Bug: #19007
Reported-by: Bryan Fox <bryfox@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/19007-
4cc6e252ed8aa54a@postgresql.org
Amit Kapila [Tue, 9 Dec 2025 07:25:20 +0000 (07:25 +0000)]
Fix LOCK_TIMEOUT handling in slotsync worker.
Previously, the slotsync worker relied on SIGINT for graceful shutdown
during promotion. However, SIGINT is also used by the LOCK_TIMEOUT handler
to cancel queries. Since the slotsync worker can lock catalog tables while
parsing libpq tuples, this overlap caused it to ignore LOCK_TIMEOUT
signals and potentially wait indefinitely on locks.
This patch replaces the slotsync worker's SIGINT handler with
StatementCancelHandler to correctly process query-cancel interrupts.
Additionally, the startup process now uses SIGUSR1 to signal the slotsync
worker to stop during promotion. The worker exits after detecting that the
shared memory flag stopSignaled is set.
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 17, here it was introduced
Discussion: https://postgr.es/m/TY4PR01MB169078F33846E9568412D878C94A2A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Peter Eisentraut [Tue, 9 Dec 2025 05:58:39 +0000 (06:58 +0100)]
Remove useless casts in format arguments
There were a number of useless casts in format arguments, either
where the input to the cast was already in the right type, or
seemingly uselessly casting between types instead of just using the
right format placeholder to begin with.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/
07fa29f9-42d7-4aac-8834-
197918cbbab6%40eisentraut.org
Peter Eisentraut [Tue, 9 Dec 2025 05:58:39 +0000 (06:58 +0100)]
Clean up int64-related format strings
Remove some gratuitous uses of INT64_FORMAT. Make use of
PRIu64/PRId64 were appropriate, remove unnecessary casts.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/
07fa29f9-42d7-4aac-8834-
197918cbbab6%40eisentraut.org
Peter Eisentraut [Tue, 9 Dec 2025 05:58:39 +0000 (06:58 +0100)]
Remove unnecessary casts in printf format arguments (%zu/%zd)
Many of these are probably left over from before use of %zu/%zd was
portable.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/
07fa29f9-42d7-4aac-8834-
197918cbbab6%40eisentraut.org
Michael Paquier [Tue, 9 Dec 2025 05:53:17 +0000 (14:53 +0900)]
Use palloc_object() and palloc_array() in more areas of the tree
The idea is to encourage more the use of these new routines across the
tree, as these offer stronger type safety guarantees than palloc().
The following paths are included in this batch, treating all the areas
proposed by the author for the most trivial changes, except src/backend
(by far the largest batch):
src/bin/
src/common/
src/fe_utils/
src/include/
src/pl/
src/test/
src/tutorial/
Similar work has been done in
31d3847a37be.
The code compiles the same before and after this commit, with the
following exceptions due to changes in line numbers because some of the
new allocation formulas are shorter:
blkreftable.c
pgfnames.c
pl_exec.c
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/
ad0748d4-3080-436e-b0bc-
ac8f86a3466a@gmail.com
Andres Freund [Tue, 9 Dec 2025 04:03:54 +0000 (23:03 -0500)]
Improve documentation for pg_atomic_unlocked_write_u32()
After my recent commit
7902a47c20b, Nathan noticed that
pg_atomic_unlocked_write_u64() was not accurately described by the comments
for the 32bit version. Turns out the 32bit version has suffered from
copy-and-paste-itis since its introduction. Fix.
Reported-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/aTGt7q4Jvn97uGAx@nathan
David Rowley [Tue, 9 Dec 2025 01:41:30 +0000 (14:41 +1300)]
Doc: fix typo in hash index documentation
Plus a similar fix to the README.
Backpatch as far back as the sgml issue exists. The README issue does
exist in v14, but that seems unlikely to harm anyone.
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/
ed3db7ea-55b4-4809-86af-
81ad3bb2c7d3@gmail.com
Backpatch-through: 15
Michael Paquier [Tue, 9 Dec 2025 01:39:08 +0000 (10:39 +0900)]
libpq: Refactor logic checking for exit() in shared library builds
This commit refactors the sanity check done by libpq to ensure that
there is no exit() reference in the build, moving the check from a
standalone Makefile rule to a perl script.
Platform-specific checks are now part of the script, avoiding most of
the duplication created by the introduction of this check for meson, but
not all of them:
- Solaris and Windows skipped in the script.
- Whitelist of symbols is in the script.
- nm availability, with its path given as an option of the script. Its
execution is checked in the script.
- Check is disabled if coverage reports are enabled. This part is not
pushed down to the script.
- Check is disabled for static builds of libpq. This part is filtered
out in each build script.
A trick is required for the stamp file, in the shape of an optional
argument that can be given to the script. Meson expects the stamp in
output and uses this argument, generating the stamp file in the script.
Meson is able to handle the removal of the stamp file internally when
libpq needs to be rebuilt and the check done again.
This refactoring piece has come up while discussing the addition of more
items in the symbols considered as acceptable.
This sanity check has never been run by meson since its introduction in
dc227eb82ea8, so it is possible that this fails in some of the buildfarm
members. At least the CI is happy with it, but let's see how it goes.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-authored-by: VASUKI M <vasukim1992002@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/19095-
6d8256d0c37d4be2@postgresql.org
Tom Lane [Tue, 9 Dec 2025 00:06:36 +0000 (19:06 -0500)]
Fix minor portability issue in pg_resetwal.c.
The argument of isspace() (like other <ctype.h> functions)
must be cast to unsigned char to ensure portable results.
Per NetBSD buildfarm members. Oversight in
636c1914b.
Peter Geoghegan [Mon, 8 Dec 2025 18:48:09 +0000 (13:48 -0500)]
Avoid pointer chasing in _bt_readpage inner loop.
Make _bt_readpage pass down the current scan direction to various
utility functions within its pstate variable. Also have _bt_readpage
work off of a local copy of scan->ignore_killed_tuples within its
per-tuple loop (rather than using scan->ignore_killed_tuples directly).
Testing has shown that this significantly benefits large range scans,
which are naturally able to take full advantage of the pstate.startikey
optimization added by commit
8a510275. Running a pgbench script with a
"SELECT abalance FROM pgbench_accounts WHERE aid BETWEEN ..." query
shows an increase in transaction throughput of over 5%. There also
appears to be a small performance benefit when running pgbench's
built-in select-only script.
Follow-up to commit
65d6acbc.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzmwMwcwKFgaf+mYPwiz3iL4AqpXnwtW_O0vqpWPXRom9Q@mail.gmail.com
Álvaro Herrera [Mon, 8 Dec 2025 18:23:38 +0000 (19:23 +0100)]
Unify some more messages
No backpatch here because of message wording changes.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/
202512081537.ahw5gwoencou@alvherre.pgsql
Peter Geoghegan [Mon, 8 Dec 2025 18:15:00 +0000 (13:15 -0500)]
Relocate _bt_readpage and related functions.
Quite a bit of code within nbtutils.c is only called by _bt_readpage.
Move _bt_readpage and all of the nbtutils.c functions it depends on into
a new .c file, nbtreadpage.c. Also reorder some of the functions within
the new file for clarity.
This commit has no functional impact. It is strictly mechanical.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzmwMwcwKFgaf+mYPwiz3iL4AqpXnwtW_O0vqpWPXRom9Q@mail.gmail.com
Álvaro Herrera [Mon, 8 Dec 2025 15:30:52 +0000 (16:30 +0100)]
Unify error messages
No visible changes, just refactor how messages are constructed.
Heikki Linnakangas [Mon, 8 Dec 2025 14:54:54 +0000 (16:54 +0200)]
pg_resetwal: Use separate flags for whether an option is given
Currently, we use special values that are otherwise invalid for each
option to indicate "option was not given". Replace that with separate
boolean variables for each option. It seems more clear to be explicit.
We were already doing that for the -m option, because there were no
invalid values for nextMulti that we could use (since commit
94939c5f3a).
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/
81adf5f3-36ad-4bcd-9ba5-
1b95c7b7a807@iki.fi
Heikki Linnakangas [Mon, 8 Dec 2025 14:54:50 +0000 (16:54 +0200)]
pg_resetwal: Reject negative and out of range arguments
The strtoul() function that we used to parse many of the options
accepts negative values, and silently wraps them to the equivalent
unsigned values. For example, -1 becomes 0xFFFFFFFF, on platforms
where unsigned long is 32 bits wide. Also, on platforms where
"unsigned long" is 64 bits wide, we silently casted values larger than
UINT32_MAX to the equivalent 32-bit value. Both of those behaviors
seem undesirable, so tighten up the parsing to reject them.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/
81adf5f3-36ad-4bcd-9ba5-
1b95c7b7a807@iki.fi
Peter Eisentraut [Mon, 8 Dec 2025 14:53:52 +0000 (15:53 +0100)]
Make ecpg parse.pl more robust with braces
When parse.pl processes braces, it does not take into account that
braces could also be their own token if single quoted ('{', '}').
This is not currently used but a future patch wants to make use of it.
This fixes that by using lookaround assertions to detect the quotes.
To make sure all Perl versions in play support this and to avoid
surprises later on, let's give this a spin on the buildfarm now. It
can exist independently of future work.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/
a855795d-e697-4fa5-8698-
d20122126567@eisentraut.org
Peter Eisentraut [Mon, 8 Dec 2025 12:52:42 +0000 (13:52 +0100)]
Use PGAlignedXLogBlock for some code simplification
The code in BootStrapXLOG() and in pg_test_fsync.c tried to align WAL
buffers in complicated ways. Also, they still used XLOG_BLCKSZ for
the alignment, even though that should now be PG_IO_ALIGN_SIZE. This
can now be simplified and made more consistent by using
PGAlignedXLogBlock, either directly in BootStrapXLOG() and using
alignas in pg_test_fsync.c.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/
f462a175-b608-44a1-b428-
bdf351e914f4%40eisentraut.org
Michael Paquier [Mon, 8 Dec 2025 06:23:09 +0000 (15:23 +0900)]
test_custom_stats: Test module for custom cumulative statistics
This test module acts as a replacement that existed prior to
d52c24b0f808 in the test module injection_points. It uses a more
flexible structure than its ancestor:
- Two libraries are built, one for fixed-sized stats and one for
variable-sized stats.
- No GUCs required. The stats are enabled only if one or both libraries
are loaded with shared_preload_libraries.
- Same kind IDs reserved: 25 (variable-sized) and 26 (fixed-sized)
The goal of this redesign is to be able to easier extend the code
coverage provided by this module for other changes that are currently
under discussion, and injection_points was not suited for these.
Injection points are also now widely used in the tree now, so extending
more the test coverage for custom pgstats in the test module
injection_points would be a riskier long-term move.
The new code is mostly a copy of what existed previously in the test
module injection_points, with the same callbacks defined for fixed-sized
and variable-sized stats, but a simpler overall structure in terms of
the stats counters updated.
The test coverage should remain the same as previously: one TAP test is
used to check data reports, crash recovery and clean restart scenarios.
Tests are added for the manual reset of fixed-sized stats, something
not tested until now.
Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAA5RZ0sJgO6GAwgFxmzg9MVP=rM7Us8KKcWpuqxe-f5qxmpE0g@mail.gmail.com
Amit Kapila [Mon, 8 Dec 2025 05:19:28 +0000 (05:19 +0000)]
Prevent invalidation of newly created replication slots.
A race condition could cause a newly created replication slot to become
invalidated between WAL reservation and a checkpoint.
Previously, if the required WAL was removed, we retried the reservation
process. However, the slot could still be invalidated before the retry if
the WAL was not yet removed but the checkpoint advanced the redo pointer
beyond the slot's intended restart LSN and computed the minimum LSN that
needs to be preserved for the slots.
The fix is to acquire an exclusive lock on ReplicationSlotAllocationLock
during WAL reservation to serialize WAL reservation and checkpoint's
minimum restart_lsn computation. This ensures that, if WAL reservation
occurs first, the checkpoint waits until restart_lsn is updated before
removing WAL. If the checkpoint runs first, subsequent WAL reservations
pick a position at or after the latest checkpoint's redo pointer.
We can't use the same fix for branch 17 and prior because commit
2090edc6f3 changed to compute to the minimum restart_LSN among slot's at
the beginning of checkpoint (or restart point). The fix for 17 and prior
branches is under discussion and will be committed separately.
Reported-by: suyu.cmj <mengjuan.cmj@alibaba-inc.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Vitaly Davydov <v.davydov@postgrespro.ru>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/
5e045179-236f-4f8f-84f1-
0f2566ba784c.mengjuan.cmj@alibaba-inc.com
Michael Paquier [Mon, 8 Dec 2025 03:45:20 +0000 (12:45 +0900)]
injection_points: Remove portions related to custom pgstats
The test module injection_points has been used as a landing spot to
provide coverage for the custom pgstats APIs, for both fixed-sized and
variable-sized stats kinds. Some recent work related to pgstats is
proving that this structure makes the implementation of new tests
harder.
This commit removes the code related to pgstats from injection_points,
and an equivalent will be reintroduced as a separate test module in a
follow-up commit. This removal is done in its own commit for clarity.
Using injection_points for this test coverage was perhaps not the best
way to design things, but this was good enough while working on the
first flavor of the custom pgstats APIs. Using a new test module will
make easier the introduction of new tests, and we will not need to worry
about the impact of new changes related to custom pgstats could have
with the internals of injection_points.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0sJgO6GAwgFxmzg9MVP=rM7Us8KKcWpuqxe-f5qxmpE0g@mail.gmail.com
Michael Paquier [Mon, 8 Dec 2025 01:23:48 +0000 (10:23 +0900)]
Improve error messages of input functions for pg_dependencies and pg_ndistinct
The error details updated in this commit can be reached in the
regression tests. They did not follow the project style, and they
should be written them as full sentences.
Some of the errors are switched to use an elog(), for cases that involve
paths that cannot be reached based on the previous state of the parser
processing the input data (array start, object end, etc.). The error
messages for these cases use now a more consistent style across the
board, with the state of the parser reported for debugging.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/
1353179.
1764901790@sss.pgh.pa.us
Tom Lane [Sun, 7 Dec 2025 19:32:36 +0000 (14:32 -0500)]
ecpg: refactor to eliminate cast-away-const in find_variable().
find_variable() and its subroutines transiently scribble on the
passed-in "name" string, even though we've declared that "const".
The string is in fact temporary, so this is not very harmful,
but it's confusing and will produce compiler warnings with
late-model gcc. Rearrange the code so that instead of modifying
the given string, we make temporary copies of the parts that we
need separated out. (I used loc_alloc so that the copies are
short-lived and don't need to be freed explicitly.)
This code is poorly structured and confusing, to the point where
my first attempt to fix it was wrong. It is also under-tested,
allowing the broken v1 patch to nonetheless pass regression.
I'll restrain myself from rewriting it completely, and just add
some comments and more test cases.
We will probably want to back-patch this once gcc 15.2 becomes
more widespread, but for now just put it in master.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/
1324889.
1764886170@sss.pgh.pa.us
Tom Lane [Sun, 7 Dec 2025 16:54:33 +0000 (11:54 -0500)]
Micro-optimize datatype conversions in datum_to_jsonb_internal.
The general case for converting to a JSONB numeric value is to run the
source datatype's output function and then numeric_in, but we can do
substantially better than that for integer and numeric source values.
This patch improves the speed of jsonb_agg by 30% for integer input,
and nearly 2X for numeric input.
Sadly, the obvious idea of using float4_numeric and float8_numeric
to speed up those cases doesn't work: they are actually slower than
the generic coerce-via-I/O method, and not by a small amount.
They might round off differently than this code has historically done,
too. Leave that alone pending possible changes in those functions.
We can also do better than the existing code for text/varchar/bpchar
source data; this optimization is similar to one that already exists
in the json_agg() code. That saves 20% or so for such inputs.
Also make a couple of other minor improvements, such as not giving
JSONTYPE_CAST its own special case outside the switch when it could
perfectly well be handled inside, and not using dubious string hacking
to detect infinity and NaN results.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/
1060917.
1753202222@sss.pgh.pa.us
Tom Lane [Sun, 7 Dec 2025 16:52:22 +0000 (11:52 -0500)]
Remove fundamentally-redundant processing in jsonb_agg() et al.
The various variants of jsonb_agg() operate as follows,
for each aggregate input value:
1. Build a JsonbValue tree representation of the input value.
2. Flatten the JsonbValue tree into a Jsonb in on-disk format.
3. Iterate through the Jsonb, building a JsonbValue that is part
of the aggregate's state stored in aggcontext, but is otherwise
identical to what phase 1 built.
This is very slightly less silly than it sounds, because phase 1
involves calling non-JSONB code such as datatype output functions,
which are likely to leak memory, and we don't want to leak into the
aggcontext. Nonetheless, phases 2 and 3 are accomplishing exactly
nothing that is useful if we can make phase 1 put the JsonbValue
tree where we need it. We could probably do that with a bunch of
MemoryContextSwitchTo's, but what seems more robust is to give
pushJsonbValue the responsibility of building the JsonbValue tree
in a specified non-current memory context. The previous patch
created the infrastructure for that, and this patch simply makes
the aggregate functions use it and then rips out phases 2 and 3.
For me, this makes jsonb_agg() with a text column as input run
about 2X faster than before. It's not yet on par with json_agg(),
but this removes a whole lot of the difference.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/
1060917.
1753202222@sss.pgh.pa.us
Tom Lane [Sun, 7 Dec 2025 16:46:49 +0000 (11:46 -0500)]
Revise APIs for pushJsonbValue() and associated routines.
Instead of passing "JsonbParseState **" to pushJsonbValue(),
pass a pointer to a JsonbInState, which will contain the
parseState stack pointer as well as other useful fields.
Also, instead of returning a JsonbValue pointer that is often
meaningless/ignored, return the top-level JsonbValue pointer
in the "result" field of the JsonbInState.
This involves a lot of (mostly mechanical) edits, but I think
the results are notationally cleaner and easier to understand.
Certainly the business with sometimes capturing the result of
pushJsonbValue() and sometimes not was bug-prone and incapable of
mechanical verification. In the new arrangement, JsonbInState.result
remains null until we've completed a valid sequence of pushes, so
that an incorrect sequence will result in a null-pointer dereference,
not mistaken use of a partial result.
However, this isn't simply an exercise in prettier notation.
The real reason for doing it is to provide a mechanism whereby
pushJsonbValue() can be told to construct the JsonbValue tree
in a context that is not CurrentMemoryContext. That happens
when a non-null "outcontext" is specified in the JsonbInState.
No callers exercise that option in this patch, but the next
patch in the series will make use of it.
I tried to improve the comments in this area too.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/
1060917.
1753202222@sss.pgh.pa.us
Tom Lane [Sun, 7 Dec 2025 16:33:35 +0000 (11:33 -0500)]
Add a macro for the declared typlen of type timetz.
pg_type.typlen says 12 for the size of timetz, but sizeof(TimeTzADT)
will be 16 on most platforms due to alignment padding. Using the
sizeof number is no problem for usages such as palloc'ing a result
datum, but in usages such as datumCopy we really ought to match
what pg_type says. Add a macro TIMETZ_TYPLEN so that we have a
symbolic way to write that rather than hard-coding "12".
I cannot find any place where we've needed this so far, but an
upcoming patch requires it.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/
2329959.
1765047648@sss.pgh.pa.us
Tom Lane [Sat, 6 Dec 2025 23:31:26 +0000 (18:31 -0500)]
Handle constant inputs to corr() and related aggregates more precisely.
The SQL standard says that corr() and friends should return NULL in
the mathematically-undefined case where all the inputs in one of
the columns have the same value. We were checking that by seeing
if the sums Sxx and Syy were zero, but that approach is very
vulnerable to roundoff error: if a sum is close to zero but not
exactly that, we'd come out with a pretty silly non-NULL result.
Instead, directly track whether the inputs are all equal by
remembering the common value in each column. Once we detect
that a new input is different from before, represent that by
storing NaN for the common value. (An objection to this scheme
is that if the inputs are all NaN, we will consider that they
were not all equal. But under IEEE float arithmetic rules,
one NaN is never equal to another, so this behavior is arguably
correct. Moreover it matches what we did before in such cases.)
Then, leave the sums at their exact value of zero for as long
as we haven't detected different input values.
This solution requires the aggregate transition state to contain
8 float values not 6, which is not problematic, and it seems to add
less than 1% to the aggregates' runtime, which seems acceptable.
While we're here, improve corr()'s final function to cope with
overflow/underflow in the final calculation, and to clamp its
result to [-1, 1] in case of roundoff error.
Although this is arguably a bug fix, it requires a catversion bump
due to the change in aggregates' initial states, so it can't be
back-patched.
Patch written by me, but many of the ideas are due to Dean Rasheed,
who also did a deal of testing.
Bug: #19340
Reported-by: Oleg Ivanov <o15611@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/19340-
6fb9f6637f562092@postgresql.org
Tom Lane [Sat, 6 Dec 2025 18:34:48 +0000 (13:34 -0500)]
Doc: include JSON in the list of SQL-standard types.
Oversight I guess, it's been in the standard for awhile.
Reported-by: Bob Kline <bkline@rksystems.com>
Discussion: https://postgr.es/m/CAGjKmVoP4qVeJgkaBtQ6L46+OLARzmym53uQGhp5COw4wp65yQ@mail.gmail.com
Michael Paquier [Sat, 6 Dec 2025 05:41:29 +0000 (14:41 +0900)]
Improve error reporting of recovery test 027_stream_regress
Previously, the 027_stream_regress test reported the full contents of
regression.diffs upon a test failure, when the standby and the primary
were still alive. If a test fails quite badly, the amount of
information reported can be really high, bloating the reports in the
buildfarm, the CI, or even local runs.
In most cases, we have noticed that having all this information is not
necessary when attempting to identify the source of a problem in this
test. This commit changes the situation by including the head and tail
of regression.diffs in the reports generated on failure rather than its
full contents, building upon
b93f4e2f98b3 to optionally control the size
of the reports with the new environment variable
PG_TEST_FILE_READ_LINES.
This will perhaps require some more tuning, but the hope is to reduce
some of the buildfarm report bloat while making the information good
enough to deduce what is happening when something is going wrong, be it
in the buildfarm or some tests run in the CI, at least.
Suggested-by: Andres Freund <andres@anarazel.de>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAN55FZ1D6KXvjSs7YGsDeadqCxNF3UUhjRAfforzzP0k-cE=bA@mail.gmail.com
Michael Paquier [Sat, 6 Dec 2025 05:27:53 +0000 (14:27 +0900)]
Add PostgreSQL::Test::Cluster::read_head_tail() helper to PostgreSQL/Utils.pm
This function reads the lines from a file and filters its contents to
report its head and tail contents. The amount of contents to read from
a file can be tuned by the environment variable PG_TEST_FILE_READ_LINES,
that can be used to override the default of 50 lines. If the file whose
content is read has less lines than two times PG_TEST_FILE_READ_LINES,
the whole file is returned.
This will be used in a follow-up commit to limit the amount of
information reported by some of the TAP tests on failure, where we have
noticed that the contents reported by the buildfarm can be heavily
bloated in some cases, with the head and tail contents of a report being
able to provide enough information to be useful for debugging.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAN55FZ1D6KXvjSs7YGsDeadqCxNF3UUhjRAfforzzP0k-cE=bA@mail.gmail.com
Tom Lane [Sat, 6 Dec 2025 01:10:33 +0000 (20:10 -0500)]
Fix text substring search for non-deterministic collations.
Due to an off-by-one error, the code failed to find matches at the
end of the haystack. Fix by rewriting the loop.
While at it, fix a comment that claimed that the function could find
a zero-length match. Such a match could send a caller into an endless
loop. However, zero-length matches only make sense with an empty
search string, and that case is explicitly excluded by all callers.
To make sure it stays that way, add an Assert and a comment.
Bug: #19341
Reported-by: Adam Warland <adam.warland@infor.com>
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19341-
1d9a22915edfec58@postgresql.org
Backpatch-through: 18
Heikki Linnakangas [Fri, 5 Dec 2025 21:39:01 +0000 (23:39 +0200)]
Fix test to work with non-8kB block sizes
Author: Maxim Orlov <orlovmg@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACG%3Dezbtm%2BLOzEMyLX7rzGcAv3ez3F6nNpSJjvZeMzed0Oe6Pw%40mail.gmail.com
Nathan Bossart [Fri, 5 Dec 2025 20:21:12 +0000 (14:21 -0600)]
Add commit
86b276a4a9 to .git-blame-ignore-revs.
Robert Haas [Fri, 5 Dec 2025 16:05:12 +0000 (11:05 -0500)]
Don't reset the pathlist of partitioned joinrels.
apply_scanjoin_target_to_paths wants to avoid useless work and
platform-specific dependencies by throwing away the path list created
prior to applying the final scan/join target and constructing a whole
new one using the final scan/join target. However, this is only valid
when we'll consider all the same strategies after the pathlist reset
as before.
After resetting the path list, we reconsider Append and MergeAppend
paths with the modified target list; therefore, it's only valid for a
partitioned relation. However, what the previous coding missed is that
it cannot be a partitioned join relation, because that also has paths
that are not Append or MergeAppend paths and will not be reconsidered.
Thus, before this patch, we'd sometimes choose a partitionwise strategy
with a higher total cost than cheapest non-partitionwise strategy,
which is not good.
We had a surprising number of tests cases that were relying on this
bug to work as they did. A big part of the reason for this is that row
counts in regression test cases tend to be low, which brings the cost
of partitionwise and non-partitionwise strategies very close together,
especially for merge joins, where the real and perceived advantages of
a partitionwise approach are minimal. In addition, one test case
included a row-count-inflating join. In such cases, a partitionwise
join can easily be a loser on cost, because the total number of tuples
passing through an Append node is much higher than it is with a
non-partitionwise strategy. That test case is adjusted by adding
additional join clauses to avoid the row count inflation.
Although the failure of the planner to choose the lowest-cost path is a
bug, we generally do not back-patch fixes of this type, because planning
is not an exact science and there is always a possibility that some user
will end up with a plan that has a lower estimated cost but actually
runs more slowly. Hence, no backpatch here, either.
The code change here is exactly what was originally proposed by
Ashutosh, but the changes to the comments and test cases have been
very heavily rewritten by me, helped along by some very useful advice
from Richard Guo.
Reported-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Author: Robert Haas <rhaas@postgresql.org>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Arne Roland <arne.roland@malkut.net>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: http://postgr.es/m/CAExHW5toze58+jL-454J3ty11sqJyU13Sz5rJPQZDmASwZgWiA@mail.gmail.com
Tom Lane [Fri, 5 Dec 2025 16:17:14 +0000 (11:17 -0500)]
Fix some cases of indirectly casting away const.
Newest versions of gcc are able to detect cases where code implicitly
casts away const by assigning the result of strchr() or a similar
function applied to a "const char *" value to a target variable
that's just "char *". This of course creates a hazard of not getting
a compiler warning about scribbling on a string one was not supposed
to, so fixing up such cases is good.
This patch fixes a dozen or so places where we were doing that.
Most are trivial additions of "const" to the target variable,
since no actually-hazardous change was occurring. There is one
place in ecpg.trailer where we were indeed violating the intention
of not modifying a string passed in as "const char *". I believe
that's harmless not a live bug, but let's fix it by copying the
string before modifying it.
There is a remaining trouble spot in ecpg/preproc/variable.c,
which requires more complex surgery. I've left that out of this
commit because I want to study that code a bit more first.
We probably will want to back-patch this once compilers that detect
this pattern get into wider circulation, but for now I'm just
going to apply it to master to see what the buildfarm says.
Thanks to Bertrand Drouvot for finding a couple more spots than
I had.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/
1324889.
1764886170@sss.pgh.pa.us
Álvaro Herrera [Fri, 5 Dec 2025 15:16:27 +0000 (16:16 +0100)]
Stabilize tests some more
Tests added by commits
90eae926abbb,
2bc7e886fc1b,
bc32a12e0db2
have occasionally failed, depending on timing. Add some dependency
markers to the spec to try and remove the instability.
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Discussion: https://postgr.es/m/
202512041739.sgg3tb2yobe2@alvherre.pgsql
Heikki Linnakangas [Fri, 5 Dec 2025 09:32:38 +0000 (11:32 +0200)]
Fix setting next multixid's offset at offset wraparound
In commit
789d65364c, we started updating the next multixid's offset
too when recording a multixid, so that it can always be used to
calculate the number of members. I got it wrong at offset wraparound:
we need to skip over offset 0. Fix that.
Discussion: https://www.postgresql.org/message-id/
d9996478-389a-4340-8735-
bfad456b313c@iki.fi
Backpatch-through: 14
Michael Paquier [Fri, 5 Dec 2025 07:40:26 +0000 (16:40 +0900)]
Use more palloc_object() and palloc_array() in contrib/
The idea is to encourage more the use of these new routines across the
tree, as these offer stronger type safety guarantees than palloc(). In
an ideal world, palloc() would then act as an internal routine of these
flavors, whose footprint in the tree is minimal.
The patch sent by the author is very large, and this chunk of changes
represents something like 10% of the overall patch submitted.
The code compiled is the same before and after this commit, using
objdump to do some validation with a difference taken in-between. There
are some diffs, which are caused by changes in line numbers because some
of the new allocation formulas are shorter, for the following files:
trgm_regexp.c, xpath.c and pg_walinspect.c.
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/
ad0748d4-3080-436e-b0bc-
ac8f86a3466a@gmail.com
Michael Paquier [Fri, 5 Dec 2025 05:15:21 +0000 (14:15 +0900)]
Improve test output of extended statistics for ndistinct and dependencies
Corey Huinker has come up with a recipe that is more compact and more
pleasant to the eye for extended stats because we know that all of them
are 1-dimension JSON arrays. This commit switches the extended stats
tests to use replace() instead of jsonb_pretty(), splitting the data so
as one line is used for each item in the extended stats object.
This results in the removal of a good chunk of test output, that is now
easier to debug with one line used for each item in a stats object.
This patch has not been provided by Corey. This is some post-commit
cleanup work that I have noticed as good enough to do on its own while
reviewing the rest of the patch set Corey has posted.
Discussion: https://postgr.es/m/CADkLM=csMd52i39Ye8-PUUHyzBb3546eSCUTh-FBQ7bzT2uZ4Q@mail.gmail.com
Amit Kapila [Fri, 5 Dec 2025 04:12:55 +0000 (04:12 +0000)]
Rename column slotsync_skip_at to slotsync_last_skip.
Commit
76b78721ca introduced two new columns in pg_stat_replication_slots
to improve monitoring of slot synchronization. One of these columns was
named slotsync_skip_at, which is inconsistent with the naming convention
used for similar columns in other system views.
Columns that store timestamps of the most recent event typically use the
'last_' in the column name (e.g., last_autovacuum, checksum_last_failure).
Renaming slotsync_skip_at to slotsync_last_skip aligns with this pattern,
making the purpose of the column clearer and improving overall consistency
across the views.
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Michael Banck <mbanck@gmx.net>
Discussion: https://postgr.es/m/
20251128091552.GB13635@p46.dedyn.io;lightning.p46.dedyn.io
Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com
Michael Paquier [Fri, 5 Dec 2025 03:30:43 +0000 (12:30 +0900)]
Fix some compiler warnings
Some of the buildfarm members with some old gcc versions have been
complaining about an always-true test for a NULL pointer caused by a
combination of SOFT_ERROR_OCCURRED() and a local ErrorSaveContext
variable.
These warnings are taken care of by removing SOFT_ERROR_OCCURRED(),
switching to a direct variable check, like
56b1e88c804b.
Oversights in
e1405aa5e3ac and
44eba8f06e55.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/
1341064.
1764895052@sss.pgh.pa.us
Michael Paquier [Fri, 5 Dec 2025 00:21:13 +0000 (09:21 +0900)]
Show version of nodes in output of TAP tests
This commit adds the version information of a node initialized by
Cluster.pm, that may vary depending on the install_path given by the
test. The code was written so as the node information, that includes
the version number, was dumped before the version number was set.
This is particularly useful for the pg_upgrade TAP tests, that may mix
several versions for cross-version runs. The TAP infrastructure also
allows mixing nodes with different versions, so this information can be
useful for out-of-core tests.
Backpatch down to v15, where Cluster.pm and the pg_upgrade TAP tests
have been introduced.
Author: Potapov Alexander <a.potapov@postgrespro.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/e59bb-
692c0a80-5-
6f987180@
170377126
Backpatch-through: 15
Melanie Plageman [Thu, 4 Dec 2025 23:55:02 +0000 (18:55 -0500)]
Suppress spurious Coverity warning in prune freeze logic
Adjust the prune_freeze_setup() parameter types of new_relfrozen_xid and
new_relmin_mxid to prevent misleading Coverity analysis.
heap_page_prune_and_freeze() compared these values against NULL when
passing them to prune_freeze_setup(), causing Coverity to assume they
could be NULL and flag a possible null-pointer dereference later, even
though it occurs inside a directly related conditional.
Reported-by: Coverity
Author: Melanie Plageman <melanieplageman@gmail.com>
Nathan Bossart [Thu, 4 Dec 2025 21:42:18 +0000 (15:42 -0600)]
Fix key size of PrivateRefCountHash.
The key is the first member of PrivateRefCountEntry, which has type
Buffer. This commit changes the key size from sizeof(int32) to
sizeof(Buffer). This appears to be an oversight in commit
4b4b680c3d, but it's of no consequence because Buffer has been a
signed 32-bit integer for a long time.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aS77DTpl0fOkIKSZ%40ip-10-97-1-34.eu-west-3.compute.internal
Peter Eisentraut [Thu, 4 Dec 2025 19:44:52 +0000 (20:44 +0100)]
Remove no longer needed casts from Pointer
These casts used to be required when Pointer was char *, but now it's
void * (commit
1b2bb5077e9), so they are not needed anymore.
Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/
4154950a-47ae-4223-bd01-
1235cc50e933%40eisentraut.org
Peter Eisentraut [Thu, 4 Dec 2025 18:40:08 +0000 (19:40 +0100)]
Remove no longer needed casts to Pointer
These casts used to be required when Pointer was char *, but now it's
void * (commit
1b2bb5077e9), so they are not needed anymore.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/
4154950a-47ae-4223-bd01-
1235cc50e933%40eisentraut.org
Álvaro Herrera [Thu, 4 Dec 2025 17:12:08 +0000 (18:12 +0100)]
amcheck: Fix snapshot usage in bt_index_parent_check
We were using SnapshotAny to do some index checks, but that's wrong and
causes spurious errors when used on indexes created by CREATE INDEX
CONCURRENTLY. Fix it to use an MVCC snapshot, and add a test for it.
This problem came in with commit
5ae2087202af, which introduced
uniqueness check. Backpatch to 17.
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Backpatch-through: 17
Discussion: https://postgr.es/m/CANtu0ojmVd27fEhfpST7RG2KZvwkX=dMyKUqg0KM87FkOSdz8Q@mail.gmail.com
Peter Eisentraut [Thu, 4 Dec 2025 10:23:23 +0000 (11:23 +0100)]
headerscheck ccache support
Currently, headerscheck and cpluspluscheck are very slow, and they
defeat use of ccache. This fixes that, and now they are much faster.
The problem was that the test files are created in a randomly-named
directory (`mktemp -d /tmp/$me.XXXXXX`), and this directory is
mentioned on the compiler command line, which is part of the cache
key.
The solution is to create the test files in the build directory. For
example, for src/include/storage/ipc.h, we generate
tmp_headerscheck_c/src_include_storage_ipc_h.c (or .cpp)
Now ccache works. (And it's also a bit easier to debug everything
with this naming.)
(The subdirectory is used to keep the cleanup trap simple.)
The observed speedup on Cirrus CI for headerscheck plus cpluspluscheck
is from about 1min 20s to only 20s. In local use, the speedups are
similar.
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/
b49e74d4-3cf9-4d1c-9dce-
09f75e55d026%40eisentraut.org
Peter Eisentraut [Fri, 28 Nov 2025 10:36:41 +0000 (11:36 +0100)]
headerscheck: Use LLVM_CPPFLAGS
Otherwise, headerscheck will fail if the LLVM headers are in a
location not reached by the normal CFLAGS/CPPFLAGS.
Discussion: https://www.postgresql.org/message-id/flat/
b49e74d4-3cf9-4d1c-9dce-
09f75e55d026%40eisentraut.org
Alexander Korotkov [Thu, 4 Dec 2025 08:38:12 +0000 (10:38 +0200)]
Fix incorrect assertion bound in WaitForLSN()
The assertion checking MyProcNumber used MaxBackends as the upper
bound, but the procInfos array is allocated with size
MaxBackends + NUM_AUXILIARY_PROCS. This inconsistency would cause
a false assertion failure if an auxiliary process calls WaitForLSN().
Author: Xuneng Zhou <xunengzhou@gmail.com>
Andres Freund [Wed, 3 Dec 2025 23:38:20 +0000 (18:38 -0500)]
Rename BUFFERPIN wait event class to BUFFER
In an upcoming patch more wait events will be added to the wait event
class (for buffer locking), making the current name too
specific. Alternatively we could introduce a dedicated wait event class for
those, but it seems somewhat confusing to have a BUFFERPIN and a BUFFER wait
event class.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
Andres Freund [Wed, 3 Dec 2025 23:38:20 +0000 (18:38 -0500)]
Add pg_atomic_unlocked_write_u64
The 64bit equivalent of pg_atomic_unlocked_write_u32(), to be used in an
upcoming patch converting BufferDesc.state into a 64bit atomic.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
Andres Freund [Wed, 3 Dec 2025 23:38:20 +0000 (18:38 -0500)]
bufmgr: Turn BUFFER_LOCK_* into an enum
It seems cleaner to use an enum to tie the different values together. It also
helps to have a more descriptive type in the argument to various functions.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
Tom Lane [Wed, 3 Dec 2025 18:23:45 +0000 (13:23 -0500)]
Make stats_ext test faster under cache-clobbering test conditions.
Commit
1eccb9315 added a test case that will cause a large number
of evaluations of a plpgsql function. With -DCLOBBER_CACHE_ALWAYS,
that takes an unreasonable amount of time (hours) because the
function's cache entries are repeatedly deleted and rebuilt.
That doesn't add any useful test coverage --- other test cases
already exercise plpgsql well enough --- and it's not part of what
this test intended to cover. We can get the same planner coverage,
if not more, by making the test directly invoke numeric_lt().
Reported-by: Tomas Vondra <tomas@vondra.me>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/
baf1ae02-83bd-4f5d-872a-
1d04f11a9073@vondra.me
Heikki Linnakangas [Wed, 3 Dec 2025 17:39:34 +0000 (19:39 +0200)]
Add test for multixid wraparound
Author: Andrey Borodin <amborodin@acm.org>
Discussion: https://www.postgresql.org/message-id/
7de697df-d74d-47db-9f73-
e069b7349c4b@iki.fi
Heikki Linnakangas [Wed, 3 Dec 2025 17:15:08 +0000 (19:15 +0200)]
Set next multixid's offset when creating a new multixid
With this commit, the next multixid's offset will always be set on the
offsets page, by the time that a backend might try to read it, so we
no longer need the waiting mechanism with the condition variable. In
other words, this eliminates "corner case 2" mentioned in the
comments.
The waiting mechanism was broken in a few scenarios:
- When nextMulti was advanced without WAL-logging the next
multixid. For example, if a later multixid was already assigned and
WAL-logged before the previous one was WAL-logged, and then the
server crashed. In that case the next offset would never be set in
the offsets SLRU, and a query trying to read it would get stuck
waiting for it. Same thing could happen if pg_resetwal was used to
forcibly advance nextMulti.
- In hot standby mode, a deadlock could happen where one backend waits
for the next multixid assignment record, but WAL replay is not
advancing because of a recovery conflict with the waiting backend.
The old TAP test used carefully placed injection points to exercise
the old waiting code, but now that the waiting code is gone, much of
the old test is no longer relevant. Rewrite the test to reproduce the
IPC/MultixactCreation hang after crash recovery instead, and to verify
that previously recorded multixids stay readable.
Backpatch to all supported versions. In back-branches, we still need
to be able to read WAL that was generated before this fix, so in the
back-branches this includes a hack to initialize the next offsets page
when replaying XLOG_MULTIXACT_CREATE_ID for the last multixid on a
page. On 'master', bump XLOG_PAGE_MAGIC instead to indicate that the
WAL is not compatible.
Author: Andrey Borodin <amborodin@acm.org>
Reviewed-by: Dmitry Yurichev <dsy.075@yandex.ru>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Ivan Bykov <i.bykov@modernsys.ru>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/
172e5723-d65f-4eec-b512-
14beacb326ce@yandex.ru
Backpatch-through: 14
Nathan Bossart [Wed, 3 Dec 2025 16:54:37 +0000 (10:54 -0600)]
Use "foo(void)" for definitions of functions with no parameters.
Standard practice in PostgreSQL is to use "foo(void)" instead of
"foo()", as the latter looks like an "old-style" function
declaration. Similar changes were made in commits
cdf4b9aff2,
0e72b9d440,
7069dbcc31,
f1283ed6cc,
7b66e2c086,
e95126cf04, and
9f7c527af3.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/aTBObQPg%2Bps5I7vl%40ip-10-97-1-34.eu-west-3.compute.internal
Álvaro Herrera [Wed, 3 Dec 2025 15:37:06 +0000 (16:37 +0100)]
Put back alternative-output expected files
These were removed in
5dee7a603f66, but that was too optimistic, per
buildfarm member prion as reported by Tom Lane. Mea (Álvaro's) culpa.
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Discussion: https://postgr.es/m/570630.
1764737028@sss.pgh.pa.us
Daniel Gustafsson [Wed, 3 Dec 2025 14:22:38 +0000 (15:22 +0100)]
doc: Consistently use restartpoint in the documentation
The majority of cases already used "restartpoint" with just a few
instances of "restart point". Changing the latter spelling to the
former ensures consistency in the user facing documentation. Code
comments are not affected by this since it is not worth the churn
to change anything there.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/
0F6E38D0-649F-4489-B2C1-
43CD937E6636@yesql.se
Peter Eisentraut [Wed, 3 Dec 2025 13:41:12 +0000 (14:41 +0100)]
Fix stray references to SubscriptRef
This type never existed. SubscriptingRef was meant instead.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/
2eaa45e3-efc5-4d75-b082-
f8159f51445f%40eisentraut.org
Peter Eisentraut [Wed, 3 Dec 2025 09:22:17 +0000 (10:22 +0100)]
Change Pointer to void *
The comment for the Pointer type said 'XXX Pointer arithmetic is done
with this, so it can't be void * under "true" ANSI compilers.'. This
has been fixed in the previous commit
756a4368932. This now changes
the definition of the type from char * to void *, as envisaged by that
comment.
Extension code that relies on using Pointer for pointer arithmetic
will need to make changes similar to commit
756a4368932, but those
changes would be backward compatible.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/
4154950a-47ae-4223-bd01-
1235cc50e933%40eisentraut.org
Peter Eisentraut [Wed, 3 Dec 2025 08:54:15 +0000 (09:54 +0100)]
Don't rely on pointer arithmetic with Pointer type
The comment for the Pointer type says 'XXX Pointer arithmetic is done
with this, so it can't be void * under "true" ANSI compilers.'. This
fixes that. Change from Pointer to use char * explicitly where
pointer arithmetic is needed. This makes the meaning of the code
clearer locally and removes a dependency on the actual definition of
the Pointer type. (The definition of the Pointer type is not changed
in this commit.)
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/
4154950a-47ae-4223-bd01-
1235cc50e933%40eisentraut.org
Peter Eisentraut [Wed, 3 Dec 2025 07:52:28 +0000 (08:52 +0100)]
Use more appropriate DatumGet* function
Use DatumGetCString() instead of DatumGetPointer() for returning a C
string. Right now, they are the same, but that doesn't always have to
be so.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/
4154950a-47ae-4223-bd01-
1235cc50e933%40eisentraut.org
Peter Eisentraut [Wed, 3 Dec 2025 07:40:33 +0000 (08:40 +0100)]
Remove useless casts to Pointer
in arguments of memcpy() and memmove() calls
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/
4154950a-47ae-4223-bd01-
1235cc50e933%40eisentraut.org
Amit Kapila [Wed, 3 Dec 2025 03:31:31 +0000 (03:31 +0000)]
Fix shadow variable warning in subscriptioncmds.c.
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CAHut+PsF8R0Bt4J3c92+T2F0mun0rRfK=-GH+iBv2s-O8ahJJw@mail.gmail.com
Nathan Bossart [Tue, 2 Dec 2025 22:40:23 +0000 (16:40 -0600)]
Use LW_SHARED in dsa.c where possible.
Both dsa_get_total_size() and dsa_get_total_size_from_handle() take
an exclusive lock just to read a variable. This commit reduces the
lock level to LW_SHARED in those functions.
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/aS8fMzWs9e8iHxk2%40nathan
Heikki Linnakangas [Tue, 2 Dec 2025 19:11:15 +0000 (21:11 +0200)]
Fix amcheck's handling of half-dead B-tree pages
amcheck incorrectly reported the following error if there were any
half-dead pages in the index:
ERROR: mismatch between parent key and child high key in index
"amchecktest_id_idx"
It's expected that a half-dead page does not have a downlink in the
parent level, so skip the test.
Reported-by: Konstantin Knizhnik <knizhnik@garret.ru>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Discussion: https://www.postgresql.org/message-id/
33e39552-6a2a-46f3-8b34-
3f9f8004451f@garret.ru
Backpatch-through: 14
Heikki Linnakangas [Tue, 2 Dec 2025 19:11:05 +0000 (21:11 +0200)]
Add a test for half-dead pages in B-tree indexes
To increase our test coverage in general, and because I will use this
in the next commit to test a bug we currently have in amcheck.
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/
33e39552-6a2a-46f3-8b34-
3f9f8004451f@garret.ru
Heikki Linnakangas [Tue, 2 Dec 2025 19:10:51 +0000 (21:10 +0200)]
Fix amcheck's handling of incomplete root splits in B-tree
When the root page is being split, it's normal that root page
according to the metapage is not marked BTP_ROOT. Fix bogus error in
amcheck about that case.
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/
abd65090-5336-42cc-b768-
2bdd66738404@iki.fi
Backpatch-through: 14