git.postgresql.org Git - postgresql.git/commit

projects / postgresql.git / commit

summary | shortlog | log | commit | commitdiff | tree
(parent: 99cd889) | patch

author	Jeff Davis <jdavis@postgresql.org>
	Mon, 1 Dec 2025 19:06:17 +0000 (11:06 -0800)
committer	Jeff Davis <jdavis@postgresql.org>
	Mon, 1 Dec 2025 19:06:17 +0000 (11:06 -0800)
commit	19b966243c38196a33b033fb0c259dcf760c0d69
tree	5588be32dc7672158eec3b82947586539681b18d	tree
parent	99cd8890becacf9d7059297c3d75cd388ad83ac0	commit \| diff

Make regex "max_chr" depend on encoding, not provider.

The regex mechanism scans through the first "max_chr" character values
to cache character property ranges (isalpha, etc.). For single-byte
encodings, there's no sense in scanning beyond UCHAR_MAX; but for
UTF-8 it makes sense to cache higher code point values (though not all
of them; only up to MAX_SIMPLE_CHR).

Prior to 5a38104b36, the logic about how many character values to scan
was based on the pg_regex_strategy, which was dependent on the
provider. Commit 5a38104b36 preserved that logic exactly, allowing
different providers to define the "max_chr".

Now, change it to depend only on the encoding and whether
ctype_is_c. For this specific calculation, distinguishing between
providers creates more complexity than it's worth.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Reviewed-by: Chao Li <li.evan.chao@gmail.com>

src/backend/regex/regc_pg_locale.c		diff \| blob \| blame \| history
src/backend/utils/adt/pg_locale_libc.c		diff \| blob \| blame \| history
src/include/utils/pg_locale.h		diff \| blob \| blame \| history

This is the main PostgreSQL git repository.