Skip to content

[test] Cbdb postgres merge test5#1728

Open
chenjinbao1989 wants to merge 5756 commits intoapache:mainfrom
chenjinbao1989:cbdb-postgres-merge-test5
Open

[test] Cbdb postgres merge test5#1728
chenjinbao1989 wants to merge 5756 commits intoapache:mainfrom
chenjinbao1989:cbdb-postgres-merge-test5

Conversation

@chenjinbao1989
Copy link
Copy Markdown
Contributor

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


MasahikoSawada and others added 30 commits November 15, 2024 17:06
…kwards.

Previously LogicalIncreaseRestartDecodingForSlot() accidentally
accepted any LSN as the candidate_lsn and candidate_valid after the
restart_lsn of the replication slot was updated, so it potentially
caused the restart_lsn to move backwards.

A scenario where this could happen in logical replication is: after a
logical replication restart, based on previous candidate_lsn and
candidate_valid values in memory, the restart_lsn advances upon
receiving a subscriber acknowledgment. Then, logical decoding restarts
from an older point, setting candidate_lsn and candidate_valid based
on an old RUNNING_XACTS record. Subsequent subscriber acknowledgments
then update the restart_lsn to an LSN older than the current value.

In the reported case, after WAL files were removed by a checkpoint,
the retreated restart_lsn prevented logical replication from
restarting due to missing WAL segments.

This change essentially modifies the 'if' condition to 'else if'
condition within the function. The previous code had an asymmetry in
this regard compared to LogicalIncreaseXminForSlot(), which does
almost the same thing for different fields.

The WAL removal issue was reported by Hubert Depesz Lubaczewski.

Backpatch to all supported versions, since the bug exists since 9.4
where logical decoding was introduced.

Reviewed-by: Tomas Vondra, Ashutosh Bapat, Amit Kapila
Discussion: https://postgr.es/m/Yz2hivgyjS1RfMKs%40depesz.com
Discussion: https://postgr.es/m/85fff40e-148b-4e86-b921-b4b846289132%40vondra.me
Backpatch-through: 13
After commit 5a2fed911a85ed6d8a015a6bafe3a0d9a69334ae, the catalog state
resulting from these commands ceased to affect sessions.  Restore the
longstanding behavior, which is like beginning the session with a SET
ROLE command.  If cherry-picking the CVE-2024-10978 fixes, default to
including this, too.  (This fixes an unintended side effect of fixing
CVE-2024-10978.)  Back-patch to v12, like that commit.  The release team
decided to include v12, despite the original intent to halt v12 commits
earlier this week.

Tom Lane and Noah Misch.  Reported by Etienne LAFARGE.

Discussion: https://postgr.es/m/CADOZwSb0UsEr4_UTFXC5k7=fyyK8uKXekucd+-uuGjJsGBfxgw@mail.gmail.com
Commits aac2c9b4f et al. added a bool field to struct ResultRelInfo.
That's no problem in the master branch, but in released branches
care must be taken when modifying publicly-visible structs to avoid
an ABI break for extensions.  Frequently we solve that by adding the
new field at the end of the struct, and that's what was done here.
But ResultRelInfo has stricter constraints than just about any other
node type in Postgres.  Some executor APIs require extensions to index
into arrays of ResultRelInfo, which means that any change whatever in
sizeof(ResultRelInfo) causes a fatal ABI break.

Fortunately, this is easy to fix, because the new field can be
squeezed into available padding space instead --- indeed, that's where
it was put in master, so this fix also removes a cross-branch coding
variation.

Per report from Pavan Deolasee.  Patch v14-v17 only; earlier versions
did not gain the extra field, nor is there any problem in master.

Discussion: https://postgr.es/m/CABOikdNmVBC1LL6pY26dyxAS2f+gLZvTsNt=2XbcyG7WxXVBBQ@mail.gmail.com
In the dim past we figured it was okay to ignore collations
when combining UNION set-operation nodes into a single N-way
UNION operation.  I believe that was fine at the time, but
it stopped being fine when we added nondeterministic collations:
the semantics of distinct-ness are affected by those.  v17 made
it even less fine by allowing per-child sorting operations to
be merged via MergeAppend, although I think we accidentally
avoided any live bug from that.

Add a check that collations match before deciding that two
UNION nodes are equivalent.  I also failed to resist the
temptation to comment plan_union_children() a little better.

Back-patch to all supported branches (v13 now), since they
all have nondeterministic collations.

Discussion: https://postgr.es/m/3605568.1731970579@sss.pgh.pa.us
In 17~, age(xid) and mxid_age(xid) were listed as deprecated.  Based on
the discussion that led to 48b5aa3143, this is not intentional as this
could break many existing monitoring queries.  Note that vacuumdb also
uses both of them.

In 16, both functions were listed under "Control Data Functions", which
is incorrect, so let's move them to the list of functions related to
transaction IDs and snapshots.

Author: Bertrand Drouvot
Discussion: https://postgr.es/m/Zzr2zZFyeFKXWe8a@ip-10-97-1-34.eu-west-3.compute.internal
Discussion: https://postgr.es/m/20231114013224.4z6oxa6p6va33rxr@awork3.anarazel.de
Backpatch-through: 16
Ordinarily transformSetOperationTree will collect all UNION/
INTERSECT/EXCEPT steps into the setOperations tree of the topmost
Query, so that leaf queries do not contain any setOperations.
However, it cannot thus flatten a subquery that also contains
WITH, ORDER BY, FOR UPDATE, or LIMIT.  I (tgl) forgot that in
commit 07b4c48 and wrote an assertion in rule deparsing that
a leaf's setOperations would always be empty.

If it were nonempty then we would want to parenthesize the subquery
to ensure that the output represents the setop nesting correctly
(e.g. UNION below INTERSECT had better get parenthesized).  So
rather than just removing the faulty Assert, let's change it into
an additional case to check to decide whether to add parens.  We
don't expect that the additional case will ever fire, but it's
cheap insurance.

Man Zeng and Tom Lane

Discussion: https://postgr.es/m/tencent_7ABF9B1F23B0C77606FC5FE3@qq.com
RelationSyncCache, the hash table in charge of tracking the relation
schemas sent through pgoutput, was forgetting to free the TupleDesc
associated to the two slots used to store the new and old tuples,
causing some memory to be leaked each time a relation is invalidated
when the slots of an existing relation entry are cleaned up.

This is rather hard to notice as the bloat is pretty minimal, but a
long-running WAL sender would be in trouble over time depending on the
workload.  sysbench has proved to be pretty good at showing the problem,
coupled with some memory monitoring of the WAL sender.

Issue introduced in 52e4f0c, that has added row filters for tables
logically replicated.

Author: Boyu Yang
Reviewed-by: Michael Paquier, Hou Zhijie
Discussion: https://postgr.es/m/DM3PR84MB3442E14B340E553313B5C816E3252@DM3PR84MB3442.NAMPRD84.PROD.OUTLOOK.COM
Backpatch-through: 15
Apparently this information has been outdated since first committed,
because we adopted a different implementation during development per
reviews and this detail was not updated in the README.

This has been wrong since commit 0ac5ad5 introduced the file in
2013.  Backpatch to all live branches.

Reported-by: Will Mortensen <will@extrahop.com>
Discussion: https://postgr.es/m/CAMpnoC6yEQ=c0Rdq-J7uRedrP7Zo9UMp6VZyP23QMT68n06cvA@mail.gmail.com
It failed to set the archive_command as it desired because of a syntax
problem.  Oversight in commit 90bcc7c2db1d.

This bug doesn't cause the test to fail, because the test only checks
pg_rewind's output messages, not the actual outcome (and the outcome in
both cases is that the file is kept, not deleted).  But in either case
the message about the file being kept is there, so it's hard to get
excited about doing much more.

Reported-by: Antonin Houska <ah@cybertec.at>
Author: Alexander Kukushkin <cyberdemn@gmail.com>
Discussion: https://postgr.es/m/7822.1732167825@antos
If the executable's .o files were produced by a compiler (probably gcc)
not using -moutline-atomics, and the corresponding .bc files were
produced by clang using -moutline-atomics (probably by default), then
the generated bitcode functions would have the target attribute
"+outline-atomics", and could fail at runtime when inlined.  If the
target ISA at bitcode generation time was armv8-a (the most conservative
aarch64 target, no LSE), then LLVM IR atomic instructions would generate
calls to functions in libgcc.a or libclang_rt.*.a that switch between
LL/SC and faster LSE instructions depending on a runtime AT_HWCAP check.
Since the corresponding .o files didn't need those functions, they
wouldn't have been included in the executable, and resolution would
fail.

At least Debian and Ubuntu are known to ship gcc and clang compilers
that target armv8-a but differ on the use of outline atomics by default.

Fix, by suppressing the outline atomics attribute in bitcode explicitly.
Inline LL/SC instructions will be generated for atomic operations in
bitcode built for armv8-a.  Only configure scripts are adjusted for now,
because the meson build system doesn't generate bitcode yet.

This doesn't seem to be a new phenomenon, so real cases of functions
using atomics that are inlined by JIT must be rare in the wild given how
long it took for a bug report to arrive.  The reported case could be
reduced to:

postgres=# set jit_inline_above_cost = 0;
SET
postgres=# set jit_above_cost = 0;
SET
postgres=# select pg_last_wal_receive_lsn();
WARNING:  failed to resolve name __aarch64_swp4_acq_rel
FATAL:  fatal llvm error: Program used external function
'__aarch64_swp4_acq_rel' which could not be resolved!

The change doesn't affect non-ARM systems or later target ISAs.

Back-patch to all supported releases.

Reported-by: Alexander Kozhemyakin <a.kozhemyakin@postgrespro.ru>
Discussion: https://postgr.es/m/18610-37bf303f904fede3%40postgresql.org
psql's --help was missed the description of the \pset variable
xheader_width, that should be listed when using \? or --help=commands,
and described for --help=variables.

Oversight in a45388d.

Author: Pavel Luzanov
Discussion: https://postgr.es/m/1e3e06d6-0807-4e62-a9f6-c11481e6eb10@postgrespro.ru
Backpatch-through: 16
Per PEP 3114, iterator.next() has been renamed to iterator.__next__(),
and one example in the documentation still used next().  This caused the
example provided to fail the function creation since Python 2 is not
supported anymore since 19252e8.

Author: Erik Wienhold
Discussion: https://postgr.es/m/173209043143.2092749.13692266486972491694@wrigleys.postgresql.org
Backpatch-through: 15
Previously we checked "for <stdbool.h> that conforms to C99" using
autoconf's AC_HEADER_STDBOOL macro.  We've required C99 since PostgreSQL
12, so the test was redundant, and under C23 it was broken: autoconf
2.69's implementation doesn't understand C23's new empty header (the
macros it's looking for went away, replaced by language keywords).
Later autoconf versions fixed that, but let's just remove the
anachronistic test.

HAVE_STDBOOL_H and HAVE__BOOL will no longer be defined, but they
weren't directly tested in core or likely extensions (except in 11, see
below).  PG_USE_STDBOOL (or USE_STDBOOL in 11 and 12) is still defined
when sizeof(bool) is 1, which should be true on all modern systems.
Otherwise we define our own bool type and values of size 1, which would
fail to compile under C23 as revealed by the broken test.  (We'll
probably clean that dead code up in master, but here we want a minimal
back-patchable change.)

This came to our attention when GCC 15 recently started using using C23
by default and failed to compile the replacement code, as reported by
Sam James and build farm animal alligator.

Back-patch to all supported releases, and then two older versions that
also know about <stdbool.h>, per the recently-out-of-support policy[1].
12 requires C99 so it's much like the supported releases, but 11 only
assumes C89 so it now uses AC_CHECK_HEADERS instead of the overly picky
AC_HEADER_STDBOOL.  (I could find no discussion of which historical
systems had <stdbool.h> but failed the conformance test; if they ever
existed, they surely aren't relevant to that policy's goals.)

[1] https://wiki.postgresql.org/wiki/Committing_checklist#Policies

Reported-by: Sam James <sam@gentoo.org>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org> (master version)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (approach)
Discussion: https://www.postgresql.org/message-id/flat/87o72eo9iu.fsf%40gentoo.org
Tcl 9 changed several API functions to take Tcl_Size, which is
ptrdiff_t, instead of int, for 64-bit enablement.  We have to change a
few local variables to be compatible with that.  We also provide a
fallback typedef of Tcl_Size for older Tcl versions.

The affected variables are used for quantities that will not approach
values beyond the range of int, so this doesn't change any
functionality.

Reviewed-by: Tristan Partin <tristan@partin.io>
Discussion: https://www.postgresql.org/message-id/flat/bce0fe54-75b4-438e-b42b-8e84bc7c0e9c%40eisentraut.org
On ARM platforms where the baseline CPU target lacks CRC instructions,
we need to supply a -march flag to persuade the compiler to compile
such instructions.  It turns out that our existing choice of
"-march=armv8-a+crc" has not worked for some time, because recent gcc
will interpret that as selecting software floating point, and then
will spit up if the platform requires hard-float ABI, as most do
nowadays.  The end result was to silently fall back to software CRC,
which isn't very desirable since in practice almost all currently
produced ARM chips do have hardware CRC.

We can fix this by using "-march=armv8-a+crc+simd" to enable the
correct ABI choice.  (This has no impact on the code actually
generated, since neither of the files we compile with this flag
does any floating-point stuff, let alone SIMD.)  Keep the test for
"-march=armv8-a+crc" since that's required for soft-float ABI,
but try that second since most platforms we're likely to build on
use hard-float.

Since this isn't working as-intended on the last several years'
worth of gcc releases, back-patch to all supported branches.

Discussion: https://postgr.es/m/4496616.iHFcN1HehY@portable-bastien
Commit bc5a4dfc accidentally left a check for <stdbool.h> in
meson.build's header_checks.  Synchronize with configure, which no
longer defines HAVE_STDBOOL_H.

There is still a reference to <stdbool.h> in an earlier test to see if
we need -std=c99 to get C99 features, like autoconf 2.69's
AC_PROG_CC_C99.  (Therefore the test remove by this commit was
tautological since day one: you'd have copped "C compiler does not
support C99" before making it this far.)

Back-patch to 16, where meson begins.
This WARNING appeared because SearchSysCacheLocked1() read
cc_relisshared before catcache initialization, when the field is false
unconditionally.  On the basis of reading false there, it constructed a
locktag as though pg_tablespace weren't relisshared.  Only shared
catalogs could be affected, and only GRANT TABLESPACE was affected in
practice.  SearchSysCacheLocked1() callers use one other shared-relation
syscache, DATABASEOID.  DATABASEOID is initialized by the end of
CheckMyDatabase(), making the problem unreachable for pg_database.

Back-patch to v13 (all supported versions).  This has no known impact
before v16, where ExecGrant_common() first appeared.  Earlier branches
avoid trouble by having a separate ExecGrant_Tablespace() that doesn't
use LOCKTAG_TUPLE.  However, leaving this unfixed in v15 could ensnare a
future back-patch of a SearchSysCacheLocked1() call.

Reported by Aya Iwata.

Discussion: https://postgr.es/m/OS7PR01MB11964507B5548245A7EE54E70EA212@OS7PR01MB11964.jpnprd01.prod.outlook.com
If passed a read-write expanded object pointer, the EEOP_NULLIF
code would hand that same pointer to the equality function
and then (unless equality was reported) also return the same
pointer as its value.  This is no good, because a function that
receives a read-write expanded object pointer is fully entitled
to scribble on or even delete the object, thus corrupting the
NULLIF output.  (This problem is likely unobservable with the
equality functions provided in core Postgres, but it's easy to
demonstrate with one coded in plpgsql.)

To fix, make sure the pointer passed to the equality function
is read-only.  We can still return the original read-write
pointer as the NULLIF result, allowing optimization of later
operations.

Per bug #18722 from Alexander Lakhin.  This has been wrong
since we invented expanded objects, so back-patch to all
supported branches.

Discussion: https://postgr.es/m/18722-fd9e645448cc78b4@postgresql.org
constexpr is a keyword in C23.  Rename a conflicting identifier for
future-proofing.

Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/08abc832-1384-4aca-a535-1a79765b565e%40eisentraut.org
Discussion: https://www.postgresql.org/message-id/flat/87o72eo9iu.fsf%40gentoo.org
The approach of declaring a function pointer with an empty argument
list and hoping that the compiler will not complain about casting it
to another type no longer works with C23, because foo() is now
equivalent to foo(void).

We don't need to do this here.  With a few struct forward declarations
we can supply a correct argument list without having to pull in
another header file.

(This is the only new warning with C23.  Together with the previous
fix a67a49648d9, this makes the whole code compile cleanly under C23.)

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/95c6a9bf-d306-43d8-b880-664ef08f2944%40eisentraut.org
Discussion: https://www.postgresql.org/message-id/flat/87o72eo9iu.fsf%40gentoo.org
This matches the behavior of the makefiles and the old MSVC build
system.  The main effect is that the build result gets installed into
pkglibdir rather than bindir.  The documentation says to locate the
library in pkglibdir, so this makes the code match the documentation
again.

Reviewed-by: Ryohei Takahashi (Fujitsu) <r.takahashi_2@fujitsu.com>
Discussion: https://www.postgresql.org/message-id/flat/TY3PR01MB118912125614599641CA881B782522%40TY3PR01MB11891.jpnprd01.prod.outlook.com
The OSSP uuid.h file includes unistd.h, so to use it with MSVC we need to
include the postgres include directories so it picks up our version of
that in src/include/port/win32_msvc. Adjust the meson test accordingly.

Backported from commit 7c655a04a2, so we can build release 16 with UUID

per request from Marina Polyakova
When using a pipeline, a transaction starts from the first command and
is committed with a Sync message or when the pipeline ends.

Functions like IsInTransactionBlock() or PreventInTransactionBlock()
were already able to understand a pipeline as being in a transaction
block, but it was not the case of CheckTransactionBlock().  This
function is called for example to generate a WARNING for SET LOCAL,
complaining that it is used outside of a transaction block.

The current state of the code caused multiple problems, like:
- SET LOCAL executed at any stage of a pipeline issued a WARNING, even
if the command was at least second in line where the pipeline is in a
transaction state.
- LOCK TABLE failed when invoked at any step of a pipeline, even if it
should be able to work within a transaction block.

The pipeline protocol assumes that the first command of a pipeline is
not part of a transaction block, and that any follow-up commands is
considered as within a transaction block.

This commit changes the backend so as an implicit transaction block is
started each time the first Execute message of a pipeline has finished
processing, with this implicit transaction block ended once a sync is
processed.  The checks based on XACT_FLAGS_PIPELINING in the routines
checking if we are in a transaction block are not necessary: it is
enough to rely on the existing ones.

Some tests are added to pgbench, that can be backpatched down to v17
when \syncpipeline is involved and down to v14 where \startpipeline and
\endpipeline are available.  This is unfortunately limited regarding the
error patterns that can be checked, but it provides coverage for various
pipeline combinations to check if these succeed or fail.  These tests
are able to capture the case of SET LOCAL's WARNING.  The author has
proposed a different feature to improve the coverage by adding similar
meta-commands to psql where error messages could be checked, something
more useful for the cases where commands cannot be used in transaction
blocks, like REINDEX CONCURRENTLY or VACUUM.  This is considered as
future work for v18~.

Author: Anthonin Bonnefoy
Reviewed-by: Jelte Fennema-Nio, Michael Paquier
Discussion: https://postgr.es/m/CAO6_XqrWO8uNBQrSu5r6jh+vTGi5Oiyk4y8yXDORdE2jbzw8xw@mail.gmail.com
Backpatch-through: 13
Commit 9044fc1d45a added some files from upstream LLVM.  These files
have different whitespace rules, which make the git whitespace checks
powered by gitattributes fail.  To fix, add those files to the exclude
list.
from commit 9044fc1d45a
…ing.

During pgbench's table initialization, progress updates could display
leftover characters from the previous message if the new message
was shorter. This commit resolves the issue by appending spaces to
the current message to fully overwrite any remaining characters from
the previous line.

Back-patch to all the supported versions.

Author: Yushi Ogiwara, Tatsuo Ishii, Fujii Masao
Reviewed-by: Tatsuo Ishii, Fujii Masao
Discussion: https://postgr.es/m/9a9b8b95b6a709877ae48ad5b0c59bb9@oss.nttdata.com
1.  The error reporting of "port setrequested list-of-packages..."
changed, hiding errors we were relying on to know if all packages in our
list were already installed.  Use a loop instead.

2.  The cached MacPorts installation was shared between PostgreSQL
major branches, though each branch wanted different packages.  Add the
list of packages to cache key, so that different branches, when tested
in one github account/repo such as postgres/postgres, stop fighting with
each other, adding and removing packages.

Back-patch to 15 where CI began.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/au2uqfuy2nf43nwy2txmc5t2emhwij7kzupygto3d2ffgtrdgr%40ckvrlwyflnh2
This reverts commit d77f91214fb7 on all stable branches, due to concerns
regarding the compatility side effects this could create in a minor
release.  The change still exists on HEAD.

Discussion: https://postgr.es/m/CA+TgmoZqRgeFTg4+Yf_CMRRXiHuNz1u6ZC4FvVk+rxw0RmOPnw@mail.gmail.com
Backpatch-through: 13
nmisch and others added 30 commits May 5, 2025 04:52
Start the file with static functions not specific to pe_test_vectors
tests.  This way, new tests can use them without disrupting the file's
layout.  Change report_result() PQExpBuffer arguments to plain strings.
Back-patch to v13 (all supported versions), for the next commit.

Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 13
Security: CVE-2025-4207
With GB18030 as source encoding, applications could crash the server via
SQL functions convert() or convert_from().  Applications themselves
could crash after passing unterminated GB18030 input to libpq functions
PQescapeLiteral(), PQescapeIdentifier(), PQescapeStringConn(), or
PQescapeString().  Extension code could crash by passing unterminated
GB18030 input to jsonapi.h functions.  All those functions have been
intended to handle untrusted, unterminated input safely.

A crash required allocating the input such that the last byte of the
allocation was the last byte of a virtual memory page.  Some malloc()
implementations take measures against that, making the SIGSEGV hard to
reach.  Back-patch to v13 (all supported versions).

Author: Noah Misch <noah@leadboat.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 13
Security: CVE-2025-4207
…ill cause the database to be in an abnormal state"

This reverts commit 3af9962.
This reverts commit fbd5e23.
…, fix duplicate totalExecuted

Three issues fixed:

1. pgstat_report_resgroup was commented out during PG16 merge (MERGE16_FIXME).
   Restore all call sites so pg_stat_activity correctly shows rsgid/rsgname.

2. In check_and_unassign_from_resgroup, the Assign-to-Bypass transition calls
   UnassignResGroup which clears st_rsgid, but pgstat_report_resgroup was not
   called afterward to restore it. This caused pg_stat_activity to show
   rsgname='unknown' for queries that switch from normal assign to bypass mode.

3. is_session_in_group plpython function used `ps -ef | grep con{session_id}`
   to find process PIDs, but PG16 removed con{session_id} from process titles.
   Empty grep result meant empty set, and empty.issubset(any) = True, so the
   function always returned true. Fixed to use gp_stat_activity JOIN
   gp_segment_configuration instead.

4. Remove duplicate totalExecuted++ in check_and_unassign_from_resgroup that
   caused double-counting when a query transitions from Assign to Bypass state.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.