[test] Cbdb postgres merge test5 by chenjinbao1989 · Pull Request #1729 · apache/cloudberry

chenjinbao1989 · 2026-05-10T04:20:53Z

Fixes #ISSUE_Number

What does this PR do?

Type of Change

Bug fix (non-breaking change)
New feature (non-breaking change)
Breaking change (fix or feature with breaking changes)
Documentation update

Breaking Changes

Test Plan

Unit tests added/updated
Integration tests added/updated
Passed make installcheck
Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Followed contribution guide
Added/updated documentation
Reviewed code for security implications
Requested review from cloudberry committers

Additional Context

CI Skip Instructions

* Change the max value of diskquota.max_workers to 20 If we set the diskquota.max_workers max value to be max_worker_processes, when max_worker_processes is less than 10 and we set diskquota.max_workers value more than max_worker_processes, the cluster will crash. Set the max value to be 20, when the max_worker_processes is less than the diskquota.max_workers, diskquota can work, only some parts of the databases can not be monitored as diskquota can not start bgworkers for them. * Modify diskquota worker schedule test Test when diskquota.max_workers more than available bgworker. Co-authored-by: Zhang Hao <hzhang2@vmware.com>

Co-authored-by: Zhang Hao <hzhang2@vmware.com>

…e#266) Change the default vaule of `diskquota.max_active_tables` from 1M to 300K, the memory usage relevant it is reduced from 300MB to 90MB.

Refactor the structure of TableSizeEntry to reduce memory usage. Previously, the size of each table in each segment should be maintained in TableSizeEntry, which wastes lots of memory. In this PR, we refactor the TableSizeEntry to: struct TableSizeEntry { Oid reloid; int segid; Oid tablespaceoid; Oid namespaceoid; Oid owneroid; uint32 flag; int64 totalsize[SEGMENT_SIZE_ARRAY_LENGTH]; }; In this way, we can maintain multiple sizes in one TableSizeEntry and efficiently save memory usage. For 50 segments: reduced by 65%. For 100 segments: reduced by 82.5%. For 101 segments: reduced by 65.3%. For 1000 segments: reduced by 82.5%.

There is a bug: removing TableSizeEntry from table_size_map by oid. Actually, the hash map key is TableKeyEntry. Fix it.

The apache#264 caused some segment ratio tests fail. The entry's relevant fields need to be set at the end of the iteration. Otherwise, only the first seg will pass the condition check.

Use diskquota.max_table_segments to define the max number of table segments in the cluster. The value equal (segment_number + 1) * max_table_number. Since hashmap in the shared memory can take over others' memory space even when it exceeds the limit, a counter is added to count how many tables have been added to the table_size_map, to prevent too many entries to be created. Co-authored-by: Xiaoran Wang <wxiaoran@vmware.com> Co-authored-by: Chen Mulong <chenmulong@gmail.com>

Avoid dispatching reject map to segments when it is not changed

…pache#279) This commit fixed two bugs: - Previously, refresh_rejectmap() cleared all entries in rejectmap, including other databases' entries, which causes hardlimit can not to work correctly. - soft-limit rejectmap entries should not be added into disk_quota_reject_map on segments, otherwise, these entries may remain in segments and trigger the soft-limit incorrectly. Co-authored-by: Chen Mulong <chenmulong@gmail.com>

When a database's diskquota bgworker is killed and the db is dropped, diskquota scheduler can not work properly. The cause is: if the scheduler failed to start a bgworker for a database, it will try it again and again forever. A different status code is returned when failing to start bg worker. And if it is failed due to the dropped database (or another other reasons causes db name cannot be retrieved from db id), just skip this bgwoker for now For other failure reasons, limit the times of starting a bgworker for a database to 3 times. If the limit is reached, skip it and pick the next one.

- Drop the table space before rm directory. - '-f' to alway force rm. - '-- start-ignore' doesn't seem to be working with retcode since retcode will add '-- start/stop-ignore' pair automatically to ignore the output, and the nested start/stop ignore doesn't seem to be handled well by the ancient perl script. Refer to 'src/test/isolation2/sql_isolation_testcase.py'. Seen flaky tests as below: root@96831b9f-9150-4424-63a7-abe8f18c144e:/tmp# cat /home/gpadmin/diskquota_artifacts/tests/isolation2/regression.diffs --- \/tmp\/build\/4eceba44\/bin_diskquota\/tests\/isolation2\/expected\/test_fast_quota_view\.out 2022-12-12 13:20:56.729354016 +0000 +++ \/tmp\/build\/4eceba44\/bin_diskquota\/tests\/isolation2\/results\/test_fast_quota_view\.out 2022-12-12 13:20:56.733354401 +0000 @@ -175,9 +175,11 @@ (exited with code 0) !\retcode rm -r /tmp/spc2; GP_IGNORE:-- start_ignore +GP_IGNORE:rm: cannot remove '/tmp/spc2/6/GPDB_6_301908232/16384/16413': No such file or directory +GP_IGNORE:rm: cannot remove '/tmp/spc2/5/GPDB_6_301908232/16384/16413': No such file or directory GP_IGNORE: GP_IGNORE:-- end_ignore -(exited with code 0) +(exited with code 1) -- end_ignore DROP TABLESPACE IF EXISTS spc1; DROP

Currently, isolation2/test_rejectmap.sql is flaky if we run isolation2 test multiple times. That's because we set the GUC 'diskquota.hard_limit' to 'on' in test_postmaster_restart.sql and forget to set it to 'off'. In the next following runs, the hard limit is enabled and the QD will continuously dispatch reject map to segment servers. However, test_rejectmap.sql requires the hard limit being disabled because we're dispatching rejectmap by UDF manually or the dispatched rejectmap will be cleared by QD. This patch adds a new injection point to prevent QD from dispatching rejectmap to make test_rejectmap.sql stateless. This patch also set 'diskquota.hard_limit' to 'off' when test_postmaster_restart.sql finishes.

* Fix diskquota on gpdb7 - Fix some compile issues, especially relstorage has been removed on gpdb7. Using relam to get the relation's storage type. - Modify diskquota hash function flag. - Fix diskquota_relation_open(). NoLock is disabled on gpdb7. - Add tests schedule and expected results for gpdb7. - Update some test expectations on gpdb7 due to AO/CO issue: As something changes about AO/CO table, the size of them is changed. - Disable some tests on gpdb7. - Disable upgrade tests. - Upgrade to diskquota 2.2. Add attribute relam to type relation_cache_detail and add a param to function relation_size_local. - Add setup.sql and setup.out for isolation2 test. - Fix bug: gpstart timeout for gpdb7. We used to set `Gp_role = GP_ROLE_DISPATCH` in disk_quota_launcher_main(), even though postmaster boots in utility mode. This seems to be nothing in gpdb6, but it will cause a dead loop when booting gpdb7. In fact, there is nothing to do in utility mode for the diskquota launcher. In this commit, if `Gp_role != GP_ROLE_DISPATCH`, disk_quota_launcher_main() will simply exit. - Add gpdb7 pipeline support. Build gpdb7 by rocky8, and test the same build with rocky8 and rhel8. 'res_test_images' has been changed to list to support this. - Add gpdb version into the task name. 'passwd' is unnecessary and ' doesn't exist in the rocky8 build image. - Use `cmake -DENABLE_UPGRADE_TEST=OFF` to disable the upgrade test. - TODO: Add upgrade test to CI pipeline. Fix activate standby error on the CI pipeline. Fix tests for gpdb7. Co-authored-by: Xiaoran Wang <wxiaoran@vmware.com> Co-authored-by: Xing Guo <higuoxing@gmail.com>

Co-authored-by: Hao Zhang <hzhang2@vmware.com>

…#287) Usage: ``` cmake -DDISKQUOTA_FAULT_INJECTOR=ON/OFF [default: OFF] ``` Co-authored-by: Hao Zhang <hzhang2@vmware.com>

If fault injector is disabled, isolation2 will be disabled.

This reverts commit e3e73d2. Revert "Add an option to control whether compile with fault injector. (apache#287)" This reverts commit ae4ab48. Co-authored-by: Hao Zhang <hzhang2@vmware.com>

- Switch GPDB binary to release-candidate for release build. - Remove test_task from the release pipeline. Co-authored-by: Xing Guo <higuoxing@gmail.com>

Isolation2 compilation command is removed by apache#285. We add it into Regress.cmake in this commit. Co-authored-by: Xing Guo higuoxing@gmail.com

- Fix flaky test test_ctas_before_set_quota. pg_type will be an active table after `CREATE TABLE`. It does not affect the function of diskquota but makes the test results unstable. In fact, we do not care about the table size of the system catalog table. So we simply skip the active table oid of these tables. - Fix test_vacuum/test_truncate. gp_wait_until_triggered_fault should be called after gp_inject_fault_infinite with suspend flag. Co-authored-by: Xing Guo <higuoxing@gmail.com> Co-authored-by: Xiaoran Wang <wxiaoran@vmware.com>

When creating a new table, pg_type will be in active tables. Filter the system catalog table. And remove pause in the test.

Fix the following bugs: - Judgement condition for update_relation_cache should be `&&`, instead of `||` - The lock for relation_open/relation_close should be AccessShareLock for gpdb7. - For `truncate table`, we cannot get the table's oid by new relfilenode immediately after `file_create_hook` is finished. So we should keep the relfilenode in active_table_file_map and wait for the next loop to calculate the correct size for this table.

Revert some modification from apache#285. - Add last_released_diskquota_bin back for CI. - Enable upgradecheck. - Add -DDISKQUOTA_LAST_RELEASE_PATH for cmake.

Due to the release build change for GP7, the fault injector doesn't work with the release build. So, all the tests were temporally disabled for release pipelines. Since we switched to use the `--disable-debug-extensions` gpdb build, the fault injector is not available for the release pipeline. - Add 'EXCLUDE_FAULT_INJECT_TEST' to Regress.cmake, so it will be smart enough to check if there are any fault injector case in the give tests set. Ignore them if so. - Skip the fault injector tests for the release pipeline. - Enable the CI test task for GP7.

ORCA's window frame translation always emits a BETWEEN frame (start + end bound), so include FRAMEOPTION_BETWEEN alongside FRAMEOPTION_NONDEFAULT to match the executor's expectations.

…ated host (apache#1702) * Fix null dereference on dedicated hot standby coordinator getCdbComponentInfo() populates hostPrimaryCountHash with primary hosts only. When IS_HOT_STANDBY_QD() is true, mirror and standby hosts are also looked up in the hash but return NULL on dedicated standby nodes that host no primary segments. Replace Assert(found) with a null-safe check to prevent SIGSEGV.

…ill cause the database to be in an abnormal state" This reverts commit 3af9962.

This reverts commit fbd5e23.

…, fix duplicate totalExecuted Three issues fixed: 1. pgstat_report_resgroup was commented out during PG16 merge (MERGE16_FIXME). Restore all call sites so pg_stat_activity correctly shows rsgid/rsgname. 2. In check_and_unassign_from_resgroup, the Assign-to-Bypass transition calls UnassignResGroup which clears st_rsgid, but pgstat_report_resgroup was not called afterward to restore it. This caused pg_stat_activity to show rsgname='unknown' for queries that switch from normal assign to bypass mode. 3. is_session_in_group plpython function used `ps -ef | grep con{session_id}` to find process PIDs, but PG16 removed con{session_id} from process titles. Empty grep result meant empty set, and empty.issubset(any) = True, so the function always returned true. Fixed to use gp_stat_activity JOIN gp_segment_configuration instead. 4. Remove duplicate totalExecuted++ in check_and_unassign_from_resgroup that caused double-counting when a query transitions from Assign to Bypass state.

Xiaoran Wang and others added 30 commits February 27, 2026 17:31

Revert change of worker_schedule test becuase flaky test (apache#260)

49bac34

Missing pause causes deadlock flaky (apache#258)

899c7ca

Fix memory leak when database is not ready (apache#262)

7830263

Co-authored-by: Zhang Hao <hzhang2@vmware.com>

Change the default value of GUC to reduce default memory cost. (apach…

7b43f27

…e#266) Change the default vaule of `diskquota.max_active_tables` from 1M to 300K, the memory usage relevant it is reduced from 300MB to 90MB.

Correct table_size_entry key (apache#268)

0016359

There is a bug: removing TableSizeEntry from table_size_map by oid. Actually, the hash map key is TableKeyEntry. Fix it.

Add cmake opt DISKQUOTA_DDL_CHANGE_CHECK (apache#270)

c8abdad

Fix regression caused by apache#264 (apache#272)

af5d066

The apache#264 caused some segment ratio tests fail. The entry's relevant fields need to be set at the end of the iteration. Otherwise, only the first seg will pass the condition check.

Enable tests (apache#274)

becada8

Optimize dispatching reject map to segments (apache#275)

337dece

Avoid dispatching reject map to segments when it is not changed

Bump version to 2.1.1 (apache#283)

c4cfb59

Fix released tarball name. (apache#286)

4aa82c8

Co-authored-by: Hao Zhang <hzhang2@vmware.com>

Add an option to control whether compile with fault injector. (apache…

c1ac4b4

…#287) Usage: ``` cmake -DDISKQUOTA_FAULT_INJECTOR=ON/OFF [default: OFF] ``` Co-authored-by: Hao Zhang <hzhang2@vmware.com>

Add judgement for fault injector. (apache#288)

c0fb2e8

If fault injector is disabled, isolation2 will be disabled.

Revert "Add judgement for fault injector. (apache#288)" (apache#291)

c1b72aa

This reverts commit e3e73d2. Revert "Add an option to control whether compile with fault injector. (apache#287)" This reverts commit ae4ab48. Co-authored-by: Hao Zhang <hzhang2@vmware.com>

CI: Fix pipeline (apache#293)

b7881fc

- Switch GPDB binary to release-candidate for release build. - Remove test_task from the release pipeline. Co-authored-by: Xing Guo <higuoxing@gmail.com>

Format code by clang-format. (apache#296)

0fcf6f7

Add command to compile isolation2. (apache#297)

9477f29

Isolation2 compilation command is removed by apache#285. We add it into Regress.cmake in this commit. Co-authored-by: Xing Guo higuoxing@gmail.com

Fix flaky test test_rejectmap_mul_db (apache#295)

cc157db

When creating a new table, pg_type will be in active tables. Filter the system catalog table. And remove pause in the test.

Enable upgrade test for CI (apache#299)

95f23a8

Revert some modification from apache#285. - Add last_released_diskquota_bin back for CI. - Enable upgradecheck. - Add -DDISKQUOTA_LAST_RELEASE_PATH for cmake.

Leonid Borchuk and others added 29 commits April 27, 2026 19:16

Fix copy for rocky linux

02d658a

Fix: set FRAMEOPTION_BETWEEN for ORCA window frames

980ed21

ORCA's window frame translation always emits a BETWEEN frame (start + end bound), so include FRAMEOPTION_BETWEEN alongside FRAMEOPTION_NONDEFAULT to match the executor's expectations.

Merge tag 'REL_16_9' into merge3

119d39e

Merge branch 'main' into cbdb-postgres-merge-test5

c46af7b

Fix conflict

4813fd7

Fix compile error in cherry-pick

0cec971

Fix some envirenment issues for python

7139eeb

Revert "Fix: fix bug When the TDE feature is enabled, backend panic w…

fc914ee

…ill cause the database to be in an abnormal state" This reverts commit 3af9962.

Fix some answer files for parallel schedule

991723a

Fix some errors for dump

28cdfa2

Fix some answer files for isolation2

c7ecbfe

Fix some answer file for pax

5bd7d64

Test pipeline

b57983d

Remove guc lc_ctype

cbd0458

Revert "Test pipeline"

c35a2cb

This reverts commit fbd5e23.

Fix some answer files

5173e7c

test

3a7fd53

Fix some answer files

bdf3c5f

Fix: fix enable_parallel tests and others

37d77ae

Fix tests after cherry-pick

d1aceb8

Fix answer file for pg_trgm

735ab21

Fix some answer files and test setup

6c526ad

Fix icw optimizer off

73c11ba

Fix: fix ORCA bugs and tests

6dc345b

Fix test cases for icw optimizer on

9587a42

test

0961fb3

test2

6c33e62

chenjinbao1989 changed the title ~~Cbdb postgres merge test5~~ [test] Cbdb postgres merge test5 May 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[test] Cbdb postgres merge test5 #1729

[test] Cbdb postgres merge test5 #1729
chenjinbao1989 wants to merge 804 commits intoapache:cbdb-postgres-mergefrom
chenjinbao1989:cbdb-postgres-merge-test5

chenjinbao1989 commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

chenjinbao1989 commented May 10, 2026

What does this PR do?

Type of Change

Breaking Changes

Test Plan

Impact

Checklist

Additional Context

CI Skip Instructions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants