Skip to content

[test] Cbdb postgres merge test5 #1729

Open
chenjinbao1989 wants to merge 804 commits intoapache:cbdb-postgres-mergefrom
chenjinbao1989:cbdb-postgres-merge-test5
Open

[test] Cbdb postgres merge test5 #1729
chenjinbao1989 wants to merge 804 commits intoapache:cbdb-postgres-mergefrom
chenjinbao1989:cbdb-postgres-merge-test5

Conversation

@chenjinbao1989
Copy link
Copy Markdown
Contributor

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


Xiaoran Wang and others added 30 commits February 27, 2026 17:31
* Change the max value of diskquota.max_workers to 20

If we set the diskquota.max_workers max value to be
max_worker_processes, when max_worker_processes is less than 10 and we
set diskquota.max_workers value more than max_worker_processes, the
cluster will crash.

Set the max value to be 20, when the max_worker_processes is less than the
diskquota.max_workers, diskquota can work, only some parts of the
databases can not be monitored as diskquota can not start bgworkers for
them.

* Modify diskquota worker schedule test

Test when diskquota.max_workers more than available bgworker.

Co-authored-by: Zhang Hao <hzhang2@vmware.com>
Co-authored-by: Zhang Hao <hzhang2@vmware.com>
…e#266)

Change the default vaule of `diskquota.max_active_tables` from 1M
to 300K, the memory usage relevant it is reduced from 300MB to 90MB.
Refactor the structure of TableSizeEntry to reduce memory usage.

Previously, the size of each table in each segment should be maintained in TableSizeEntry, which wastes lots of memory. In this PR, we refactor the TableSizeEntry to:

struct TableSizeEntry
{
	Oid    reloid;
	int    segid;
	Oid    tablespaceoid;
	Oid    namespaceoid;
	Oid    owneroid;
	uint32 flag;
	int64 totalsize[SEGMENT_SIZE_ARRAY_LENGTH];
};
In this way, we can maintain multiple sizes in one TableSizeEntry and efficiently save memory usage.

For 50 segments: reduced by 65%.
For 100 segments: reduced by 82.5%.
For 101 segments: reduced by 65.3%.
For 1000 segments: reduced by 82.5%.
There is a bug: removing TableSizeEntry from table_size_map by oid.
Actually, the hash map key is TableKeyEntry. Fix it.
The apache#264 caused some segment ratio tests fail. The entry's relevant
fields need to be set at the end of the iteration. Otherwise, only the
first seg will pass the condition check.
Use diskquota.max_table_segments to define the max number of table segments in
the cluster. The value equal (segment_number + 1) * max_table_number.

Since hashmap in the shared memory can take over others' memory space
even when it exceeds the limit, a counter is added to count how many tables
have been added to the table_size_map, to prevent too many entries to be
created.

Co-authored-by: Xiaoran Wang <wxiaoran@vmware.com>
Co-authored-by: Chen Mulong <chenmulong@gmail.com>
Avoid dispatching reject map to segments when it is not changed
…pache#279)

This commit fixed two bugs:
- Previously, refresh_rejectmap() cleared all entries in rejectmap, including other databases' entries, which causes hardlimit can not to work correctly.
- soft-limit rejectmap entries should not be added into disk_quota_reject_map on segments, otherwise, these entries may remain in segments and trigger the soft-limit incorrectly.

Co-authored-by: Chen Mulong <chenmulong@gmail.com>
When a database's diskquota bgworker is killed and the db is dropped,
diskquota scheduler can not work properly. The cause is: if the scheduler
failed to start a bgworker for a database, it will try it again and
again forever.

A different status code is returned when failing to start bg worker. And if it is
failed due to the dropped database (or another other reasons causes
db name cannot be retrieved from db id), just skip this bgwoker for now
For other failure reasons, limit the times of starting a bgworker for a database
to 3 times. If the limit is reached, skip it and pick the next one.
- Drop the table space before rm directory.
- '-f' to alway force rm.
- '-- start-ignore' doesn't seem to be working with retcode since retcode
  will add '-- start/stop-ignore' pair automatically to ignore the
  output, and the nested start/stop ignore doesn't seem to be handled
  well by the ancient perl script. Refer to
  'src/test/isolation2/sql_isolation_testcase.py'.

Seen flaky tests as below:

root@96831b9f-9150-4424-63a7-abe8f18c144e:/tmp# cat /home/gpadmin/diskquota_artifacts/tests/isolation2/regression.diffs
--- \/tmp\/build\/4eceba44\/bin_diskquota\/tests\/isolation2\/expected\/test_fast_quota_view\.out	2022-12-12 13:20:56.729354016 +0000
+++ \/tmp\/build\/4eceba44\/bin_diskquota\/tests\/isolation2\/results\/test_fast_quota_view\.out	2022-12-12 13:20:56.733354401 +0000
@@ -175,9 +175,11 @@
 (exited with code 0)
 !\retcode rm -r /tmp/spc2;
 GP_IGNORE:-- start_ignore
+GP_IGNORE:rm: cannot remove '/tmp/spc2/6/GPDB_6_301908232/16384/16413': No such file or directory
+GP_IGNORE:rm: cannot remove '/tmp/spc2/5/GPDB_6_301908232/16384/16413': No such file or directory
 GP_IGNORE:
 GP_IGNORE:-- end_ignore
-(exited with code 0)
+(exited with code 1)
 -- end_ignore
 DROP TABLESPACE IF EXISTS spc1;
 DROP
Currently, isolation2/test_rejectmap.sql is flaky if we run isolation2
test multiple times. That's because we set the GUC
'diskquota.hard_limit' to 'on' in test_postmaster_restart.sql and forget
to set it to 'off'. In the next following runs, the hard limit is
enabled and the QD will continuously dispatch reject map to segment
servers. However, test_rejectmap.sql requires the hard limit being
disabled because we're dispatching rejectmap by UDF manually or the
dispatched rejectmap will be cleared by QD.

This patch adds a new injection point to prevent QD from dispatching
rejectmap to make test_rejectmap.sql stateless. This patch also set
'diskquota.hard_limit' to 'off' when test_postmaster_restart.sql
finishes.
* Fix diskquota on gpdb7

- Fix some compile issues, especially relstorage has been removed
on gpdb7. Using relam to get the relation's storage type.

- Modify diskquota hash function flag.

- Fix diskquota_relation_open(). NoLock is disabled on gpdb7.

- Add tests schedule and expected results for gpdb7.

- Update some test expectations on gpdb7 due to AO/CO issue: As
something changes about AO/CO table, the size of them is changed.

- Disable some tests on gpdb7.

- Disable upgrade tests.

- Upgrade to diskquota 2.2. Add attribute relam to type
relation_cache_detail and add a param to function relation_size_local.

- Add setup.sql and setup.out for isolation2 test.

- Fix bug: gpstart timeout for gpdb7. We used to set
`Gp_role = GP_ROLE_DISPATCH` in disk_quota_launcher_main(), even
though postmaster boots in utility mode. This seems to be nothing 
in gpdb6, but it will cause a dead loop when booting gpdb7. In fact,
there is nothing to do in utility mode for the diskquota launcher. In 
this commit, if `Gp_role != GP_ROLE_DISPATCH`,
disk_quota_launcher_main() will simply exit.

- Add gpdb7 pipeline support. Build gpdb7 by rocky8, and test the 
same build with rocky8 and rhel8. 'res_test_images' has been changed
to list to support this.

- Add gpdb version into the task name. 'passwd' is unnecessary and '
doesn't exist in the rocky8 build image.

- Use `cmake -DENABLE_UPGRADE_TEST=OFF` to disable the upgrade test.

- TODO: Add upgrade test to CI pipeline. Fix activate standby error on the
CI pipeline. Fix tests for gpdb7.


Co-authored-by: Xiaoran Wang <wxiaoran@vmware.com>
Co-authored-by: Xing Guo <higuoxing@gmail.com>
Co-authored-by: Hao Zhang <hzhang2@vmware.com>
…#287)

Usage:

```
cmake -DDISKQUOTA_FAULT_INJECTOR=ON/OFF [default: OFF]
```

Co-authored-by: Hao Zhang <hzhang2@vmware.com>
If fault injector is disabled, isolation2 will be disabled.
This reverts commit e3e73d2.

Revert "Add an option to control whether compile with fault injector. (apache#287)"

This reverts commit ae4ab48.

Co-authored-by: Hao Zhang <hzhang2@vmware.com>
- Switch GPDB binary to release-candidate for release build.
- Remove test_task from the release pipeline.

Co-authored-by: Xing Guo <higuoxing@gmail.com>
Isolation2 compilation command is removed by apache#285. We add it into
Regress.cmake in this commit.

Co-authored-by: Xing Guo higuoxing@gmail.com
- Fix flaky test test_ctas_before_set_quota. pg_type will be an active table
after `CREATE TABLE`. It does not affect the function of diskquota but
makes the test results unstable. In fact, we do not care about the table
size of the system catalog table. So we simply skip the active table oid
of these tables.

- Fix test_vacuum/test_truncate. gp_wait_until_triggered_fault
should be called after gp_inject_fault_infinite with suspend flag.

Co-authored-by: Xing Guo <higuoxing@gmail.com>
Co-authored-by: Xiaoran Wang <wxiaoran@vmware.com>
When creating a new table, pg_type will be in active tables.
Filter the system catalog table. And remove pause in the test.
Fix the following bugs:
- Judgement condition for update_relation_cache should be `&&`, instead of `||`
- The lock for relation_open/relation_close should be AccessShareLock for gpdb7.
- For `truncate table`, we cannot get the table's oid by new relfilenode
immediately after `file_create_hook` is finished. So we should keep
the relfilenode in active_table_file_map and wait for the next loop to
calculate the correct size for this table.
Revert some modification from apache#285.
- Add last_released_diskquota_bin back for CI.
- Enable upgradecheck.
- Add -DDISKQUOTA_LAST_RELEASE_PATH for cmake.
Due to the release build change for GP7, the fault injector doesn't work
with the release build. So, all the tests were temporally disabled for
release pipelines.

Since we switched to use the `--disable-debug-extensions` gpdb build, the
fault injector is not available for the release pipeline.

- Add 'EXCLUDE_FAULT_INJECT_TEST' to Regress.cmake, so it will be smart
  enough to check if there are any fault injector case in the give tests
  set. Ignore them if so.
- Skip the fault injector tests for the release pipeline.
- Enable the CI test task for GP7.
Leonid Borchuk and others added 29 commits April 27, 2026 19:16
ORCA's window frame translation always emits a BETWEEN frame
(start + end bound), so include FRAMEOPTION_BETWEEN alongside
FRAMEOPTION_NONDEFAULT to match the executor's expectations.
…ated host (apache#1702)

* Fix null dereference on dedicated hot standby coordinator

getCdbComponentInfo() populates hostPrimaryCountHash with primary hosts only.
When IS_HOT_STANDBY_QD() is true, mirror and standby hosts are also looked up
in the hash but return NULL on dedicated standby nodes that host no primary
segments. Replace Assert(found) with a null-safe check to prevent SIGSEGV.
…ill cause the database to be in an abnormal state"

This reverts commit 3af9962.
This reverts commit fbd5e23.
…, fix duplicate totalExecuted

Three issues fixed:

1. pgstat_report_resgroup was commented out during PG16 merge (MERGE16_FIXME).
   Restore all call sites so pg_stat_activity correctly shows rsgid/rsgname.

2. In check_and_unassign_from_resgroup, the Assign-to-Bypass transition calls
   UnassignResGroup which clears st_rsgid, but pgstat_report_resgroup was not
   called afterward to restore it. This caused pg_stat_activity to show
   rsgname='unknown' for queries that switch from normal assign to bypass mode.

3. is_session_in_group plpython function used `ps -ef | grep con{session_id}`
   to find process PIDs, but PG16 removed con{session_id} from process titles.
   Empty grep result meant empty set, and empty.issubset(any) = True, so the
   function always returned true. Fixed to use gp_stat_activity JOIN
   gp_segment_configuration instead.

4. Remove duplicate totalExecuted++ in check_and_unassign_from_resgroup that
   caused double-counting when a query transitions from Assign to Bypass state.
@chenjinbao1989 chenjinbao1989 changed the title Cbdb postgres merge test5 [test] Cbdb postgres merge test5 May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.