[test] Cbdb postgres merge test5 #1729
Open
chenjinbao1989 wants to merge 804 commits intoapache:cbdb-postgres-mergefrom
Open
[test] Cbdb postgres merge test5 #1729chenjinbao1989 wants to merge 804 commits intoapache:cbdb-postgres-mergefrom
chenjinbao1989 wants to merge 804 commits intoapache:cbdb-postgres-mergefrom
Conversation
* Change the max value of diskquota.max_workers to 20 If we set the diskquota.max_workers max value to be max_worker_processes, when max_worker_processes is less than 10 and we set diskquota.max_workers value more than max_worker_processes, the cluster will crash. Set the max value to be 20, when the max_worker_processes is less than the diskquota.max_workers, diskquota can work, only some parts of the databases can not be monitored as diskquota can not start bgworkers for them. * Modify diskquota worker schedule test Test when diskquota.max_workers more than available bgworker. Co-authored-by: Zhang Hao <hzhang2@vmware.com>
Co-authored-by: Zhang Hao <hzhang2@vmware.com>
…e#266) Change the default vaule of `diskquota.max_active_tables` from 1M to 300K, the memory usage relevant it is reduced from 300MB to 90MB.
Refactor the structure of TableSizeEntry to reduce memory usage.
Previously, the size of each table in each segment should be maintained in TableSizeEntry, which wastes lots of memory. In this PR, we refactor the TableSizeEntry to:
struct TableSizeEntry
{
Oid reloid;
int segid;
Oid tablespaceoid;
Oid namespaceoid;
Oid owneroid;
uint32 flag;
int64 totalsize[SEGMENT_SIZE_ARRAY_LENGTH];
};
In this way, we can maintain multiple sizes in one TableSizeEntry and efficiently save memory usage.
For 50 segments: reduced by 65%.
For 100 segments: reduced by 82.5%.
For 101 segments: reduced by 65.3%.
For 1000 segments: reduced by 82.5%.
There is a bug: removing TableSizeEntry from table_size_map by oid. Actually, the hash map key is TableKeyEntry. Fix it.
The apache#264 caused some segment ratio tests fail. The entry's relevant fields need to be set at the end of the iteration. Otherwise, only the first seg will pass the condition check.
Use diskquota.max_table_segments to define the max number of table segments in the cluster. The value equal (segment_number + 1) * max_table_number. Since hashmap in the shared memory can take over others' memory space even when it exceeds the limit, a counter is added to count how many tables have been added to the table_size_map, to prevent too many entries to be created. Co-authored-by: Xiaoran Wang <wxiaoran@vmware.com> Co-authored-by: Chen Mulong <chenmulong@gmail.com>
Avoid dispatching reject map to segments when it is not changed
…pache#279) This commit fixed two bugs: - Previously, refresh_rejectmap() cleared all entries in rejectmap, including other databases' entries, which causes hardlimit can not to work correctly. - soft-limit rejectmap entries should not be added into disk_quota_reject_map on segments, otherwise, these entries may remain in segments and trigger the soft-limit incorrectly. Co-authored-by: Chen Mulong <chenmulong@gmail.com>
When a database's diskquota bgworker is killed and the db is dropped, diskquota scheduler can not work properly. The cause is: if the scheduler failed to start a bgworker for a database, it will try it again and again forever. A different status code is returned when failing to start bg worker. And if it is failed due to the dropped database (or another other reasons causes db name cannot be retrieved from db id), just skip this bgwoker for now For other failure reasons, limit the times of starting a bgworker for a database to 3 times. If the limit is reached, skip it and pick the next one.
- Drop the table space before rm directory. - '-f' to alway force rm. - '-- start-ignore' doesn't seem to be working with retcode since retcode will add '-- start/stop-ignore' pair automatically to ignore the output, and the nested start/stop ignore doesn't seem to be handled well by the ancient perl script. Refer to 'src/test/isolation2/sql_isolation_testcase.py'. Seen flaky tests as below: root@96831b9f-9150-4424-63a7-abe8f18c144e:/tmp# cat /home/gpadmin/diskquota_artifacts/tests/isolation2/regression.diffs --- \/tmp\/build\/4eceba44\/bin_diskquota\/tests\/isolation2\/expected\/test_fast_quota_view\.out 2022-12-12 13:20:56.729354016 +0000 +++ \/tmp\/build\/4eceba44\/bin_diskquota\/tests\/isolation2\/results\/test_fast_quota_view\.out 2022-12-12 13:20:56.733354401 +0000 @@ -175,9 +175,11 @@ (exited with code 0) !\retcode rm -r /tmp/spc2; GP_IGNORE:-- start_ignore +GP_IGNORE:rm: cannot remove '/tmp/spc2/6/GPDB_6_301908232/16384/16413': No such file or directory +GP_IGNORE:rm: cannot remove '/tmp/spc2/5/GPDB_6_301908232/16384/16413': No such file or directory GP_IGNORE: GP_IGNORE:-- end_ignore -(exited with code 0) +(exited with code 1) -- end_ignore DROP TABLESPACE IF EXISTS spc1; DROP
Currently, isolation2/test_rejectmap.sql is flaky if we run isolation2 test multiple times. That's because we set the GUC 'diskquota.hard_limit' to 'on' in test_postmaster_restart.sql and forget to set it to 'off'. In the next following runs, the hard limit is enabled and the QD will continuously dispatch reject map to segment servers. However, test_rejectmap.sql requires the hard limit being disabled because we're dispatching rejectmap by UDF manually or the dispatched rejectmap will be cleared by QD. This patch adds a new injection point to prevent QD from dispatching rejectmap to make test_rejectmap.sql stateless. This patch also set 'diskquota.hard_limit' to 'off' when test_postmaster_restart.sql finishes.
* Fix diskquota on gpdb7 - Fix some compile issues, especially relstorage has been removed on gpdb7. Using relam to get the relation's storage type. - Modify diskquota hash function flag. - Fix diskquota_relation_open(). NoLock is disabled on gpdb7. - Add tests schedule and expected results for gpdb7. - Update some test expectations on gpdb7 due to AO/CO issue: As something changes about AO/CO table, the size of them is changed. - Disable some tests on gpdb7. - Disable upgrade tests. - Upgrade to diskquota 2.2. Add attribute relam to type relation_cache_detail and add a param to function relation_size_local. - Add setup.sql and setup.out for isolation2 test. - Fix bug: gpstart timeout for gpdb7. We used to set `Gp_role = GP_ROLE_DISPATCH` in disk_quota_launcher_main(), even though postmaster boots in utility mode. This seems to be nothing in gpdb6, but it will cause a dead loop when booting gpdb7. In fact, there is nothing to do in utility mode for the diskquota launcher. In this commit, if `Gp_role != GP_ROLE_DISPATCH`, disk_quota_launcher_main() will simply exit. - Add gpdb7 pipeline support. Build gpdb7 by rocky8, and test the same build with rocky8 and rhel8. 'res_test_images' has been changed to list to support this. - Add gpdb version into the task name. 'passwd' is unnecessary and ' doesn't exist in the rocky8 build image. - Use `cmake -DENABLE_UPGRADE_TEST=OFF` to disable the upgrade test. - TODO: Add upgrade test to CI pipeline. Fix activate standby error on the CI pipeline. Fix tests for gpdb7. Co-authored-by: Xiaoran Wang <wxiaoran@vmware.com> Co-authored-by: Xing Guo <higuoxing@gmail.com>
Co-authored-by: Hao Zhang <hzhang2@vmware.com>
…#287) Usage: ``` cmake -DDISKQUOTA_FAULT_INJECTOR=ON/OFF [default: OFF] ``` Co-authored-by: Hao Zhang <hzhang2@vmware.com>
If fault injector is disabled, isolation2 will be disabled.
This reverts commit e3e73d2. Revert "Add an option to control whether compile with fault injector. (apache#287)" This reverts commit ae4ab48. Co-authored-by: Hao Zhang <hzhang2@vmware.com>
- Switch GPDB binary to release-candidate for release build. - Remove test_task from the release pipeline. Co-authored-by: Xing Guo <higuoxing@gmail.com>
Isolation2 compilation command is removed by apache#285. We add it into Regress.cmake in this commit. Co-authored-by: Xing Guo higuoxing@gmail.com
- Fix flaky test test_ctas_before_set_quota. pg_type will be an active table after `CREATE TABLE`. It does not affect the function of diskquota but makes the test results unstable. In fact, we do not care about the table size of the system catalog table. So we simply skip the active table oid of these tables. - Fix test_vacuum/test_truncate. gp_wait_until_triggered_fault should be called after gp_inject_fault_infinite with suspend flag. Co-authored-by: Xing Guo <higuoxing@gmail.com> Co-authored-by: Xiaoran Wang <wxiaoran@vmware.com>
When creating a new table, pg_type will be in active tables. Filter the system catalog table. And remove pause in the test.
Fix the following bugs: - Judgement condition for update_relation_cache should be `&&`, instead of `||` - The lock for relation_open/relation_close should be AccessShareLock for gpdb7. - For `truncate table`, we cannot get the table's oid by new relfilenode immediately after `file_create_hook` is finished. So we should keep the relfilenode in active_table_file_map and wait for the next loop to calculate the correct size for this table.
Revert some modification from apache#285. - Add last_released_diskquota_bin back for CI. - Enable upgradecheck. - Add -DDISKQUOTA_LAST_RELEASE_PATH for cmake.
Due to the release build change for GP7, the fault injector doesn't work with the release build. So, all the tests were temporally disabled for release pipelines. Since we switched to use the `--disable-debug-extensions` gpdb build, the fault injector is not available for the release pipeline. - Add 'EXCLUDE_FAULT_INJECT_TEST' to Regress.cmake, so it will be smart enough to check if there are any fault injector case in the give tests set. Ignore them if so. - Skip the fault injector tests for the release pipeline. - Enable the CI test task for GP7.
ORCA's window frame translation always emits a BETWEEN frame (start + end bound), so include FRAMEOPTION_BETWEEN alongside FRAMEOPTION_NONDEFAULT to match the executor's expectations.
…ated host (apache#1702) * Fix null dereference on dedicated hot standby coordinator getCdbComponentInfo() populates hostPrimaryCountHash with primary hosts only. When IS_HOT_STANDBY_QD() is true, mirror and standby hosts are also looked up in the hash but return NULL on dedicated standby nodes that host no primary segments. Replace Assert(found) with a null-safe check to prevent SIGSEGV.
…ill cause the database to be in an abnormal state" This reverts commit 3af9962.
This reverts commit fbd5e23.
…, fix duplicate totalExecuted
Three issues fixed:
1. pgstat_report_resgroup was commented out during PG16 merge (MERGE16_FIXME).
Restore all call sites so pg_stat_activity correctly shows rsgid/rsgname.
2. In check_and_unassign_from_resgroup, the Assign-to-Bypass transition calls
UnassignResGroup which clears st_rsgid, but pgstat_report_resgroup was not
called afterward to restore it. This caused pg_stat_activity to show
rsgname='unknown' for queries that switch from normal assign to bypass mode.
3. is_session_in_group plpython function used `ps -ef | grep con{session_id}`
to find process PIDs, but PG16 removed con{session_id} from process titles.
Empty grep result meant empty set, and empty.issubset(any) = True, so the
function always returned true. Fixed to use gp_stat_activity JOIN
gp_segment_configuration instead.
4. Remove duplicate totalExecuted++ in check_and_unassign_from_resgroup that
caused double-counting when a query transitions from Assign to Bypass state.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #ISSUE_Number
What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheckmake -C src/test installcheck-cbdb-parallelImpact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
CI Skip Instructions