tici: increase timeout, and resource for e2e test#4164
tici: increase timeout, and resource for e2e test#4164CalvinNeo wants to merge 1 commit intoPingCAP-QE:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello @CalvinNeo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the stability and reliability of the end-to-end (e2e) tests by adjusting their execution environment. The changes primarily focus on preventing test failures due to resource constraints or insufficient execution time, ensuring a more robust continuous integration process. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
I have already done a preliminary review for you, and I hope to help you do a better job.
Summary:
This PR increases the timeout duration from 80 minutes to 100 minutes and raises the CPU resource requests and limits for the make ci-e2e presubmit job from 6 to 8 CPUs while keeping memory constant at 24Gi. The changes are straightforward adjustments to the Prow job configuration, likely to improve test reliability and performance under heavier workloads. The modifications are minimal and focused, with no obvious issues in syntax or structure.
Code Improvements
-
Resource Request and Limit Consistency (prow-jobs/pingcap-inc/tici/presubmits.yaml, lines ~110-116):
- Increasing both CPU requests and limits to 8 is fine, but verify that the cluster node capacity and quotas can accommodate this increase. If other jobs run concurrently, this could cause resource contention.
- Suggest adding a comment explaining why this increase is necessary (e.g., to reduce flaky test failures or handle increased workload), improving maintainability and clarity for future reviewers.
-
Timeout Increase Justification (prow-jobs/pingcap-inc/tici/presubmits.yaml, line ~10):
- Similarly, the timeout increase from 80m to 100m should ideally be accompanied by a comment stating the rationale. This avoids confusion about why the timeout was extended.
Best Practices
-
Documentation:
-
Add inline comments near the changed configuration lines, for example:
timeout: 100m # Increased timeout to reduce flaky failures in e2e tests under heavier load ... cpu: "8" # Increased CPU allocation to speed up e2e tests and avoid resource starvation
-
-
Testing Coverage:
- Although this is a config file change, confirm that there is a process to validate these resource and timeout changes (e.g., monitoring job stability after this change). Consider adding a note in PR description or commit message about planned validation steps.
No critical issues or bugs were detected given the nature of this PR. The changes are sound but would benefit from added context to improve long-term maintainability.
There was a problem hiding this comment.
Code Review
This pull request increases the global job timeout and the CPU resources for the e2e test job. The resource increase for the e2e test is specific and looks good. However, the timeout is increased for all jobs, which might not be ideal. I've left a comment with a suggestion to apply the timeout increase only to the e2e job to avoid masking potential performance issues in other jobs.
|
|
||
| decoration_config: &decoration_config | ||
| timeout: 80m | ||
| timeout: 100m |
There was a problem hiding this comment.
The PR title suggests the timeout increase is for the e2e test, but this change increases the timeout for all jobs (pull-verify, pull-it, and pull-e2e). A global timeout increase could mask performance regressions in other jobs. It would be better to apply this change only to the pull-e2e job.
You can achieve this by reverting this line to timeout: 80m and adding a specific decoration_config override to the pull-e2e job like this:
decoration_config:
<<: *decoration_config
timeout: 100m
No description provided.