Skip to content

fix: increase bottlecap compile timeout and add retries to integration tests#1177

Merged
jchrostek-dd merged 2 commits intomainfrom
jc/increase-gitlab-ci-timeouts-and-retries
Apr 8, 2026
Merged

fix: increase bottlecap compile timeout and add retries to integration tests#1177
jchrostek-dd merged 2 commits intomainfrom
jc/increase-gitlab-ci-timeouts-and-retries

Conversation

@jchrostek-dd
Copy link
Copy Markdown
Contributor

@jchrostek-dd jchrostek-dd commented Apr 8, 2026

Summary

  • Increases bottlecap compile job timeout from 10m to 20m. This seems to be frequently timing out.
  • Adds retry: 2 to integration-suite to handle transient failures from AWS/Datadog API throttling

…n tests

- Increase bottlecap compile job timeout from 10m to 20m to reduce false timeout failures
- Add retry: 2 to integration-suite to handle transient AWS/Datadog API throttling

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 8, 2026 18:30
@jchrostek-dd jchrostek-dd requested a review from a team as a code owner April 8, 2026 18:30
timeout: 10m
retry:
max: 2
when:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't remove the when

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request aims to improve CI/CD reliability by addressing timeout issues in the bottlecap compilation job and adding retry logic to the integration-suite job. Specifically, it increases the bottlecap compilation timeout from 10 minutes to 20 minutes (to accommodate cold builds of the large Rust workspace) and adds retry configuration to the integration-suite job to handle transient AWS/Datadog API throttling failures.

Changes:

  • Increases bottlecap compile job timeout from 10m to 20m to prevent spurious timeouts on cold builds
  • Simplifies bottlecap retry configuration from explicit structured format to shorthand format (retry: 2)
  • Adds retry: 2 to the integration-suite job to handle transient API failures

# This job sometimes times out on GitLab for unclear reasons.
# Set a timeout with retries to work around this.
timeout: 20m
retry: 2
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplified retry configuration retry: 2 changes the retry behavior from the previous implementation. The original configuration only retried on stuck_or_timeout_failure, runner_system_failure, and script_failure. The simplified form retries on ALL failure conditions, which could cause unintended retries on failures that shouldn't be retried (e.g., configuration errors or other job failures). Consider explicitly specifying the when conditions to maintain the original behavior of retrying only on infrastructure/timeout issues.

Copilot uses AI. Check for mistakes.
stage: integration-tests
tags: ["arch:amd64"]
image: ${CI_DOCKER_TARGET_IMAGE}:${CI_DOCKER_TARGET_VERSION}
retry: 2
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration-suite job uses retry: 2 without specifying retry conditions. This will retry on ALL failure types, including legitimate test failures that shouldn't be retried. Given that the PR description indicates retries are intended to handle "transient failures from AWS/Datadog API throttling", consider using a more specific retry configuration with when conditions that targets infrastructure/timeout failures rather than test script failures.

Copilot uses AI. Check for mistakes.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@jchrostek-dd jchrostek-dd merged commit edeb5ab into main Apr 8, 2026
52 checks passed
@jchrostek-dd jchrostek-dd deleted the jc/increase-gitlab-ci-timeouts-and-retries branch April 8, 2026 19:57
zarirhamza pushed a commit that referenced this pull request Apr 9, 2026
…n tests (#1177)

## Summary
- Increases bottlecap compile job timeout from 10m to 20m. This seems to
be frequently timing out.
- Adds `retry: 2` to `integration-suite` to handle transient failures
from AWS/Datadog API throttling

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants