Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 14 additions & 5 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,8 @@
"merge-queue/using-the-queue/monitor-queue-status",
"merge-queue/using-the-queue/handle-failed-pull-requests",
"merge-queue/using-the-queue/stacked-pull-requests",
"merge-queue/using-the-queue/emergency-pull-requests"
"merge-queue/using-the-queue/emergency-pull-requests",
"merge-queue/using-the-queue/force-merge"
]
},
{
Expand All @@ -111,7 +112,8 @@
"pages": [
"merge-queue/administration",
"merge-queue/administration/advanced-settings",
"merge-queue/administration/metrics"
"merge-queue/administration/metrics",
"merge-queue/administration/terraform"
]
},
{
Expand Down Expand Up @@ -198,13 +200,16 @@
"pages": [
"flaky-tests/detection",
"flaky-tests/detection/pass-on-retry-monitor",
"flaky-tests/detection/threshold-monitor",
"flaky-tests/detection/failure-rate-monitor",
"flaky-tests/detection/flag-as-flaky",
"flaky-tests/infrastructure-failure-protection",
"flaky-tests/the-importance-of-pr-test-results",
"flaky-tests/quarantining",
"flaky-tests/quarantine-service-availability",
"flaky-tests/github-pull-request-comments"
"flaky-tests/github-pull-request-comments",
"flaky-tests/test-labels",
"flaky-tests/autofix-flaky-tests",
"flaky-tests/autofix-ci-failures"
]
},
{
Expand Down Expand Up @@ -239,8 +244,11 @@
"flaky-tests/use-mcp-server/configuration/github-copilot-ide",
"flaky-tests/use-mcp-server/configuration/claude-code-cli",
"flaky-tests/use-mcp-server/configuration/gemini-cli",
"flaky-tests/use-mcp-server/configuration/bearer-auth",
"flaky-tests/use-mcp-server/mcp-tool-reference",
"flaky-tests/use-mcp-server/mcp-tool-reference/get-root-cause-analysis",
"flaky-tests/use-mcp-server/mcp-tool-reference/fix-flaky-test",
"flaky-tests/use-mcp-server/mcp-tool-reference/search-test",
"flaky-tests/use-mcp-server/mcp-tool-reference/investigate-ci-failure",
"flaky-tests/use-mcp-server/mcp-tool-reference/set-up-test-uploads"
]
}
Expand All @@ -258,6 +266,7 @@
"pages": [
"setup-and-administration/managing-your-organization",
"setup-and-administration/github-app-permissions",
"setup-and-administration/trunk-sudo-app",
"setup-and-administration/support",
"setup-and-administration/billing",
"setup-and-administration/security"
Expand Down
77 changes: 77 additions & 0 deletions flaky-tests/autofix-ci-failures.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
title: "Autofix CI Failures"
description: "Automatically analyze and fix CI failures using AI agents and Trunk's MCP server"
---
Trunk can return targeted information about CI failures, enabling AI agents and automation tools to analyze and fix issues automatically.

### Prerequisites

To use the Autofix CI Failures feature, you'll need to have:

* Your repository set up to [upload test results to Trunk](/flaky-tests/get-started)

### Cursor CI Autofix

You can set up a [Cursor Automation](https://cursor.com/automations) to automatically fix CI failures by connecting to Trunk's CI failure investigation data via MCP. This is an extension of the Cursor `CI Autofix` template.

Set up the Trunk MCP using [Bearer Authentication](/flaky-tests/use-mcp-server/configuration/bearer-auth).

<Accordion title="Recommended Cursor Automation Prompt">

```json
{
"name": "CI Autofix v1",
"description": "Detect CI failures on main and automatically open PRs",
"triggers": [
{
"git": {
"ciCompleted": {
"repos": [
"https://github.com/<repo>"
],
"condition": 1,
"ignoreBaseFailures": true
}
}
}
],
"actions": [
{
"gitPr": {}
},
{
"mcp": {
"server": {
"name": "trunk"
}
}
}
],
"prompts": [
{
"prompt": "Your task is to fix CI failures on PRs.\n\n# Deduplication\n\nTo avoid racing against other agents, before any investigation:\n1. Collect the names of ALL failing CI jobs/checks from the CI Status Report above.\n2. Calculate your memory filename: sort the failing jobs alphabetically, join with \"_\", then remove any characters that are not letters, digits, hyphens, underscores, or dots. Prepend \"ci-fail-\" and truncate to 64 characters total. This is the filename.\n3. Read the memory file with this filename.\n - If it exists and the timestamp inside is less than 30 minutes old, stop immediately — no branch, no Slack, no output.\n4. Else, write the memory file with the current unix timestamp.\n - If the write SUCCEEDS: you claimed this failure. Proceed with the investigation below.\n - If the write FAILS (version conflict): another agent claimed it first. Stop immediately — no branch, no Slack, no output.\n\n# Investigation\n\nRoot cause the CI failure. Call investigate-ci-failure on the trunk MCP in order to get information about the failing test by passing in the workflow URL. Use that to identify which tests to fix. Look at the error output returned by this tool. ONLY IF you need additional information, look at the CI run's logs.\n\n- If the CI failure is due to a bug introduced on that commit, create a new PR that fixes the bug. The PR should be stacked on the PR with the failure. Modify/ensure the base branch of the PR you create is the branch of the PR you are fixing.\n- If the CI failure is due to a flaky test, create a new PR that skips that test.\n- If you are not confident in either of these outcomes, then do nothing.\n\n# Output\n\nOutput your results in the following format:\n**CI Autofix Automation**\n\n**Failure logs**: <link to failing CI job>\n**Broken by**: <link to PR> (cc @prAuthor)\n**Reason**: <1-2 sentence explanation of why CI broke>\n**Fixed by**: <1-2 sentence explanation of what fixed it>\n\nMake sure to push the PR but don't include a PR link in your output — the system will generate that for you."
}
],
"memoryEnabled": true,
"scope": "team_editable_user",
"templateId": "ci-autofix"
}
```

</Accordion>

We recommend the following conventions:

* Version your Automation names for more clarity (e.g., "CI Autofix v1")
* Refine the prompt to avoid scanning GitHub logs in order to save time and tokens
* Be specific about your repository's conventions and common failure patterns

<Info>
Currently Cursor will create a pull request with a base of `main`. You will need to adjust the pull request base if you want to merge the fix into your PR.
</Info>

### Claude Code Routines

<Info>
**Coming soon.** Set up Claude Routines to autofix CI failures
</Info>
72 changes: 72 additions & 0 deletions flaky-tests/autofix-flaky-tests.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
title: "Autofix Flaky Tests"
description: "Automatically investigate flaky tests and raise fix pull requests with suggested solutions"
---
Trunk can automatically investigate flaky tests in your codebase and raise fix pull requests with suggested solutions.

### Prerequisites

To use the Autofix Flaky Tests feature, you'll need:

1. Beta access via waitlist (reach out to us on [Slack](https://slack.trunk.io))
2. The "Investigate Flaky Tests" setting enabled in your workspace
3. Active installation of the [Trunk GitHub App](/setup-and-administration/github-app-permissions)

### Auto-Investigate Flaky Tests

Once enabled, any time that Trunk [detects a flaky test](/flaky-tests/detection), Trunk analyzes the failure patterns, failure output, and git history of the test to provide a number of insights.

Flaky tests can also be analyzed manually via the UI and via the [MCP server](/flaky-tests/use-mcp-server/mcp-tool-reference/fix-flaky-test).

### Autofix with Cursor Automations

Whenever an investigation is completed, Trunk will emit a [webhook](/flaky-tests/webhooks) for `test_case.investigation_completed`. Enable webhooks via [Svix](/flaky-tests/webhooks).

You can then set up a [Cursor Automation](https://cursor.com/automations) to trigger when webhooks are received.

<Accordion title="Recommended Cursor Automation Config">

```json
{
"name": "Autofix Flaky Tests v1",
"triggers": [
{
"webhook": {}
}
],
"actions": [],
"prompts": [
{
"prompt": "Your task is to fix flaky tests in this repo using provided insights.\n\n# Filter\n\nIf the test does not include the repository html_url \"https://github.com/<repo>\", exit early and do nothing.\n\n# Root Cause\n\nThe payload will include metadata about the failing test as well as some insights about the flakiness.\n\n1. The markdown_summary field includes the most important insights and the first steps you should take to root cause the flaky tests.\n2. The facts field includes more findings from historical data about running the test.\n3. Remember that the test is flaky. Sometimes it passes and sometimes it fails. Use the investigation payload to target your analysis.\n4. Use the memory tool to capture any important findings as you analyze the codebase to root cause the flakiness, such as codebase structure or test patterns.\n\n## Antipatterns\n\n1. Identify the root cause of the flakiness of the test. Do not simply increase the test's timeout or change the assertion to be more generic.\n2. Do not attempt to fix flakiness in other tests, limit your analysis to this single test.\n3. Do not add new tests, fix the flaky test in the payload.\n4. If the test is not present on your stable branch, exit early.\n5. When modifying end to end tests, do not wait on internal API calls to resolve. Focus on the page state and what the end user sees.\n6. There may be additional reasons for test flakiness, such as nondeterministic seed data, noisy neighbors, or test order issues. Conduct a deep analysis for necessary evidence, do not terminate your analysis early.\n\n## Output\n\n1. Once you have identified the root cause of the test's flakiness, open a pull request to fix the PR.\n2. Title the Pull Request: \"[Cursor Fix Flaky Test]: <Description of Fix>\".\n3. Include 1 short paragraph about the fix and the supporting evidence in the pull request body. Include links to relevant files/pages that were relevant from the webhook payload and its facts.\n4. In a collapsible summary of the PR description, include the entire webhook payload you received."
}
],
"memoryEnabled": true,
"scope": "private",
"gitConfig": {
"repo": "https://github.com/<repo>",
"repos": [
"https://github.com/<repo>"
],
"branch": "main"
}
}
```

</Accordion>

We recommend the following conventions:

* Version your Automation names for more clarity.
* Configure the Svix endpoint with the Cursor Bearer token.
* Webhooks are configured for your entire organization, so you will need to use [Svix transformations](https://docs.svix.com/transformations) or filter out events that are not for your intended repository.
* Be specific about conventions and antipatterns for your repository. You will need to refine the Automation prompt to suit your needs.
* If your CI setup allows it, prompt Cursor to run the tests to verify them.

### What's next?

* Continue to monitor your tests to confirm the flaky test fixes are effective
* Investigations can be triggered and applied via [MCP](/flaky-tests/use-mcp-server/mcp-tool-reference/fix-flaky-test)

<Info>
**Coming soon.** Set up Claude Routines to autofix flaky tests
</Info>
12 changes: 6 additions & 6 deletions flaky-tests/detection.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ Each monitor independently observes your test runs and tracks two states per tes

| Priority | Status | Condition |
| -------- | ----------- | --------------------------------------------------------------------- |
| Highest | **Broken** | Any enabled broken-type threshold monitor is active for this test |
| Middle | **Flaky** | Any enabled flaky-type monitor (threshold or pass-on-retry) is active |
| Highest | **Broken** | Any enabled broken-type failure rate monitor is active for this test |
| Middle | **Flaky** | Any enabled flaky-type monitor (failure rate or pass-on-retry) is active |
| Lowest | **Healthy** | No active monitors |

If a test triggers both a broken monitor and a flaky monitor simultaneously, it shows as **Broken**. When the broken monitor resolves (e.g., you fix the regression and the failure rate drops), the test transitions to **Flaky** if a flaky monitor is still active, or to **Healthy** if no monitors remain active.
Expand All @@ -22,24 +22,24 @@ A test stays in its detected state until every relevant monitor that flagged it

When you disable or delete a monitor, it is immediately set to **resolved** for every test case in the repo. This triggers a status re-evaluation for all affected tests. If the disabled monitor was the only active monitor for a test, that test transitions to healthy. If other monitors are still active, the test remains in the most severe active state.

For example, if you have a broken threshold monitor and a flaky pass-on-retry monitor, and you disable the broken monitor, any test that was only flagged by the broken monitor will become healthy. A test flagged by both will transition from broken to flaky (because pass-on-retry is still active).
For example, if you have a broken failure rate monitor and a flaky pass-on-retry monitor, and you disable the broken monitor, any test that was only flagged by the broken monitor will become healthy. A test flagged by both will transition from broken to flaky (because pass-on-retry is still active).

## Monitor Types

| Monitor | What it detects | Detection type | Plan availability | Default state |
| -------------------------------------------------------------------------------------- | ----------------------------------------------------------------- | --------------- | ----------------- | ------------- |
| [**Pass-on-Retry**](/flaky-tests/detection/pass-on-retry-monitor) | A test fails then passes on the same commit (retry after failure) | Flaky | Team and above | Enabled |
| [**Threshold**](/flaky-tests/detection/threshold-monitor) | Failure rate exceeds a configured percentage over a time window | Flaky or Broken | Paid plans | Disabled |
| [**Failure Rate**](/flaky-tests/detection/failure-rate-monitor) | Failure rate exceeds a configured percentage over a time window | Flaky or Broken | Paid plans | Disabled |

You can run multiple monitors simultaneously. For example, you might use pass-on-retry to catch classic retry-based flakiness while also running threshold monitors scoped to different branches. A common pattern is to pair a broken-type threshold monitor (catching consistently failing tests) with a flaky-type threshold monitor (catching intermittently failing tests). See [Threshold Monitor: Recommended Configurations](/flaky-tests/detection/threshold-monitor#recommended-configurations) for details.
You can run multiple monitors simultaneously. For example, you might use pass-on-retry to catch classic retry-based flakiness while also running failure rate monitors scoped to different branches. A common pattern is to pair a broken-type failure rate monitor (catching consistently failing tests) with a flaky-type failure rate monitor (catching intermittently failing tests). See [Failure Rate Monitor: Recommended Configurations](/flaky-tests/detection/failure-rate-monitor#recommended-configurations) for details.

If you need to manually flag a test that automated monitors haven't caught, use [Flag as Flaky](/flaky-tests/detection/flag-as-flaky) from the test detail page.

## Branch-Aware Detection

Tests often behave differently depending on where they run. Failures on `main` are usually unexpected and signal flakiness. Failures on PR branches may be expected during active development. Merge queue failures are suspicious because the code has already passed PR checks.

Rather than applying a single set of branch rules automatically, Trunk gives you control over how detection treats different branches through **branch scoping** on threshold monitors. You can create separate monitors with different thresholds and windows for your stable branch, PR branches, and merge queue branches. See [Threshold Monitor: Recommended configurations](/flaky-tests/detection/threshold-monitor#recommended-configurations) for specific guidance.
Rather than applying a single set of branch rules automatically, Trunk gives you control over how detection treats different branches through **branch scoping** on failure rate monitors. You can create separate monitors with different thresholds and windows for your stable branch, PR branches, and merge queue branches. See [Failure Rate Monitor: Recommended configurations](/flaky-tests/detection/failure-rate-monitor#recommended-configurations) for specific guidance.

Pass-on-retry detection is branch-agnostic. It flags any test that fails and passes on the same commit, regardless of which branch the test ran on.

Expand Down
Loading