Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion welcome.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ That problem has existed for years. What makes it urgent now is volume. AI agent

At tens of thousands of tests, even a 1% flake rate means false failures on nearly every run. Each flake costs 10 to 15 minutes: the developer waits, reads logs, reruns, confirms it was noise. If your CI target is five-minute PR jobs, every flake doubles or triples that.

Trunk detects flakes through branch-aware analysis that treats main, PRs, and merge queues differently. We fingerprint failure modes using stack trace embeddings, which means quarantine decisions are based on the actual failure pattern, not just the test name. If a quarantined test fails with a known flaky pattern, CI passes. If it fails in a new way, CI fails normally. Business-critical tests can be pinned as never-quarantine. Developers see all of this in PR comments: what failed, why, and whether it's their code or a known issue. No code changes required.
Trunk detects flakes through branch-aware analysis that treats main, PRs, and merge queues differently. We fingerprint failure modes using stack trace embeddings and surface those differences for quick triage. We quarantine flaky tests so that if a quarantined test fails, CI passes. Business-critical tests can be pinned as never-quarantine. Developers see all of this in PR comments: what failed, why, and whether it's their code or a known issue. No code changes required.

On the repair side, we're working with design partners on AI-powered fixing through MCP integration with the likes of Claude Code, Codex, or Cursor. Trunk provides the failure data and CI context, and the agent uses that to iterate on the actual fix. Think of it as a Roomba for flaky tests. Teams in the program are already running it to detect flakes, figure out root causes, and submit fixes without a human in the loop.

Expand Down
Loading