docs(flaky-tests): document monitor action type selection for failure rate and failure count monitors#655
Conversation
When creating a failure rate or failure count monitor, users now choose an action type first -- either classify test status (flaky/broken) or apply labels. Add an Action Type section to both monitor pages explaining the two options and moving Detection Type into context as a classify-only setting. Shipped in v174 via trunk2#3945. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
|
Claude finished @samgutentag's task in 1m 43s —— View job Review
The new Action Type / Detection Type structure is solid and mirrors cleanly between the two pages. The link to
Nothing blocking — the substantive content is right and the changes are consistent with what's in trunk2#3945. Note that the companion update in
|
| @@ -1,5 +1,5 @@ | |||
| --- | |||
| description: Detect flaky or broken tests as soon as they accumulate a configured number of failures | |||
| description: Detect and classify or label tests as soon as they accumulate a configured number of failures | |||
There was a problem hiding this comment.
Frontmatter inconsistency between the two pages.
This page's description was updated to mention labeling ("Detect and classify or label tests…") but failure-rate-monitor.md:2 was left as the original "Detect flaky or broken tests…" and never updated to reflect the new apply-labels action. Either both should mention the labels path or neither should — currently the two sibling pages disagree about whether labeling is part of the monitor's purpose.
Also, the new phrasing "Detect and classify or label tests" parses awkwardly. A cleaner version that mirrors the H1 ("Failure Count Monitor") without the parse-ambiguity:
| description: Detect and classify or label tests as soon as they accumulate a configured number of failures | |
| description: Classify or label tests as soon as they accumulate a configured number of failures |
| Each failure count monitor has a **detection type** -- either **flaky** or **broken** -- which controls what status a test receives when the monitor flags it: | ||
| Applies only to monitors with the **Classify test status** action type. | ||
|
|
||
| Each classify-action failure count monitor has a **detection type** -- either **flaky** or **broken** -- which controls what status a test receives when the monitor flags it: |
There was a problem hiding this comment.
"classify-action" reads as a coined term that is never defined.
The previous line already scopes this paragraph to monitors with the Classify test status action type, so the adjective here is redundant and a bit jargony — readers may wonder if "classify-action monitor" is a formal name they should know. The matching line in failure-rate-monitor.md:24 has the same phrasing.
Suggest dropping the qualifier since the section intro already establishes the scope:
| Each classify-action failure count monitor has a **detection type** -- either **flaky** or **broken** -- which controls what status a test receives when the monitor flags it: | |
| Each such failure count monitor has a **detection type** -- either **flaky** or **broken** -- which controls what status a test receives when the monitor flags it: |
(And similarly on the failure-rate page.)
| When creating a failure count monitor, choose what action it takes when a test is flagged: | ||
|
|
||
| - **Classify test status** — marks the test as flaky or broken. This is the default and integrates with quarantine workflows and status-based filtering. | ||
| - **Apply labels** — tags matching tests with one or more labels when the monitor activates. Use this when you want to categorize tests automatically without changing their status. See [Automatic labeling from monitors](../management/test-labels.md#automatic-labeling-from-monitors) for details. |
There was a problem hiding this comment.
Minor wording mismatch: the intro says "when a test is flagged" but the Apply labels bullet then says "when the monitor activates." The label path doesn't flag tests — it labels them — so "flagged" is misleading for one of the two options. Consider:
| When creating a failure count monitor, choose what action it takes when a test is flagged: | |
| - **Classify test status** — marks the test as flaky or broken. This is the default and integrates with quarantine workflows and status-based filtering. | |
| - **Apply labels** — tags matching tests with one or more labels when the monitor activates. Use this when you want to categorize tests automatically without changing their status. See [Automatic labeling from monitors](../management/test-labels.md#automatic-labeling-from-monitors) for details. | |
| When creating a failure count monitor, choose what action it takes when the monitor matches a test: | |
| - **Classify test status** — marks the test as flaky or broken. This is the default and integrates with quarantine workflows and status-based filtering. | |
| - **Apply labels** — tags matching tests with one or more labels when the monitor activates. Use this when you want to categorize tests automatically without changing their status. See [Automatic labeling from monitors](../management/test-labels.md#automatic-labeling-from-monitors) for details. |
Same wording exists at failure-rate-monitor.md:13 and could be updated in parallel.
| The failure rate monitor detects tests based on failure rate over a rolling time window. Unlike pass-on-retry, which looks for a specific pattern on a single commit, the failure rate monitor identifies tests that fail too often over a period of time, even if no individual failure looks like a retry. | ||
|
|
||
| You can create multiple failure rate monitors with different configurations. This is how you tailor detection to different branches, test volumes, sensitivity levels, and detection types. | ||
| You can create multiple failure rate monitors with different configurations. This is how you tailor detection to different branches, test volumes, sensitivity levels, and action types. |
There was a problem hiding this comment.
Related to the description-frontmatter comment on the failure-count page: this file's description (line 2, outside the diff) still reads "Detect flaky or broken tests based on failure rate over a configurable time window" and was not updated to mention the new apply-labels action. The sibling page's description was updated. For consistency, consider updating this page's description here too, or reverting the failure-count one — they should agree.
|
superseded by #656 |
Summary
Shipped in v174 via trunk-io/trunk2#3945. Pairs with #651 which documents automatic labeling from the test-labels side.
Test plan
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Generated by Claude Code