Skip to content

docs(flaky-tests): document monitor action type selection for failure rate and failure count monitors#655

Closed
samgutentag wants to merge 1 commit into
mainfrom
sam-gutentag/monitor-action-types-v174
Closed

docs(flaky-tests): document monitor action type selection for failure rate and failure count monitors#655
samgutentag wants to merge 1 commit into
mainfrom
sam-gutentag/monitor-action-types-v174

Conversation

@samgutentag
Copy link
Copy Markdown
Member

Summary

  • Adds an Action Type section to both the Failure Rate Monitor and Failure Count Monitor pages explaining the two options: Classify test status vs. Apply labels
  • Moves Detection Type (flaky/broken) into context as a setting that applies only to classify-action monitors
  • Updates the configuration tables and subsections in both pages to reflect the new creation form flow
  • Links to the Test Labels page for the apply-labels path

Shipped in v174 via trunk-io/trunk2#3945. Pairs with #651 which documents automatic labeling from the test-labels side.

Test plan

  • Verify Action Type section appears in correct location on both pages
  • Verify Detection Type section is clearly scoped to classify-action monitors
  • Check links to test-labels.md#automatic-labeling-from-monitors resolve

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Generated by Claude Code

When creating a failure rate or failure count monitor, users now choose
an action type first -- either classify test status (flaky/broken) or
apply labels. Add an Action Type section to both monitor pages explaining
the two options and moving Detection Type into context as a classify-only
setting.

Shipped in v174 via trunk2#3945.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@trunk-staging-io
Copy link
Copy Markdown

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 13, 2026

Claude finished @samgutentag's task in 1m 43s —— View job


Review

  • Read changed files
  • Check for typos, grammar, and formatting issues
  • Verify links resolve
  • Check structure and consistency between failure-rate and failure-count pages
  • Post review feedback

The new Action Type / Detection Type structure is solid and mirrors cleanly between the two pages. The link to test-labels.md#automatic-labeling-from-monitors resolves to an existing section. A few small things to address (posted as inline comments):

  1. Frontmatter description inconsistency between the two pages. Only failure-count-monitor.md's description was updated to mention labeling; failure-rate-monitor.md's description still says "Detect flaky or broken tests…". Pick one direction and apply it to both. The new failure-count phrasing ("Detect and classify or label tests") also parses awkwardly — a small reword would help.
  2. "classify-action" is a coined adjective that is never defined (appears once on each page right after the section's own scoping sentence). Since the section header already scopes the discussion, the qualifier is redundant — suggest simplifying.
  3. Wording mismatch in the Action Type intros. The intro says "when a test is flagged" but the Apply labels bullet says "when the monitor activates." The labels path doesn't flag, so a more neutral verb ("matches a test") avoids implying that labeling produces a flag.

Nothing blocking — the substantive content is right and the changes are consistent with what's in trunk2#3945. Note that the companion update in test-labels.md (the "Coming soon" hint at lines 35-37) presumably gets removed in #651.

Out of scope but worth flagging separately: failure-count-monitor.md has two ## Preview Panel sections (lines 117 and 129). That predates this PR.
Branch

@@ -1,5 +1,5 @@
---
description: Detect flaky or broken tests as soon as they accumulate a configured number of failures
description: Detect and classify or label tests as soon as they accumulate a configured number of failures
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frontmatter inconsistency between the two pages.

This page's description was updated to mention labeling ("Detect and classify or label tests…") but failure-rate-monitor.md:2 was left as the original "Detect flaky or broken tests…" and never updated to reflect the new apply-labels action. Either both should mention the labels path or neither should — currently the two sibling pages disagree about whether labeling is part of the monitor's purpose.

Also, the new phrasing "Detect and classify or label tests" parses awkwardly. A cleaner version that mirrors the H1 ("Failure Count Monitor") without the parse-ambiguity:

Suggested change
description: Detect and classify or label tests as soon as they accumulate a configured number of failures
description: Classify or label tests as soon as they accumulate a configured number of failures

Each failure count monitor has a **detection type** -- either **flaky** or **broken** -- which controls what status a test receives when the monitor flags it:
Applies only to monitors with the **Classify test status** action type.

Each classify-action failure count monitor has a **detection type** -- either **flaky** or **broken** -- which controls what status a test receives when the monitor flags it:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"classify-action" reads as a coined term that is never defined.

The previous line already scopes this paragraph to monitors with the Classify test status action type, so the adjective here is redundant and a bit jargony — readers may wonder if "classify-action monitor" is a formal name they should know. The matching line in failure-rate-monitor.md:24 has the same phrasing.

Suggest dropping the qualifier since the section intro already establishes the scope:

Suggested change
Each classify-action failure count monitor has a **detection type** -- either **flaky** or **broken** -- which controls what status a test receives when the monitor flags it:
Each such failure count monitor has a **detection type** -- either **flaky** or **broken** -- which controls what status a test receives when the monitor flags it:

(And similarly on the failure-rate page.)

Comment on lines +23 to +26
When creating a failure count monitor, choose what action it takes when a test is flagged:

- **Classify test status** — marks the test as flaky or broken. This is the default and integrates with quarantine workflows and status-based filtering.
- **Apply labels** — tags matching tests with one or more labels when the monitor activates. Use this when you want to categorize tests automatically without changing their status. See [Automatic labeling from monitors](../management/test-labels.md#automatic-labeling-from-monitors) for details.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor wording mismatch: the intro says "when a test is flagged" but the Apply labels bullet then says "when the monitor activates." The label path doesn't flag tests — it labels them — so "flagged" is misleading for one of the two options. Consider:

Suggested change
When creating a failure count monitor, choose what action it takes when a test is flagged:
- **Classify test status** — marks the test as flaky or broken. This is the default and integrates with quarantine workflows and status-based filtering.
- **Apply labels** — tags matching tests with one or more labels when the monitor activates. Use this when you want to categorize tests automatically without changing their status. See [Automatic labeling from monitors](../management/test-labels.md#automatic-labeling-from-monitors) for details.
When creating a failure count monitor, choose what action it takes when the monitor matches a test:
- **Classify test status** — marks the test as flaky or broken. This is the default and integrates with quarantine workflows and status-based filtering.
- **Apply labels** — tags matching tests with one or more labels when the monitor activates. Use this when you want to categorize tests automatically without changing their status. See [Automatic labeling from monitors](../management/test-labels.md#automatic-labeling-from-monitors) for details.

Same wording exists at failure-rate-monitor.md:13 and could be updated in parallel.

The failure rate monitor detects tests based on failure rate over a rolling time window. Unlike pass-on-retry, which looks for a specific pattern on a single commit, the failure rate monitor identifies tests that fail too often over a period of time, even if no individual failure looks like a retry.

You can create multiple failure rate monitors with different configurations. This is how you tailor detection to different branches, test volumes, sensitivity levels, and detection types.
You can create multiple failure rate monitors with different configurations. This is how you tailor detection to different branches, test volumes, sensitivity levels, and action types.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the description-frontmatter comment on the failure-count page: this file's description (line 2, outside the diff) still reads "Detect flaky or broken tests based on failure rate over a configurable time window" and was not updated to mention the new apply-labels action. The sibling page's description was updated. For consistency, consider updating this page's description here too, or reverting the failure-count one — they should agree.

@samgutentag
Copy link
Copy Markdown
Member Author

superseded by #656

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants