Document automatic label application from monitors by max-trunk · Pull Request #656 · trunk-io/docs

max-trunk · 2026-05-13T18:50:17Z

Documents automatically labeling tests from monitors & cleans up other tangential areas to be more in line with the new de-coupled monitor / action framework

https://docs.trunk.io/~/revisions/YYsUK1OsdCh4dBJDroBD/

trunk-staging-io · 2026-05-13T18:50:21Z

😎 Merged successfully - details.

claude · 2026-05-13T18:50:49Z

Claude finished @max-trunk's task in 1m 59s —— View job

Claude PR Review

Gather context from changed files
Review for typos, grammar, formatting
Check documentation best practices
Post inline comments and final summary

Verdict: ✅ Looks good to merge. Nicely scoped, internally consistent, cross-links all resolve, and the new ## Action section follows a uniform template across all three monitor pages. No typos. No blocking issues.

Inline comments posted

README.md L19 — terminology nit: define "classifying monitor" explicitly on first use
failure-count-monitor.md L23 — -- vs — style consistency with the equivalent sentence in failure-rate-monitor.md
test-labels.md L44 — minor redundancy between adjacent sentences about Apply labels not changing health status

All three are optional polish, not blockers.

One pre-existing issue worth surfacing

flaky-tests/detection/failure-count-monitor.md has two ## Preview Panel headings (around lines 97 and 113 on main). This already existed before this PR, but the new ## Action section now sits between them, which makes the duplication more visible. Worth cleaning up in a follow-up: the second block has the richer content (Status Filter, Large Repo Truncation), so the fix is to merge that detail into the first ## Preview Panel and delete the second heading. Fix in a follow-up →

What's strong about this PR

The new content in test-labels.md covers the right things: how to choose the action, how to configure labels (including inline label creation), how labels surface in the UI, and the multi-source removal semantics. Each sub-question a reader is likely to have is answered in its own #### subsection — easy to skim.
The README.md rewrite handles the conceptual shift well: monitors now have an "action," and the status-calculation table is properly scoped to classifying monitors. The hint at L23 ("Monitors configured to apply labels do not contribute to this status calculation") preempts the natural follow-up question.
Reverse links from each monitor doc back to test-labels.md#automatic-labeling-from-monitors are consistent in placement and wording across all three monitor pages.
The deletion-restriction hint in test-labels.md (a referenced label can't be deleted) is a good defensive UX detail to document.
· branch max/monitor-label-actions

github-actions

Auto-approved: Claude code review passed.

claude · 2026-05-13T18:58:23Z

Claude finished @max-trunk's task in 1m 28s —— View job

Claude PR Review (re-review after `pr feedback` commit)

Diff 543bd68 against prior review point
Re-read changed files
Check whether prior review comments were addressed
Look for new typos, grammar, formatting, doc issues

Verdict: ✅ Looks good to merge. The pr feedback commit cleanly addresses every comment from the previous review and goes a step further by fixing the pre-existing issue I flagged as a follow-up.

What was addressed

Previous comment	Fix	Status
`README.md:13` — define "classifying monitor" on first use	Added "(referred to below as `_classifying monitors_`)" parenthetical at first mention. Italic style matches the existing `_fails on retry on PR branches_` example in `test-labels.md:35`.	✅
`failure-count-monitor.md:23` — `--` vs `—` style consistency with `failure-rate-monitor.md:13`	Replaced `--` with `—`. The two sentences are now matched templates.	✅
`test-labels.md:44` — redundancy between the two sentences	Dropped the redundant second sentence; line now reads cleanly: "A monitor uses one action or the other, not both."	✅
(Pre-existing, flagged as follow-up) duplicate `## Preview Panel` headings in `failure-count-monitor.md`	Fixed in this PR — second heading removed, Status Filter + Large Repo Truncation are now `###` subsections under a single Preview Panel, and the `## Action` section is correctly placed after.	✅ Bonus

No new issues

I re-scanned the diff for typos, broken cross-links, dangling refs, and style drift — nothing to flag. All five files are in a clean shippable state.
· branch max/monitor-label-actions

github-actions

Auto-approved: Claude code review passed.

claude · 2026-05-13T19:39:41Z

Claude finished @max-trunk's task in 2m 2s —— View job

Claude PR Review (re-review after `cleanup` commit)

Diff 607ce7e against prior review point
Re-read changed files
Check for new typos, grammar, formatting, doc issues
Post final summary

Verdict: ✅ Looks good to merge. The cleanup commit is a structural improvement — it removes redundancy and reorganizes "Detection type" as a sub-attribute of "Action: Classify test status", which is the correct conceptual hierarchy now that actions exist.

What changed in `607ce7e`

File	Change	Assessment
`README.md:27`	Disable/delete paragraph rewritten to cover both classifying and labeling monitor cases (incl. Remove these labels when the monitor resolves mention)	✅ Reads cleanly, more accurate
`failure-count-monitor.md`	Removed standalone `## Detection Type` and `## Resolution Behavior` sections; folded detection types under `### Action › #### Classify test status`; merged resolution-behavior prose into `### Resolution Timeout`	✅ Less duplication, Action is now the proper parent concept
`failure-rate-monitor.md`	Same restructure: removed `## Detection Type` and `## Resolution Behavior`; Action moved under Configuration with detection types as sub-bullets	✅ Consistent with failure-count
`pass-on-retry-monitor.md`	Added Action row to the config table; `## Action` → `### Action` with bullets for the two options	✅ Now matches the other monitors' structure

Notes (all non-blocking)

failure-count-monitor.md L31 — The example config table lists Detection type \| Broken but no Action row. With the restructure, "Detection type" is now logically a sub-attribute of "Action: Classify test status". Adding Action \| Classify test status above the detection-type row would make the table match the new conceptual hierarchy. Tiny nit — fine either way since Classify is the default.
Action subsection style differs across monitors — failure-count and failure-rate use #### Classify test status / #### Apply labels subheadings; pass-on-retry uses bullet items inside ### Action. Justified by content asymmetry (pass-on-retry's options are one-liners, the others have detection-type detail), so I'd leave as-is — flagging only so you know it's a deliberate choice.

Pre-existing follow-up (not from this PR)

failure-count-monitor.md L128 and L130 in the Choosing Between Monitors table reference **Threshold** (the old name for what is now the failure rate monitor). This predates this PR — it's on main already — but it's the only remaining spot where the old name leaks through. Fix in a follow-up →

Verdict on this PR

Five files are in a clean, internally consistent, shippable state. No typos, no broken cross-links or anchors (#action resolves correctly in all three monitor files). Ready to merge.
· branch max/monitor-label-actions

github-actions

Auto-approved: Claude code review passed.

claude · 2026-05-13T19:43:22Z

Claude finished @max-trunk's task in 2m 6s —— View job

Claude PR Review (re-review on `ready_for_review`)

Diff current HEAD against origin/main
Re-read all five changed files end-to-end
Check for typos, grammar, formatting, doc style
Verify cross-links and anchors resolve
Post final summary

Verdict: ✅ Looks good to merge. Three prior review passes have already caught and resolved the substantive issues. This pass surfaced one small new style nit and re-flags one pre-existing item; neither blocks.

New observation from this pass

pass-on-retry-monitor.md:13 — the rewritten "How It Works" intro now uses a colon (:) before "by default", while the parallel sentences in failure-rate-monitor.md:13 and failure-count-monitor.md:23 use an em dash (—). It also flips the construction ("the test is flagged…" vs the gerund form the other two use). Since these three sentences are now near-identical templates, aligning them would make the family read consistently. Inline comment with suggestion →

Pre-existing items not introduced by this PR

failure-count-monitor.md:31 — the example configuration table lists Detection type | Broken but no Action row. After the cleanup commit's restructure, "Detection type" is now logically a sub-attribute of Action: Classify test status. Adding an Action | Classify test status row above the detection-type row would make the example match the new conceptual hierarchy. Fine either way since Classify is the default. Fix in a follow-up →
failure-count-monitor.md L128 & L130 — the "Choosing Between Monitors" table still references **Threshold** (the old name for the failure rate monitor). Pre-existing on main; flagged in the prior review as a follow-up. Fix in a follow-up →

What's in good shape

No typos.
All #action and #automatic-labeling-from-monitors anchors resolve.
"Classifying monitor" terminology is introduced explicitly on first use (README.md:13) per the earlier review.
failure-count and failure-rate share a near-identical ### Action › #### Classify test status / #### Apply labels structure. pass-on-retry uses a flatter bullet form, which is justified by its single-option asymmetry.
Reverse links from each monitor doc back to test-labels.md#automatic-labeling-from-monitors are consistent in placement and wording.
Heading-case mix (Title Case in README.md + monitor docs vs. sentence case in test-labels.md) is pre-existing per-file style, not drift introduced here.
· branch max/monitor-label-actions

claude · 2026-05-13T19:45:15Z

 ## How It Works

-The monitor continuously scans your test runs looking for commits where a test has both a failure and a success. When it finds one, the test is flagged as flaky.
+The monitor continuously scans your test runs looking for commits where a test has both a failure and a success. When it finds one, the monitor activates on that test and runs its configured [action](#action): by default, the test is flagged as flaky.


Minor style consistency: this sentence introduces the "by default" clause with a colon (:), but the parallel sentences in the other two monitor docs use an em dash (—):

failure-rate-monitor.md:13 — "…runs its configured action — by default, flagging the test as flaky or broken."

failure-count-monitor.md:23 — "…runs its configured action — by default, flagging the test as flaky or broken."

Now that all three "How It Works" intros are written as the same template, aligning punctuation (and ideally construction — flagging the test as flaky vs the test is flagged as flaky) would make them read as siblings.

Suggested change

The monitor continuously scans your test runs looking for commits where a test has both a failure and a success. When it finds one, the monitor activates on that test and runs its configured [action](#action): by default, the test is flagged as flaky.

The monitor continuously scans your test runs looking for commits where a test has both a failure and a success. When it finds one, the monitor activates on that test and runs its configured [action](#action) — by default, flagging the test as flaky.

Non-blocking polish.

github-actions

Auto-approved: Claude code review passed.

TylerJang27

Love how explicit we're being here

TylerJang27 · 2026-05-14T04:57:45Z

We should add a section here or to the "how monitors work" detection/README.md on dry-running classify monitors with labels as a supported use case

claude · 2026-05-14T17:51:53Z

Claude finished @max-trunk's task in 1m 30s —— View job

Claude PR Review (re-review after `add section on dry-running` commit)

Diff 38f94ad against prior review point
Re-read changed files
Check for typos, grammar, formatting, doc issues
Verify cross-links and anchors resolve

Verdict: ✅ Looks good to merge. The add section on dry-running commit cleanly addresses TylerJang27's review comment by adding a ## Dry-Running with Labels section to flaky-tests/detection/README.md and reverse-linking it from test-labels.md#automatic-labeling-from-monitors.

What changed in `38f94ad`

File	Change	Assessment
`README.md` (+13)	New `## Dry-Running with Labels` section between "Monitor Types" and "Branch-Aware Detection"	✅ Logical placement — it's a workflow-level guide that pairs naturally with the monitor-types overview
`test-labels.md` (+1/-1)	Appended a sentence to the "Automatic labeling from monitors" intro pointing at the new section	✅ Good cross-reference; keeps the dry-run framing discoverable from the management side

What works well in the new section

The four-step numbered flow (create with Apply labels → observe → refine → switch to Classify) is concrete and easy to follow.
The example label would-be-flaky is a nice, self-explanatory placeholder.
The closing paragraph contrasts the Preview Panel (static, configuration-time) against a label dry-run (live, ongoing) — that's exactly the right framing for a reader wondering "why not just use the preview?".
The cross-link anchor #dry-running-with-labels resolves correctly given the ## Dry-Running with Labels heading.

Notes (all non-blocking)

README.md L45 — the new section uses the noun phrase "labeling monitor" implicitly, in the same way "classifying monitor" was used implicitly before the earlier review pass. The earlier feedback led to defining classifying monitor parenthetically on first use at L13 ("(referred to below as _classifying monitors_)"). For symmetry, you could do the same the first time "labeling monitor" appears here — e.g. "…by deploying it as a monitor with the Apply labels action (a labeling monitor) first." Self-evident enough to skip if you prefer; flagging only because the analogous treatment exists for the other term.
README.md L52 — "Let the monitor run for a few cycles" — "cycles" isn't defined anywhere else in the doc. A reader familiar with monitors may parse this fine, but "for a few days" or "for a few evaluation runs" is slightly more concrete. Tiny nit.

Pre-existing items still outstanding (not from this PR)

failure-count-monitor.md:31 — example config table lists Detection type but no Action row after the restructure. Fine either way since Classify is the default. Fix in a follow-up →
failure-count-monitor.md L128 & L130 — "Choosing Between Monitors" table still references **Threshold** (the old name for failure rate). Pre-existing on main. Fix in a follow-up →
pass-on-retry-monitor.md:13 — minor punctuation/construction drift from failure-rate-monitor.md:13 and failure-count-monitor.md:23. Already raised inline in the prior review, left to author discretion.

What's in good shape across the full PR

No typos.
All cross-links and anchors resolve (#action, #automatic-labeling-from-monitors, #dry-running-with-labels, #manage-labels).
"Classifying monitor" terminology is introduced explicitly on first use.
Action subsection structure is consistent between failure-count and failure-rate; pass-on-retry uses a flatter form justified by its single-option asymmetry.
· branch max/monitor-label-actions ·

github-actions

Auto-approved: Claude code review passed.

monitor label actions

17cece7

claude Bot reviewed May 13, 2026

View reviewed changes

Comment thread flaky-tests/detection/README.md

claude Bot reviewed May 13, 2026

View reviewed changes

Comment thread flaky-tests/detection/failure-count-monitor.md Outdated

claude Bot reviewed May 13, 2026

View reviewed changes

Comment thread flaky-tests/management/test-labels.md Outdated

github-actions Bot approved these changes May 13, 2026

View reviewed changes

pr feedback

543bd68

max-trunk changed the title ~~monitor label actions~~ Document automatic label application from monitors May 13, 2026

github-actions Bot approved these changes May 13, 2026

View reviewed changes

cleanup

607ce7e

github-actions Bot approved these changes May 13, 2026

View reviewed changes

max-trunk marked this pull request as ready for review May 13, 2026 19:42

max-trunk requested review from TylerJang27, acatxnamedvirtue, dfrankland and samgutentag May 13, 2026 19:43

This was referenced May 13, 2026

docs(flaky-tests): document automatic label application from monitors #651

Closed

docs(flaky-tests): document monitor action type selection for failure rate and failure count monitors #655

Closed

claude Bot reviewed May 13, 2026

View reviewed changes

github-actions Bot approved these changes May 13, 2026

View reviewed changes

TylerJang27 approved these changes May 14, 2026

View reviewed changes

acatxnamedvirtue approved these changes May 14, 2026

View reviewed changes

add section on dry-running

38f94ad

github-actions Bot approved these changes May 14, 2026

View reviewed changes

trunk-staging-io Bot merged commit 7546810 into main May 14, 2026
5 of 7 checks passed

trunk-staging-io Bot deleted the max/monitor-label-actions branch May 14, 2026 17:57

	The monitor continuously scans your test runs looking for commits where a test has both a failure and a success. When it finds one, the monitor activates on that test and runs its configured [action](#action): by default, the test is flagged as flaky.
	The monitor continuously scans your test runs looking for commits where a test has both a failure and a success. When it finds one, the monitor activates on that test and runs its configured [action](#action) — by default, flagging the test as flaky.

Conversation

max-trunk commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trunk-staging-io Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude PR Review

Inline comments posted

One pre-existing issue worth surfacing

What's strong about this PR

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude PR Review (re-review after pr feedback commit)

What was addressed

No new issues

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude PR Review (re-review after cleanup commit)

What changed in 607ce7e

Notes (all non-blocking)

Pre-existing follow-up (not from this PR)

Verdict on this PR

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude PR Review (re-review on ready_for_review)

New observation from this pass

Pre-existing items not introduced by this PR

What's in good shape

Uh oh!

claude Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

TylerJang27 left a comment

Choose a reason for hiding this comment

Uh oh!

TylerJang27 May 14, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude PR Review (re-review after add section on dry-running commit)

What changed in 38f94ad

What works well in the new section

Notes (all non-blocking)

Pre-existing items still outstanding (not from this PR)

What's in good shape across the full PR

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

max-trunk commented May 13, 2026 •

edited

Loading

trunk-staging-io Bot commented May 13, 2026 •

edited

Loading

claude Bot commented May 13, 2026 •

edited

Loading

claude Bot commented May 13, 2026 •

edited

Loading

Claude PR Review (re-review after `pr feedback` commit)

claude Bot commented May 13, 2026 •

edited

Loading

Claude PR Review (re-review after `cleanup` commit)

What changed in `607ce7e`

claude Bot commented May 13, 2026 •

edited

Loading

Claude PR Review (re-review on `ready_for_review`)

claude Bot commented May 14, 2026 •

edited

Loading

Claude PR Review (re-review after `add section on dry-running` commit)

What changed in `38f94ad`