Skip to content

Document automatic label application from monitors#656

Merged
trunk-staging-io[bot] merged 4 commits into
mainfrom
max/monitor-label-actions
May 14, 2026
Merged

Document automatic label application from monitors#656
trunk-staging-io[bot] merged 4 commits into
mainfrom
max/monitor-label-actions

Conversation

@max-trunk
Copy link
Copy Markdown
Contributor

@max-trunk max-trunk commented May 13, 2026

Documents automatically labeling tests from monitors & cleans up other tangential areas to be more in line with the new de-coupled monitor / action framework

https://docs.trunk.io/~/revisions/YYsUK1OsdCh4dBJDroBD/

@trunk-staging-io
Copy link
Copy Markdown

trunk-staging-io Bot commented May 13, 2026

😎 Merged successfully - details.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 13, 2026

Claude finished @max-trunk's task in 1m 59s —— View job


Claude PR Review

  • Gather context from changed files
  • Review for typos, grammar, formatting
  • Check documentation best practices
  • Post inline comments and final summary

Verdict: ✅ Looks good to merge. Nicely scoped, internally consistent, cross-links all resolve, and the new ## Action section follows a uniform template across all three monitor pages. No typos. No blocking issues.

Inline comments posted

  • README.md L19 — terminology nit: define "classifying monitor" explicitly on first use
  • failure-count-monitor.md L23-- vs style consistency with the equivalent sentence in failure-rate-monitor.md
  • test-labels.md L44 — minor redundancy between adjacent sentences about Apply labels not changing health status

All three are optional polish, not blockers.

One pre-existing issue worth surfacing

flaky-tests/detection/failure-count-monitor.md has two ## Preview Panel headings (around lines 97 and 113 on main). This already existed before this PR, but the new ## Action section now sits between them, which makes the duplication more visible. Worth cleaning up in a follow-up: the second block has the richer content (Status Filter, Large Repo Truncation), so the fix is to merge that detail into the first ## Preview Panel and delete the second heading. Fix in a follow-up →

What's strong about this PR

  • The new content in test-labels.md covers the right things: how to choose the action, how to configure labels (including inline label creation), how labels surface in the UI, and the multi-source removal semantics. Each sub-question a reader is likely to have is answered in its own #### subsection — easy to skim.
  • The README.md rewrite handles the conceptual shift well: monitors now have an "action," and the status-calculation table is properly scoped to classifying monitors. The hint at L23 ("Monitors configured to apply labels do not contribute to this status calculation") preempts the natural follow-up question.
  • Reverse links from each monitor doc back to test-labels.md#automatic-labeling-from-monitors are consistent in placement and wording across all three monitor pages.
  • The deletion-restriction hint in test-labels.md (a referenced label can't be deleted) is a good defensive UX detail to document.
    · branch max/monitor-label-actions

Comment thread flaky-tests/detection/README.md
Comment thread flaky-tests/detection/failure-count-monitor.md Outdated
Comment thread flaky-tests/management/test-labels.md Outdated
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: Claude code review passed.

@max-trunk max-trunk changed the title monitor label actions Document automatic label application from monitors May 13, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 13, 2026

Claude finished @max-trunk's task in 1m 28s —— View job


Claude PR Review (re-review after pr feedback commit)

  • Diff 543bd68 against prior review point
  • Re-read changed files
  • Check whether prior review comments were addressed
  • Look for new typos, grammar, formatting, doc issues

Verdict: ✅ Looks good to merge. The pr feedback commit cleanly addresses every comment from the previous review and goes a step further by fixing the pre-existing issue I flagged as a follow-up.

What was addressed

Previous comment Fix Status
README.md:13 — define "classifying monitor" on first use Added "(referred to below as _classifying monitors_)" parenthetical at first mention. Italic style matches the existing _fails on retry on PR branches_ example in test-labels.md:35.
failure-count-monitor.md:23-- vs style consistency with failure-rate-monitor.md:13 Replaced -- with . The two sentences are now matched templates.
test-labels.md:44 — redundancy between the two sentences Dropped the redundant second sentence; line now reads cleanly: "A monitor uses one action or the other, not both."
(Pre-existing, flagged as follow-up) duplicate ## Preview Panel headings in failure-count-monitor.md Fixed in this PR — second heading removed, Status Filter + Large Repo Truncation are now ### subsections under a single Preview Panel, and the ## Action section is correctly placed after. ✅ Bonus

No new issues

I re-scanned the diff for typos, broken cross-links, dangling refs, and style drift — nothing to flag. All five files are in a clean shippable state.
· branch max/monitor-label-actions

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: Claude code review passed.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 13, 2026

Claude finished @max-trunk's task in 2m 2s —— View job


Claude PR Review (re-review after cleanup commit)

  • Diff 607ce7e against prior review point
  • Re-read changed files
  • Check for new typos, grammar, formatting, doc issues
  • Post final summary

Verdict: ✅ Looks good to merge. The cleanup commit is a structural improvement — it removes redundancy and reorganizes "Detection type" as a sub-attribute of "Action: Classify test status", which is the correct conceptual hierarchy now that actions exist.

What changed in 607ce7e

File Change Assessment
README.md:27 Disable/delete paragraph rewritten to cover both classifying and labeling monitor cases (incl. Remove these labels when the monitor resolves mention) ✅ Reads cleanly, more accurate
failure-count-monitor.md Removed standalone ## Detection Type and ## Resolution Behavior sections; folded detection types under ### Action › #### Classify test status; merged resolution-behavior prose into ### Resolution Timeout ✅ Less duplication, Action is now the proper parent concept
failure-rate-monitor.md Same restructure: removed ## Detection Type and ## Resolution Behavior; Action moved under Configuration with detection types as sub-bullets ✅ Consistent with failure-count
pass-on-retry-monitor.md Added Action row to the config table; ## Action### Action with bullets for the two options ✅ Now matches the other monitors' structure

Notes (all non-blocking)

  • failure-count-monitor.md L31 — The example config table lists Detection type \| Broken but no Action row. With the restructure, "Detection type" is now logically a sub-attribute of "Action: Classify test status". Adding Action \| Classify test status above the detection-type row would make the table match the new conceptual hierarchy. Tiny nit — fine either way since Classify is the default.
  • Action subsection style differs across monitorsfailure-count and failure-rate use #### Classify test status / #### Apply labels subheadings; pass-on-retry uses bullet items inside ### Action. Justified by content asymmetry (pass-on-retry's options are one-liners, the others have detection-type detail), so I'd leave as-is — flagging only so you know it's a deliberate choice.

Pre-existing follow-up (not from this PR)

failure-count-monitor.md L128 and L130 in the Choosing Between Monitors table reference **Threshold** (the old name for what is now the failure rate monitor). This predates this PR — it's on main already — but it's the only remaining spot where the old name leaks through. Fix in a follow-up →

Verdict on this PR

Five files are in a clean, internally consistent, shippable state. No typos, no broken cross-links or anchors (#action resolves correctly in all three monitor files). Ready to merge.
· branch max/monitor-label-actions

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: Claude code review passed.

@max-trunk max-trunk marked this pull request as ready for review May 13, 2026 19:42
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 13, 2026

Claude finished @max-trunk's task in 2m 6s —— View job


Claude PR Review (re-review on ready_for_review)

  • Diff current HEAD against origin/main
  • Re-read all five changed files end-to-end
  • Check for typos, grammar, formatting, doc style
  • Verify cross-links and anchors resolve
  • Post final summary

Verdict: ✅ Looks good to merge. Three prior review passes have already caught and resolved the substantive issues. This pass surfaced one small new style nit and re-flags one pre-existing item; neither blocks.

New observation from this pass

  • pass-on-retry-monitor.md:13 — the rewritten "How It Works" intro now uses a colon (:) before "by default", while the parallel sentences in failure-rate-monitor.md:13 and failure-count-monitor.md:23 use an em dash (). It also flips the construction ("the test is flagged…" vs the gerund form the other two use). Since these three sentences are now near-identical templates, aligning them would make the family read consistently. Inline comment with suggestion →

Pre-existing items not introduced by this PR

  • failure-count-monitor.md:31 — the example configuration table lists Detection type | Broken but no Action row. After the cleanup commit's restructure, "Detection type" is now logically a sub-attribute of Action: Classify test status. Adding an Action | Classify test status row above the detection-type row would make the example match the new conceptual hierarchy. Fine either way since Classify is the default. Fix in a follow-up →
  • failure-count-monitor.md L128 & L130 — the "Choosing Between Monitors" table still references **Threshold** (the old name for the failure rate monitor). Pre-existing on main; flagged in the prior review as a follow-up. Fix in a follow-up →

What's in good shape

  • No typos.
  • All #action and #automatic-labeling-from-monitors anchors resolve.
  • "Classifying monitor" terminology is introduced explicitly on first use (README.md:13) per the earlier review.
  • failure-count and failure-rate share a near-identical ### Action › #### Classify test status / #### Apply labels structure. pass-on-retry uses a flatter bullet form, which is justified by its single-option asymmetry.
  • Reverse links from each monitor doc back to test-labels.md#automatic-labeling-from-monitors are consistent in placement and wording.
  • Heading-case mix (Title Case in README.md + monitor docs vs. sentence case in test-labels.md) is pre-existing per-file style, not drift introduced here.
    · branch max/monitor-label-actions

## How It Works

The monitor continuously scans your test runs looking for commits where a test has both a failure and a success. When it finds one, the test is flagged as flaky.
The monitor continuously scans your test runs looking for commits where a test has both a failure and a success. When it finds one, the monitor activates on that test and runs its configured [action](#action): by default, the test is flagged as flaky.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style consistency: this sentence introduces the "by default" clause with a colon (:), but the parallel sentences in the other two monitor docs use an em dash ():

  • failure-rate-monitor.md:13 — "…runs its configured action by default, flagging the test as flaky or broken."
  • failure-count-monitor.md:23 — "…runs its configured action by default, flagging the test as flaky or broken."

Now that all three "How It Works" intros are written as the same template, aligning punctuation (and ideally construction — flagging the test as flaky vs the test is flagged as flaky) would make them read as siblings.

Suggested change
The monitor continuously scans your test runs looking for commits where a test has both a failure and a success. When it finds one, the monitor activates on that test and runs its configured [action](#action): by default, the test is flagged as flaky.
The monitor continuously scans your test runs looking for commits where a test has both a failure and a success. When it finds one, the monitor activates on that test and runs its configured [action](#action)by default, flagging the test as flaky.

Non-blocking polish.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: Claude code review passed.

Copy link
Copy Markdown
Contributor

@TylerJang27 TylerJang27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love how explicit we're being here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a section here or to the "how monitors work" detection/README.md on dry-running classify monitors with labels as a supported use case

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 14, 2026

Claude finished @max-trunk's task in 1m 30s —— View job


Claude PR Review (re-review after add section on dry-running commit)

  • Diff 38f94ad against prior review point
  • Re-read changed files
  • Check for typos, grammar, formatting, doc issues
  • Verify cross-links and anchors resolve

Verdict: ✅ Looks good to merge. The add section on dry-running commit cleanly addresses TylerJang27's review comment by adding a ## Dry-Running with Labels section to flaky-tests/detection/README.md and reverse-linking it from test-labels.md#automatic-labeling-from-monitors.

What changed in 38f94ad

File Change Assessment
README.md (+13) New ## Dry-Running with Labels section between "Monitor Types" and "Branch-Aware Detection" ✅ Logical placement — it's a workflow-level guide that pairs naturally with the monitor-types overview
test-labels.md (+1/-1) Appended a sentence to the "Automatic labeling from monitors" intro pointing at the new section ✅ Good cross-reference; keeps the dry-run framing discoverable from the management side

What works well in the new section

  • The four-step numbered flow (create with Apply labels → observe → refine → switch to Classify) is concrete and easy to follow.
  • The example label would-be-flaky is a nice, self-explanatory placeholder.
  • The closing paragraph contrasts the Preview Panel (static, configuration-time) against a label dry-run (live, ongoing) — that's exactly the right framing for a reader wondering "why not just use the preview?".
  • The cross-link anchor #dry-running-with-labels resolves correctly given the ## Dry-Running with Labels heading.

Notes (all non-blocking)

  • README.md L45 — the new section uses the noun phrase "labeling monitor" implicitly, in the same way "classifying monitor" was used implicitly before the earlier review pass. The earlier feedback led to defining classifying monitor parenthetically on first use at L13 ("(referred to below as _classifying monitors_)"). For symmetry, you could do the same the first time "labeling monitor" appears here — e.g. "…by deploying it as a monitor with the Apply labels action (a labeling monitor) first." Self-evident enough to skip if you prefer; flagging only because the analogous treatment exists for the other term.
  • README.md L52"Let the monitor run for a few cycles" — "cycles" isn't defined anywhere else in the doc. A reader familiar with monitors may parse this fine, but "for a few days" or "for a few evaluation runs" is slightly more concrete. Tiny nit.

Pre-existing items still outstanding (not from this PR)

  • failure-count-monitor.md:31 — example config table lists Detection type but no Action row after the restructure. Fine either way since Classify is the default. Fix in a follow-up →
  • failure-count-monitor.md L128 & L130 — "Choosing Between Monitors" table still references **Threshold** (the old name for failure rate). Pre-existing on main. Fix in a follow-up →
  • pass-on-retry-monitor.md:13 — minor punctuation/construction drift from failure-rate-monitor.md:13 and failure-count-monitor.md:23. Already raised inline in the prior review, left to author discretion.

What's in good shape across the full PR

  • No typos.
  • All cross-links and anchors resolve (#action, #automatic-labeling-from-monitors, #dry-running-with-labels, #manage-labels).
  • "Classifying monitor" terminology is introduced explicitly on first use.
  • Action subsection structure is consistent between failure-count and failure-rate; pass-on-retry uses a flatter form justified by its single-option asymmetry.
    · branch max/monitor-label-actions ·

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: Claude code review passed.

@trunk-staging-io trunk-staging-io Bot merged commit 7546810 into main May 14, 2026
5 of 7 checks passed
@trunk-staging-io trunk-staging-io Bot deleted the max/monitor-label-actions branch May 14, 2026 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants