Document automatic label application from monitors#656
Conversation
|
😎 Merged successfully - details. |
|
Claude finished @max-trunk's task in 1m 59s —— View job Claude PR Review
Verdict: ✅ Looks good to merge. Nicely scoped, internally consistent, cross-links all resolve, and the new Inline comments posted
All three are optional polish, not blockers. One pre-existing issue worth surfacing
What's strong about this PR
|
|
Claude finished @max-trunk's task in 1m 28s —— View job Claude PR Review (re-review after
|
| Previous comment | Fix | Status |
|---|---|---|
README.md:13 — define "classifying monitor" on first use |
Added "(referred to below as _classifying monitors_)" parenthetical at first mention. Italic style matches the existing _fails on retry on PR branches_ example in test-labels.md:35. |
✅ |
failure-count-monitor.md:23 — -- vs — style consistency with failure-rate-monitor.md:13 |
Replaced -- with —. The two sentences are now matched templates. |
✅ |
test-labels.md:44 — redundancy between the two sentences |
Dropped the redundant second sentence; line now reads cleanly: "A monitor uses one action or the other, not both." | ✅ |
(Pre-existing, flagged as follow-up) duplicate ## Preview Panel headings in failure-count-monitor.md |
Fixed in this PR — second heading removed, Status Filter + Large Repo Truncation are now ### subsections under a single Preview Panel, and the ## Action section is correctly placed after. |
✅ Bonus |
No new issues
I re-scanned the diff for typos, broken cross-links, dangling refs, and style drift — nothing to flag. All five files are in a clean shippable state.
· branch max/monitor-label-actions
|
Claude finished @max-trunk's task in 2m 2s —— View job Claude PR Review (re-review after
|
| File | Change | Assessment |
|---|---|---|
README.md:27 |
Disable/delete paragraph rewritten to cover both classifying and labeling monitor cases (incl. Remove these labels when the monitor resolves mention) | ✅ Reads cleanly, more accurate |
failure-count-monitor.md |
Removed standalone ## Detection Type and ## Resolution Behavior sections; folded detection types under ### Action › #### Classify test status; merged resolution-behavior prose into ### Resolution Timeout |
✅ Less duplication, Action is now the proper parent concept |
failure-rate-monitor.md |
Same restructure: removed ## Detection Type and ## Resolution Behavior; Action moved under Configuration with detection types as sub-bullets |
✅ Consistent with failure-count |
pass-on-retry-monitor.md |
Added Action row to the config table; ## Action → ### Action with bullets for the two options |
✅ Now matches the other monitors' structure |
Notes (all non-blocking)
- failure-count-monitor.md L31 — The example config table lists
Detection type \| Brokenbut noActionrow. With the restructure, "Detection type" is now logically a sub-attribute of "Action: Classify test status". AddingAction \| Classify test statusabove the detection-type row would make the table match the new conceptual hierarchy. Tiny nit — fine either way since Classify is the default. - Action subsection style differs across monitors —
failure-countandfailure-rateuse#### Classify test status/#### Apply labelssubheadings;pass-on-retryuses bullet items inside### Action. Justified by content asymmetry (pass-on-retry's options are one-liners, the others have detection-type detail), so I'd leave as-is — flagging only so you know it's a deliberate choice.
Pre-existing follow-up (not from this PR)
failure-count-monitor.md L128 and L130 in the Choosing Between Monitors table reference **Threshold** (the old name for what is now the failure rate monitor). This predates this PR — it's on main already — but it's the only remaining spot where the old name leaks through. Fix in a follow-up →
Verdict on this PR
Five files are in a clean, internally consistent, shippable state. No typos, no broken cross-links or anchors (#action resolves correctly in all three monitor files). Ready to merge.
· branch max/monitor-label-actions
|
Claude finished @max-trunk's task in 2m 6s —— View job Claude PR Review (re-review on
|
| ## How It Works | ||
|
|
||
| The monitor continuously scans your test runs looking for commits where a test has both a failure and a success. When it finds one, the test is flagged as flaky. | ||
| The monitor continuously scans your test runs looking for commits where a test has both a failure and a success. When it finds one, the monitor activates on that test and runs its configured [action](#action): by default, the test is flagged as flaky. |
There was a problem hiding this comment.
Minor style consistency: this sentence introduces the "by default" clause with a colon (:), but the parallel sentences in the other two monitor docs use an em dash (—):
failure-rate-monitor.md:13— "…runs its configured action — by default, flagging the test as flaky or broken."failure-count-monitor.md:23— "…runs its configured action — by default, flagging the test as flaky or broken."
Now that all three "How It Works" intros are written as the same template, aligning punctuation (and ideally construction — flagging the test as flaky vs the test is flagged as flaky) would make them read as siblings.
| The monitor continuously scans your test runs looking for commits where a test has both a failure and a success. When it finds one, the monitor activates on that test and runs its configured [action](#action): by default, the test is flagged as flaky. | |
| The monitor continuously scans your test runs looking for commits where a test has both a failure and a success. When it finds one, the monitor activates on that test and runs its configured [action](#action) — by default, flagging the test as flaky. |
Non-blocking polish.
TylerJang27
left a comment
There was a problem hiding this comment.
Love how explicit we're being here
There was a problem hiding this comment.
We should add a section here or to the "how monitors work" detection/README.md on dry-running classify monitors with labels as a supported use case
|
Claude finished @max-trunk's task in 1m 30s —— View job Claude PR Review (re-review after
|
| File | Change | Assessment |
|---|---|---|
README.md (+13) |
New ## Dry-Running with Labels section between "Monitor Types" and "Branch-Aware Detection" |
✅ Logical placement — it's a workflow-level guide that pairs naturally with the monitor-types overview |
test-labels.md (+1/-1) |
Appended a sentence to the "Automatic labeling from monitors" intro pointing at the new section | ✅ Good cross-reference; keeps the dry-run framing discoverable from the management side |
What works well in the new section
- The four-step numbered flow (create with Apply labels → observe → refine → switch to Classify) is concrete and easy to follow.
- The example label
would-be-flakyis a nice, self-explanatory placeholder. - The closing paragraph contrasts the Preview Panel (static, configuration-time) against a label dry-run (live, ongoing) — that's exactly the right framing for a reader wondering "why not just use the preview?".
- The cross-link anchor
#dry-running-with-labelsresolves correctly given the## Dry-Running with Labelsheading.
Notes (all non-blocking)
- README.md L45 — the new section uses the noun phrase "labeling monitor" implicitly, in the same way "classifying monitor" was used implicitly before the earlier review pass. The earlier feedback led to defining classifying monitor parenthetically on first use at L13 ("(referred to below as
_classifying monitors_)"). For symmetry, you could do the same the first time "labeling monitor" appears here — e.g. "…by deploying it as a monitor with the Apply labels action (a labeling monitor) first." Self-evident enough to skip if you prefer; flagging only because the analogous treatment exists for the other term. - README.md L52 — "Let the monitor run for a few cycles" — "cycles" isn't defined anywhere else in the doc. A reader familiar with monitors may parse this fine, but "for a few days" or "for a few evaluation runs" is slightly more concrete. Tiny nit.
Pre-existing items still outstanding (not from this PR)
failure-count-monitor.md:31— example config table listsDetection typebut noActionrow after the restructure. Fine either way since Classify is the default. Fix in a follow-up →failure-count-monitor.mdL128 & L130 — "Choosing Between Monitors" table still references**Threshold**(the old name for failure rate). Pre-existing onmain. Fix in a follow-up →pass-on-retry-monitor.md:13— minor punctuation/construction drift fromfailure-rate-monitor.md:13andfailure-count-monitor.md:23. Already raised inline in the prior review, left to author discretion.
What's in good shape across the full PR
- No typos.
- All cross-links and anchors resolve (
#action,#automatic-labeling-from-monitors,#dry-running-with-labels,#manage-labels). - "Classifying monitor" terminology is introduced explicitly on first use.
- Action subsection structure is consistent between
failure-countandfailure-rate;pass-on-retryuses a flatter form justified by its single-option asymmetry.
· branchmax/monitor-label-actions·
Documents automatically labeling tests from monitors & cleans up other tangential areas to be more in line with the new de-coupled monitor / action framework
https://docs.trunk.io/~/revisions/YYsUK1OsdCh4dBJDroBD/