Skip to content

[feature not live] docs(merge-queue): add Testing Duration chart and drill-down to metrics page#662

Draft
samgutentag wants to merge 1 commit into
mainfrom
sam-gutentag/mq-testing-duration-metrics-v175
Draft

[feature not live] docs(merge-queue): add Testing Duration chart and drill-down to metrics page#662
samgutentag wants to merge 1 commit into
mainfrom
sam-gutentag/mq-testing-duration-metrics-v175

Conversation

@samgutentag
Copy link
Copy Markdown
Member

Adds documentation for two related Merge Queue metrics features shipped in v175.

Changes

  • Adds a Testing Duration section to the metrics page covering the chart, filters (Outcome, Cycle ended in), and statistical measures
  • Documents drill-down into individual test runs from the Testing Duration chart (trunk2 PR #3936)
  • Updates the "Drill down into metrics" section to include Testing Duration alongside Conclusion count and Time in queue
  • Removes the hint that drill-down was only available on Conclusion count and Time in queue

Context


Generated by Claude Code

…to metrics page

Documents two related features that ship in v175:
- Testing Duration chart (trunk2 PR #3919 + context): shows per-testing-cycle
  durations with Outcome and Cycle ended in filters and statistical measures
- Testing metrics drill-down (trunk2 PR #3936): clicking into a range on the
  Testing Duration chart shows individual PR test runs with duration, outcome,
  and cycle conclusion columns

Also updates the "Drill down into metrics" intro to include Testing Duration
alongside Conclusion count and Time in queue, and removes the hint that said
drill-down was not yet available on additional charts.

https://github.com/trunk-io/trunk2/pull/3936

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@trunk-staging-io
Copy link
Copy Markdown

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 14, 2026

Claude finished @samgutentag's task in 1m 45s —— View job


PR review

  • Read the changed file
  • Analyze the diff for typos, grammar, and formatting
  • Check documentation best practices
  • Post review feedback

Summary

Solid additions overall — the section is well-structured, clearly distinguishes Testing duration from Time in queue, and properly updates the Drill-down section and removes the now-stale hint. A few prose-level issues are worth tightening before merge; none are blocking.

Inline comments posted

  1. Line 110uses separate data bucketing is awkward, and the trailing hover sentence reads as a non-sequitur. Suggested a single combined sentence.
  2. Line 115if its testing cycle restarted mixes present-tense conditional with past-tense verb; suggest was restarted.
  3. Line 129preempt is being used as a noun in a list of nouns; should be preemption.
  4. Line 130 — Spelling consistency: this PR adds ~6 instances of Cancelled (British), but the body of the doc primarily uses canceled (US). Pre-existing inconsistency, but worth standardizing.
  5. Line 162 — Parallelism: tests passed, failed, interrupted, or were cancelled mixes voice. Suggested passed, failed, were interrupted, or were cancelled.
  6. Line 163 — Terminology: the filter is Cycle ended in but the drill-down column is Cycle conclusion. Verify against the UI; align if they're the same.

Not flagged inline (FYI)

  • Heading case: existing #### Why Drill Down? and #### Select Data Points use Title Case, while the new #### Filters, #### Statistical measures, #### Drill down into individual test runs use sentence case. Pre-existing inconsistency — not introduced by this PR.
  • Table column-width padding on the new tables doesn't match the padded style used for the Time in queue table above. Renders identically; purely cosmetic.
    · Branch

| All Outcomes | Include all testing cycles (default) |
| Passed | Cycles where tests passed |
| Failed | Cycles where tests failed |
| Interrupted | Test runs cut short by a restart, preempt, or base-branch change |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar: preempt is being used as a noun here, but it's a verb. Use preemption (or preemption event) for parallelism with the other nouns in the list.

Suggested change
| Interrupted | Test runs cut short by a restart, preempt, or base-branch change |
| Interrupted | Test runs cut short by a restart, preemption, or base-branch change |


* **PR number** — links directly to the pull request on GitHub.
* **Testing duration** — how long that PR's testing cycle took.
* **Outcome** — whether tests passed, failed, interrupted, or were cancelled.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallelism: the list mixes active and passive voice — tests passed, failed, interrupted, were cancelled. interrupted reads awkwardly as a passive without were. Restructure for consistent voice:

Suggested change
* **Outcome** — whether tests passed, failed, interrupted, or were cancelled.
* **Outcome** — whether tests passed, failed, were interrupted, or were cancelled.


### Testing duration

Testing duration shows how long each PR spends in the Testing state within the Merge Queue — measured from when testing begins to when the testing cycle reaches its final outcome. Unlike the Conclusion count and Time in queue charts, testing duration uses separate data bucketing. Hovering over a data point does not highlight corresponding points on the other charts.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarity: uses separate data bucketing is awkward and a bit jargony, and the closing sentence about hover behavior is a non-sequitur after it. Consider rephrasing so the reader knows why this matters — that the time buckets don't sync up with the other charts:

Suggested change
Testing duration shows how long each PR spends in the Testing state within the Merge Queue — measured from when testing begins to when the testing cycle reaches its final outcome. Unlike the Conclusion count and Time in queue charts, testing duration uses separate data bucketing. Hovering over a data point does not highlight corresponding points on the other charts.
Testing duration shows how long each PR spends in the Testing state within the Merge Queue — measured from when testing begins to when the testing cycle reaches its final outcome. Unlike the Conclusion count and Time in queue charts, the Testing duration chart uses its own time buckets, so hovering over a data point does not highlight corresponding points on the other charts.

This is distinct from [Time in queue](#time-in-queue), which measures total time from queue entry to exit. A PR that waits before testing starts will have a longer time in queue but the same testing duration. Use this chart to understand CI performance specifically, separate from queue wait time.

{% hint style="info" %}
Each data point represents one testing-to-final-state transition. A single PR can contribute multiple data points if its testing cycle restarted.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tense: if its testing cycle restarted mixes a present-tense conditional with a past-tense verb. Either present (restarts) or passive (was restarted) reads more cleanly:

Suggested change
Each data point represents one testing-to-final-state transition. A single PR can contribute multiple data points if its testing cycle restarted.
Each data point represents one testing-to-final-state transition. A single PR can contribute multiple data points if its testing cycle was restarted.

* **PR number** — links directly to the pull request on GitHub.
* **Testing duration** — how long that PR's testing cycle took.
* **Outcome** — whether tests passed, failed, interrupted, or were cancelled.
* **Cycle conclusion** — the PR's overall outcome (Merged, Failed, Cancelled, or In Flight).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terminology: the filter is documented as Cycle ended in (line 132), but the drill-down column here is Cycle conclusion. If those really are two different labels in the UI, this is fine — but worth double-checking, since readers will assume the column shows the value of the filter they just used. If the UI label is the same, align the docs.

| Passed | Cycles where tests passed |
| Failed | Cycles where tests failed |
| Interrupted | Test runs cut short by a restart, preempt, or base-branch change |
| Cancelled | Cycles cancelled mid-test |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling consistency: the new content uses Cancelled / cancelled (British) throughout, but the body of this doc primarily uses canceled / Canceled (US) — see lines 82, 88, 92, 177. Pre-existing inconsistency means this isn't a regression introduced by the PR, but since you're adding ~6 new instances, it might be worth standardizing in this PR (or a follow-up). canceled matches the bulk of the existing prose; cancelled matches the Prometheus metric label.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: Claude code review passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants