Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 60 additions & 5 deletions merge-queue/administration/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,9 +105,68 @@ The time in queue can be displayed as different statistical measures. You can sh
| P95 | The value below 95% of the time in queue falls. |
| P99 | The value below 99% of the time in queue falls. |

### Testing duration

Testing duration shows how long each PR spends in the Testing state within the Merge Queue — measured from when testing begins to when the testing cycle reaches its final outcome. Unlike the Conclusion count and Time in queue charts, testing duration uses separate data bucketing. Hovering over a data point does not highlight corresponding points on the other charts.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarity: uses separate data bucketing is awkward and a bit jargony, and the closing sentence about hover behavior is a non-sequitur after it. Consider rephrasing so the reader knows why this matters — that the time buckets don't sync up with the other charts:

Suggested change
Testing duration shows how long each PR spends in the Testing state within the Merge Queue — measured from when testing begins to when the testing cycle reaches its final outcome. Unlike the Conclusion count and Time in queue charts, testing duration uses separate data bucketing. Hovering over a data point does not highlight corresponding points on the other charts.
Testing duration shows how long each PR spends in the Testing state within the Merge Queue — measured from when testing begins to when the testing cycle reaches its final outcome. Unlike the Conclusion count and Time in queue charts, the Testing duration chart uses its own time buckets, so hovering over a data point does not highlight corresponding points on the other charts.


This is distinct from [Time in queue](#time-in-queue), which measures total time from queue entry to exit. A PR that waits before testing starts will have a longer time in queue but the same testing duration. Use this chart to understand CI performance specifically, separate from queue wait time.

{% hint style="info" %}
Each data point represents one testing-to-final-state transition. A single PR can contribute multiple data points if its testing cycle restarted.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tense: if its testing cycle restarted mixes a present-tense conditional with a past-tense verb. Either present (restarts) or passive (was restarted) reads more cleanly:

Suggested change
Each data point represents one testing-to-final-state transition. A single PR can contribute multiple data points if its testing cycle restarted.
Each data point represents one testing-to-final-state transition. A single PR can contribute multiple data points if its testing cycle was restarted.

{% endhint %}

#### Filters

Two dropdowns let you narrow the data shown in the chart.

**Outcome** filters by how each testing cycle ended:

| Value | Meaning |
| ----- | ------- |
| All Outcomes | Include all testing cycles (default) |
| Passed | Cycles where tests passed |
| Failed | Cycles where tests failed |
| Interrupted | Test runs cut short by a restart, preempt, or base-branch change |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar: preempt is being used as a noun here, but it's a verb. Use preemption (or preemption event) for parallelism with the other nouns in the list.

Suggested change
| Interrupted | Test runs cut short by a restart, preempt, or base-branch change |
| Interrupted | Test runs cut short by a restart, preemption, or base-branch change |

| Cancelled | Cycles cancelled mid-test |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling consistency: the new content uses Cancelled / cancelled (British) throughout, but the body of this doc primarily uses canceled / Canceled (US) — see lines 82, 88, 92, 177. Pre-existing inconsistency means this isn't a regression introduced by the PR, but since you're adding ~6 new instances, it might be worth standardizing in this PR (or a follow-up). canceled matches the bulk of the existing prose; cancelled matches the Prometheus metric label.


**Cycle ended in** filters by how the PR's overall merge cycle resolved:

| Value | Meaning |
| ----- | ------- |
| All | Include all PR cycles (default) |
| Merged | PR was ultimately merged |
| Failed | PR ultimately failed out of the queue |
| Cancelled | PR was cancelled |
| In Flight | PR cycle is still in progress |

Combine the two filters to isolate specific patterns. For example, set **Outcome** to Passed and **Cycle ended in** to Merged to see testing durations for PRs that ultimately merged — giving you a clean baseline for CI speed without noise from cancelled or failed runs.

#### Statistical measures

| Measure | Explanation |
| ------- | ----------- |
| Average | Average testing duration during the time bucket |
| Minimum | The shortest testing duration in the time bucket |
| Maximum | The longest testing duration in the time bucket |
| Sum | The total of all testing durations added together |
| P50 | The value below which 50% of testing durations fall |
| P95 | The value below which 95% of testing durations fall |
| P99 | The value below which 99% of testing durations fall |

#### Drill down into individual test runs

Click and drag on the Testing duration chart to select a time range, then click **View PRs** to see the individual PRs that contributed data points in that window. The drill-down list shows:

* **PR number** — links directly to the pull request on GitHub.
* **Testing duration** — how long that PR's testing cycle took.
* **Outcome** — whether tests passed, failed, interrupted, or were cancelled.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallelism: the list mixes active and passive voice — tests passed, failed, interrupted, were cancelled. interrupted reads awkwardly as a passive without were. Restructure for consistent voice:

Suggested change
* **Outcome** — whether tests passed, failed, interrupted, or were cancelled.
* **Outcome** — whether tests passed, failed, were interrupted, or were cancelled.

* **Cycle conclusion** — the PR's overall outcome (Merged, Failed, Cancelled, or In Flight).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terminology: the filter is documented as Cycle ended in (line 132), but the drill-down column here is Cycle conclusion. If those really are two different labels in the UI, this is fine — but worth double-checking, since readers will assume the column shows the value of the filter they just used. If the UI label is the same, align the docs.


The list is sortable by any column. Use it to identify outlier PRs that dragged P95 or P99 up, or to audit testing times across a specific time window.

### Drill down into metrics

From the **Conclusion count** and **Time in queue** charts, you can drill into any point or window on the graph to see the exact pull requests that made up those numbers.
From the **Conclusion count**, **Time in queue**, and **Testing duration** charts, you can drill into any point or window on the graph to see the exact pull requests that made up those numbers.

#### Why Drill Down?

Expand Down Expand Up @@ -145,10 +204,6 @@ Both columns are sortable, so you can quickly surface the longest-running PRs in

The PR list page shows the selected date range as a subtitle and a **Back to Health** link to return to the charts. If the selection contains more than 2,500 PRs, the list shows the first 2,500 with a notice indicating the total. Narrow the time bucket on the chart to drill into a smaller window.

{% hint style="info" %}
Drill down and range selection are currently available on the Conclusion count and Time in queue charts. Additional Health charts will support the same interactions as they land in the UI.
{% endhint %}

***

### Prometheus metrics endpoint
Expand Down
Loading