Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 23 additions & 3 deletions flaky-tests/detection/failure-count-monitor.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
description: Detect flaky or broken tests as soon as they accumulate a configured number of failures
description: Detect and classify or label tests as soon as they accumulate a configured number of failures
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frontmatter inconsistency between the two pages.

This page's description was updated to mention labeling ("Detect and classify or label tests…") but failure-rate-monitor.md:2 was left as the original "Detect flaky or broken tests…" and never updated to reflect the new apply-labels action. Either both should mention the labels path or neither should — currently the two sibling pages disagree about whether labeling is part of the monitor's purpose.

Also, the new phrasing "Detect and classify or label tests" parses awkwardly. A cleaner version that mirrors the H1 ("Failure Count Monitor") without the parse-ambiguity:

Suggested change
description: Detect and classify or label tests as soon as they accumulate a configured number of failures
description: Classify or label tests as soon as they accumulate a configured number of failures

---

# Failure Count Monitor
Expand All @@ -18,14 +18,25 @@ Use the failure count monitor when you want immediate visibility into test failu

If you need to detect patterns of intermittent failure over time (e.g., a test that fails 20% of the time), use a [failure rate monitor](failure-rate-monitor.md) instead. If you want to catch tests that fail and then pass on retry within a single commit, [pass-on-retry](pass-on-retry-monitor.md) handles that automatically.

## Action Type

When creating a failure count monitor, choose what action it takes when a test is flagged:

- **Classify test status** — marks the test as flaky or broken. This is the default and integrates with quarantine workflows and status-based filtering.
- **Apply labels** — tags matching tests with one or more labels when the monitor activates. Use this when you want to categorize tests automatically without changing their status. See [Automatic labeling from monitors](../management/test-labels.md#automatic-labeling-from-monitors) for details.
Comment on lines +23 to +26
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor wording mismatch: the intro says "when a test is flagged" but the Apply labels bullet then says "when the monitor activates." The label path doesn't flag tests — it labels them — so "flagged" is misleading for one of the two options. Consider:

Suggested change
When creating a failure count monitor, choose what action it takes when a test is flagged:
- **Classify test status** — marks the test as flaky or broken. This is the default and integrates with quarantine workflows and status-based filtering.
- **Apply labels** — tags matching tests with one or more labels when the monitor activates. Use this when you want to categorize tests automatically without changing their status. See [Automatic labeling from monitors](../management/test-labels.md#automatic-labeling-from-monitors) for details.
When creating a failure count monitor, choose what action it takes when the monitor matches a test:
- **Classify test status** — marks the test as flaky or broken. This is the default and integrates with quarantine workflows and status-based filtering.
- **Apply labels** — tags matching tests with one or more labels when the monitor activates. Use this when you want to categorize tests automatically without changing their status. See [Automatic labeling from monitors](../management/test-labels.md#automatic-labeling-from-monitors) for details.

Same wording exists at failure-rate-monitor.md:13 and could be updated in parallel.


The action type is set at creation and cannot be changed afterward. If you need to switch a monitor's action type, create a new monitor with the desired type and disable the old one.

## Detection Type

Each failure count monitor has a **detection type** -- either **flaky** or **broken** -- which controls what status a test receives when the monitor flags it:
Applies only to monitors with the **Classify test status** action type.

Each classify-action failure count monitor has a **detection type** -- either **flaky** or **broken** -- which controls what status a test receives when the monitor flags it:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"classify-action" reads as a coined term that is never defined.

The previous line already scopes this paragraph to monitors with the Classify test status action type, so the adjective here is redundant and a bit jargony — readers may wonder if "classify-action monitor" is a formal name they should know. The matching line in failure-rate-monitor.md:24 has the same phrasing.

Suggest dropping the qualifier since the section intro already establishes the scope:

Suggested change
Each classify-action failure count monitor has a **detection type** -- either **flaky** or **broken** -- which controls what status a test receives when the monitor flags it:
Each such failure count monitor has a **detection type** -- either **flaky** or **broken** -- which controls what status a test receives when the monitor flags it:

(And similarly on the failure-rate page.)


- **Flaky monitors** are appropriate when failures on the monitored branch are likely non-deterministic. A test that fails once on `main` but passes on retry is probably flaky.
- **Broken monitors** are appropriate when failures indicate a real regression. If a test fails on `main` and you expect it to keep failing until someone fixes it, broken is the right classification.

The detection type is set at creation and cannot be changed afterward. If you need to switch a monitor's type, create a new monitor with the desired type and disable the old one.
The detection type is set at creation and cannot be changed afterward. If you need to switch a monitor's detection type, create a new monitor with the desired type and disable the old one.

## How It Works

Expand All @@ -37,6 +48,7 @@ You configure a failure count monitor with:

| Setting | Value |
|---|---|
| Action type | Classify test status |
| Detection type | Broken |
| Failure count | 1 |
| Window | 30 minutes |
Expand All @@ -56,6 +68,14 @@ If another test, `test_signup`, also failed during that window, it would be flag

## Configuration

### Action Type

Choose **Classify test status** or **Apply labels**. See [Action Type](#action-type) above for details. This cannot be changed after the monitor is created.

### Detection Type

Appears only when the action type is **Classify test status**. Choose **Flaky** or **Broken**. This determines the status a test receives when the monitor flags it. See [Detection Type](#detection-type) above for guidance.

### Failure Count

The number of failures required to trigger detection. The default is **1**, meaning any single failure on a monitored branch flags the test.
Expand Down
23 changes: 19 additions & 4 deletions flaky-tests/detection/failure-rate-monitor.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,27 @@ description: Detect flaky or broken tests based on failure rate over a configura

The failure rate monitor detects tests based on failure rate over a rolling time window. Unlike pass-on-retry, which looks for a specific pattern on a single commit, the failure rate monitor identifies tests that fail too often over a period of time, even if no individual failure looks like a retry.

You can create multiple failure rate monitors with different configurations. This is how you tailor detection to different branches, test volumes, sensitivity levels, and detection types.
You can create multiple failure rate monitors with different configurations. This is how you tailor detection to different branches, test volumes, sensitivity levels, and action types.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the description-frontmatter comment on the failure-count page: this file's description (line 2, outside the diff) still reads "Detect flaky or broken tests based on failure rate over a configurable time window" and was not updated to mention the new apply-labels action. The sibling page's description was updated. For consistency, consider updating this page's description here too, or reverting the failure-count one — they should agree.


## Action Type

When creating a failure rate monitor, choose what action it takes when a test is flagged:

- **Classify test status** — marks the test as flaky or broken. This is the default and integrates with quarantine workflows and status-based filtering.
- **Apply labels** — tags matching tests with one or more labels when the monitor activates. Use this when you want to categorize tests automatically without changing their status. See [Automatic labeling from monitors](../management/test-labels.md#automatic-labeling-from-monitors) for details.

The action type is set at creation and cannot be changed afterward. If you need to switch a monitor's action type, create a new monitor with the desired type and disable the old one.

## Detection Type

Each failure rate monitor has a **detection type** — either **flaky** or **broken** — which controls what status a test receives when the monitor flags it:
Applies only to monitors with the **Classify test status** action type.

Each classify-action failure rate monitor has a **detection type** — either **flaky** or **broken** — which controls what status a test receives when the monitor flags it:

- **Flaky monitors** catch tests that fail intermittently (e.g., 20–50% failure rate). These are typically caused by timing issues, shared state, or non-deterministic behavior.
- **Broken monitors** catch tests that fail consistently at a high rate (e.g., 80%+ failure rate). These usually indicate a real regression — something in the code or environment is genuinely broken and needs a fix.

The detection type is set at creation and cannot be changed afterward. If you need to switch a monitor's type, create a new monitor with the desired type and disable the old one.
The detection type is set at creation and cannot be changed afterward. If you need to switch a monitor's detection type, create a new monitor with the desired type and disable the old one.

This distinction matters because the two problems call for different responses. Flaky tests might be quarantined while you investigate the root cause. Broken tests represent real failures that should be fixed, not hidden.

Expand Down Expand Up @@ -53,9 +64,13 @@ stale timeout, and branch scope. Capture it with realistic example values filled
in (e.g., "Broken on main", Broken detection type, 80% activation, 60% resolution,
6 hour window, 50 min sample, main branch). -->

### Action Type

Choose **Classify test status** or **Apply labels**. See [Action Type](#action-type) above for details. This cannot be changed after the monitor is created.

### Detection Type

Choose **Flaky** or **Broken**. This determines the status a test receives when the monitor flags it. See [Detection Type](#detection-type) above for guidance on which to use.
Appears only when the action type is **Classify test status**. Choose **Flaky** or **Broken**. This determines the status a test receives when the monitor flags it. See [Detection Type](#detection-type) above for guidance on which to use.

### Activation Threshold

Expand Down
Loading