-
Notifications
You must be signed in to change notification settings - Fork 13
docs(flaky-tests): document monitor action type selection for failure rate and failure count monitors #655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,5 +1,5 @@ | ||||||||||||||||||
| --- | ||||||||||||||||||
| description: Detect flaky or broken tests as soon as they accumulate a configured number of failures | ||||||||||||||||||
| description: Detect and classify or label tests as soon as they accumulate a configured number of failures | ||||||||||||||||||
| --- | ||||||||||||||||||
|
|
||||||||||||||||||
| # Failure Count Monitor | ||||||||||||||||||
|
|
@@ -18,14 +18,25 @@ Use the failure count monitor when you want immediate visibility into test failu | |||||||||||||||||
|
|
||||||||||||||||||
| If you need to detect patterns of intermittent failure over time (e.g., a test that fails 20% of the time), use a [failure rate monitor](failure-rate-monitor.md) instead. If you want to catch tests that fail and then pass on retry within a single commit, [pass-on-retry](pass-on-retry-monitor.md) handles that automatically. | ||||||||||||||||||
|
|
||||||||||||||||||
| ## Action Type | ||||||||||||||||||
|
|
||||||||||||||||||
| When creating a failure count monitor, choose what action it takes when a test is flagged: | ||||||||||||||||||
|
|
||||||||||||||||||
| - **Classify test status** — marks the test as flaky or broken. This is the default and integrates with quarantine workflows and status-based filtering. | ||||||||||||||||||
| - **Apply labels** — tags matching tests with one or more labels when the monitor activates. Use this when you want to categorize tests automatically without changing their status. See [Automatic labeling from monitors](../management/test-labels.md#automatic-labeling-from-monitors) for details. | ||||||||||||||||||
|
Comment on lines
+23
to
+26
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor wording mismatch: the intro says "when a test is flagged" but the Apply labels bullet then says "when the monitor activates." The label path doesn't flag tests — it labels them — so "flagged" is misleading for one of the two options. Consider:
Suggested change
Same wording exists at |
||||||||||||||||||
|
|
||||||||||||||||||
| The action type is set at creation and cannot be changed afterward. If you need to switch a monitor's action type, create a new monitor with the desired type and disable the old one. | ||||||||||||||||||
|
|
||||||||||||||||||
| ## Detection Type | ||||||||||||||||||
|
|
||||||||||||||||||
| Each failure count monitor has a **detection type** -- either **flaky** or **broken** -- which controls what status a test receives when the monitor flags it: | ||||||||||||||||||
| Applies only to monitors with the **Classify test status** action type. | ||||||||||||||||||
|
|
||||||||||||||||||
| Each classify-action failure count monitor has a **detection type** -- either **flaky** or **broken** -- which controls what status a test receives when the monitor flags it: | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "classify-action" reads as a coined term that is never defined. The previous line already scopes this paragraph to monitors with the Classify test status action type, so the adjective here is redundant and a bit jargony — readers may wonder if "classify-action monitor" is a formal name they should know. The matching line in Suggest dropping the qualifier since the section intro already establishes the scope:
Suggested change
(And similarly on the failure-rate page.) |
||||||||||||||||||
|
|
||||||||||||||||||
| - **Flaky monitors** are appropriate when failures on the monitored branch are likely non-deterministic. A test that fails once on `main` but passes on retry is probably flaky. | ||||||||||||||||||
| - **Broken monitors** are appropriate when failures indicate a real regression. If a test fails on `main` and you expect it to keep failing until someone fixes it, broken is the right classification. | ||||||||||||||||||
|
|
||||||||||||||||||
| The detection type is set at creation and cannot be changed afterward. If you need to switch a monitor's type, create a new monitor with the desired type and disable the old one. | ||||||||||||||||||
| The detection type is set at creation and cannot be changed afterward. If you need to switch a monitor's detection type, create a new monitor with the desired type and disable the old one. | ||||||||||||||||||
|
|
||||||||||||||||||
| ## How It Works | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
@@ -37,6 +48,7 @@ You configure a failure count monitor with: | |||||||||||||||||
|
|
||||||||||||||||||
| | Setting | Value | | ||||||||||||||||||
| |---|---| | ||||||||||||||||||
| | Action type | Classify test status | | ||||||||||||||||||
| | Detection type | Broken | | ||||||||||||||||||
| | Failure count | 1 | | ||||||||||||||||||
| | Window | 30 minutes | | ||||||||||||||||||
|
|
@@ -56,6 +68,14 @@ If another test, `test_signup`, also failed during that window, it would be flag | |||||||||||||||||
|
|
||||||||||||||||||
| ## Configuration | ||||||||||||||||||
|
|
||||||||||||||||||
| ### Action Type | ||||||||||||||||||
|
|
||||||||||||||||||
| Choose **Classify test status** or **Apply labels**. See [Action Type](#action-type) above for details. This cannot be changed after the monitor is created. | ||||||||||||||||||
|
|
||||||||||||||||||
| ### Detection Type | ||||||||||||||||||
|
|
||||||||||||||||||
| Appears only when the action type is **Classify test status**. Choose **Flaky** or **Broken**. This determines the status a test receives when the monitor flags it. See [Detection Type](#detection-type) above for guidance. | ||||||||||||||||||
|
|
||||||||||||||||||
| ### Failure Count | ||||||||||||||||||
|
|
||||||||||||||||||
| The number of failures required to trigger detection. The default is **1**, meaning any single failure on a monitored branch flags the test. | ||||||||||||||||||
|
|
||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,16 +6,27 @@ description: Detect flaky or broken tests based on failure rate over a configura | |
|
|
||
| The failure rate monitor detects tests based on failure rate over a rolling time window. Unlike pass-on-retry, which looks for a specific pattern on a single commit, the failure rate monitor identifies tests that fail too often over a period of time, even if no individual failure looks like a retry. | ||
|
|
||
| You can create multiple failure rate monitors with different configurations. This is how you tailor detection to different branches, test volumes, sensitivity levels, and detection types. | ||
| You can create multiple failure rate monitors with different configurations. This is how you tailor detection to different branches, test volumes, sensitivity levels, and action types. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Related to the description-frontmatter comment on the failure-count page: this file's |
||
|
|
||
| ## Action Type | ||
|
|
||
| When creating a failure rate monitor, choose what action it takes when a test is flagged: | ||
|
|
||
| - **Classify test status** — marks the test as flaky or broken. This is the default and integrates with quarantine workflows and status-based filtering. | ||
| - **Apply labels** — tags matching tests with one or more labels when the monitor activates. Use this when you want to categorize tests automatically without changing their status. See [Automatic labeling from monitors](../management/test-labels.md#automatic-labeling-from-monitors) for details. | ||
|
|
||
| The action type is set at creation and cannot be changed afterward. If you need to switch a monitor's action type, create a new monitor with the desired type and disable the old one. | ||
|
|
||
| ## Detection Type | ||
|
|
||
| Each failure rate monitor has a **detection type** — either **flaky** or **broken** — which controls what status a test receives when the monitor flags it: | ||
| Applies only to monitors with the **Classify test status** action type. | ||
|
|
||
| Each classify-action failure rate monitor has a **detection type** — either **flaky** or **broken** — which controls what status a test receives when the monitor flags it: | ||
|
|
||
| - **Flaky monitors** catch tests that fail intermittently (e.g., 20–50% failure rate). These are typically caused by timing issues, shared state, or non-deterministic behavior. | ||
| - **Broken monitors** catch tests that fail consistently at a high rate (e.g., 80%+ failure rate). These usually indicate a real regression — something in the code or environment is genuinely broken and needs a fix. | ||
|
|
||
| The detection type is set at creation and cannot be changed afterward. If you need to switch a monitor's type, create a new monitor with the desired type and disable the old one. | ||
| The detection type is set at creation and cannot be changed afterward. If you need to switch a monitor's detection type, create a new monitor with the desired type and disable the old one. | ||
|
|
||
| This distinction matters because the two problems call for different responses. Flaky tests might be quarantined while you investigate the root cause. Broken tests represent real failures that should be fixed, not hidden. | ||
|
|
||
|
|
@@ -53,9 +64,13 @@ stale timeout, and branch scope. Capture it with realistic example values filled | |
| in (e.g., "Broken on main", Broken detection type, 80% activation, 60% resolution, | ||
| 6 hour window, 50 min sample, main branch). --> | ||
|
|
||
| ### Action Type | ||
|
|
||
| Choose **Classify test status** or **Apply labels**. See [Action Type](#action-type) above for details. This cannot be changed after the monitor is created. | ||
|
|
||
| ### Detection Type | ||
|
|
||
| Choose **Flaky** or **Broken**. This determines the status a test receives when the monitor flags it. See [Detection Type](#detection-type) above for guidance on which to use. | ||
| Appears only when the action type is **Classify test status**. Choose **Flaky** or **Broken**. This determines the status a test receives when the monitor flags it. See [Detection Type](#detection-type) above for guidance on which to use. | ||
|
|
||
| ### Activation Threshold | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frontmatter inconsistency between the two pages.
This page's
descriptionwas updated to mention labeling ("Detect and classify or label tests…") butfailure-rate-monitor.md:2was left as the original "Detect flaky or broken tests…" and never updated to reflect the new apply-labels action. Either both should mention the labels path or neither should — currently the two sibling pages disagree about whether labeling is part of the monitor's purpose.Also, the new phrasing "Detect and classify or label tests" parses awkwardly. A cleaner version that mirrors the H1 ("Failure Count Monitor") without the parse-ambiguity: