|
| 1 | +--- |
| 2 | +name: flashduty-alert |
| 3 | +version: 1.0.0 |
| 4 | +description: "Flashduty alert and alert event investigation: search, filter, and inspect alerts (Layer 1 deduplicated) and raw alert events (Layer 0 signals). Commands: alert list, get, events, timeline, merge; alert-event list. Use when drilling down from incidents to root cause alerts, tracing deduplication history, viewing alert state transitions, merging alerts into incidents, searching global alert events by severity or integration, or analyzing alert noise patterns." |
| 5 | +metadata: |
| 6 | + requires: |
| 7 | + bins: ["flashduty"] |
| 8 | + cliHelp: "flashduty alert --help" |
| 9 | +--- |
| 10 | + |
| 11 | +# flashduty-alert |
| 12 | + |
| 13 | +**CRITICAL** — Before using this skill, read [`../flashduty-shared/SKILL.md`](../flashduty-shared/SKILL.md) for authentication, the 3-layer noise reduction model, global flags, and safety rules. |
| 14 | + |
| 15 | +--- |
| 16 | + |
| 17 | +## Overview |
| 18 | + |
| 19 | +This skill covers **Layer 0 (Alert Events)** and **Layer 1 (Alerts)** of the Flashduty 3-layer noise reduction model. |
| 20 | + |
| 21 | +- **Layer 0 -- Alert Events**: Raw signals pushed by monitoring systems (Prometheus, Zabbix, Datadog, etc.) via an `integration_key`. These are immutable records of every firing/recovery signal received. |
| 22 | +- **Layer 1 -- Alerts**: Deduplicated from Alert Events using `alert_key`. Multiple raw events with the same alert_key collapse into a single alert, incrementing its `EventCnt`. |
| 23 | + |
| 24 | +Use this skill for **investigation** -- drilling down from incidents to their root alert signals. |
| 25 | + |
| 26 | +--- |
| 27 | + |
| 28 | +## Quick Decision |
| 29 | + |
| 30 | +| User wants to... | Command | |
| 31 | +|---|---| |
| 32 | +| Search alerts by severity/status | `alert list --severity Critical --active` | |
| 33 | +| Inspect a specific alert | `alert get <alert_id>` | |
| 34 | +| See raw events behind an alert | `alert events <alert_id>` | |
| 35 | +| View alert state transitions | `alert timeline <alert_id>` | |
| 36 | +| Correlate alerts to an incident | `alert merge <ids> --incident <id>` | |
| 37 | +| Search all raw alert events globally | `alert-event list --since 1h` | |
| 38 | + |
| 39 | +--- |
| 40 | + |
| 41 | +## Commands |
| 42 | + |
| 43 | +### alert list |
| 44 | + |
| 45 | +List alerts with filtering and pagination. |
| 46 | + |
| 47 | +```bash |
| 48 | +flashduty alert list [flags] |
| 49 | +``` |
| 50 | + |
| 51 | +| Flag | Type | Default | Description | |
| 52 | +|------|------|---------|-------------| |
| 53 | +| `--severity` | string | | Filter by severity: `Critical`, `Warning`, `Info` | |
| 54 | +| `--active` | bool | false | Show active alerts only | |
| 55 | +| `--recovered` | bool | false | Show recovered alerts only | |
| 56 | +| `--channel` | string | | Comma-separated channel IDs | |
| 57 | +| `--muted` | bool | false | Show ever-muted alerts only | |
| 58 | +| `--title` | string | | Search by title keyword | |
| 59 | +| `--since` | string | `24h` | Start time (duration or absolute) | |
| 60 | +| `--until` | string | `now` | End time (duration or absolute) | |
| 61 | +| `--limit` | int | 20 | Max results per page | |
| 62 | +| `--page` | int | 1 | Page number | |
| 63 | + |
| 64 | +**Constraint**: `--active` and `--recovered` are mutually exclusive. Specifying both produces an error. |
| 65 | + |
| 66 | +Output columns: ID, TITLE, SEVERITY, STATUS, EVENTS, CHANNEL, STARTED. |
| 67 | + |
| 68 | +Examples: |
| 69 | +```bash |
| 70 | +# Active critical alerts in the last 24 hours |
| 71 | +flashduty alert list --severity Critical --active |
| 72 | + |
| 73 | +# Warnings from a specific channel in the last 6 hours |
| 74 | +flashduty alert list --severity Warning --channel 12345 --since 6h |
| 75 | + |
| 76 | +# Search by title keyword |
| 77 | +flashduty alert list --title "disk usage" --active |
| 78 | +``` |
| 79 | + |
| 80 | +### alert get |
| 81 | + |
| 82 | +Show full detail for a single alert. |
| 83 | + |
| 84 | +```bash |
| 85 | +flashduty alert get <alert_id> |
| 86 | +``` |
| 87 | + |
| 88 | +Displays a vertical detail view including: ID, Title, Severity, Status, Alert Key, Channel, Integration (name and type), Event Count, Start/Last/End times, Muted status, linked Incident (ID and progress), Labels, and Description. |
| 89 | + |
| 90 | +### alert events |
| 91 | + |
| 92 | +List all raw alert events (Layer 0) that were deduplicated into a specific alert. |
| 93 | + |
| 94 | +```bash |
| 95 | +flashduty alert events <alert_id> |
| 96 | +``` |
| 97 | + |
| 98 | +Output columns: EVENT_ID, SEVERITY, STATUS, TIME, TITLE. |
| 99 | + |
| 100 | +This shows the **dedup history** for one alert -- how many raw signals were collapsed into it. Use this to understand event volume and timing for a single alert. |
| 101 | + |
| 102 | +### alert timeline |
| 103 | + |
| 104 | +View the timeline/feed for a specific alert, showing state transitions and operator actions. |
| 105 | + |
| 106 | +```bash |
| 107 | +flashduty alert timeline <alert_id> [flags] |
| 108 | +``` |
| 109 | + |
| 110 | +| Flag | Type | Default | Description | |
| 111 | +|------|------|---------|-------------| |
| 112 | +| `--limit` | int | 20 | Max timeline events | |
| 113 | +| `--page` | int | 1 | Page number | |
| 114 | + |
| 115 | +Output columns: TIME, TYPE, OPERATOR, DETAIL. Operator names are enriched (resolved to person names). |
| 116 | + |
| 117 | +### alert merge |
| 118 | + |
| 119 | +Merge one or more alerts into an existing incident. **This operation is IRREVERSIBLE.** |
| 120 | + |
| 121 | +```bash |
| 122 | +flashduty alert merge <alert_id> [<alert_id2> ...] --incident <incident_id> [--comment <text>] |
| 123 | +``` |
| 124 | + |
| 125 | +| Flag | Type | Required | Description | |
| 126 | +|------|------|----------|-------------| |
| 127 | +| `--incident` | string | Yes | Target incident ID | |
| 128 | +| `--comment` | string | No | Merge comment | |
| 129 | + |
| 130 | +Example: |
| 131 | +```bash |
| 132 | +flashduty alert merge abc123 def456 --incident inc789 --comment "Related disk alerts" |
| 133 | +``` |
| 134 | + |
| 135 | +### alert-event list (global) |
| 136 | + |
| 137 | +Search across ALL alert events globally (Layer 0). This is a separate top-level command, not a subcommand of `alert`. |
| 138 | + |
| 139 | +```bash |
| 140 | +flashduty alert-event list [flags] |
| 141 | +``` |
| 142 | + |
| 143 | +| Flag | Type | Default | Description | |
| 144 | +|------|------|---------|-------------| |
| 145 | +| `--severity` | string | | Filter by severity: `Critical`, `Warning`, `Info` (comma-separated) | |
| 146 | +| `--channel` | string | | Comma-separated channel IDs | |
| 147 | +| `--integration-type` | string | | Comma-separated integration types | |
| 148 | +| `--since` | string | `1h` | Start time (duration or absolute) | |
| 149 | +| `--until` | string | `now` | End time (duration or absolute) | |
| 150 | +| `--limit` | int | 20 | Max results per page | |
| 151 | +| `--page` | int | 1 | Page number | |
| 152 | + |
| 153 | +Output columns: EVENT_ID, ALERT_ID, SEVERITY, STATUS, TIME, TITLE. |
| 154 | + |
| 155 | +**Important**: The default time window is `1h`, which is shorter than `alert list`'s default of `24h`. This is intentional because raw event volume can be very high. |
| 156 | + |
| 157 | +Example: |
| 158 | +```bash |
| 159 | +# All critical events in the last hour |
| 160 | +flashduty alert-event list --severity Critical |
| 161 | + |
| 162 | +# Events from a specific integration type in the last 30 minutes |
| 163 | +flashduty alert-event list --integration-type Prometheus --since 30m |
| 164 | + |
| 165 | +# Events from multiple severity levels |
| 166 | +flashduty alert-event list --severity Critical,Warning --since 2h |
| 167 | +``` |
| 168 | + |
| 169 | +--- |
| 170 | + |
| 171 | +## Workflows |
| 172 | + |
| 173 | +### Workflow 1: Investigate an Incident's Root Cause |
| 174 | + |
| 175 | +Drill down from an incident through its contributing alerts to the raw signals. |
| 176 | + |
| 177 | +```bash |
| 178 | +# 1. See all alerts contributing to this incident |
| 179 | +flashduty incident alerts <incident_id> |
| 180 | + |
| 181 | +# 2. Pick a suspicious alert and view its full detail |
| 182 | +flashduty alert get <alert_id> |
| 183 | + |
| 184 | +# 3. Trace the raw events that were deduplicated into this alert |
| 185 | +flashduty alert events <alert_id> |
| 186 | + |
| 187 | +# 4. View the alert's state transition history |
| 188 | +flashduty alert timeline <alert_id> |
| 189 | +``` |
| 190 | + |
| 191 | +### Workflow 2: Find Noisy Alert Sources |
| 192 | + |
| 193 | +Identify which alerts or integrations are generating the most noise. |
| 194 | + |
| 195 | +```bash |
| 196 | +# 1. Find active warnings in the last 24 hours |
| 197 | +flashduty alert list --since 24h --active --severity Warning |
| 198 | + |
| 199 | +# 2. Check recent critical event volume (raw Layer 0 signals) |
| 200 | +flashduty alert-event list --since 1h --severity Critical |
| 201 | + |
| 202 | +# 3. For aggregate analysis, use the insight command (see flashduty-insight skill) |
| 203 | +flashduty insight top-alerts --label integration_name |
| 204 | +``` |
| 205 | + |
| 206 | +### Workflow 3: Manually Correlate Alerts to an Incident |
| 207 | + |
| 208 | +Find related alerts and merge them into a single incident for unified response. |
| 209 | + |
| 210 | +```bash |
| 211 | +# 1. Find alerts matching a pattern |
| 212 | +flashduty alert list --title "disk" --active |
| 213 | + |
| 214 | +# 2. Merge selected alerts into an existing incident (IRREVERSIBLE) |
| 215 | +flashduty alert merge <alert_id1> <alert_id2> --incident <incident_id> --comment "Related disk alerts" |
| 216 | +``` |
| 217 | + |
| 218 | +--- |
| 219 | + |
| 220 | +## Key Concepts |
| 221 | + |
| 222 | +### alert events vs alert-event list |
| 223 | + |
| 224 | +These are different commands with different scopes: |
| 225 | + |
| 226 | +| | `alert events <alert_id>` | `alert-event list` | |
| 227 | +|---|---|---| |
| 228 | +| Scope | Events for ONE specific alert | ALL events globally | |
| 229 | +| Purpose | Dedup history of a single alert | Global raw signal search | |
| 230 | +| Filters | None (alert_id is the filter) | Severity, channel, integration type, time | |
| 231 | +| Default window | N/A (all events for the alert) | 1 hour | |
| 232 | +| Use case | "How many raw events hit this alert?" | "What raw signals arrived recently?" | |
| 233 | + |
| 234 | +### Alert States |
| 235 | + |
| 236 | +- **Active**: The alert is currently firing. No recovery signal has been received. |
| 237 | +- **Recovered**: A recovery signal was received (or the alert was manually resolved). |
| 238 | +- The `--active` and `--recovered` flags on `alert list` are mutually exclusive boolean filters. Omitting both returns all alerts regardless of state. |
| 239 | + |
| 240 | +### Muted Status |
| 241 | + |
| 242 | +- `EverMuted` indicates whether the alert was muted at any point during its lifecycle (via noise reduction rules or manual muting). |
| 243 | +- The `--muted` flag on `alert list` filters to alerts that have been muted at least once. |
| 244 | + |
| 245 | +### Deduplication via alert_key |
| 246 | + |
| 247 | +Multiple raw alert events with the same `alert_key` within a channel are deduplicated into a single alert. The alert's `EventCnt` reflects how many raw events were collapsed. Use `alert events <alert_id>` to see the individual raw events. |
| 248 | + |
| 249 | +--- |
| 250 | + |
| 251 | +## Safety Notes |
| 252 | + |
| 253 | +- **`alert merge` is IRREVERSIBLE.** Once alerts are merged into an incident, they cannot be separated. Always confirm the target incident ID before merging. |
| 254 | +- **`alert-event list` defaults to a 1-hour window**, which is shorter than other commands' 24-hour default. This is by design due to potentially high raw event volume. Widen the window explicitly with `--since` if needed, but be aware of large result sets. |
| 255 | + |
| 256 | +--- |
| 257 | + |
| 258 | +## Cross-References |
| 259 | + |
| 260 | +| Relation | Skill | Purpose | |
| 261 | +|----------|-------|---------| |
| 262 | +| Prerequisites | `flashduty-shared` | Authentication, configuration, shared flags | |
| 263 | +| Parent layer | `flashduty-incident` | Incidents contain alerts (Layer 2) | |
| 264 | +| Analytics | `flashduty-insight` | Alert noise analytics, top-alerts aggregation | |
| 265 | +| Rules | `flashduty-channel` | Noise reduction rules, aggregation configuration | |
0 commit comments