diff --git a/docs/_snippets/cloud/features/anomaly-detection/automated-monitors-cards.mdx b/docs/_snippets/cloud/features/anomaly-detection/automated-monitors-cards.mdx index 578c45146..8bf65ca6b 100644 --- a/docs/_snippets/cloud/features/anomaly-detection/automated-monitors-cards.mdx +++ b/docs/_snippets/cloud/features/anomaly-detection/automated-monitors-cards.mdx @@ -1,10 +1,14 @@ - Monitors updates to tables and how frequently a table is updated, - and fails if there is an unexpected delay. + Monitors updates to tables and how frequently a table is updated, + and fails if there is an unexpected delay. Monitors how many rows were added or removed to a table on each update, - and fails if there is an unexpected drop or spike in rows. + and fails if there is an unexpected drop or spike in rows. + + + Monitors execution duration of dbt models, seeds, and snapshots, and fails + on unexpected slowdowns or SLA breaches. \ No newline at end of file diff --git a/docs/_snippets/quickstart-package-install.mdx b/docs/_snippets/quickstart-package-install.mdx index 654cbbb60..666d3bf66 100644 --- a/docs/_snippets/quickstart-package-install.mdx +++ b/docs/_snippets/quickstart-package-install.mdx @@ -39,7 +39,7 @@ Some packages we recommend you check out: [dbt_utils](https://github.com/dbt-lab ```yml packages.yml packages: - package: elementary-data/elementary - version: 0.16.1 + version: 0.16.3 ## Docs: https://docs.elementary-data.com ``` diff --git a/docs/data-tests/performance-monitoring.mdx b/docs/data-tests/performance-monitoring.mdx new file mode 100644 index 000000000..5874a61bd --- /dev/null +++ b/docs/data-tests/performance-monitoring.mdx @@ -0,0 +1,104 @@ +--- +title: Performance Monitoring (dbt-package) +sidebarTitle: "Performance monitoring" +--- + + +**For Elementary Cloud users:** the recommended approach for pipeline performance monitoring is the automated [`pipeline_task_performance`](/features/anomaly-detection/automated-pipeline-task-performance) monitor — see [Performance Alerts](/features/performance-monitoring/performance-alerts). This page documents the manual dbt-test alternative using the Elementary dbt package. + + +## Overview + +Elementary's dbt package exposes the `model_run_results` view, which contains run results for every dbt model enriched with model metadata. You can write dbt singular tests against this view to assert performance SLAs directly in your dbt project. + +This approach requires no additional infrastructure — it runs as part of your existing dbt test invocations. + +## `model_run_results` schema + +`model_run_results` is a view that joins `dbt_run_results` with `dbt_models`. Key columns: + +| Column | Type | Description | +|---|---|---| +| `unique_id` | string | dbt node unique ID | +| `alias` | string | Model alias (table name) | +| `schema_name` | string | Schema the model is materialized in | +| `execution_time` | float | Execution duration in seconds | +| `status` | string | Run status (`success`, `error`, etc.) | +| `invocation_id` | string | dbt invocation ID | +| `generated_at` | timestamp | When the run result was recorded | + +Full schema is documented in [Elementary package models](/dbt/package-models#model_run_results). + +## Writing a performance SLA test + +Create a singular test that queries `model_run_results` and returns rows where execution time exceeds your threshold. A dbt singular test fails when any rows are returned. + +```sql tests/assert_model_performance_sla.sql +-- Fail if any model exceeded its execution time SLA in the last run +with latest_run as ( + select max(generated_at) as latest_ts + from {{ ref('model_run_results') }} +), + +violations as ( + select + r.unique_id, + r.alias, + r.execution_time, + r.generated_at + from {{ ref('model_run_results') }} r + cross join latest_run lr + where r.generated_at >= lr.latest_ts + and r.status = 'success' + and r.execution_time > 600 -- 10 minute SLA +) + +select * from violations +``` + +Adjust the threshold (`600`) per model by parameterising the test or writing per-model singular tests. + +## Per-model SLA tests + +To enforce different thresholds per model, write a separate singular test for each: + +```sql tests/assert_orders_model_sla.sql +with latest_run as ( + select max(generated_at) as latest_ts + from {{ ref('model_run_results') }} +) + +select + r.unique_id, + r.execution_time +from {{ ref('model_run_results') }} r +cross join latest_run lr +where r.generated_at >= lr.latest_ts + and r.alias = 'orders' + and r.execution_time > 300 -- 5 minute SLA for orders model +``` + +## Alerts + +Failures from these singular tests appear in Elementary's alert pipeline alongside other dbt test failures. Use [alert rules](/features/alerts-and-incidents/alert-rules) and tags/owners to route them to the right channel. + +Add a tag to the test to make routing easier: + +```yaml tests/schema.yml +version: 2 + +singular_tests: + - name: assert_orders_model_sla + tags: + - performance_sla + config: + severity: warn +``` + +## Limitations + +- Runs only as part of `dbt test` invocations — not evaluated in real time after each model run. +- Requires manual threshold configuration per model. +- No anomaly detection or baseline learning — purely static thresholds. + +For automated, baseline-aware performance monitoring without manual configuration, use the [pipeline_task_performance automated monitor](/features/performance-monitoring/performance-alerts) in Elementary Cloud. diff --git a/docs/dbt/on-run-end_hooks.mdx b/docs/dbt/on-run-end_hooks.mdx index f72276e03..0ccbb911f 100644 --- a/docs/dbt/on-run-end_hooks.mdx +++ b/docs/dbt/on-run-end_hooks.mdx @@ -96,7 +96,7 @@ If you only want to disable dbt_columns but leave all other artifacts, configure ```yaml dbt_project.yml vars: - disable_dbt_columns_autoupload: true + columns_upload_strategy: "none" ``` diff --git a/docs/features/anomaly-detection/automated-pipeline-task-performance.mdx b/docs/features/anomaly-detection/automated-pipeline-task-performance.mdx new file mode 100644 index 000000000..5c80be3fe --- /dev/null +++ b/docs/features/anomaly-detection/automated-pipeline-task-performance.mdx @@ -0,0 +1,57 @@ +--- +title: Pipeline Task Performance Monitor +sidebarTitle: "Pipeline task performance" +--- + + + +The pipeline task performance monitor tracks the **execution duration** of dbt models, seeds, and snapshots on each run, and alerts when a node takes significantly longer than expected — or exceeds a fixed SLA threshold. + +This is an automated monitor: Elementary creates it for every dbt node in your project, with no configuration required to get started. + +## Monitoring modes + +### Anomaly detection (default) + +Elementary uses a z-score seasonal model to learn normal execution duration for each node. The model accounts for time-of-day and day-of-week patterns, so a model that reliably runs longer on Monday mornings will not be flagged during that window. + +No configuration is required. Elementary builds the baseline from your pipeline run history automatically. + +### Static SLA + +Set a fixed duration threshold. The monitor fails whenever the observed execution time exceeds that threshold, regardless of historical norms. + +Use static SLA when you have a hard operational or contractual limit (e.g., "this model must finish within 10 minutes"). + +## Understand the monitor result + +The result shows the execution duration of the dbt node for the latest run alongside the historical baseline. + +- **Anomaly detection** — data points outside the expected range (grey band) are flagged. The expected range is derived from the seasonal model trained on historical durations. +- **Static SLA** — a horizontal line marks the threshold. Any run that crosses it is a failure. + +Use the **Anomaly settings** button to adjust mode, sensitivity, or training period after the monitor is created. + +## Anomaly settings + +| Setting | Description | Default | +|---|---|---| +| `mode` | `"anomaly"` or `"static"` | `"anomaly"` | +| `sensitivity` | `"low"`, `"medium"`, or `"high"` (anomaly mode only) | `"medium"` | +| `training_period` | Days of history used to train the model (anomaly mode only) | `14` | +| `anomaly_direction` | `"spike"` (duration longer than expected), `"drop"`, or `"both"` (anomaly mode only) | `"spike"` | +| `excluded_time_ranges` | Time ranges excluded from training and detection (anomaly mode only) | `[]` | +| `fixed_threshold` | Max allowed execution duration in seconds (static SLA mode only) | — | + + + +## Alerts + +Failures raise alerts in the **`pipeline_performance`** alert category, routed through [alert rules](/features/alerts-and-incidents/alert-rules) to all configured integrations (Slack, PagerDuty, Teams, OpsGenie, email). + +By default, alerts are not active. To enable them, go to **Setup → Alert Rules** and add **Pipeline performance** to the relevant rule's alert categories. + +## Related + +- [Performance Alerts](/features/performance-monitoring/performance-alerts) — setup guide and API reference for creating `pipeline_task_performance` monitors +- [Automated monitors overview](/features/anomaly-detection/automated-monitors) — how automated monitors work across freshness, volume, and performance diff --git a/docs/features/performance-monitoring/performance-alerts.mdx b/docs/features/performance-monitoring/performance-alerts.mdx new file mode 100644 index 000000000..cd5cfdebd --- /dev/null +++ b/docs/features/performance-monitoring/performance-alerts.mdx @@ -0,0 +1,123 @@ +--- +title: Pipeline Performance Alerts +sidebarTitle: "Performance alerts" +--- + + + +Elementary Cloud automatically monitors the execution duration of your dbt models, seeds, and snapshots — detecting unexpected slowdowns before they affect your data consumers. + +The `pipeline_task_performance` monitor runs after each dbt job execution and compares the observed duration against a learned baseline or a fixed SLA threshold. + +## Monitoring modes + +### Anomaly detection (default) + +The default mode uses a z-score seasonal model to learn normal execution duration for each dbt node. No configuration is required — Elementary automatically builds a baseline from your pipeline history. + +The model accounts for time-of-day and day-of-week seasonality. For example, a model that normally runs longer on Monday morning will not be flagged during that window. + +### Static SLA + +Static SLA mode lets you define a fixed duration threshold. The monitor fails whenever the observed execution time exceeds that threshold, regardless of historical behavior. + +Use static SLA when you have a contractual or operational hard limit (e.g., "this model must complete within 10 minutes"). + +## Creating a monitor + +### Add Test wizard + + + The Add Test wizard supports **models only**. To add a `pipeline_task_performance` monitor for seeds or snapshots, use the [API](#api). + + +1. Open the model in Elementary Cloud. +2. Click **Add Test** → select **Pipeline Task Performance**. +3. Choose the mode: **Anomaly detection** or **Static SLA**. +4. Configure parameters (see [Config reference](#config-reference) below). +5. Save. + +### API + +```http +POST /{env_id}/tests +``` + +```json +{ + "node_id": "", + "test_type": "pipeline_task_performance", + "params": { + "mode": "anomaly", + "sensitivity": "medium" + } +} +``` + +For static SLA: + +```json +{ + "node_id": "", + "test_type": "pipeline_task_performance", + "params": { + "mode": "static", + "fixed_threshold": 600 + } +} +``` + +## Config reference + +### Anomaly detection mode + +| Parameter | Description | Default | +|---|---|---| +| `mode` | `"anomaly"` | `"anomaly"` | +| `sensitivity` | Detection sensitivity: `"low"`, `"medium"`, `"high"` | `"medium"` | +| `training_period` | Number of days used to build the baseline | `14` | +| `anomaly_direction` | `"spike"` (duration longer than expected), `"drop"`, or `"both"` | `"spike"` | +| `excluded_time_ranges` | List of time ranges to exclude from training and detection | `[]` | + +### Static SLA mode + +| Parameter | Description | Default | +|---|---|---| +| `mode` | `"static"` | — | +| `fixed_threshold` | Maximum allowed execution duration in seconds | Required | + +## Alerts and incidents + +Failures from `pipeline_task_performance` raise alerts in the **`pipeline_performance`** alert category. + +Alerts are routed through [alert rules](/features/alerts-and-incidents/alert-rules) and delivered to all configured integrations: +- [Slack](/cloud/integrations/alerts/slack) +- [Microsoft Teams](/cloud/integrations/alerts/ms-teams) +- [PagerDuty](/cloud/integrations/alerts/pagerduty) +- [OpsGenie](/cloud/integrations/alerts/opsgenie) +- Email + +Incidents are created and managed through the standard [incidents workflow](/features/alerts-and-incidents/incidents). + +### Activating alerts + +By default, `pipeline_task_performance` alerts are not active. To enable them: + +1. Go to **Setup → Alert Rules**. +2. Edit the default rule (or create a new one) and add **Pipeline performance** to the alert categories. + + +
+ Alert categories in alert rules +
+ + +## Related + +- [Pipeline task performance monitor](/features/anomaly-detection/automated-pipeline-task-performance) — full monitor reference under Automated Monitors +- [Performance monitoring dashboard](/features/performance-monitoring/performance-monitoring) — view execution trends in the UI +- [Manual performance monitoring (dbt-package)](/data-tests/performance-monitoring) — manual dbt-test alternative using `model_run_results` diff --git a/docs/mint.json b/docs/mint.json index 14893e524..9d134a226 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -97,7 +97,8 @@ "pages": [ "features/anomaly-detection/automated-monitors", "features/anomaly-detection/automated-freshness", - "features/anomaly-detection/automated-volume" + "features/anomaly-detection/automated-volume", + "features/anomaly-detection/automated-pipeline-task-performance" ] }, "features/anomaly-detection/metrics", @@ -147,7 +148,8 @@ { "group": "Performance & Cost", "pages": [ - "features/performance-monitoring/performance-monitoring" + "features/performance-monitoring/performance-monitoring", + "features/performance-monitoring/performance-alerts" ] }, { @@ -354,7 +356,8 @@ { "group": "Other Tests", "pages": [ - "data-tests/python-tests" + "data-tests/python-tests", + "data-tests/performance-monitoring" ] }, { @@ -482,6 +485,10 @@ } }, "redirects": [ + { + "source": "/cloud/features/performance-monitoring/performance-alerts", + "destination": "/features/performance-monitoring/performance-alerts" + }, { "source": "/features/lineage", "destination": "/features/data-lineage/lineage"