Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
<CardGroup cols={2}>
<Card title="Automated Freshness" href="/features/anomaly-detection/automated-freshness">
Monitors updates to tables and how frequently a table is updated,
and fails if there is an unexpected delay.
Monitors updates to tables and how frequently a table is updated,
and fails if there is an unexpected delay.
</Card>
<Card title="Automated Volume" href="/features/anomaly-detection/automated-volume">
Monitors how many rows were added or removed to a table on each update,
and fails if there is an unexpected drop or spike in rows.
and fails if there is an unexpected drop or spike in rows.
</Card>
<Card title="Pipeline Task Performance" href="/features/anomaly-detection/automated-pipeline-task-performance">
Monitors execution duration of dbt models, seeds, and snapshots, and fails
on unexpected slowdowns or SLA breaches.
</Card>
</CardGroup>
2 changes: 1 addition & 1 deletion docs/_snippets/quickstart-package-install.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Some packages we recommend you check out: [dbt_utils](https://github.com/dbt-lab
```yml packages.yml
packages:
- package: elementary-data/elementary
version: 0.16.1
version: 0.16.3
## Docs: https://docs.elementary-data.com
```
</Step>
Expand Down
104 changes: 104 additions & 0 deletions docs/data-tests/performance-monitoring.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
title: Performance Monitoring (dbt-package)
sidebarTitle: "Performance monitoring"
---

<Info>
**For Elementary Cloud users:** the recommended approach for pipeline performance monitoring is the automated [`pipeline_task_performance`](/features/anomaly-detection/automated-pipeline-task-performance) monitor — see [Performance Alerts](/features/performance-monitoring/performance-alerts). This page documents the manual dbt-test alternative using the Elementary dbt package.
</Info>

## Overview

Elementary's dbt package exposes the `model_run_results` view, which contains run results for every dbt model enriched with model metadata. You can write dbt singular tests against this view to assert performance SLAs directly in your dbt project.

This approach requires no additional infrastructure — it runs as part of your existing dbt test invocations.

## `model_run_results` schema

`model_run_results` is a view that joins `dbt_run_results` with `dbt_models`. Key columns:

| Column | Type | Description |
|---|---|---|
| `unique_id` | string | dbt node unique ID |
| `alias` | string | Model alias (table name) |
| `schema_name` | string | Schema the model is materialized in |
| `execution_time` | float | Execution duration in seconds |
| `status` | string | Run status (`success`, `error`, etc.) |
| `invocation_id` | string | dbt invocation ID |
| `generated_at` | timestamp | When the run result was recorded |

Full schema is documented in [Elementary package models](/dbt/package-models#model_run_results).

## Writing a performance SLA test

Create a singular test that queries `model_run_results` and returns rows where execution time exceeds your threshold. A dbt singular test fails when any rows are returned.

```sql tests/assert_model_performance_sla.sql
-- Fail if any model exceeded its execution time SLA in the last run
with latest_run as (
select max(generated_at) as latest_ts
from {{ ref('model_run_results') }}
),

violations as (
select
r.unique_id,
r.alias,
r.execution_time,
r.generated_at
from {{ ref('model_run_results') }} r
cross join latest_run lr
where r.generated_at >= lr.latest_ts
and r.status = 'success'
and r.execution_time > 600 -- 10 minute SLA
)

select * from violations
```

Adjust the threshold (`600`) per model by parameterising the test or writing per-model singular tests.

## Per-model SLA tests

To enforce different thresholds per model, write a separate singular test for each:

```sql tests/assert_orders_model_sla.sql
with latest_run as (
select max(generated_at) as latest_ts
from {{ ref('model_run_results') }}
)

select
r.unique_id,
r.execution_time
from {{ ref('model_run_results') }} r
cross join latest_run lr
where r.generated_at >= lr.latest_ts
and r.alias = 'orders'
and r.execution_time > 300 -- 5 minute SLA for orders model
```

## Alerts

Failures from these singular tests appear in Elementary's alert pipeline alongside other dbt test failures. Use [alert rules](/features/alerts-and-incidents/alert-rules) and tags/owners to route them to the right channel.

Add a tag to the test to make routing easier:

```yaml tests/schema.yml
version: 2

singular_tests:
- name: assert_orders_model_sla
tags:
- performance_sla
config:
severity: warn
```

## Limitations

- Runs only as part of `dbt test` invocations — not evaluated in real time after each model run.
- Requires manual threshold configuration per model.
- No anomaly detection or baseline learning — purely static thresholds.

For automated, baseline-aware performance monitoring without manual configuration, use the [pipeline_task_performance automated monitor](/features/performance-monitoring/performance-alerts) in Elementary Cloud.
2 changes: 1 addition & 1 deletion docs/dbt/on-run-end_hooks.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ If you only want to disable dbt_columns but leave all other artifacts, configure

```yaml dbt_project.yml
vars:
disable_dbt_columns_autoupload: true
columns_upload_strategy: "none"
```

<Warning>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
---
title: Pipeline Task Performance Monitor
sidebarTitle: "Pipeline task performance"
---

<Snippet file="cloud/cloud-feature-tag.mdx" />

The pipeline task performance monitor tracks the **execution duration** of dbt models, seeds, and snapshots on each run, and alerts when a node takes significantly longer than expected — or exceeds a fixed SLA threshold.

This is an automated monitor: Elementary creates it for every dbt node in your project, with no configuration required to get started.

## Monitoring modes

### Anomaly detection (default)

Elementary uses a z-score seasonal model to learn normal execution duration for each node. The model accounts for time-of-day and day-of-week patterns, so a model that reliably runs longer on Monday mornings will not be flagged during that window.

No configuration is required. Elementary builds the baseline from your pipeline run history automatically.

### Static SLA

Set a fixed duration threshold. The monitor fails whenever the observed execution time exceeds that threshold, regardless of historical norms.

Use static SLA when you have a hard operational or contractual limit (e.g., "this model must finish within 10 minutes").

## Understand the monitor result

The result shows the execution duration of the dbt node for the latest run alongside the historical baseline.

- **Anomaly detection** — data points outside the expected range (grey band) are flagged. The expected range is derived from the seasonal model trained on historical durations.
- **Static SLA** — a horizontal line marks the threshold. Any run that crosses it is a failure.

Use the **Anomaly settings** button to adjust mode, sensitivity, or training period after the monitor is created.

## Anomaly settings

| Setting | Description | Default |
|---|---|---|
| `mode` | `"anomaly"` or `"static"` | `"anomaly"` |
| `sensitivity` | `"low"`, `"medium"`, or `"high"` (anomaly mode only) | `"medium"` |
| `training_period` | Days of history used to train the model (anomaly mode only) | `14` |
| `anomaly_direction` | `"spike"` (duration longer than expected), `"drop"`, or `"both"` (anomaly mode only) | `"spike"` |
| `excluded_time_ranges` | Time ranges excluded from training and detection (anomaly mode only) | `[]` |
| `fixed_threshold` | Max allowed execution duration in seconds (static SLA mode only) | — |

<Snippet file="cloud/features/anomaly-detection/all-anomalies-configuration.mdx" />

## Alerts

Failures raise alerts in the **`pipeline_performance`** alert category, routed through [alert rules](/features/alerts-and-incidents/alert-rules) to all configured integrations (Slack, PagerDuty, Teams, OpsGenie, email).

By default, alerts are not active. To enable them, go to **Setup → Alert Rules** and add **Pipeline performance** to the relevant rule's alert categories.

## Related

- [Performance Alerts](/features/performance-monitoring/performance-alerts) — setup guide and API reference for creating `pipeline_task_performance` monitors
- [Automated monitors overview](/features/anomaly-detection/automated-monitors) — how automated monitors work across freshness, volume, and performance
123 changes: 123 additions & 0 deletions docs/features/performance-monitoring/performance-alerts.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
---
title: Pipeline Performance Alerts
sidebarTitle: "Performance alerts"
---

<Snippet file="cloud/cloud-feature-tag.mdx" />

Elementary Cloud automatically monitors the execution duration of your dbt models, seeds, and snapshots — detecting unexpected slowdowns before they affect your data consumers.

The `pipeline_task_performance` monitor runs after each dbt job execution and compares the observed duration against a learned baseline or a fixed SLA threshold.

## Monitoring modes

### Anomaly detection (default)

The default mode uses a z-score seasonal model to learn normal execution duration for each dbt node. No configuration is required — Elementary automatically builds a baseline from your pipeline history.

The model accounts for time-of-day and day-of-week seasonality. For example, a model that normally runs longer on Monday morning will not be flagged during that window.

### Static SLA

Static SLA mode lets you define a fixed duration threshold. The monitor fails whenever the observed execution time exceeds that threshold, regardless of historical behavior.

Use static SLA when you have a contractual or operational hard limit (e.g., "this model must complete within 10 minutes").

## Creating a monitor

### Add Test wizard

<Note>
The Add Test wizard supports **models only**. To add a `pipeline_task_performance` monitor for seeds or snapshots, use the [API](#api).
</Note>

1. Open the model in Elementary Cloud.
2. Click **Add Test** → select **Pipeline Task Performance**.
3. Choose the mode: **Anomaly detection** or **Static SLA**.
4. Configure parameters (see [Config reference](#config-reference) below).
5. Save.

### API

```http
POST /{env_id}/tests
```

```json
{
"node_id": "<dbt_node_id>",
"test_type": "pipeline_task_performance",
"params": {
"mode": "anomaly",
"sensitivity": "medium"
}
}
```

For static SLA:

```json
{
"node_id": "<dbt_node_id>",
"test_type": "pipeline_task_performance",
"params": {
"mode": "static",
"fixed_threshold": 600
}
}
```

## Config reference

### Anomaly detection mode

| Parameter | Description | Default |
|---|---|---|
| `mode` | `"anomaly"` | `"anomaly"` |
| `sensitivity` | Detection sensitivity: `"low"`, `"medium"`, `"high"` | `"medium"` |
| `training_period` | Number of days used to build the baseline | `14` |
| `anomaly_direction` | `"spike"` (duration longer than expected), `"drop"`, or `"both"` | `"spike"` |
| `excluded_time_ranges` | List of time ranges to exclude from training and detection | `[]` |

### Static SLA mode

| Parameter | Description | Default |
|---|---|---|
| `mode` | `"static"` | — |
| `fixed_threshold` | Maximum allowed execution duration in seconds | Required |

## Alerts and incidents

Failures from `pipeline_task_performance` raise alerts in the **`pipeline_performance`** alert category.

Alerts are routed through [alert rules](/features/alerts-and-incidents/alert-rules) and delivered to all configured integrations:
- [Slack](/cloud/integrations/alerts/slack)
- [Microsoft Teams](/cloud/integrations/alerts/ms-teams)
- [PagerDuty](/cloud/integrations/alerts/pagerduty)
- [OpsGenie](/cloud/integrations/alerts/opsgenie)
- Email

Incidents are created and managed through the standard [incidents workflow](/features/alerts-and-incidents/incidents).

### Activating alerts

By default, `pipeline_task_performance` alerts are not active. To enable them:

1. Go to **Setup → Alert Rules**.
2. Edit the default rule (or create a new one) and add **Pipeline performance** to the alert categories.

<Frame>
<div className="dark:bg-white rounded-md p-1">
<img
src="https://res.cloudinary.com/diuctyblm/image/upload/v1725443568/alert_categories_ikzjqr.png"
alt="Alert categories in alert rules"
width="420"
/>
</div>
</Frame>

## Related

- [Pipeline task performance monitor](/features/anomaly-detection/automated-pipeline-task-performance) — full monitor reference under Automated Monitors
- [Performance monitoring dashboard](/features/performance-monitoring/performance-monitoring) — view execution trends in the UI
- [Manual performance monitoring (dbt-package)](/data-tests/performance-monitoring) — manual dbt-test alternative using `model_run_results`
13 changes: 10 additions & 3 deletions docs/mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,8 @@
"pages": [
"features/anomaly-detection/automated-monitors",
"features/anomaly-detection/automated-freshness",
"features/anomaly-detection/automated-volume"
"features/anomaly-detection/automated-volume",
"features/anomaly-detection/automated-pipeline-task-performance"
]
},
"features/anomaly-detection/metrics",
Expand Down Expand Up @@ -147,7 +148,8 @@
{
"group": "Performance & Cost",
"pages": [
"features/performance-monitoring/performance-monitoring"
"features/performance-monitoring/performance-monitoring",
"features/performance-monitoring/performance-alerts"
]
},
{
Expand Down Expand Up @@ -354,7 +356,8 @@
{
"group": "Other Tests",
"pages": [
"data-tests/python-tests"
"data-tests/python-tests",
"data-tests/performance-monitoring"
]
},
{
Expand Down Expand Up @@ -482,6 +485,10 @@
}
},
"redirects": [
{
"source": "/cloud/features/performance-monitoring/performance-alerts",
"destination": "/features/performance-monitoring/performance-alerts"
},
{
"source": "/features/lineage",
"destination": "/features/data-lineage/lineage"
Expand Down
Loading