Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
7f24059
Fix n8n metric mappings and add full v2 metric coverage
AAraKKe May 8, 2026
2991295
Add changelog for PR #23635
AAraKKe May 8, 2026
12f3122
Refine n8n metric coverage and e2e setup
AAraKKe May 8, 2026
8523188
Document raw_metric_prefix requirement when customizing N8N_METRICS_P…
AAraKKe May 8, 2026
1be3b3d
Reformat changelog so towncrier renders sub-bullets correctly
AAraKKe May 8, 2026
af60d11
Add tests/lab traffic generator for n8n
AAraKKe May 8, 2026
43e7fc8
Add missing n8n event metric mappings
AAraKKe May 8, 2026
3db752d
Merge branch 'master' into aarakke/fix-n8n-metrics
AAraKKe May 11, 2026
fc4db3d
Add VM-isolated expression engine metrics (n8n 2.x)
AAraKKe May 11, 2026
66e5dc3
Split lab into its own compose, mount workflows by bind
AAraKKe May 11, 2026
7d4e58c
Address PR review feedback
AAraKKe May 11, 2026
8c3703a
Fix e2e test referencing removed drop_rare_event_metrics helper
AAraKKe May 11, 2026
e8dfb08
Proofread n8n README against the Datadog style guide
AAraKKe May 11, 2026
9418e2d
Move workflow setup back into docker_run conditions to fix e2e
AAraKKe May 11, 2026
8ec545d
Address second-round PR review feedback
AAraKKe May 11, 2026
d0b3a90
Address third-round PR review feedback
AAraKKe May 11, 2026
1f407d5
Wait for webhook registration after n8n restart on v2
AAraKKe May 11, 2026
fbe1dd8
Map n8n event-bus dynamic counters
AAraKKe May 12, 2026
2844832
Drop technical hyphen-rejection paragraph from n8n README
AAraKKe May 12, 2026
c5cae88
Tighten n8n changelog to one-line themes
AAraKKe May 12, 2026
a67c26b
Tone down n8n changelog lead-in
AAraKKe May 12, 2026
cc56e02
Reframe n8n changelog from user perspective
AAraKKe May 12, 2026
a0e259b
Treat any 2xx response as ready, bump n8n to a major release
AAraKKe May 13, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 106 additions & 21 deletions n8n/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@

## Overview

This check monitors [n8n][1] through the Datadog Agent.
This check monitors [n8n][1] through the Datadog Agent.

Collect n8n metrics including:
- Cache metrics: Hit and miss statistics.
- Message event bus metrics: Event-related metrics.
- Workflow metrics: Can include workflow ID labels.
- Node metrics: Can include node type labels.
- Credential metrics: Can include credential type labels.
- Queue metrics
- Cache metrics: hit, miss, and update counts.
- Workflow metrics: started, success, failed counters, audit workflow lifecycle counters; in n8n 2.x, an execution-duration histogram.
- Node metrics: per-node started and finished counters emitted by worker processes in queue mode.
- Queue metrics: queue depth; enqueued, dequeued, completed, failed, and stalled counters; and scaling-mode worker gauges.
- HTTP metrics: request duration histograms tagged with status code.
- Process and Node.js runtime metrics.


## Setup
Expand Down Expand Up @@ -40,13 +40,79 @@ N8N_METRICS_INCLUDE_CACHE_METRICS=true
N8N_METRICS_INCLUDE_MESSAGE_EVENT_BUS_METRICS=true
N8N_METRICS_INCLUDE_WORKFLOW_ID_LABEL=true
N8N_METRICS_INCLUDE_API_ENDPOINTS=true
N8N_METRICS_INCLUDE_QUEUE_METRICS=true

# Optional: n8n 2.x adds workflow_statistics gauges (workflows, users, executions, ...) - opt in
N8N_METRICS_INCLUDE_WORKFLOW_STATISTICS=true

# Optional: Customize the metric prefix (default is 'n8n_')
N8N_METRICS_PREFIX=n8n_
```

For more details, see the n8n documentation on [enabling Prometheus metrics][10].

If you change `N8N_METRICS_PREFIX` from its default of `n8n_`, you **must** also set `raw_metric_prefix` in the integration's `conf.yaml` to the same value. Otherwise the check will not recognize the exposed metric names and will silently submit nothing:

```yaml
instances:
- openmetrics_endpoint: http://localhost:5678/metrics
raw_metric_prefix: my_custom_prefix_
```

#### Event-driven counters

Most n8n counters are registered dynamically the first time their underlying event fires. The integration ships mappings for around 70 of these event-bus counters, including:

- Workflow lifecycle: `n8n.workflow.started.count`, `n8n.workflow.success.count`, `n8n.workflow.failed.count`, `n8n.workflow.cancelled.count`
- Audit (workflow, user, credentials, package, variable, execution data): `n8n.audit.workflow.executed.count`, `n8n.audit.user.login.success.count`, `n8n.audit.user.credentials.created.count`, and similar
- AI nodes: `n8n.ai.tool.called.count`, `n8n.ai.llm.generated.count`, `n8n.ai.vector.store.searched.count`, and similar
- Runner, queue, and node lifecycle: `n8n.runner.task.requested.count`, `n8n.queue.job.completed.count`, `n8n.node.started.count`, `n8n.node.finished.count`

These counters do not appear on the `/metrics` endpoint until the corresponding event has occurred. A healthy idle deployment will not produce data points for them until that activity fires. The complete list is in [`metadata.csv`][7].

If a future n8n release exposes a new event-driven counter that is not yet covered by this integration, add it to the `extra_metrics` option in your instance configuration:

```yaml
instances:
- openmetrics_endpoint: http://n8n:5678/metrics
extra_metrics:
- some_new_n8n_event_total: some.new.n8n.event
```

The left-hand side is the Prometheus counter name as n8n exposes it (keep the `_total` suffix). The right-hand side is the dotted Datadog metric name to submit it as.

#### Queue mode and workers

In queue mode, n8n runs separate worker processes that execute jobs picked up from a Redis-backed queue. Each worker exposes its own `/metrics` endpoint and emits a different subset of metrics than the main process. Worker-observed metrics include `n8n.queue.job.dequeued.count`, `n8n.queue.job.stalled.count`, `n8n.node.started.count`, `n8n.node.finished.count`, and `n8n.runner.task.requested.count`. Main-only metrics include `n8n.instance.role.leader` and the `n8n.scaling.mode.queue.jobs.*` family.

To expose worker metrics, set `QUEUE_HEALTH_CHECK_ACTIVE=true` and `QUEUE_HEALTH_CHECK_PORT=<port>` on each worker. **In n8n 2.x, port `5679` is reserved for the task runner broker, so pick a different port (for example `5680`).**

For full coverage in queue deployments, configure one Datadog instance per n8n process exposing `/metrics`, including main and worker processes:

```yaml
instances:
- openmetrics_endpoint: http://n8n-main:5678/metrics
- openmetrics_endpoint: http://n8n-worker:5680/metrics
```

#### Version-specific metrics

Several metric families were introduced in n8n 2.x and are not emitted on n8n 1.x:

- `n8n.workflow.execution.duration.seconds.*` (histogram). Gated by `N8N_METRICS_INCLUDE_WORKFLOW_EXECUTION_DURATION`, which defaults to `true` in n8n 2.x.
- `n8n.audit.workflow.activated.count`, `n8n.audit.workflow.deactivated.count`, `n8n.audit.workflow.executed.count`, `n8n.audit.workflow.resumed.count`, `n8n.audit.workflow.version.updated.count`, and `n8n.audit.workflow.waiting.count`
- `n8n.embed.login.requests.count` (tagged with `result:success` or `result:failure`), `n8n.embed.login.failures.count` (tagged with `reason`)
- `n8n.token.exchange.requests.count` (tagged with `result:success` or `result:failure`), `n8n.token.exchange.failures.count` (tagged with `reason`), `n8n.token.exchange.identity.linked.count`, `n8n.token.exchange.jit.provisioning.count`
- `n8n.process.pss.bytes` (Linux only)
- The `n8n.{production,manual,production.root}.executions`, `n8n.users.total`, `n8n.enabled.users`, `n8n.workflows.total`, and `n8n.credentials.total` family. Only emitted when `N8N_METRICS_INCLUDE_WORKFLOW_STATISTICS=true` is set.
- The `n8n.expression.*` family (`evaluation.duration.seconds`, `code.cache.{hit,miss,eviction,size}`, `pool.{acquired,replenish.failed,scaled.up,scaled.to.zero}`). Only emitted when n8n is running the new VM-isolated expression engine *and* observability for it is on. Set `N8N_EXPRESSION_ENGINE=vm` and `N8N_EXPRESSION_ENGINE_OBSERVABILITY_ENABLED=true` on the n8n process; both default to off (the engine defaults to `legacy`). These metrics surface the per-expression evaluation latency, the compiled-expression LRU cache hit and miss rates, and the V8-isolate pool's idle scaling behavior. They are most useful for troubleshooting workflow latency that traces back to slow `{{ ... }}` evaluation.

Some metrics only emit samples after the corresponding runtime event occurs. For example, failures-only counters (`*.failures.count`) need an authentication failure, audit workflow counters need the matching workflow state transition, and the libuv `n8n.nodejs.active.requests` gauge needs an in-flight libuv request. A healthy idle deployment may not produce data points for these metrics until that activity occurs.

#### Tag cardinality

When `N8N_METRICS_INCLUDE_WORKFLOW_ID_LABEL=true`, http and workflow execution histograms are tagged with `workflow_id` (and similar labels for nodes). On deployments with many distinct workflows or nodes, this can produce high-cardinality metrics. Drop the label via `exclude_labels` or omit `N8N_METRICS_INCLUDE_WORKFLOW_ID_LABEL` to keep tag cardinality bounded.

#### Configure the Datadog Agent

1. Edit the `n8n.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory to start collecting your n8n performance data. See the [sample n8n.d/conf.yaml][4] for all available configuration options.
Expand All @@ -59,27 +125,32 @@ _Available for Agent versions >6.0_

#### Enable n8n logging

Configure n8n to output logs by setting the following environment variables:
Configure n8n application logs by setting the following environment variables:

```bash
# Set the log level (error, warn, info, debug)
N8N_LOG_LEVEL=info

# Output logs to console (for containerized environments) or file
# Output application logs to console or file
N8N_LOG_OUTPUT=console

# If using file output, specify the log file location
# Use JSON formatting so Datadog can parse n8n application log attributes
N8N_LOG_FORMAT=json

# If using file output, specify the application log file location
N8N_LOG_FILE_LOCATION=/var/log/n8n/n8n.log
```

#### Structured event logs

n8n can output structured JSON logs to `n8nEventLog.log` containing detailed workflow execution events. Enable this by setting the log output to file:
n8n also writes structured event bus logs to `n8nEventLog*.log`. These logs contain workflow, node, queue, runner, and audit events and are separate from the application logs controlled by `N8N_LOG_OUTPUT` and `N8N_LOG_FILE_LOCATION`.

```bash
N8N_LOG_OUTPUT=file
N8N_LOG_FILE_LOCATION=/var/log/n8n/
```
By default, event bus log files are written under the n8n user folder, for example:

- Host installations: `~/.n8n/n8nEventLog*.log`
- Official Docker image: `/home/node/.n8n/n8nEventLog*.log`

If you use a custom n8n user folder, collect the event bus logs from that folder instead. If you customize the event bus log file base name with `N8N_EVENTBUS_LOGWRITER_LOGBASENAME`, update the Datadog log path to match.

The event log includes the following event types:

Expand All @@ -102,32 +173,46 @@ Each event contains rich metadata including `executionId`, `workflowId`, `workfl
logs_enabled: true
```

2. Add this configuration block to your `n8n.d/conf.yaml` file to start collecting your n8n logs:
2. Add log collection entries to your `n8n.d/conf.yaml` file.

For a host-based n8n installation where the Agent can read local files, collect the application log file and the event bus log files:

```yaml
logs:
- type: file
path: /var/log/n8n/*.log
source: n8n
service: n8n
service: <SERVICE>
- type: file
path: /home/n8n/.n8n/n8nEventLog*.log
source: n8n
service: <SERVICE>
```

For containerized environments using Docker, use the following configuration instead:
Adjust `/home/n8n/.n8n/n8nEventLog*.log` to the n8n user folder on your host.

For a containerized n8n deployment, collect stdout and stderr from the n8n container for application logs, and make the n8n user folder available to the Agent for event bus file logs. For example, if the n8n data directory is mounted on the host at `/var/lib/n8n`, configure:

```yaml
logs:
- type: docker
source: n8n
service: n8n
service: <SERVICE>
- type: file
path: /var/lib/n8n/n8nEventLog*.log
source: n8n
service: <SERVICE>
```

If the Agent runs in a container, mount the n8n data volume or host directory into the Agent container and use the path as seen from inside the Agent container.

3. [Restart the Agent][5].

### Validation

[Run the Agent's status subcommand][6] and look for `n8n` under the Checks section.

## Data Collected
## Data collected
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Data collected
## Data Collected

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a fix, docs guidelines say that the capitalization here should be as the one I modified it to. I got a similar comment in my READMEs for KrakenD. Unless it has been modified and the docs are not up to date. Docs

The content support skill still mentions this guideline. Skill

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we'd need to fix in the template and for all integrations. This is what all the integrations readme as following, and that's the way it's displayed in the public integration docs. Not sure if we do some preprocessing on the docs side (can take a look later)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we'd need to fix in the template and for all integrations.

We do, until this I didn't realize the template was doing this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, so I guess we could handle it separately.


### Metrics

Expand All @@ -137,7 +222,7 @@ See [metadata.csv][7] for a list of metrics provided by this integration.

The n8n integration does not include any events.

### Service Checks
### Service checks
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Service checks
### Service Checks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for the comment above.


See [service_checks.json][8] for a list of service checks provided by this integration.

Expand Down
2 changes: 1 addition & 1 deletion n8n/assets/configuration/spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ files:
openmetrics_endpoint.required: true
openmetrics_endpoint.hidden: false
openmetrics_endpoint.display_priority: 1
openmetrics_endpoint.value.example: http://localhost:5678
openmetrics_endpoint.value.example: http://localhost:5678/metrics
openmetrics_endpoint.description: |
Endpoint exposing the n8n's metrics in the OpenMetrics format. For more information, refer to:
https://docs.n8n.io/hosting/logging-monitoring/monitoring/
Expand Down
6 changes: 6 additions & 0 deletions n8n/changelog.d/23635.changed
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Improve the n8n metric coverage:

- Correct missing or incorrect metrics.
- Add metrics introduced in n8n 2.x (workflow execution duration, audit events, authentication, workflow and user statistics, expression engine, and process memory).
- Track n8n's dynamic events (workflow cancellations, audit activity, AI nodes, user and credential changes, package and variable changes).
- Add support for monitoring n8n worker processes alongside the main process.
69 changes: 33 additions & 36 deletions n8n/datadog_checks/n8n/check.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,58 +2,55 @@
# All rights reserved
# Licensed under a 3-clause BSD style license (see LICENSE)

from urllib.parse import urljoin
from functools import cached_property
from typing import Any
from urllib.parse import urljoin, urlparse

from requests.exceptions import RequestException

from datadog_checks.base import OpenMetricsBaseCheckV2
from datadog_checks.n8n.metrics import METRIC_MAP, RENAME_LABELS_MAP

from .config_models import ConfigMixin

DEFAULT_READY_ENDPOINT = '/healthz/readiness'
DEFAULT_READY_PATH = '/healthz/readiness'


class N8nCheck(OpenMetricsBaseCheckV2, ConfigMixin):
__NAMESPACE__ = 'n8n'
DEFAULT_METRIC_LIMIT = 0

def __init__(self, name, init_config, instances=None):
super(N8nCheck, self).__init__(
name,
init_config,
instances,
)
self.openmetrics_endpoint = self.instance["openmetrics_endpoint"]
self.tags = self.instance.get('tags', [])
self._ready_endpoint = DEFAULT_READY_ENDPOINT

def get_default_config(self):
def get_default_config(self) -> dict[str, Any]:
return {
'metrics': [METRIC_MAP],
'rename_labels': RENAME_LABELS_MAP,
'raw_metric_prefix': 'n8n_',
}

def _check_n8n_readiness(self):
endpoint = urljoin(self.openmetrics_endpoint, self._ready_endpoint)
response = self.http.get(endpoint)

# Determine metric value and status_code tag
if response.status_code is None:
self.log.warning("The readiness endpoint did not return a status code")
metric_value = 0
metric_tags = self.tags + ['status_code:null']
elif response.status_code == 200:
# Ready - submit 1
metric_value = 1
metric_tags = self.tags + [f'status_code:{response.status_code}']
else:
# Not ready - submit 0
metric_value = 0
metric_tags = self.tags + [f'status_code:{response.status_code}']

# Submit metric with appropriate value and status_code tag
self.gauge('readiness.check', metric_value, tags=metric_tags)

def check(self, instance):
super().check(instance)
@cached_property
def _readiness_endpoint(self) -> str:
parsed = urlparse(self.config.openmetrics_endpoint)
base = f'{parsed.scheme}://{parsed.netloc}'
return urljoin(base, DEFAULT_READY_PATH)

def _check_n8n_readiness(self) -> None:
endpoint = self._readiness_endpoint
tags = list(self.config.tags or ())

try:
response = self.http.get(endpoint)
except RequestException as e:
self.log.warning("Could not reach n8n readiness endpoint %s: %s", endpoint, e)
self.gauge('readiness.check', 0, tags=tags + ['status_code:none'])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.gauge('readiness.check', 0, tags=tags + ['status_code:none'])
response = getattr(e, "response", None)
status_code = getattr(response, "status_code", None) or "none"
self.gauge('readiness.check', 0, tags=tags + [f"status_code:{status_code}"])

nit: could be good to add the status_code when it's available (HTTP error).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is, isn't it? Or am I misunderstanding your suggestion?

        is_ready = response.status_code == 200
        self.gauge(
            'readiness.check',
            1 if is_ready else 0,
            tags=tags + [f'status_code:{response.status_code}'],
        )

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting we set it on the failure path, inside the:

 except RequestException as e:
            self.log.warning("Could not reach n8n readiness endpoint %s: %s", endpoint, e)
            self.gauge('readiness.check', 0, tags=tags + ['status_code:none'])

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah, ok. When there is a RequestException, we do not get a response because the connection was never stablished (docs). We can get that if we do raise_for_status which we do not do here. The only exceptions raised are ConnectionError and TimeoutError which do not carry the status code.

Any other error (non 2xx) goes through the other branch where we add the code in the tag.

I checked, just in case the Wrapper was doing the raise_for_status internally and it is but only if we have an auth_token and it needs refreshing. Whatever is the final result is still done without the raise_for_status call. Here.

The failure path here carries no response. I can apply your suggestion since it is non-breaking but I believe it is dead code and extra attribute lookup we could save in every run. Let me know if you still prefer to apply it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, I missed that!
In that case, we could change the is_ready to be valid for anything in the 2xx range?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, updated now.

return

is_ready = 200 <= response.status_code < 300
self.gauge(
'readiness.check',
1 if is_ready else 0,
tags=tags + [f'status_code:{response.status_code}'],
)

def check(self, instance: dict[str, Any]) -> None:
self._check_n8n_readiness()
super().check(instance)
2 changes: 1 addition & 1 deletion n8n/datadog_checks/n8n/data/conf.yaml.example
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ instances:
## https://docs.n8n.io/hosting/logging-monitoring/monitoring/
## https://docs.n8n.io/hosting/configuration/environment-variables/endpoints/
#
- openmetrics_endpoint: http://localhost:5678
- openmetrics_endpoint: http://localhost:5678/metrics

## @param raw_metric_prefix - string - optional - default: n8n_
## The prefix prepended to all metrics from n8n.
Expand Down
Loading
Loading