[otel-advisor] OTel improvement: add exception.type to exception span events to enable error classification

### 📡 OTel Instrumentation Improvement: add `exception.type` to exception span events

**Analysis Date**: 2026-04-12
**Priority**: High
**Effort**: Small (< 2h)

### Problem

Exception span events emitted by `sendJobConclusionSpan` (in `send_otlp_span.cjs`) include only `exception.message` — the `exception.type` attribute is never set. The OpenTelemetry semantic conventions define `exception.type` as the **primary identifier** for classifying exceptions in traces. Without it, every exception event in every failed job span is anonymous: backends cannot group by error class, create type-specific alerts, or display the exception explorer views they are designed for.

The gap is at `actions/setup/js/send_otlp_span.cjs` lines 768–781:

```javascript
// Current — only exception.message, no exception.type
.map(msg => ({
  timeUnixNano: errorTimeNano,
  name: "exception",
  attributes: [buildAttr("exception.message", msg.slice(0, MAX_ATTR_VALUE_LENGTH))],
}))
```

A second related gap: `sanitizeOTLPPayload` (line 285–300) sanitizes `span.attributes` but **does not** sanitize `span.events[].attributes`. This means `exception.message` values bypass the sensitive-key redaction logic before being sent to the OTLP endpoint **and** before being written to the JSONL mirror artifact. If an agent failure message inadvertently echoes a token or key, it would be stored unredacted in both the backend and the artifact.

### Why This Matters (DevOps Perspective)

Without `exception.type`:

- **Grafana Tempo / Jaeger exception viewer is blind**: These backends index exception events by `exception.type`. Without it, no grouping is shown — every failure looks like a raw string.
- **Honeycomb "Top errors" is empty**: Honeycomb uses `exception.type` to populate its error analysis panel. Spans without it don't appear in error breakdowns.
- **Datadog APM "Error Tracking" doesn't group**: Datadog groups errors by `exception.type` + `exception.message` fingerprint. Without the type, grouping degrades to per-message fingerprinting, which fragments high-cardinality messages into hundreds of "issue" entries instead of one.
- **Alerting gaps**: You cannot create an alert like "alert when `gh-aw.AgentTimeout` error rate > 5%" because the type dimension doesn't exist.
- **MTTR impact**: On-call engineers filtering for "timeout failures" in a trace search cannot do `exception.type = "gh-aw.AgentTimeout"` — they must resort to substring matching on `exception.message`.

The secondary gap (span events not sanitized) means that if an error message ever contains a credential (e.g., an API error that echoes back request headers), it would be stored unredacted in the OTLP backend and exposed in the artifact download.

### Current Behavior

```javascript
// actions/setup/js/send_otlp_span.cjs (lines 768–781)
const spanEvents = isAgentFailure
  ? outputErrors
      .map(e => (e && typeof e.message === "string" ? e.message : String(e)))
      .filter(Boolean)
      .map(msg => ({
        timeUnixNano: errorTimeNano,
        name: "exception",
        attributes: [buildAttr("exception.message", msg.slice(0, MAX_ATTR_VALUE_LENGTH))],
        // ❌ exception.type is absent — backends cannot classify errors
      }))
  : [];
```

And in `sanitizeOTLPPayload` (lines 285–300):

```javascript
// Current: sanitizes span.attributes only — span.events[].attributes bypass redaction
spans: Array.isArray(ss.spans) ? ss.spans.map(span => ({
  ...span,
  attributes: sanitizeAttrs(span.attributes),
  // ❌ events: span.events — NOT sanitized
})) : ss.spans,
```

### Proposed Change

**Part 1 — Add `exception.type` to exception events**

Agent errors often follow the pattern `"push_to_pull_request_branch:Cannot push..."` (colon-separated `type:message`). Extract the type prefix when present; fall back to a stable sentinel `"gh-aw.AgentError"`.

```javascript
// Proposed: actions/setup/js/send_otlp_span.cjs (replace lines 768–781)
const spanEvents = isAgentFailure
  ? outputErrors
      .map(e => (e && typeof e.message === "string" ? e.message : String(e)))
      .filter(Boolean)
      .map(msg => {
        // Extract colon-prefixed type when available ("push_to_pull_request_branch:...")
        const colonIdx = msg.indexOf(":");
        const exceptionType =
          colonIdx > 0 && colonIdx < 64 && /^[a-z_][a-z0-9_.]*$/i.test(msg.slice(0, colonIdx))
            ? `gh-aw.\$\{msg.slice(0, colonIdx)}`
            : "gh-aw.AgentError";
        const exceptionMessage = (colonIdx > 0 ? msg.slice(colonIdx + 1).trim() : msg).slice(0, MAX_ATTR_VALUE_LENGTH);
        return {
          timeUnixNano: errorTimeNano,
          name: "exception",
          attributes: [
            buildAttr("exception.type", exceptionType),
            buildAttr("exception.message", exceptionMessage),
          ],
        };
      })
  : [];
```

**Part 2 — Extend `sanitizeOTLPPayload` to cover span event attributes**

````javascript
// Proposed: actions/setup/js/send_otlp_span.cjs (replace lines 289–298)
spans: Array.isArray(ss.spans)
  ? ss.spans.map(span => ({
      ...span,
      attributes: sanitizeAttrs(span.attributes),
      // Also sanitize event attributes so exception.message is redacted if it
      // accidentally contains a token or key.
      events: Array.isArray(span.events)
        ? span.events.map(ev => ({ ...ev, attributes: sanitizeAttrs(ev.attributes) }))
        : span.events,
    }))
  : ss.spans,
```

### Expected Outcome

After this change:

- **In Grafana / Honeycomb / Datadog**: Exception events appear in the type-grouped error views. Engineers can query `exception.type = "gh-aw.AgentError"` or more specific types like `exception.type = "gh-aw.push_to_pull_request_branch"` to filter all push-failure traces.
- **In the JSONL mirror**: Exception event attributes include `exception.type` alongside `exception.message`, making grep-based local triage (`grep exception.type otel.jsonl`) immediately useful.
- **For on-call engineers**: "Why did this job fail?" is answerable by exception type in under 30 seconds — no manual string parsing required. Type-based alert rules become possible.
- **Security**: Error messages that accidentally echo credentials are now redacted in the OTLP export and the artifact, consistent with the existing span-attribute redaction policy.

<details>
<summary><b>Implementation Steps</b></summary>

- [ ] Open `actions/setup/js/send_otlp_span.cjs`
- [ ] Replace the `spanEvents` construction block (lines 768–781) with the proposed version that extracts `exception.type` from the colon-prefix and falls back to `"gh-aw.AgentError"`
- [ ] Replace the `spans` mapping inside `sanitizeOTLPPayload` (lines 289–298) with the extended version that also maps over `span.events`
- [ ] Update `actions/setup/js/send_otlp_span.test.cjs` to assert:
  - Each exception event has an `exception.type` attribute (e.g. `{ key: "exception.type", value: { stringValue: "gh-aw.AgentError" } }`)
  - The colon-prefix extraction works for `"push_to_pull_request_branch:message"` → `exception.type = "gh-aw.push_to_pull_request_branch"`
  - `sanitizeOTLPPayload` now redacts sensitive keys in span event attributes
- [ ] Run `cd actions/setup/js && npx vitest run` to confirm tests pass
- [ ] Run `make fmt` to ensure formatting
- [ ] Open a PR referencing this issue

</details>

### Evidence from Static Analysis

The gap is confirmed by direct code inspection — `send_otlp_span.cjs:779` shows the exception event construction emits only `exception.message`:

```
attributes: [buildAttr("exception.message", msg.slice(0, MAX_ATTR_VALUE_LENGTH))],
````

No `exception.type` attribute is built anywhere in the file. The OTel specification ([Semantic Conventions for Exceptions]((opentelemetry.io/redacted) states that `exception.type` MUST be set when recording an exception event — making this a spec-level violation that silently degrades all backend exception views.

The sanitization gap is confirmed at line 295: the `sanitizeOTLPPayload` mapping of spans does not include an `events` field, leaving event attributes un-redacted.

### Related Files

- `actions/setup/js/send_otlp_span.cjs` — primary change (exception event construction + `sanitizeOTLPPayload`)
- `actions/setup/js/send_otlp_span.test.cjs` — test assertions for both changes

---

*Generated by the [Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/24316622621) workflow*







> Generated by [Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/24316622621/agentic_workflow) · ● 163.5K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-otel-instrumentation-advisor%22&type=issues)
> - [x] expires  on Apr 19, 2026, 9:23 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[otel-advisor] OTel improvement: add exception.type to exception span events to enable error classification #25937

📡 OTel Instrumentation Improvement: add `exception.type` to exception span events

Problem

Why This Matters (DevOps Perspective)

Current Behavior

Proposed Change

Related Files

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[otel-advisor] OTel improvement: add exception.type to exception span events to enable error classification #25937

Description

📡 OTel Instrumentation Improvement: add exception.type to exception span events

Problem

Why This Matters (DevOps Perspective)

Current Behavior

Proposed Change

Related Files

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

📡 OTel Instrumentation Improvement: add `exception.type` to exception span events