📡 OTel Instrumentation Improvement: add exception.type to exception span events
Analysis Date: 2026-04-12
Priority: High
Effort: Small (< 2h)
Problem
Exception span events emitted by sendJobConclusionSpan (in send_otlp_span.cjs) include only exception.message — the exception.type attribute is never set. The OpenTelemetry semantic conventions define exception.type as the primary identifier for classifying exceptions in traces. Without it, every exception event in every failed job span is anonymous: backends cannot group by error class, create type-specific alerts, or display the exception explorer views they are designed for.
The gap is at actions/setup/js/send_otlp_span.cjs lines 768–781:
// Current — only exception.message, no exception.type
.map(msg => ({
timeUnixNano: errorTimeNano,
name: "exception",
attributes: [buildAttr("exception.message", msg.slice(0, MAX_ATTR_VALUE_LENGTH))],
}))
A second related gap: sanitizeOTLPPayload (line 285–300) sanitizes span.attributes but does not sanitize span.events[].attributes. This means exception.message values bypass the sensitive-key redaction logic before being sent to the OTLP endpoint and before being written to the JSONL mirror artifact. If an agent failure message inadvertently echoes a token or key, it would be stored unredacted in both the backend and the artifact.
Why This Matters (DevOps Perspective)
Without exception.type:
- Grafana Tempo / Jaeger exception viewer is blind: These backends index exception events by
exception.type. Without it, no grouping is shown — every failure looks like a raw string.
- Honeycomb "Top errors" is empty: Honeycomb uses
exception.type to populate its error analysis panel. Spans without it don't appear in error breakdowns.
- Datadog APM "Error Tracking" doesn't group: Datadog groups errors by
exception.type + exception.message fingerprint. Without the type, grouping degrades to per-message fingerprinting, which fragments high-cardinality messages into hundreds of "issue" entries instead of one.
- Alerting gaps: You cannot create an alert like "alert when
gh-aw.AgentTimeout error rate > 5%" because the type dimension doesn't exist.
- MTTR impact: On-call engineers filtering for "timeout failures" in a trace search cannot do
exception.type = "gh-aw.AgentTimeout" — they must resort to substring matching on exception.message.
The secondary gap (span events not sanitized) means that if an error message ever contains a credential (e.g., an API error that echoes back request headers), it would be stored unredacted in the OTLP backend and exposed in the artifact download.
Current Behavior
// actions/setup/js/send_otlp_span.cjs (lines 768–781)
const spanEvents = isAgentFailure
? outputErrors
.map(e => (e && typeof e.message === "string" ? e.message : String(e)))
.filter(Boolean)
.map(msg => ({
timeUnixNano: errorTimeNano,
name: "exception",
attributes: [buildAttr("exception.message", msg.slice(0, MAX_ATTR_VALUE_LENGTH))],
// ❌ exception.type is absent — backends cannot classify errors
}))
: [];
And in sanitizeOTLPPayload (lines 285–300):
// Current: sanitizes span.attributes only — span.events[].attributes bypass redaction
spans: Array.isArray(ss.spans) ? ss.spans.map(span => ({
...span,
attributes: sanitizeAttrs(span.attributes),
// ❌ events: span.events — NOT sanitized
})) : ss.spans,
Proposed Change
Part 1 — Add exception.type to exception events
Agent errors often follow the pattern "push_to_pull_request_branch:Cannot push..." (colon-separated type:message). Extract the type prefix when present; fall back to a stable sentinel "gh-aw.AgentError".
// Proposed: actions/setup/js/send_otlp_span.cjs (replace lines 768–781)
const spanEvents = isAgentFailure
? outputErrors
.map(e => (e && typeof e.message === "string" ? e.message : String(e)))
.filter(Boolean)
.map(msg => {
// Extract colon-prefixed type when available ("push_to_pull_request_branch:...")
const colonIdx = msg.indexOf(":");
const exceptionType =
colonIdx > 0 && colonIdx < 64 && /^[a-z_][a-z0-9_.]*$/i.test(msg.slice(0, colonIdx))
? `gh-aw.\$\{msg.slice(0, colonIdx)}`
: "gh-aw.AgentError";
const exceptionMessage = (colonIdx > 0 ? msg.slice(colonIdx + 1).trim() : msg).slice(0, MAX_ATTR_VALUE_LENGTH);
return {
timeUnixNano: errorTimeNano,
name: "exception",
attributes: [
buildAttr("exception.type", exceptionType),
buildAttr("exception.message", exceptionMessage),
],
};
})
: [];
Part 2 — Extend sanitizeOTLPPayload to cover span event attributes
// Proposed: actions/setup/js/send_otlp_span.cjs (replace lines 289–298)
spans: Array.isArray(ss.spans)
? ss.spans.map(span => ({
...span,
attributes: sanitizeAttrs(span.attributes),
// Also sanitize event attributes so exception.message is redacted if it
// accidentally contains a token or key.
events: Array.isArray(span.events)
? span.events.map(ev => ({ ...ev, attributes: sanitizeAttrs(ev.attributes) }))
: span.events,
}))
: ss.spans,
```
### Expected Outcome
After this change:
- **In Grafana / Honeycomb / Datadog**: Exception events appear in the type-grouped error views. Engineers can query `exception.type = "gh-aw.AgentError"` or more specific types like `exception.type = "gh-aw.push_to_pull_request_branch"` to filter all push-failure traces.
- **In the JSONL mirror**: Exception event attributes include `exception.type` alongside `exception.message`, making grep-based local triage (`grep exception.type otel.jsonl`) immediately useful.
- **For on-call engineers**: "Why did this job fail?" is answerable by exception type in under 30 seconds — no manual string parsing required. Type-based alert rules become possible.
- **Security**: Error messages that accidentally echo credentials are now redacted in the OTLP export and the artifact, consistent with the existing span-attribute redaction policy.
<details>
<summary><b>Implementation Steps</b></summary>
- [ ] Open `actions/setup/js/send_otlp_span.cjs`
- [ ] Replace the `spanEvents` construction block (lines 768–781) with the proposed version that extracts `exception.type` from the colon-prefix and falls back to `"gh-aw.AgentError"`
- [ ] Replace the `spans` mapping inside `sanitizeOTLPPayload` (lines 289–298) with the extended version that also maps over `span.events`
- [ ] Update `actions/setup/js/send_otlp_span.test.cjs` to assert:
- Each exception event has an `exception.type` attribute (e.g. `{ key: "exception.type", value: { stringValue: "gh-aw.AgentError" } }`)
- The colon-prefix extraction works for `"push_to_pull_request_branch:message"` → `exception.type = "gh-aw.push_to_pull_request_branch"`
- `sanitizeOTLPPayload` now redacts sensitive keys in span event attributes
- [ ] Run `cd actions/setup/js && npx vitest run` to confirm tests pass
- [ ] Run `make fmt` to ensure formatting
- [ ] Open a PR referencing this issue
</details>
### Evidence from Static Analysis
The gap is confirmed by direct code inspection — `send_otlp_span.cjs:779` shows the exception event construction emits only `exception.message`:
```
attributes: [buildAttr("exception.message", msg.slice(0, MAX_ATTR_VALUE_LENGTH))],
No exception.type attribute is built anywhere in the file. The OTel specification ([Semantic Conventions for Exceptions]((opentelemetry.io/redacted) states that exception.type MUST be set when recording an exception event — making this a spec-level violation that silently degrades all backend exception views.
The sanitization gap is confirmed at line 295: the sanitizeOTLPPayload mapping of spans does not include an events field, leaving event attributes un-redacted.
Related Files
actions/setup/js/send_otlp_span.cjs — primary change (exception event construction + sanitizeOTLPPayload)
actions/setup/js/send_otlp_span.test.cjs — test assertions for both changes
Generated by the Daily OTel Instrumentation Advisor workflow
Generated by Daily OTel Instrumentation Advisor · ● 163.5K · ◷
📡 OTel Instrumentation Improvement: add
exception.typeto exception span eventsAnalysis Date: 2026-04-12
Priority: High
Effort: Small (< 2h)
Problem
Exception span events emitted by
sendJobConclusionSpan(insend_otlp_span.cjs) include onlyexception.message— theexception.typeattribute is never set. The OpenTelemetry semantic conventions defineexception.typeas the primary identifier for classifying exceptions in traces. Without it, every exception event in every failed job span is anonymous: backends cannot group by error class, create type-specific alerts, or display the exception explorer views they are designed for.The gap is at
actions/setup/js/send_otlp_span.cjslines 768–781:A second related gap:
sanitizeOTLPPayload(line 285–300) sanitizesspan.attributesbut does not sanitizespan.events[].attributes. This meansexception.messagevalues bypass the sensitive-key redaction logic before being sent to the OTLP endpoint and before being written to the JSONL mirror artifact. If an agent failure message inadvertently echoes a token or key, it would be stored unredacted in both the backend and the artifact.Why This Matters (DevOps Perspective)
Without
exception.type:exception.type. Without it, no grouping is shown — every failure looks like a raw string.exception.typeto populate its error analysis panel. Spans without it don't appear in error breakdowns.exception.type+exception.messagefingerprint. Without the type, grouping degrades to per-message fingerprinting, which fragments high-cardinality messages into hundreds of "issue" entries instead of one.gh-aw.AgentTimeouterror rate > 5%" because the type dimension doesn't exist.exception.type = "gh-aw.AgentTimeout"— they must resort to substring matching onexception.message.The secondary gap (span events not sanitized) means that if an error message ever contains a credential (e.g., an API error that echoes back request headers), it would be stored unredacted in the OTLP backend and exposed in the artifact download.
Current Behavior
And in
sanitizeOTLPPayload(lines 285–300):Proposed Change
Part 1 — Add
exception.typeto exception eventsAgent errors often follow the pattern
"push_to_pull_request_branch:Cannot push..."(colon-separatedtype:message). Extract the type prefix when present; fall back to a stable sentinel"gh-aw.AgentError".Part 2 — Extend
sanitizeOTLPPayloadto cover span event attributesNo
exception.typeattribute is built anywhere in the file. The OTel specification ([Semantic Conventions for Exceptions]((opentelemetry.io/redacted) states thatexception.typeMUST be set when recording an exception event — making this a spec-level violation that silently degrades all backend exception views.The sanitization gap is confirmed at line 295: the
sanitizeOTLPPayloadmapping of spans does not include aneventsfield, leaving event attributes un-redacted.Related Files
actions/setup/js/send_otlp_span.cjs— primary change (exception event construction +sanitizeOTLPPayload)actions/setup/js/send_otlp_span.test.cjs— test assertions for both changesGenerated by the Daily OTel Instrumentation Advisor workflow