Add failed capability IDs to workflow execution failed metric by vreff · Pull Request #21735 · smartcontractkit/chainlink

vreff · 2026-03-26T19:40:48Z

Track which capabilities failed during a workflow execution and include them as a comma-joined capabilityID label on the platform_engine_workflow_execution_failed_count metric and in the "Workflow execution failed" log line.

Changes

capability_executor.go: Add failedCapabilityIDs sync.Map to ExecutionHelper, store capability ID on each callCapability error, expose via FailedCapabilityIDs() getter
engine.go: Add capabilityID label to IncrementWorkflowExecutionFailedCounter and to the error log line in the result.GetError() path
capability_executor_test.go: Unit tests for empty, single, dedup, and multiple capability ID tracking

- Track failed capability IDs in ExecutionHelper via sync.Map - Store capability ID on each callCapability error - Expose FailedCapabilityIDs() as comma-joined string - Add capabilityID label to IncrementWorkflowExecutionFailedCounter - Add capabilityID to 'Workflow execution failed' log line - Add unit tests for FailedCapabilityIDs

github-actions · 2026-03-26T19:41:05Z

👋 vreff, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

github-actions · 2026-03-26T19:41:55Z

✅ No conflicts with other open PRs targeting develop

github-actions · 2026-03-26T19:41:58Z

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

#added For any new functionality added.
#breaking_change For any functionality that requires manual action for the node to boot.
#bugfix For bug fixes.
#changed For any change to the existing functionality.
#db_update For any feature that introduces updates to database schema.
#deprecation_notice For any upcoming deprecation functionality.
#internal For changesets that need to be excluded from the final changelog.
#nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
#removed For any functionality/config that is removed.
#updated For any functionality that is updated.
#wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

Copilot

Pull request overview

Risk Rating: MEDIUM — changes affect observability/metrics dimensions on a workflow-level counter and introduce new concurrent state tracking that can impact metric cardinality and operational cost.

This PR aims to attribute workflow execution failures to the capability IDs that failed during the run, surfacing that attribution in both logs and the platform_engine_workflow_execution_failed_count metric.

Changes:

Track failed capability IDs during execution via ExecutionHelper.
Add a capabilityID label (comma-joined IDs) to the workflow execution failed metric and error log line.
Add unit tests covering empty/single/dedup/multiple failed capability ID tracking.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`core/services/workflows/v2/engine.go`	Adds failed capability ID label to workflow execution failure metric/log.
`core/services/workflows/v2/capability_executor.go`	Tracks capability IDs on execution errors and exposes a getter for failed IDs.
`core/services/workflows/v2/capability_executor_test.go`	Adds tests for the failed capability ID aggregation behavior.

Areas requiring scrupulous human review:

The determinism and cardinality impact of using a comma-joined list as a metric label value.
Ensuring failed capability IDs are recorded for all relevant error paths (not only capability.Execute errors).

Recommended reviewers (per CODEOWNERS):

@smartcontractkit/keystone (owners for /core/services/workflows)

Comments suppressed due to low confidence (1)

core/services/workflows/v2/capability_executor.go:217

failedCapabilityIDs is only updated when capability.Execute(...) returns an error. Errors returned earlier in callCapability (e.g., GetExecutable/Info/ConfigForCapability failures) or in CallCapability (rate limit / semaphore waits) won't be attributed to the capability ID, so the new workflow-execution-failed metric label may be empty/misleading in those cases. Consider recording request.Id for any non-nil error returned from CallCapability/callCapability (e.g., via a deferred function that checks the named return err).

	executionDuration := time.Since(executionStart)
	c.metrics.With(platform.KeyCapabilityID, request.Id).UpdateCapabilityExecutionDurationHistogram(ctx, int64(executionDuration.Seconds()))
	if err != nil {
		c.failedCapabilityIDs.Store(request.Id, struct{}{})
		var capabilityError caperrors.Error
		if errors.As(err, &capabilityError) {
			if capabilityError.Origin() == caperrors.OriginUser {
				execLogger.Debugw("Capability execution failed with user error", "userErr", err)

core/services/workflows/v2/capability_executor.go

core/services/workflows/v2/capability_executor_test.go

core/services/workflows/v2/engine.go

- Add new platform_engine_workflow_execution_failed_capability_error_count metric - Emit one metric increment per failed capability ID (avoids high-cardinality label) - Change FailedCapabilityIDs() to return sorted []string - Keep comma-joined list in log line for debugging - Remove capabilityID label from existing execution_failed_count metric

trunk-io · 2026-03-26T19:59:10Z

_{View Full Report ↗︎ ⋅ Docs}

cl-sonarqube-production · 2026-03-26T20:23:51Z

Quality Gate passed

Issues
2 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

vreff requested a review from a team as a code owner March 26, 2026 19:40

Copilot AI review requested due to automatic review settings March 26, 2026 19:40

product-security-plaid-production bot requested a review from jmank88 March 26, 2026 19:41

Copilot started reviewing on behalf of vreff March 26, 2026 19:41 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

core/services/workflows/v2/capability_executor.go Outdated Show resolved Hide resolved

core/services/workflows/v2/capability_executor_test.go Show resolved Hide resolved

core/services/workflows/v2/engine.go Outdated Show resolved Hide resolved

Fix goimports formatting in monitoring.go

36f67fa

vreff requested a review from bolekk March 26, 2026 20:52

vreff closed this Mar 27, 2026

vreff deleted the feat/capability-id-in-execution-failed-metric-v2 branch March 27, 2026 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add failed capability IDs to workflow execution failed metric#21735

Add failed capability IDs to workflow execution failed metric#21735
vreff wants to merge 3 commits intodevelopfrom
feat/capability-id-in-execution-failed-metric-v2

vreff commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

trunk-io bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

cl-sonarqube-production bot commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vreff commented Mar 26, 2026

Changes

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

trunk-io bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cl-sonarqube-production bot commented Mar 26, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 26, 2026 •

edited

Loading

trunk-io bot commented Mar 26, 2026 •

edited

Loading