Vault better instrumentation by cedric-cordenier · Pull Request #21732 · smartcontractkit/chainlink

cedric-cordenier · 2026-03-26T19:08:33Z

Requires

Supports

Instrument all public KVStore methods with a histogram metric (platform_vault_plugin_kv_operation_duration_seconds) to help diagnose high P95 Observation latencies in production. Also adds a max-batch Observation test exercising 10 GetSecrets requests with 10 encryption keys each plus 20 pending queue items.

github-actions · 2026-03-26T19:08:47Z

👋 cedric-cordenier, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

github-actions · 2026-03-26T19:09:41Z

✅ No conflicts with other open PRs targeting develop

github-actions · 2026-03-26T19:09:43Z

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

#added For any new functionality added.
#breaking_change For any functionality that requires manual action for the node to boot.
#bugfix For bug fixes.
#changed For any change to the existing functionality.
#db_update For any feature that introduces updates to database schema.
#deprecation_notice For any upcoming deprecation functionality.
#internal For changesets that need to be excluded from the final changelog.
#nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
#removed For any functionality/config that is removed.
#updated For any functionality that is updated.
#wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

Copilot

Pull request overview

Risk Rating: MEDIUM

This PR improves observability for Vault and OCR3.1 reporting plugins by adding new latency metrics around KV and blob operations, expanding histogram bucket ranges, and updating the Vault KVStore API to propagate context.Context through reads/writes (with accompanying test updates).

Changes:

Add OCR3.1 beholderwrapper histograms and wrappers to record KV/blob operation durations.
Add Vault plugin KV operation duration histogram and plumb context.Context through KVStore APIs/callers.
Expand histogram buckets (plugin duration + data size) and extend tests to cover new wrapping/instrumentation paths.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
core/services/ocr3_1/beholderwrapper/types.go	Adds KV/blob duration instruments and views; expands bucket ranges.
core/services/ocr3_1/beholderwrapper/plugin.go	Wraps KV readers/writers and blob fetchers/broadcasters with instrumented implementations.
core/services/ocr3_1/beholderwrapper/instrumented_kv.go	New KV reader/readwriter wrappers that record per-method duration.
core/services/ocr3_1/beholderwrapper/instrumented_blob.go	New blob fetcher/broadcast-fetcher wrappers that record per-method duration.
core/services/ocr3_1/beholderwrapper/plugin_test.go	Adds tests validating delegation, wrapping behavior, and error propagation for new wrappers.
core/services/ocr2/plugins/vault/metrics.go	Adds a KV operation duration histogram metric.
core/services/ocr2/plugins/vault/kvstore.go	Updates KVStore interfaces to accept `context.Context` and records per-method duration metrics.
core/services/ocr2/plugins/vault/plugin.go	Plumbs `ctx` into KVStore calls and passes metrics into KVStore constructors.
core/services/ocr2/plugins/vault/kvstore_test.go	Updates tests for new KVStore signatures and adds test helpers that provide metrics.
core/services/ocr2/plugins/vault/plugin_test.go	Updates plugin tests for new KVStore signatures/metrics; adds a max-batch observation test.

Areas needing scrupulous human review:

Vault KVStore API changes (ctx plumbing) to ensure no call sites are missed and that behavior is unchanged aside from instrumentation.
Histogram bucket changes (especially data size) to confirm the intended max ranges match actual limits/operational expectations.
The newly added “max batch” Vault observation test for runtime/flakiness (crypto + large loops).

Reviewer recommendations (from .github/CODEOWNERS):

/core/services/ocr* → @smartcontractkit/foundations and @smartcontractkit/core

Copilot · 2026-03-26T19:13:51Z

core/services/ocr2/plugins/vault/plugin_test.go

+	// the observation also includes the maximum number of pending queue items
+	// to be broadcast as blobs.
+	numLocalItems := 2 * batchSize
+	for i := range numLocalItems {


The loop uses for i := range numLocalItems, but range can't be used on an int (this won't compile). Use a standard index loop (e.g., for i := 0; i < numLocalItems; i++ { ... }).

Suggested change

for i := range numLocalItems {

for i := 0; i < numLocalItems; i++ {

Copilot · 2026-03-26T19:13:51Z

core/services/ocr2/plugins/vault/plugin_test.go

+		require.NoError(t, err)
+		gotLocalIDs[gotMsg.Id] = true
+	}
+	for i := range numLocalItems {


Same issue here: for i := range numLocalItems attempts to range over an int and will not compile. Switch to an index-based for loop.

Suggested change

for i := range numLocalItems {

for i := 0; i < numLocalItems; i++ {

Copilot · 2026-03-26T19:13:51Z

core/services/ocr3_1/beholderwrapper/types.go

 				// 512KB is the max value possible
-				// 1KB, 2KB, 4KB, 8KB, 16KB, 32KB, 64KB, 128KB, 256KB, 512KB
-				Boundaries: prometheus.ExponentialBuckets(1024, 2, 10),
+				// 1KB, 2KB, 4KB, 8KB, 16KB, 32KB, 64KB, 128KB, 256KB, 512KB, 1024KB, 2048KB, 4096KB, 8192KB
+				Boundaries: prometheus.ExponentialBuckets(1024, 2, 14),


The comment says "512KB is the max value possible", but the bucket boundaries were expanded up to 8192KB. Please update/remove the max-value comment so it matches the new histogram range (or adjust the buckets if 512KB truly is the max).

Copilot · 2026-03-26T19:13:52Z

core/services/ocr2/plugins/vault/plugin_test.go

 		Key:       "my_secret",
 	}
-	err = NewWriteStore(rdr).WriteSecret(createdID, &vaultcommon.StoredSecret{
+	err = newTestWriteStore(t, rdr).WriteSecret(t.Context(), createdID,&vaultcommon.StoredSecret{


This line (and a few similar ones nearby) appears not to be gofmt'd (missing spaces after commas, e.g. createdID,&...). Please run gofmt on this file to keep formatting consistent.

Suggested change

err = newTestWriteStore(t, rdr).WriteSecret(t.Context(), createdID,&vaultcommon.StoredSecret{

err = newTestWriteStore(t, rdr).WriteSecret(t.Context(), createdID, &vaultcommon.StoredSecret{

Copilot · 2026-03-26T19:13:52Z

core/services/ocr2/plugins/vault/kvstore_test.go

+	err = store.WriteSecret(t.Context(), id,ss)
 	require.NoError(t, err)


This call isn't gofmt'd (missing a space after the comma: id,ss). Please run gofmt on the file to ensure standard Go formatting throughout.

cl-sonarqube-production · 2026-03-26T19:24:53Z

Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube

Catch issues before they fail your Quality Gate with our IDE extension SonarQube IDE

trunk-io · 2026-03-26T19:25:39Z

_{View Full Report ↗︎ ⋅ Docs}

cedric-cordenier added 3 commits March 26, 2026 17:38

Increase buckets to account for production usage

eb65caf

Instrument KV and BlobBroadcaster/Fetcher interfaces

cd79b58

Copilot AI review requested due to automatic review settings March 26, 2026 19:08

cedric-cordenier requested review from a team as code owners March 26, 2026 19:08

product-security-plaid-production bot requested a review from ilija42 March 26, 2026 19:08

Copilot started reviewing on behalf of cedric-cordenier March 26, 2026 19:09 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vault better instrumentation#21732

Vault better instrumentation#21732
cedric-cordenier wants to merge 3 commits intodevelopfrom
vault-better-instrumentation

cedric-cordenier commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

cl-sonarqube-production bot commented Mar 26, 2026

Uh oh!

trunk-io bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	for i := range numLocalItems {
	for i := 0; i < numLocalItems; i++ {

	err = newTestWriteStore(t, rdr).WriteSecret(t.Context(), createdID,&vaultcommon.StoredSecret{
	err = newTestWriteStore(t, rdr).WriteSecret(t.Context(), createdID, &vaultcommon.StoredSecret{

		err = store.WriteSecret(t.Context(), id,ss)
		require.NoError(t, err)

Conversation

cedric-cordenier commented Mar 26, 2026

Requires

Supports

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

cl-sonarqube-production bot commented Mar 26, 2026

Quality Gate failed

Uh oh!

trunk-io bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

trunk-io bot commented Mar 26, 2026 •

edited

Loading