Conversation
Instrument all public KVStore methods with a histogram metric (platform_vault_plugin_kv_operation_duration_seconds) to help diagnose high P95 Observation latencies in production. Also adds a max-batch Observation test exercising 10 GetSecrets requests with 10 encryption keys each plus 20 pending queue items.
|
👋 cedric-cordenier, thanks for creating this pull request! To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team. Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks! |
|
✅ No conflicts with other open PRs targeting |
|
I see you updated files related to
|
There was a problem hiding this comment.
Pull request overview
Risk Rating: MEDIUM
This PR improves observability for Vault and OCR3.1 reporting plugins by adding new latency metrics around KV and blob operations, expanding histogram bucket ranges, and updating the Vault KVStore API to propagate context.Context through reads/writes (with accompanying test updates).
Changes:
- Add OCR3.1 beholderwrapper histograms and wrappers to record KV/blob operation durations.
- Add Vault plugin KV operation duration histogram and plumb
context.Contextthrough KVStore APIs/callers. - Expand histogram buckets (plugin duration + data size) and extend tests to cover new wrapping/instrumentation paths.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| core/services/ocr3_1/beholderwrapper/types.go | Adds KV/blob duration instruments and views; expands bucket ranges. |
| core/services/ocr3_1/beholderwrapper/plugin.go | Wraps KV readers/writers and blob fetchers/broadcasters with instrumented implementations. |
| core/services/ocr3_1/beholderwrapper/instrumented_kv.go | New KV reader/readwriter wrappers that record per-method duration. |
| core/services/ocr3_1/beholderwrapper/instrumented_blob.go | New blob fetcher/broadcast-fetcher wrappers that record per-method duration. |
| core/services/ocr3_1/beholderwrapper/plugin_test.go | Adds tests validating delegation, wrapping behavior, and error propagation for new wrappers. |
| core/services/ocr2/plugins/vault/metrics.go | Adds a KV operation duration histogram metric. |
| core/services/ocr2/plugins/vault/kvstore.go | Updates KVStore interfaces to accept context.Context and records per-method duration metrics. |
| core/services/ocr2/plugins/vault/plugin.go | Plumbs ctx into KVStore calls and passes metrics into KVStore constructors. |
| core/services/ocr2/plugins/vault/kvstore_test.go | Updates tests for new KVStore signatures and adds test helpers that provide metrics. |
| core/services/ocr2/plugins/vault/plugin_test.go | Updates plugin tests for new KVStore signatures/metrics; adds a max-batch observation test. |
Areas needing scrupulous human review:
- Vault KVStore API changes (
ctxplumbing) to ensure no call sites are missed and that behavior is unchanged aside from instrumentation. - Histogram bucket changes (especially data size) to confirm the intended max ranges match actual limits/operational expectations.
- The newly added “max batch” Vault observation test for runtime/flakiness (crypto + large loops).
Reviewer recommendations (from .github/CODEOWNERS):
/core/services/ocr*→@smartcontractkit/foundationsand@smartcontractkit/core
| // the observation also includes the maximum number of pending queue items | ||
| // to be broadcast as blobs. | ||
| numLocalItems := 2 * batchSize | ||
| for i := range numLocalItems { |
There was a problem hiding this comment.
The loop uses for i := range numLocalItems, but range can't be used on an int (this won't compile). Use a standard index loop (e.g., for i := 0; i < numLocalItems; i++ { ... }).
| for i := range numLocalItems { | |
| for i := 0; i < numLocalItems; i++ { |
| require.NoError(t, err) | ||
| gotLocalIDs[gotMsg.Id] = true | ||
| } | ||
| for i := range numLocalItems { |
There was a problem hiding this comment.
Same issue here: for i := range numLocalItems attempts to range over an int and will not compile. Switch to an index-based for loop.
| for i := range numLocalItems { | |
| for i := 0; i < numLocalItems; i++ { |
| // 512KB is the max value possible | ||
| // 1KB, 2KB, 4KB, 8KB, 16KB, 32KB, 64KB, 128KB, 256KB, 512KB | ||
| Boundaries: prometheus.ExponentialBuckets(1024, 2, 10), | ||
| // 1KB, 2KB, 4KB, 8KB, 16KB, 32KB, 64KB, 128KB, 256KB, 512KB, 1024KB, 2048KB, 4096KB, 8192KB | ||
| Boundaries: prometheus.ExponentialBuckets(1024, 2, 14), |
There was a problem hiding this comment.
The comment says "512KB is the max value possible", but the bucket boundaries were expanded up to 8192KB. Please update/remove the max-value comment so it matches the new histogram range (or adjust the buckets if 512KB truly is the max).
| Key: "my_secret", | ||
| } | ||
| err = NewWriteStore(rdr).WriteSecret(createdID, &vaultcommon.StoredSecret{ | ||
| err = newTestWriteStore(t, rdr).WriteSecret(t.Context(), createdID,&vaultcommon.StoredSecret{ |
There was a problem hiding this comment.
This line (and a few similar ones nearby) appears not to be gofmt'd (missing spaces after commas, e.g. createdID,&...). Please run gofmt on this file to keep formatting consistent.
| err = newTestWriteStore(t, rdr).WriteSecret(t.Context(), createdID,&vaultcommon.StoredSecret{ | |
| err = newTestWriteStore(t, rdr).WriteSecret(t.Context(), createdID, &vaultcommon.StoredSecret{ |
| err = store.WriteSecret(t.Context(), id,ss) | ||
| require.NoError(t, err) |
There was a problem hiding this comment.
This call isn't gofmt'd (missing a space after the comma: id,ss). Please run gofmt on the file to ensure standard Go formatting throughout.
|




Requires
Supports