Skip to content

[RPC Metric Part 1] Support two basic metrics in RPC client : Latency and error rate #89

Open
guandali wants to merge 11 commits intomainfrom
lli/rpc-beholder-metric
Open

[RPC Metric Part 1] Support two basic metrics in RPC client : Latency and error rate #89
guandali wants to merge 11 commits intomainfrom
lli/rpc-beholder-metric

Conversation

@guandali
Copy link
Member

@guandali guandali commented Mar 13, 2026

Description

Allow RPC client to capture two metrics, so it later can be exported to beholder

Requires Dependencies

Resolves Dependencies

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a3174e7ea3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@guandali guandali requested a review from a team as a code owner March 13, 2026 08:19
@guandali guandali changed the title publish two basic metrics from the RPC client into Beholder: Latency and error rate [RPC Metric Part 1] publish two basic metrics from the RPC client into Beholder: Latency and error rate Mar 17, 2026
@guandali guandali changed the title [RPC Metric Part 1] publish two basic metrics from the RPC client into Beholder: Latency and error rate [RPC Metric Part 1] Support two basic metrics in RPC client : Latency and error rate Mar 17, 2026
Copy link

@vlfig vlfig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments, grab me if you need.

// RPCClientMetricsConfig holds labels for RPC client metrics.
// Empty strings are allowed; they will still be emitted as labels for filtering.
type RPCClientMetricsConfig struct {
Env string // e.g. "staging", "production"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this label should ever be populated by the application itself.

Env string // e.g. "staging", "production"
Network string // chain/network name
ChainID string // chain ID
RPCProvider string // RPC provider or node name (optional)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this come from? I'd imagine this being called from logResult in rpc_client.go.

@@ -0,0 +1,125 @@
// RPC client observability using Beholder.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a new module I'd imagine this becoming an expansion of metrics/client.go, which already has a (promauto) latency metric, to 1) include beholder as a "target" like in metrics/multinode.go; and 2) add the request error rate metric.

@@ -0,0 +1,54 @@
# RPC Observability (Beholder)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure you intended to commit this. I do like the idea of having a /docs folder in this style but I think that's better pursued with a broader effort. Leaving such a slim slice here would end up be more confusing, I think.


Create `RPCClientMetrics` with `metrics.NewRPCClientMetrics(metrics.RPCClientMetricsConfig{...})` and pass it as the last argument to `multinode.NewRPCClientBase(...)`. The follow-up interface refactor will make it easier for multinode/chain integrations to supply `env`, `network`, `chain_id`, and `rpc_provider`.

## Follow-up: multinode integration (PR 2)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better not mix the plan of what to do with description of what is.

@guandali guandali requested a review from vlfig March 20, 2026 14:45
Copy link

@vlfig vlfig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments. Do tag Dmytro once you feel we're past these. I think we'll need approval from someone other than me.

RPCCallLatency = promauto.NewHistogramVec(prometheus.HistogramOpts{
Name: "rpc_call_latency",
Help: "The duration of an RPC call in milliseconds",
Help: "The duration of an RPC call in seconds",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You sure about this?

Comment on lines +40 to +41
rpcCallLatencyBeholder = "rpc_call_latency"
rpcCallErrorsTotalBeholder = "rpc_call_errors_total"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're defining these, let's use them above.

}

// RPCClientMetricsConfig holds fixed labels for an RPC client instance.
type RPCClientMetricsConfig struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is going to be called from chainlink-evm's logResult, If I'm reading this correctly, the only "static" fields are ChainFamily and ChainID, no? The others come from the request and would be passed on each increment, not when creating the metrics instance. Did you test locally logResult calling RecordRequest instead of RPCCallLatency directly?

Comment on lines +85 to +86
latency: latency,
errorsTotal: errorsTotal,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call them latencyHist and errorsCounter for clarity?

Comment on lines +123 to +128
// NoopRPCClientMetrics is a no-op implementation for when metrics are disabled.
type NoopRPCClientMetrics struct{}

func (NoopRPCClientMetrics) RecordRequest(context.Context, string, time.Duration, error) {}

var _ RPCClientMetrics = NoopRPCClientMetrics{}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly a golang wizz, but is this necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's compile time interface check, recommended here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants