Skip to content

add basic metrics for NRI framework#277

Open
chrishenzie wants to merge 1 commit intocontainerd:mainfrom
chrishenzie:feat/prometheus-metrics
Open

add basic metrics for NRI framework#277
chrishenzie wants to merge 1 commit intocontainerd:mainfrom
chrishenzie:feat/prometheus-metrics

Conversation

@chrishenzie
Copy link
Contributor

@chrishenzie chrishenzie commented Mar 5, 2026

Provides necessary hooks for container runtimes to collect basic runtime metrics regarding NRI plugin activity. This improves observability and helps satisfy v1.0 requirements.

Introduces a Metrics interface that allows us to collect plugin counts, invocations, adjustments, and latency information.

To avoid burying state change events under a generic operation name, StateChange invocations append the specific event type using human-readable strings (e.g., "StateChange/CreateContainer").

These hooks are called during plugin invocations and container lifecycle events. Runtimes can provide their own implementation of the Metrics interface (e.g. via a WithMetrics option) to route these data points into their existing telemetry setups.

Ref: #272
Signed-off-by: Chris Henzie chrishenzie@gmail.com
Assisted-by: gemini-cli

@chrishenzie chrishenzie requested review from klihub and samuelkarp March 5, 2026 00:13
@chrishenzie chrishenzie force-pushed the feat/prometheus-metrics branch from 5c1dcef to 2df634d Compare March 5, 2026 00:14
@samuelkarp
Copy link
Member

Marking as draft per offline conversation with @chrishenzie. Some unstructured thoughts (haven't taken a deep look yet):

  • CI is failing
  • Bringing the whole Prometheus library stack into the go.mod is a lot
  • Clients will have to plug in Prometheus themselves anyway, so a couple alternatives:
    1. Build tags (no_prometheus) for consumers that don't want it, like plugins. But that still ends up in the vendor folder.
    2. A Metrics interface with members like RecordPluginInvocation, RecordPluginAdjustments, etc, and then have runtimes provide an impl tied to their own Prometheus setup as desired (or, could be something really, really simple like a stdout logger)

@chrishenzie chrishenzie force-pushed the feat/prometheus-metrics branch from 2df634d to fd1117c Compare March 5, 2026 00:31
@samuelkarp samuelkarp marked this pull request as draft March 5, 2026 00:31
@chrishenzie chrishenzie force-pushed the feat/prometheus-metrics branch 2 times, most recently from ed0c551 to fe82293 Compare March 5, 2026 20:19
@chrishenzie chrishenzie changed the title add prometheus metrics for NRI framework add basic metrics for NRI framework Mar 5, 2026
@samuelkarp
Copy link
Member

Looking more, I think a Metrics interface would be helpful. containerd uses https://github.com/docker/go-metrics as a wrapper around the Prometheus libraries anyway, and the interface would allow that to still be used for NRI metrics.

@chrishenzie chrishenzie force-pushed the feat/prometheus-metrics branch 2 times, most recently from e18c21a to eda0942 Compare March 5, 2026 20:36
@chrishenzie
Copy link
Contributor Author

chrishenzie commented Mar 5, 2026

Refactored to introduce a Metrics interface, and provide more fine-grained labels for StateChange invocations.

@chrishenzie chrishenzie marked this pull request as ready for review March 5, 2026 20:39
defer cancel()

if err = p.impl.StateChange(ctx, evt); err != nil {
err = p.impl.StateChange(ctx, evt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect this will cause some conflicts with #274

@chrishenzie chrishenzie force-pushed the feat/prometheus-metrics branch from eda0942 to 9a2a1c8 Compare March 6, 2026 00:43
@klihub
Copy link
Member

klihub commented Mar 6, 2026

@chrishenzie Thank you for picking this up ! With this updated interface-based approach this looks pretty good to me.

@klihub
Copy link
Member

klihub commented Mar 6, 2026

@chrishenzie I think the CDO check gets confused by the Assisted-by: being at the very end.

@chrishenzie chrishenzie force-pushed the feat/prometheus-metrics branch from 9a2a1c8 to 7f874b3 Compare March 6, 2026 20:43
Provides necessary hooks for container runtimes to collect basic runtime
metrics regarding NRI plugin activity. This improves observability and
helps satisfy v1.0 requirements.

Introduces a `Metrics` interface that allows us to collect plugin
counts, invocations, adjustments, and latency information.

To avoid burying state change events under a generic operation name,
`StateChange` invocations append the specific event type using
human-readable strings (e.g., "StateChange/CreateContainer").

These hooks are called during plugin invocations and container lifecycle
events. Runtimes can provide their own implementation of the `Metrics`
interface (e.g. via a `WithMetrics` option) to route these data points
into their existing telemetry setups.

Ref: containerd#272
Signed-off-by: Chris Henzie <chrishenzie@gmail.com>
@chrishenzie chrishenzie force-pushed the feat/prometheus-metrics branch from 7f874b3 to b31cb47 Compare March 6, 2026 20:43
@chrishenzie
Copy link
Contributor Author

chrishenzie commented Mar 6, 2026

Updated to also collect latency metrics, and refactored test suite to collect metrics via the dummyPlugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants