Skip to content

feat(metrics): add NodesByState, ReconciliationLatency, and BootstrapDuration metrics#238

Open
Shreya2005-2005 wants to merge 1 commit into
kubernetes-sigs:mainfrom
Shreya2005-2005:feat/add-missing-metrics
Open

feat(metrics): add NodesByState, ReconciliationLatency, and BootstrapDuration metrics#238
Shreya2005-2005 wants to merge 1 commit into
kubernetes-sigs:mainfrom
Shreya2005-2005:feat/add-missing-metrics

Conversation

@Shreya2005-2005
Copy link
Copy Markdown
Contributor

@Shreya2005-2005 Shreya2005-2005 commented May 10, 2026

Description

Add three Prometheus metrics missing from the controller, required for the SLO dashboard proposed in #182:

  • node_readiness_nodes_by_state{rule, state}: gauge tracking per-rule node counts by readiness state (ready/not_ready)
  • node_readiness_reconciliation_latency_seconds{rule, operation}: histogram tracking end-to-end latency of taint add/remove operations per rule
  • node_readiness_bootstrap_duration_seconds{rule}: histogram tracking time elapsed from bootstrap start to completion per rule

Related Issue

Relates to #182

Type of Change

/kind feature

Testing

  • go build ./... passes
  • go test ./internal/... passes (45 controller specs + webhook specs)
  • Verified all three metrics registered and wired at correct call sites in node_controller.go and nodereadinessrule_controller.go

Does this PR introduce a user-facing change?

Add three new Prometheus metrics: node_readiness_nodes_by_state,
node_readiness_reconciliation_latency_seconds, and
node_readiness_bootstrap_duration_seconds to support SLO dashboards
for the node-readiness-controller.

…Duration metrics

Add three Prometheus metrics missing from the controller, required
for the SLO dashboard proposed in kubernetes-sigs#182:

- node_readiness_nodes_by_state{rule, state}: gauge tracking per-rule
  node counts by readiness state (ready/not_ready)
- node_readiness_reconciliation_latency_seconds{rule, operation}: histogram
  tracking end-to-end latency of taint add/remove operations per rule
- node_readiness_bootstrap_duration_seconds{rule}: histogram tracking
  time elapsed from bootstrap start to completion per rule

Signed-off-by: Shreya Bhakat <bhakatmistu2005@gmail.com>
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Shreya2005-2005
Once this PR has been reviewed and has the lgtm label, please assign sergeykanzhelev for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@netlify
Copy link
Copy Markdown

netlify Bot commented May 10, 2026

👷 Deploy Preview for node-readiness-controller processing.

Name Link
🔨 Latest commit 03b3994
🔍 Latest deploy log https://app.netlify.com/projects/node-readiness-controller/deploys/6a00e31756864700081bd0e5

@netlify
Copy link
Copy Markdown

netlify Bot commented May 10, 2026

Deploy Preview for node-readiness-controller canceled.

Name Link
🔨 Latest commit 03b3994
🔍 Latest deploy log https://app.netlify.com/projects/node-readiness-controller/deploys/6a00e31756864700081bd0e5

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 10, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @Shreya2005-2005. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 10, 2026
@Shreya2005-2005
Copy link
Copy Markdown
Contributor Author

@ajaysundark whenever you get a chance review this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants