feat(metrics): add NodesByState, ReconciliationLatency, and BootstrapDuration metrics#238
Conversation
…Duration metrics Add three Prometheus metrics missing from the controller, required for the SLO dashboard proposed in kubernetes-sigs#182: - node_readiness_nodes_by_state{rule, state}: gauge tracking per-rule node counts by readiness state (ready/not_ready) - node_readiness_reconciliation_latency_seconds{rule, operation}: histogram tracking end-to-end latency of taint add/remove operations per rule - node_readiness_bootstrap_duration_seconds{rule}: histogram tracking time elapsed from bootstrap start to completion per rule Signed-off-by: Shreya Bhakat <bhakatmistu2005@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Shreya2005-2005 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
👷 Deploy Preview for node-readiness-controller processing.
|
✅ Deploy Preview for node-readiness-controller canceled.
|
|
Hi @Shreya2005-2005. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@ajaysundark whenever you get a chance review this |
Description
Add three Prometheus metrics missing from the controller, required for the SLO dashboard proposed in #182:
node_readiness_nodes_by_state{rule, state}: gauge tracking per-rule node counts by readiness state (ready/not_ready)node_readiness_reconciliation_latency_seconds{rule, operation}: histogram tracking end-to-end latency of taint add/remove operations per rulenode_readiness_bootstrap_duration_seconds{rule}: histogram tracking time elapsed from bootstrap start to completion per ruleRelated Issue
Relates to #182
Type of Change
/kind feature
Testing
go build ./...passesgo test ./internal/...passes (45 controller specs + webhook specs)node_controller.goandnodereadinessrule_controller.goDoes this PR introduce a user-facing change?