fix: pre-register auto top-up Prometheus counters#812
Closed
Conversation
The metrics crate lazily registers counters — they only appear in the Prometheus /metrics endpoint after the first .increment() call. Two of the four auto top-up counters (credit_failures and errors) have never been triggered in production, so they don't exist in Prometheus at all. This breaks Grafana alerts that reference these metrics (they evaluate to "no data" instead of 0) and causes NaN in dashboard panels that divide by a sum including missing metrics. Fix by obtaining Counter handles at the top of process_auto_topups(), which registers them with the Prometheus exporter as 0 on the first poll cycle, well before any error path needs to fire.
Deploying control-layer with
|
| Latest commit: |
ebb0a98
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://75300c5e.control-layer.pages.dev |
| Branch Preview URL: | https://fix-pre-register-auto-topup.control-layer.pages.dev |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR ensures auto top-up Prometheus counters are visible immediately (as 0) by pre-registering the dwctl_auto_topup_* counter series when process_auto_topups() starts, preventing “no data” alert/panel behavior in Grafana/Prometheus when certain error paths haven’t occurred yet.
Changes:
- Pre-register
dwctl_auto_topup_success_total,dwctl_auto_topup_charge_failures_total,dwctl_auto_topup_credit_failures_total, anddwctl_auto_topup_errors_total{stage=...}counter series at the start ofprocess_auto_topups(). - Ensures the two previously never-triggered error counters are exported from the first poll cycle onward.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
dwctl_auto_topup_*counters at the top ofprocess_auto_topups()so they appear in Prometheus as0from the first poll cyclecredit_failures_total,errors_total) that were invisible to Prometheus because their error paths have never been triggered in productionProblem
The
metricscrate lazily registers counters — they only exist in the/metricsendpoint after the first.increment()call. Sincedwctl_auto_topup_credit_failures_total(P1: user charged but credits not recorded) anddwctl_auto_topup_errors_total(idempotency/payment method lookup failures) have never fired in production, they don't exist in Prometheus at all.This causes two issues:
Fix
Obtain
Counterhandles (without incrementing) at the top ofprocess_auto_topups(). This registers them with the Prometheus exporter as 0 on the first poll cycle. The counters then exist continuously and will correctly increment when error paths are hit.Test plan
cargo checkpasses cleanly (no warnings)dwctl_auto_topup_*metrics appear in Prometheus with value 0