feat(metrics): add Prometheus instrumentation and expose /metrics end…#124
feat(metrics): add Prometheus instrumentation and expose /metrics end…#124dharapandya85 wants to merge 4 commits intoAnkanMisra:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 📝 WalkthroughWalkthroughAdds Prometheus-based metrics to Gateway and Verifier: metric declarations, a Gin middleware and /metrics endpoint in the gateway, cache hit/miss and verification instrumentation in Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Gateway as Gateway (Go)
participant Verifier as Verifier (Rust)
participant Prom as Prometheus
Client->>Gateway: HTTP request
activate Gateway
Gateway->>Gateway: MetricsMiddleware: Inc activeRequests, start timer
alt Cache HIT
Gateway->>Gateway: cacheHits(path)++
Gateway->>Verifier: forward verification request
activate Verifier
Verifier->>Verifier: verify_signature (measure duration)
Verifier->>Prom: record_verification(duration, valid/error)
Verifier-->>Gateway: verification result
deactivate Verifier
Gateway->>Prom: verificationTotal(success|error)++
else Cache MISS
Gateway->>Prom: cacheMisses(path)++
end
Gateway->>Gateway: stop timer, compute status
Gateway->>Prom: requestsTotal(method,path,status)++, requestsDuration.observe(...)
Gateway->>Gateway: Dec activeRequests
Gateway-->>Client: HTTP response
deactivate Gateway
Note over Prom: Prometheus scrapes /metrics from Gateway and Verifier
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| r := gin.Default() | ||
|
|
||
| r.Use(CorrelationIDMiddleware()) | ||
|
|
||
| if os.Getenv("METRICS_ENABLED") != "false" { | ||
| r.Use(MetricsMiddleware()) | ||
|
|
||
| path := os.Getenv("METRICS_PATH") | ||
| if path == "" { | ||
| path = "/metrics" | ||
| } | ||
| r.GET(path, gin.WrapH(promhttp.Handler())) | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Duplicate CorrelationIDMiddleware registration
CorrelationIDMiddleware() is registered twice — once at line 208 (newly added) and again at line 223 (the original). Every request will run through the middleware twice, which means a new UUID is potentially generated and then overwritten on the second pass. The duplicate at line 208 should be removed, as line 223 is the intended registration point (as indicated by the existing comment "VIBE FIX: Register the Correlation ID Middleware immediately").
| r := gin.Default() | |
| r.Use(CorrelationIDMiddleware()) | |
| if os.Getenv("METRICS_ENABLED") != "false" { | |
| r.Use(MetricsMiddleware()) | |
| path := os.Getenv("METRICS_PATH") | |
| if path == "" { | |
| path = "/metrics" | |
| } | |
| r.GET(path, gin.WrapH(promhttp.Handler())) | |
| } | |
| r := gin.Default() | |
| if os.Getenv("METRICS_ENABLED") != "false" { | |
| r.Use(MetricsMiddleware()) | |
| path := os.Getenv("METRICS_PATH") | |
| if path == "" { | |
| path = "/metrics" | |
| } | |
| r.GET(path, gin.WrapH(promhttp.Handler())) | |
| } | |
Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/main.go
Line: 206:220
Comment:
**Duplicate `CorrelationIDMiddleware` registration**
`CorrelationIDMiddleware()` is registered twice — once at line 208 (newly added) and again at line 223 (the original). Every request will run through the middleware twice, which means a new UUID is potentially generated and then overwritten on the second pass. The duplicate at line 208 should be removed, as line 223 is the intended registration point (as indicated by the existing comment "VIBE FIX: Register the Correlation ID Middleware immediately").
```suggestion
r := gin.Default()
if os.Getenv("METRICS_ENABLED") != "false" {
r.Use(MetricsMiddleware())
path := os.Getenv("METRICS_PATH")
if path == "" {
path = "/metrics"
}
r.GET(path, gin.WrapH(promhttp.Handler()))
}
```
How can I resolve this? If you propose a fix, please make it concise.| routePath := c.FullPath() | ||
| if routePath == "" { | ||
| routePath = "unknown" | ||
| } | ||
| cacheHits.WithLabelValues(routePath).Inc() |
There was a problem hiding this comment.
Cache miss incorrectly increments cacheHits
On a cache MISS, this code increments cacheHits instead of cacheMisses. The metric cacheMisses is declared in metrics.go but is never used anywhere in the codebase.
| routePath := c.FullPath() | |
| if routePath == "" { | |
| routePath = "unknown" | |
| } | |
| cacheHits.WithLabelValues(routePath).Inc() | |
| routePath := c.FullPath() | |
| if routePath == "" { | |
| routePath = "unknown" | |
| } | |
| cacheMisses.WithLabelValues(routePath).Inc() |
Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/cache.go
Line: 181:185
Comment:
**Cache miss incorrectly increments `cacheHits`**
On a cache MISS, this code increments `cacheHits` instead of `cacheMisses`. The metric `cacheMisses` is declared in `metrics.go` but is never used anywhere in the codebase.
```suggestion
routePath := c.FullPath()
if routePath == "" {
routePath = "unknown"
}
cacheMisses.WithLabelValues(routePath).Inc()
```
How can I resolve this? If you propose a fix, please make it concise.| use metrics::{counter, histogram}; | ||
|
|
||
| pub fn record_verification(valid: bool, duration: f64, reason: Option<&str>){ | ||
| counter!("verifier_requests_total", 1); | ||
| histogram!("verifier_request_duration_seconds", duration); | ||
|
|
||
| if valid { | ||
| counter!("verifier_signature_valid_total", 1); | ||
| } else { | ||
| let label = reason.unwrap_or("unknown").to_string(); | ||
| counter!( | ||
| "verifier_signature_invalid_total", | ||
| 1, | ||
| "reason" => label | ||
| ); | ||
| } | ||
| } No newline at end of file |
There was a problem hiding this comment.
Incorrect metrics 0.21 macro API — will not compile
The metrics crate v0.21 changed its macro API. The old syntax counter!("name", 1) and histogram!("name", value) from v0.17–0.20 is no longer valid. In v0.21, the macros return handle objects, and you call .increment() or .record() on them.
This code will fail to compile. The correct v0.21 API is:
| use metrics::{counter, histogram}; | |
| pub fn record_verification(valid: bool, duration: f64, reason: Option<&str>){ | |
| counter!("verifier_requests_total", 1); | |
| histogram!("verifier_request_duration_seconds", duration); | |
| if valid { | |
| counter!("verifier_signature_valid_total", 1); | |
| } else { | |
| let label = reason.unwrap_or("unknown").to_string(); | |
| counter!( | |
| "verifier_signature_invalid_total", | |
| 1, | |
| "reason" => label | |
| ); | |
| } | |
| } | |
| use metrics::{counter, histogram}; | |
| pub fn record_verification(valid: bool, duration: f64, reason: Option<&str>){ | |
| counter!("verifier_requests_total").increment(1); | |
| histogram!("verifier_request_duration_seconds").record(duration); | |
| if valid { | |
| counter!("verifier_signature_valid_total").increment(1); | |
| } else { | |
| let label = reason.unwrap_or("unknown").to_string(); | |
| counter!( | |
| "verifier_signature_invalid_total", | |
| "reason" => label | |
| ).increment(1); | |
| } | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: verifier/src/metrics.rs
Line: 1:17
Comment:
**Incorrect `metrics` 0.21 macro API — will not compile**
The `metrics` crate v0.21 changed its macro API. The old syntax `counter!("name", 1)` and `histogram!("name", value)` from v0.17–0.20 is no longer valid. In v0.21, the macros return handle objects, and you call `.increment()` or `.record()` on them.
This code will fail to compile. The correct v0.21 API is:
```suggestion
use metrics::{counter, histogram};
pub fn record_verification(valid: bool, duration: f64, reason: Option<&str>){
counter!("verifier_requests_total").increment(1);
histogram!("verifier_request_duration_seconds").record(duration);
if valid {
counter!("verifier_signature_valid_total").increment(1);
} else {
let label = reason.unwrap_or("unknown").to_string();
counter!(
"verifier_signature_invalid_total",
"reason" => label
).increment(1);
}
}
```
How can I resolve this? If you propose a fix, please make it concise.| // r := gin.New() | ||
|
|
||
| // r.use(MetricsMiddleware()) | ||
|
|
There was a problem hiding this comment.
Leftover commented-out code
These commented-out lines appear to be debug/development remnants and should be removed before merging.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/main.go
Line: 1191:1194
Comment:
**Leftover commented-out code**
These commented-out lines appear to be debug/development remnants and should be removed before merging.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| rateLimitHits = promauto.NewCounterVec( | ||
| prometheus.CounterOpts{ | ||
| Name: "gateway_rate_limit_hits_total", | ||
| Help: "Rate limit rejections", | ||
| }, | ||
| []string{"path"}, | ||
| ) | ||
| verificationTotal = promauto.NewCounterVec( | ||
| prometheus.CounterOpts{ | ||
| Name: "gateway_verification_total", | ||
| Help: "Signature verification results", | ||
| }, | ||
| []string{"result"}, | ||
| ) |
There was a problem hiding this comment.
Declared metrics are never used
rateLimitHits and verificationTotal are declared but never referenced anywhere in the changed or existing code. If these are intended for future use, they should be added when the corresponding instrumentation is implemented. Otherwise, they add unused metric registrations to the Prometheus output.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/metrics.go
Line: 38:51
Comment:
**Declared metrics are never used**
`rateLimitHits` and `verificationTotal` are declared but never referenced anywhere in the changed or existing code. If these are intended for future use, they should be added when the corresponding instrumentation is implemented. Otherwise, they add unused metric registrations to the Prometheus output.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
Additional Comments (1)
The Prometheus setup and the Prompt To Fix With AIThis is a comment left during a code review.
Path: verifier/src/main.rs
Line: 55:79
Comment:
**Dead code: Prometheus setup is unreachable**
`axum::serve(listener, app).await.unwrap()` on line 65 blocks indefinitely while serving requests. All code after it (lines 67-78) — including the Prometheus recorder installation, the `/metrics` route definition, and the new `Router` — will **never execute**. This means:
1. The metrics recorder is never installed, so calls to `metrics::record_verification()` will silently no-op.
2. The `/metrics` endpoint is never served.
3. A second `Router` is constructed (shadowing `app`) but never bound to a listener.
The Prometheus setup and the `/metrics` route need to be moved **before** `axum::serve`, and the routes should be added to the original `app` router that is actually served.
```suggestion
let recorder = PrometheusBuilder::new()
.install_recorder()
.expect("failed to install recorder");
let metrics_route = {
let handle = recorder.clone();
move || async move { handle.render() }
};
let app = Router::new()
.route("/health", get(health))
.route("/verify", post(verify_signature))
.route("/metrics", get(metrics_route))
.layer(DefaultBodyLimit::max(limit))
.with_state(state);
let addr = SocketAddr::from(([0, 0, 0, 0], 3002));
println!("Rust Verifier listening on {}", addr);
let listener = tokio::net::TcpListener::bind(addr).await.unwrap();
axum::serve(listener, app).await.unwrap();
}
```
How can I resolve this? If you propose a fix, please make it concise. |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (2)
gateway/main.go (1)
1190-1194: Remove dead commented-out code.Lines 1190-1194 contain leftover commented-out scaffolding (
r := gin.New(),r.use(MetricsMiddleware())). These serve no purpose and should be cleaned up.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/main.go` around lines 1190 - 1194, Remove the dead commented-out scaffolding lines containing "r := gin.New()" and "r.use(MetricsMiddleware())" so the file no longer contains unused commented code; locate the commented block referencing gin.New() and MetricsMiddleware() and delete those comment lines to keep the gateway/main.go clean.verifier/src/main.rs (1)
327-352: Unnecessaryres_headers.clone()in both match arms — headers are discarded.The match on line 328 destructures into
(status, headers, Json(response)), but theheadersbinding from each arm is never used — line 351 returns the originalres_headers. The.clone()calls on lines 331 and 340 exist only to avoid a move, but since the cloned values are thrown away, you can simplify by not capturing headers from the match at all.Also, multiple
cargo fmtviolations are flagged by CI on these lines. Please runcargo fmt. As per coding guidelines, format withcargo fmt -- --checkand lint withcargo clippy -- -D warnings.♻️ Simplify the match to avoid unnecessary clone
let start = std::time::Instant::now(); - let (status,headers,Json(response)) = match sig.recover_typed_data(&typed_data) { + let (status, response) = match sig.recover_typed_data(&typed_data) { Ok(addr) => ( StatusCode::OK, - res_headers.clone(), - Json(VerifyResponse { + VerifyResponse { is_valid: true, recovered_address: Some(format!("{:?}", addr)), error: None, - }), + }, ), Err(e) => ( StatusCode::OK, - res_headers.clone(), - Json(VerifyResponse { + VerifyResponse { is_valid: false, recovered_address: None, error: Some(e.to_string()), - }), + }, ), }; - + let duration = start.elapsed().as_secs_f64(); metrics::record_verification(response.is_valid, duration, response.error.as_deref()); - (status, res_headers, Json(response),) - + (status, res_headers, Json(response)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@verifier/src/main.rs` around lines 327 - 352, The match that calls sig.recover_typed_data(&typed_data) currently returns a three-tuple with cloned res_headers which is never used; update the match to only produce (status, Json(response)) (or set status/response inside the match) and return the original res_headers at the end to avoid the needless res_headers.clone() calls and moves; adjust the binding in the match (referencing sig.recover_typed_data, VerifyResponse, and response) so headers are not captured, then run cargo fmt and cargo clippy -- -D warnings to fix formatting and lint violations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@gateway/cache.go`:
- Around line 181-185: In the cache MISS branch where routePath is determined
via c.FullPath() and you currently call
cacheHits.WithLabelValues(routePath).Inc(), replace that call with
cacheMisses.WithLabelValues(routePath).Inc() so misses are recorded correctly;
ensure the variable routePath and the label usage remain unchanged and only swap
the metric being incremented.
In `@gateway/go.mod`:
- Line 14: The go.mod addition of github.com/prometheus/client_golang v1.23.2
introduced missing go.sum entries causing CI failures; run `go mod tidy` to
regenerate and update go.sum so the Prometheus dependency checksums are
recorded, then run `go vet ./...` to lint the codebase as per guidelines before
committing the updated go.sum and go.mod changes.
In `@gateway/main.go`:
- Around line 208-223: Remove the duplicate CorrelationIDMiddleware
registration: there are two calls to r.Use(CorrelationIDMiddleware()) (one
before the METRICS_ENABLED block and one after); delete the earlier call (the
one before MetricsMiddleware/METRICS_PATH handling) so CorrelationIDMiddleware
is only registered once via r.Use(CorrelationIDMiddleware()), leaving
MetricsMiddleware and the metrics route intact.
In `@gateway/metrics.go`:
- Around line 16-23: The histogram metric declared as requestsDuration uses Name
"gateway_requests_duration_seconds" which deviates from the spec; rename the
metric to "gateway_request_duration_seconds" to match Issue `#79`. Locate the
promauto.NewHistogramVec call that creates requestsDuration and update the
HistogramOpts.Name field to the singular form, then run any unit/integration
tests or linting that validate metric names so dashboards/alerts align with the
spec.
- Around line 38-51: The metrics rateLimitHits and verificationTotal are
declared but never used; add increments where rate limiting and signature
verification happen: in your rate-limit middleware or cache rejection path call
rateLimitHits.WithLabelValues(path).Inc() using the request path label when
rejecting a request, and in the signature verification code paths (e.g.,
VerifySignature or the handler that validates signatures) call
verificationTotal.WithLabelValues("success").Inc() on valid signatures and
verificationTotal.WithLabelValues("failure").Inc() on failures; ensure you pass
the correct label values ("path" and "result") and keep the increments adjacent
to the existing rejection/verification logic so metrics reflect actual events.
In `@verifier/Cargo.toml`:
- Around line 13-14: The project pins metrics = "0.21" and
metrics-exporter-prometheus = "0.12" but the code uses the legacy macro API
(counter!("name", value) and histogram!("name", value)) which breaks on metrics
>=0.22; to fix, either (A) migrate all uses of the macros (e.g., counter! and
histogram! invocations) to the new handle-based API by replacing
counter!("name", x) with counter!("name").increment(x) and histogram!("name", d)
with histogram!("name").record(d) throughout the codebase (ensure you import any
new types or features required by the updated metrics crate), or (B) explicitly
document and keep the pinned versions in Cargo.toml to indicate intentional
compatibility with the legacy macro API.
In `@verifier/src/main.rs`:
- Around line 64-78: The Prometheus setup and metrics route are dead because
axum::serve(listener, app).await blocks; move the
PrometheusBuilder::new().install_recorder() call (and capture the returned
recorder) to occur before you construct the Router, then include the metrics
route in the same Router you pass to axum::serve; e.g., install the recorder,
create metrics_route (using recorder.clone()), build app =
Router::new().route("/health", get(health)).route("/verify",
post(verify_signature)).route("/metrics", get(metrics_route)), and only then
call tokio::net::TcpListener::bind(...).await and axum::serve(listener,
app).await.unwrap().
In `@verifier/src/metrics.rs`:
- Around line 1-16: Run rustfmt (cargo fmt) to fix formatting violations in the
verifier module; format the function record_verification and surrounding imports
so the file compiles with CI formatting checks (ensure use metrics::{counter,
histogram}; and the record_verification signature/body conform to rustfmt
style), then re-run cargo fmt -- --check and cargo clippy -- -D warnings to
verify; focus changes around the record_verification function and the
counter!/histogram! invocations.
---
Nitpick comments:
In `@gateway/main.go`:
- Around line 1190-1194: Remove the dead commented-out scaffolding lines
containing "r := gin.New()" and "r.use(MetricsMiddleware())" so the file no
longer contains unused commented code; locate the commented block referencing
gin.New() and MetricsMiddleware() and delete those comment lines to keep the
gateway/main.go clean.
In `@verifier/src/main.rs`:
- Around line 327-352: The match that calls sig.recover_typed_data(&typed_data)
currently returns a three-tuple with cloned res_headers which is never used;
update the match to only produce (status, Json(response)) (or set
status/response inside the match) and return the original res_headers at the end
to avoid the needless res_headers.clone() calls and moves; adjust the binding in
the match (referencing sig.recover_typed_data, VerifyResponse, and response) so
headers are not captured, then run cargo fmt and cargo clippy -- -D warnings to
fix formatting and lint violations.
| github.com/gin-gonic/gin v1.11.0 | ||
| github.com/google/uuid v1.6.0 | ||
| github.com/joho/godotenv v1.5.1 | ||
| github.com/prometheus/client_golang v1.23.2 |
There was a problem hiding this comment.
Run go mod tidy to fix missing go.sum entries.
Pipeline failures show that go.sum is missing entries for the newly added Prometheus packages. Run go mod tidy to synchronize. As per coding guidelines, lint with go vet ./... after fixing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@gateway/go.mod` at line 14, The go.mod addition of
github.com/prometheus/client_golang v1.23.2 introduced missing go.sum entries
causing CI failures; run `go mod tidy` to regenerate and update go.sum so the
Prometheus dependency checksums are recorded, then run `go vet ./...` to lint
the codebase as per guidelines before committing the updated go.sum and go.mod
changes.
| rateLimitHits = promauto.NewCounterVec( | ||
| prometheus.CounterOpts{ | ||
| Name: "gateway_rate_limit_hits_total", | ||
| Help: "Rate limit rejections", | ||
| }, | ||
| []string{"path"}, | ||
| ) | ||
| verificationTotal = promauto.NewCounterVec( | ||
| prometheus.CounterOpts{ | ||
| Name: "gateway_verification_total", | ||
| Help: "Signature verification results", | ||
| }, | ||
| []string{"result"}, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if rateLimitHits and verificationTotal are used anywhere in the gateway codebase
echo "=== rateLimitHits usage ==="
rg -n 'rateLimitHits' --type=go
echo ""
echo "=== verificationTotal usage ==="
rg -n 'verificationTotal' --type=goRepository: AnkanMisra/MicroAI-Paygate
Length of output: 260
Instrument rateLimitHits and verificationTotal metrics throughout the gateway code.
These metrics are declared in gateway/metrics.go but are never incremented anywhere in the codebase. They will always report zero, which could mislead operators relying on these metrics for monitoring. Add instrumentation calls to increment these counters in the appropriate middleware, cache, and handler code paths where rate limit rejections and signature verifications occur.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@gateway/metrics.go` around lines 38 - 51, The metrics rateLimitHits and
verificationTotal are declared but never used; add increments where rate
limiting and signature verification happen: in your rate-limit middleware or
cache rejection path call rateLimitHits.WithLabelValues(path).Inc() using the
request path label when rejecting a request, and in the signature verification
code paths (e.g., VerifySignature or the handler that validates signatures) call
verificationTotal.WithLabelValues("success").Inc() on valid signatures and
verificationTotal.WithLabelValues("failure").Inc() on failures; ensure you pass
the correct label values ("path" and "result") and keep the increments adjacent
to the existing rejection/verification logic so metrics reflect actual events.
| let listener = tokio::net::TcpListener::bind(addr).await.unwrap(); | ||
| axum::serve(listener, app).await.unwrap(); | ||
|
|
||
| let recorder = PrometheusBuilder::new() | ||
| .install_recorder() | ||
| .expect("failed to install recorder"); | ||
|
|
||
| let metrics_route = { | ||
| let handle = recorder.clone(); | ||
| move || async move {handle.render() } | ||
| }; | ||
| let app = Router::new() | ||
| .route("/health", get(health)) | ||
| .route("/verify", post(verify_signature)) | ||
| .route("/metrics", get(metrics_route)); |
There was a problem hiding this comment.
Critical: All Prometheus setup code is unreachable (dead code after blocking .await).
axum::serve(listener, app).await.unwrap() on line 65 blocks indefinitely — it is the server event loop. Lines 67-78 will never execute. This means:
- The Prometheus recorder is never installed, so all
counter!/histogram!calls inmetrics.rssilently no-op. - The
/metricsroute is never registered, so scraping will 404.
The recorder must be installed before building the Router, and the /metrics route must be part of the original app.
🐛 Proposed fix — move Prometheus setup before serve
#[tokio::main]
async fn main() {
let limit = get_max_body_size();
let state = AppState {
max_body_size: limit,
};
+
+ let recorder = PrometheusBuilder::new()
+ .install_recorder()
+ .expect("failed to install recorder");
+
+ let metrics_handle = recorder.clone();
+ let metrics_route = move || async move { metrics_handle.render() };
+
let app = Router::new()
.route("/health", get(health))
.route("/verify", post(verify_signature))
+ .route("/metrics", get(metrics_route))
.layer(DefaultBodyLimit::max(limit))
.with_state(state);
let addr = SocketAddr::from(([0, 0, 0, 0], 3002));
println!("Rust Verifier listening on {}", addr);
let listener = tokio::net::TcpListener::bind(addr).await.unwrap();
axum::serve(listener, app).await.unwrap();
-
- let recorder = PrometheusBuilder::new()
- .install_recorder()
- .expect("failed to install recorder");
-
- let metrics_route = {
- let handle = recorder.clone();
- move || async move {handle.render() }
- };
- let app = Router::new()
- .route("/health", get(health))
- .route("/verify", post(verify_signature))
- .route("/metrics", get(metrics_route));
}🧰 Tools
🪛 GitHub Actions: Rust Lint
[error] 70-70: cargo fmt --check detected formatting changes: difference in closure body (move || async move {handle.render()}). Please run 'cargo fmt' to format.
[error] 78-78: cargo fmt --check detected formatting changes near route definitions. Please run 'cargo fmt' to format.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@verifier/src/main.rs` around lines 64 - 78, The Prometheus setup and metrics
route are dead because axum::serve(listener, app).await blocks; move the
PrometheusBuilder::new().install_recorder() call (and capture the returned
recorder) to occur before you construct the Router, then include the metrics
route in the same Router you pass to axum::serve; e.g., install the recorder,
create metrics_route (using recorder.clone()), build app =
Router::new().route("/health", get(health)).route("/verify",
post(verify_signature)).route("/metrics", get(metrics_route)), and only then
call tokio::net::TcpListener::bind(...).await and axum::serve(listener,
app).await.unwrap().
|
@dharapandya85 fix the go issue and rust lint problems |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
verifier/src/main.rs (1)
327-351:⚠️ Potential issue | 🟠 MajorUnused
headersvariable causes a pipeline failure at line 328.Both match arms clone
res_headersintoheaders, but line 351 returnsres_headersrather thanheaders. Clippy reports "unused variable:headers." Returnheadersdirectly instead to silence the warning and use the value that was destructured.🐛 Proposed fix
- (status, res_headers, Json(response)) + (status, headers, Json(response))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@verifier/src/main.rs` around lines 327 - 351, The match destructures into a local variable named headers but the function still returns res_headers, leaving headers unused and triggering Clippy; update the final return to use the destructured headers variable instead of res_headers so the tuple (status, headers, Json(response)) that you built from sig.recover_typed_data(...) is returned and the unused-variable warning is resolved (no other logic changes needed; keep metrics::record_verification(...) and VerifyResponse construction as-is).
🧹 Nitpick comments (1)
gateway/cache.go (1)
116-119: DuplicateroutePathextraction — minor refactor opportunity.The same 4-line pattern appears twice:
routePath := c.FullPath() if routePath == "" { routePath = "unknown" }♻️ Optional helper extraction
+func routePathOrUnknown(c *gin.Context) string { + if p := c.FullPath(); p != "" { + return p + } + return "unknown" +}Then replace both occurrences with a single call:
- routePath := c.FullPath() - if routePath == "" { - routePath = "unknown" - } - cacheHits.WithLabelValues(routePath).Inc() + cacheHits.WithLabelValues(routePathOrUnknown(c)).Inc()- routePath := c.FullPath() - if routePath == "" { - routePath = "unknown" - } - cacheMisses.WithLabelValues(routePath).Inc() + cacheMisses.WithLabelValues(routePathOrUnknown(c)).Inc()Also applies to: 183-186
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/cache.go` around lines 116 - 119, Extract the repeated routePath extraction into a small helper (e.g., routePathFromContext or getRoutePath) that accepts the gin Context (or the same type as c) and returns c.FullPath() or "unknown" when empty, then replace both occurrences where routePath is computed with a single call to that helper; update uses in the surrounding functions/methods (the places containing the original routePath := c.FullPath() check) to call the new helper to eliminate duplication.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@verifier/src/main.rs`:
- Line 12: The metrics_route block currently contains only a let binding so it
evaluates to () and the Prometheus handle is dropped; change the block so the
closure submitted to get(metrics_route) captures and returns the Prometheus
rendering closure (or simply return the closure as the block's trailing
expression) so that the PrometheusHandle produced by install_recorder() is kept
alive and used; ensure PrometheusHandle (from install_recorder()) is referenced
(e.g., stored in a variable captured by the closure) so the
unused-import/unused-variable errors go away and get(metrics_route) registers
the real Prometheus renderer rather than unit.
---
Outside diff comments:
In `@verifier/src/main.rs`:
- Around line 327-351: The match destructures into a local variable named
headers but the function still returns res_headers, leaving headers unused and
triggering Clippy; update the final return to use the destructured headers
variable instead of res_headers so the tuple (status, headers, Json(response))
that you built from sig.recover_typed_data(...) is returned and the
unused-variable warning is resolved (no other logic changes needed; keep
metrics::record_verification(...) and VerifyResponse construction as-is).
---
Duplicate comments:
In `@gateway/metrics.go`:
- Around line 38-43: The rateLimitHits Prometheus counter (rateLimitHits) is
never incremented and will always report zero; locate the rate-limiting
rejection path (e.g., the rate limiter middleware or the handler that returns
429, likely in gateway/middleware.go or the code that enforces the limiter) and
add a call to increment the metric there using the request path label (e.g.,
rateLimitHits.WithLabelValues(requestPath).Inc()); ensure you use the actual
request path (r.URL.Path or equivalent) when calling WithLabelValues so the
metric is correctly labeled and incremented on each rate-limit rejection.
In `@verifier/src/main.rs`:
- Around line 74-79: The second Router::new() block is dead/unreachable because
axum::serve(listener, app).await.unwrap() never returns; remove the duplicate
Router creation (the let app = Router::new() ... that appears after axum::serve)
and instead ensure the Router (created via Router::new().route("/health",
get(health)).route("/verify", post(verify_signature)).route("/metrics",
get(metrics_route))) is constructed once and passed into axum::serve(listener,
app).await.unwrap(); update references so the app variable used in axum::serve
is the single Router instance.
---
Nitpick comments:
In `@gateway/cache.go`:
- Around line 116-119: Extract the repeated routePath extraction into a small
helper (e.g., routePathFromContext or getRoutePath) that accepts the gin Context
(or the same type as c) and returns c.FullPath() or "unknown" when empty, then
replace both occurrences where routePath is computed with a single call to that
helper; update uses in the surrounding functions/methods (the places containing
the original routePath := c.FullPath() check) to call the new helper to
eliminate duplication.
| use ethers::types::transaction::eip712::TypedData; | ||
| use ethers::types::Signature; | ||
| mod metrics; | ||
| use metrics_exporter_prometheus::{PrometheusBuilder, PrometheusHandle}; |
There was a problem hiding this comment.
metrics_route evaluates to () — /metrics endpoint is broken; pipeline failures at lines 12, 60, 61.
The wrapping block at lines 60–62 assigns () to metrics_route rather than the closure, because the block contains only a let statement (no trailing expression). The closure is assigned to the local handle and immediately dropped. get(metrics_route) at line 66 therefore registers a unit handler, not the Prometheus renderer. Three pipeline errors originate from this:
- Line 12:
PrometheusHandleis imported but never explicitly named (install_recorder()return type is inferred). - Line 60: "let unit value — the let binding evaluates to
()." - Line 61: "unused variable:
handle."
🐛 Proposed fix
-use metrics_exporter_prometheus::{PrometheusBuilder, PrometheusHandle};
+use metrics_exporter_prometheus::PrometheusBuilder;- let metrics_handle = recorder.clone();
- let metrics_route = {
- let handle = move || async move { metrics_handle.render() };
- };
+ let metrics_route = move || async move { recorder.render() };Also applies to: 59-66
🧰 Tools
🪛 GitHub Actions: Rust Lint
[error] 12-12: unused import: PrometheusHandle. The import is unused and will cause a clippy error due to -D warnings.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@verifier/src/main.rs` at line 12, The metrics_route block currently contains
only a let binding so it evaluates to () and the Prometheus handle is dropped;
change the block so the closure submitted to get(metrics_route) captures and
returns the Prometheus rendering closure (or simply return the closure as the
block's trailing expression) so that the PrometheusHandle produced by
install_recorder() is kept alive and used; ensure PrometheusHandle (from
install_recorder()) is referenced (e.g., stored in a variable captured by the
closure) so the unused-import/unused-variable errors go away and
get(metrics_route) registers the real Prometheus renderer rather than unit.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
verifier/src/main.rs (2)
208-263: Early-exit paths (JSON rejection, timestamp, parse errors) bypassrecord_verification—verifier_requests_totalundercounts real traffic.All four early-return branches (lines 210–236 for JSON/size rejection, lines 242–263 for timestamp errors, and the
typed_data/sigparse branches around lines 291–319) return without callingmetrics::record_verification. Theverifier_requests_totalcounter therefore only reflects requests that reach the crypto path, not actual service load. This diverges from the issue requirement thatverifier_requests_totalcarries astatuslabel covering all outcomes.Consider extracting the common recording logic or adding lightweight calls at each early exit so the counter accurately represents all inbound requests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@verifier/src/main.rs` around lines 208 - 263, Early-return branches (JSON rejection in the payload match, timestamp validation block, and the typed_data/sig parse failures) are returning before calling the metrics recorder, causing verifier_requests_total undercounting; fix by invoking the same metrics::record_verification (or a small helper that wraps it) with the appropriate status label just before each early return (e.g., inside the Err(JsonRejection::BytesRejection(_)) arm, the generic Err(e) arm, the timestamp error branch that builds msg, and the typed_data/sig parse failure branches) so every exit path increments verifier_requests_total with the correct status; extract into a small helper function if helpful to avoid duplication.
321-345:res_headers.clone()in both match arms is immediately discarded — simplify the destructuring.The two
res_headers.clone()calls (lines 325 and 334) exist only to satisfy the tuple type in the match arms, but the second element is thrown away bylet (status, _, Json(response)). The clones are wasted allocations on every verified request.♻️ Proposed simplification
let start = std::time::Instant::now(); - let (status, _, Json(response)) = match sig.recover_typed_data(&typed_data) { + let (status, Json(response)) = match sig.recover_typed_data(&typed_data) { Ok(addr) => ( StatusCode::OK, - res_headers.clone(), Json(VerifyResponse { is_valid: true, recovered_address: Some(format!("{:?}", addr)), error: None, }), ), Err(e) => ( StatusCode::OK, - res_headers.clone(), Json(VerifyResponse { is_valid: false, recovered_address: None, error: Some(e.to_string()), }), ), };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@verifier/src/main.rs` around lines 321 - 345, The match on sig.recover_typed_data currently returns tuples that include res_headers.clone() twice even though the second element is discarded by let (status, _, Json(response)); change the match arms to return only (StatusCode, Json(VerifyResponse)) (i.e. remove res_headers.clone() from both arms), then after metrics::record_verification(...) perform a single res_headers.clone() when constructing the final return tuple (status, res_headers.clone(), Json(response)); update types accordingly so the match yields (status, Json(response)) and the final line returns the full (status, res_headers.clone(), Json(response)) to eliminate the wasted duplicate clones.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@verifier/src/main.rs`:
- Around line 72-73: Remove the trailing blank line inside the main() function
after the axum::serve(listener, app).await.unwrap(); statement so the closing
brace immediately follows that line; update the function containing
axum::serve(...) (main) to have no extra blank line before its final `}` to
satisfy rustfmt (`cargo fmt -- --check`).
- Around line 55-57: The Prometheus recorder installed with
PrometheusBuilder::install_recorder() needs a background upkeep loop to call
PrometheusHandle::run_upkeep() periodically to avoid unbounded histogram growth;
change how you use install_recorder() so you keep the returned PrometheusHandle
(instead of dropping it) and spawn a background task (tokio::spawn or
std::thread::spawn depending on your runtime) that loops with a sleep interval
(e.g., 10s) and calls handle.run_upkeep(). Ensure the spawned task is detached
and the handle is moved into the task so it remains alive for the lifetime of
the process.
---
Duplicate comments:
In `@verifier/src/main.rs`:
- Around line 55-66: The code correctly installs the Prometheus recorder before
building the Router and defines metrics_route as a move || async move closure
that calls recorder.render(); no changes required—ensure
PrometheusBuilder::new().install_recorder() remains before Router construction,
that metrics_route (the closure referencing recorder.render()) is used in
Router::route("/metrics", get(metrics_route)), and that any unused
PrometheusHandle import has been removed.
---
Nitpick comments:
In `@verifier/src/main.rs`:
- Around line 208-263: Early-return branches (JSON rejection in the payload
match, timestamp validation block, and the typed_data/sig parse failures) are
returning before calling the metrics recorder, causing verifier_requests_total
undercounting; fix by invoking the same metrics::record_verification (or a small
helper that wraps it) with the appropriate status label just before each early
return (e.g., inside the Err(JsonRejection::BytesRejection(_)) arm, the generic
Err(e) arm, the timestamp error branch that builds msg, and the typed_data/sig
parse failure branches) so every exit path increments verifier_requests_total
with the correct status; extract into a small helper function if helpful to
avoid duplication.
- Around line 321-345: The match on sig.recover_typed_data currently returns
tuples that include res_headers.clone() twice even though the second element is
discarded by let (status, _, Json(response)); change the match arms to return
only (StatusCode, Json(VerifyResponse)) (i.e. remove res_headers.clone() from
both arms), then after metrics::record_verification(...) perform a single
res_headers.clone() when constructing the final return tuple (status,
res_headers.clone(), Json(response)); update types accordingly so the match
yields (status, Json(response)) and the final line returns the full (status,
res_headers.clone(), Json(response)) to eliminate the wasted duplicate clones.
| let recorder = PrometheusBuilder::new() | ||
| .install_recorder() | ||
| .expect("failed to install recorder"); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the code at lines 55-57 in verifier/src/main.rs
sed -n '50,70p' verifier/src/main.rsRepository: AnkanMisra/MicroAI-Paygate
Length of output: 722
🏁 Script executed:
# Check what version of metrics-exporter-prometheus is used
cat verifier/Cargo.toml | grep -A 5 -B 5 "metrics-exporter-prometheus"Repository: AnkanMisra/MicroAI-Paygate
Length of output: 384
🌐 Web query:
metrics-exporter-prometheus PrometheusBuilder install_recorder upkeep task
💡 Result:
In metrics-exporter-prometheus, PrometheusBuilder::install_recorder() only builds + installs the recorder (and returns a PrometheusHandle you can render into a Prometheus scrape payload yourself), but it does not spawn the background “upkeep task” for you—you must run upkeep periodically. [1][2]
What the upkeep task does: periodic maintenance like draining histogram data so values stay up-to-date and to avoid unbounded memory growth. [1][2]
Key implications:
builder.install()/builder.build()⇒ exporter is set up and an upkeep task is spawned automatically. [2]builder.install_recorder()/builder.build_recorder()⇒ no upkeep task, so you must callPrometheusHandle::run_upkeep()on an interval. [2]- You can configure the upkeep interval with
PrometheusBuilder::upkeep_timeout(Duration). [1]
Minimal pattern (pseudo-structure):
let handle = PrometheusBuilder::new()
.install_recorder()
.unwrap();
// your scheduler/timer:
loop {
handle.run_upkeep(); // run periodically
sleep(upkeep_interval).await;
}Sources: [1] PrometheusBuilder docs (docs.rs), [2] crate docs (lib.rs “Upkeep and maintenance”).
🏁 Script executed:
# Search for other uses of PrometheusBuilder to see if there's a pattern
rg "PrometheusBuilder" verifier/Repository: AnkanMisra/MicroAI-Paygate
Length of output: 207
🏁 Script executed:
# Check if there are imports or other relevant code in main.rs
head -30 verifier/src/main.rsRepository: AnkanMisra/MicroAI-Paygate
Length of output: 887
🌐 Web query:
metrics-exporter-prometheus PrometheusHandle Clone trait 0.18
💡 Result:
In metrics-exporter-prometheus v0.18.0, PrometheusHandle does implement Clone (impl Clone for PrometheusHandle). You can just do:
use metrics_exporter_prometheus::PrometheusBuilder;
let handle = PrometheusBuilder::new()
.install_recorder()
.expect("install recorder");
let handle2 = handle.clone();Docs: PrometheusHandle shows Trait Implementations: Clone, Debug, and the impl Clone for PrometheusHandle block is present in the 0.18.0 rustdoc. [1]
If you’re seeing “the trait bound PrometheusHandle: Clone is not satisfied”, double-check you’re cloning the metrics_exporter_prometheus::PrometheusHandle type (not a different PrometheusHandle from another crate/module brought into scope by an import). [1]
Sources
metrics-exporter-prometheus0.18.0 rustdoc forPrometheusHandle(showsimpl Clone) https://docs.rs/metrics-exporter-prometheus/0.18.0/metrics_exporter_prometheus/struct.PrometheusHandle.html [1]
install_recorder() does not spawn the upkeep task — histogram buckets will grow unbounded.
When using PrometheusBuilder::install_recorder(), the background upkeep task that drains histogram data structures is not spawned automatically. Without periodic calls to PrometheusHandle::run_upkeep(), memory will grow indefinitely over time.
🔧 Proposed fix — spawn a background upkeep task
let recorder = PrometheusBuilder::new()
.install_recorder()
.expect("failed to install recorder");
+ // Upkeep is not automatic when using install_recorder(); drive it manually.
+ let upkeep_handle = recorder.clone();
+ tokio::spawn(async move {
+ loop {
+ tokio::time::sleep(std::time::Duration::from_secs(5)).await;
+ upkeep_handle.run_upkeep();
+ }
+ });
+
let metrics_route = move || async move { recorder.render() };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@verifier/src/main.rs` around lines 55 - 57, The Prometheus recorder installed
with PrometheusBuilder::install_recorder() needs a background upkeep loop to
call PrometheusHandle::run_upkeep() periodically to avoid unbounded histogram
growth; change how you use install_recorder() so you keep the returned
PrometheusHandle (instead of dropping it) and spawn a background task
(tokio::spawn or std::thread::spawn depending on your runtime) that loops with a
sleep interval (e.g., 10s) and calls handle.run_upkeep(). Ensure the spawned
task is detached and the handle is moved into the task so it remains alive for
the lifetime of the process.
|
@greptile review the pr and explain about the low confidence score along with snippets, be brief |
|
Looking at this PR, the low confidence score (1/5) is justified by several critical bugs: 1. Verifier Metrics Are Dead CodeIn axum::serve(listener, app)
.await
.context("Failed to start Axum server")?;
// ⚠️ This code NEVER runs - unreachable dead code
let builder = PrometheusBuilder::new();
let handle = builder.install_recorder().expect("failed to install recorder");
let metrics_app = Router::new()
.route("/metrics", get(|| async move { handle.render() }));Impact: The 2. Wrong Metrics API SyntaxIn counter!("verification_total", 1, "outcome" => outcome.as_str());
counter!("verification_failures", 1, "error" => error_type);v0.21 requires: counter!("verification_total", "outcome" => outcome.as_str()).increment(1);
counter!("verification_failures", "error" => error_type).increment(1);Impact: This code won't compile with metrics v0.21. 3. Wrong Counter IncrementedIn // Cache miss
log.Printf("Cache miss for key: %s", cacheKey)
metrics.RecordCacheHit(false)
cacheHits.Inc() // ❌ Should be cacheMisses.Inc()Impact: Both cache hit/miss metrics will be incorrect. 4. Duplicate MiddlewareIn router.Use(middleware.CorrelationIDMiddleware())
// ... other middleware ...
router.Use(middleware.CorrelationIDMiddleware()) // ❌ Registered twiceImpact: Every request runs through CorrelationIDMiddleware twice unnecessarily. The gateway metrics structure itself is sound, but these bugs make the PR unsafe to merge without fixes. |
Fixes: #79
Added Prometheus
/metricsendpoint at Gateway and VerifierChecked testing for Gateway and Verifier Metrics
Summary by CodeRabbit
New Features
Bug Fixes / Behavior
Greptile Summary
This PR adds Prometheus metrics instrumentation to both the Go gateway and the Rust verifier services. The gateway implementation defines counters for requests, cache hits/misses, rate limits, and verification results, plus a histogram for request duration and a gauge for active requests. The verifier adds per-verification metrics tracking. However, the PR has several critical bugs that prevent it from working as intended:
/metricsroute are placed after the blockingaxum::serve()call, making them unreachable dead code. The recorder is never installed, sometrics::record_verification()calls silently no-op, and the/metricsendpoint is never served.metrics.rsuses the wrong macro API: Themetricscrate v0.21 changed its macro syntax. The oldcounter!("name", 1)form from v0.17–0.20 was removed; v0.21 requirescounter!("name").increment(1). This code will fail to compile.cacheHitscounter instead ofcacheMisses, making both metrics inaccurate.CorrelationIDMiddleware()is registered twice on the router, causing every request to run through it twice.Confidence Score: 1/5
verifier/src/main.rs(dead code after serve),verifier/src/metrics.rs(wrong API),gateway/cache.go(wrong counter), andgateway/main.go(duplicate middleware).Important Files Changed
Flowchart
flowchart TD A[Incoming HTTP Request] --> B[CorrelationIDMiddleware] B --> C{METRICS_ENABLED != false?} C -->|Yes| D[MetricsMiddleware] C -->|No| E[CORS / Rate Limit / Timeout] D --> E E --> F{Route} F -->|/metrics| G[promhttp.Handler - Prometheus Scrape] F -->|/api/ai/summarize| H{Cache Enabled?} H -->|Yes| I[CacheMiddleware] H -->|No| J[handleSummarize] I -->|Cache HIT| K[cacheHits.Inc] I -->|Cache MISS| L["⚠️ BUG: cacheHits.Inc instead of cacheMisses.Inc"] K --> M[Verify Payment & Serve Cached] L --> J J --> N[Verify Payment → Verifier Service] N --> O[Verifier /verify] O --> P["metrics::record_verification()"] P --> Q["⚠️ BUG: No recorder installed - silently no-ops"] F -->|/health /readyz| R[Health Check Handlers]Last reviewed commit: 46836ba
(5/5) You can turn off certain types of comments like style here!
Context used:
dashboard- AGENTS.md (source)