Skip to content

feat(metrics): add Prometheus instrumentation and expose /metrics end…#124

Open
dharapandya85 wants to merge 4 commits intoAnkanMisra:mainfrom
dharapandya85:feat/prometheus-metrics
Open

feat(metrics): add Prometheus instrumentation and expose /metrics end…#124
dharapandya85 wants to merge 4 commits intoAnkanMisra:mainfrom
dharapandya85:feat/prometheus-metrics

Conversation

@dharapandya85
Copy link
Copy Markdown
Contributor

@dharapandya85 dharapandya85 commented Feb 17, 2026

Fixes: #79

Added Prometheus /metrics endpoint at Gateway and Verifier

Checked testing for Gateway and Verifier Metrics

Summary by CodeRabbit

  • New Features

    • Exposes Prometheus metrics and a /metrics endpoint to monitor requests, latencies, cache hits/misses, rate limits, active requests, and verification outcomes.
  • Bug Fixes / Behavior

    • Records route-specific cache hit/miss and verification metrics; verification failures now increment failure metrics and can change responses (including deadline, validation, or authorization errors).

Greptile Summary

This PR adds Prometheus metrics instrumentation to both the Go gateway and the Rust verifier services. The gateway implementation defines counters for requests, cache hits/misses, rate limits, and verification results, plus a histogram for request duration and a gauge for active requests. The verifier adds per-verification metrics tracking. However, the PR has several critical bugs that prevent it from working as intended:

  • Verifier metrics are completely non-functional: The Prometheus recorder setup and /metrics route are placed after the blocking axum::serve() call, making them unreachable dead code. The recorder is never installed, so metrics::record_verification() calls silently no-op, and the /metrics endpoint is never served.
  • Verifier metrics.rs uses the wrong macro API: The metrics crate v0.21 changed its macro syntax. The old counter!("name", 1) form from v0.17–0.20 was removed; v0.21 requires counter!("name").increment(1). This code will fail to compile.
  • Gateway cache miss counter is wrong: The cache MISS path increments the cacheHits counter instead of cacheMisses, making both metrics inaccurate.
  • Duplicate middleware registration: CorrelationIDMiddleware() is registered twice on the router, causing every request to run through it twice.

Confidence Score: 1/5

  • This PR has multiple critical bugs — the verifier metrics will not compile and the gateway has logic errors in metric counters and middleware ordering.
  • Score of 1 reflects: (1) verifier metrics.rs uses incorrect API for metrics 0.21 and will not compile, (2) all verifier Prometheus setup is unreachable dead code after a blocking serve call, (3) gateway cache miss incorrectly increments cache hit counter, (4) duplicate CorrelationIDMiddleware registration. The gateway MetricsMiddleware and metrics definitions are structurally sound, but the bugs above make the PR unsafe to merge without fixes.
  • Pay close attention to verifier/src/main.rs (dead code after serve), verifier/src/metrics.rs (wrong API), gateway/cache.go (wrong counter), and gateway/main.go (duplicate middleware).

Important Files Changed

Filename Overview
gateway/main.go Duplicate CorrelationIDMiddleware registration causes double execution; MetricsMiddleware added before existing middleware ordering; leftover commented-out code at end of file.
gateway/cache.go Cache miss path incorrectly increments cacheHits counter instead of cacheMisses — metrics data will be wrong for both hit and miss counts.
gateway/metrics.go New file defining Prometheus metrics; rateLimitHits and verificationTotal are declared but never used anywhere in the codebase.
gateway/middleware.go MetricsMiddleware implementation is correct — tracks active requests, request counts, and duration histograms with proper label usage.
gateway/go.mod Adds prometheus/client_golang and transitive dependencies; versions appear appropriate.
verifier/src/main.rs Critical: All Prometheus setup code is unreachable (placed after blocking axum::serve call). The /metrics endpoint will never be served and the recorder is never installed.
verifier/src/metrics.rs Uses metrics crate 0.21 macros with the old v0.17-0.20 syntax (counter!("name", 1)) which was removed in 0.21 — this will not compile.
verifier/Cargo.toml Adds metrics 0.21 and metrics-exporter-prometheus 0.12 — versions are compatible with each other.

Flowchart

flowchart TD
    A[Incoming HTTP Request] --> B[CorrelationIDMiddleware]
    B --> C{METRICS_ENABLED != false?}
    C -->|Yes| D[MetricsMiddleware]
    C -->|No| E[CORS / Rate Limit / Timeout]
    D --> E
    E --> F{Route}
    F -->|/metrics| G[promhttp.Handler - Prometheus Scrape]
    F -->|/api/ai/summarize| H{Cache Enabled?}
    H -->|Yes| I[CacheMiddleware]
    H -->|No| J[handleSummarize]
    I -->|Cache HIT| K[cacheHits.Inc]
    I -->|Cache MISS| L["⚠️ BUG: cacheHits.Inc instead of cacheMisses.Inc"]
    K --> M[Verify Payment & Serve Cached]
    L --> J
    J --> N[Verify Payment → Verifier Service]
    N --> O[Verifier /verify]
    O --> P["metrics::record_verification()"]
    P --> Q["⚠️ BUG: No recorder installed - silently no-ops"]
    F -->|/health /readyz| R[Health Check Handlers]
Loading

Last reviewed commit: 46836ba

(5/5) You can turn off certain types of comments like style here!

Context used:

  • Context from dashboard - AGENTS.md (source)

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 17, 2026

No actionable comments were generated in the recent review. 🎉


📝 Walkthrough

Walkthrough

Adds Prometheus-based metrics to Gateway and Verifier: metric declarations, a Gin middleware and /metrics endpoint in the gateway, cache hit/miss and verification instrumentation in gateway/cache.go, and Prometheus recorder + verification metrics in the verifier.

Changes

Cohort / File(s) Summary
Gateway Metrics + deps
gateway/metrics.go, gateway/go.mod
Add Prometheus metric declarations (counters, histogram, gauge) and add github.com/prometheus/client_golang (plus indirect deps) to go.mod.
Gateway Middleware & route
gateway/middleware.go, gateway/main.go
Add MetricsMiddleware() to track active requests, duration, and request counters; register conditional /metrics route using promhttp.Handler().
Gateway cache & verification instrumentation
gateway/cache.go
Instrument cache HIT/MISS with route-scoped cacheHits/cacheMisses; on HIT invoke verification metrics (verificationTotal) and increment appropriate labels on success/error; adjust control flow to return suitable HTTP statuses for verification errors.
Verifier metrics deps
verifier/Cargo.toml
Add Rust crates metrics and metrics-exporter-prometheus.
Verifier instrumentation & route
verifier/src/metrics.rs, verifier/src/main.rs
New record_verification(...) records counters and durations; initialize Prometheus recorder and expose /metrics; verify_signature records duration, calls metrics recorder, and preserves headers.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Gateway as Gateway (Go)
    participant Verifier as Verifier (Rust)
    participant Prom as Prometheus

    Client->>Gateway: HTTP request
    activate Gateway
    Gateway->>Gateway: MetricsMiddleware: Inc activeRequests, start timer
    alt Cache HIT
        Gateway->>Gateway: cacheHits(path)++
        Gateway->>Verifier: forward verification request
        activate Verifier
        Verifier->>Verifier: verify_signature (measure duration)
        Verifier->>Prom: record_verification(duration, valid/error)
        Verifier-->>Gateway: verification result
        deactivate Verifier
        Gateway->>Prom: verificationTotal(success|error)++
    else Cache MISS
        Gateway->>Prom: cacheMisses(path)++
    end
    Gateway->>Gateway: stop timer, compute status
    Gateway->>Prom: requestsTotal(method,path,status)++, requestsDuration.observe(...)
    Gateway->>Gateway: Dec activeRequests
    Gateway-->>Client: HTTP response
    deactivate Gateway

    Note over Prom: Prometheus scrapes /metrics from Gateway and Verifier
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

Hard, SWoC26

Suggested reviewers

  • AnkanMisra

Poem

🐰
I hopped where counters quietly grow,
I timed each tick and watched the flow.
Hits, misses, verifications bright—
I tally carrots through the night. 🥕

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR addresses issue #79 objectives but contains critical implementation issues: verifier metrics setup is unreachable (placed after blocking axum::serve call), metrics module uses outdated macro syntax incompatible with metrics 0.21, cache miss path increments wrong metric, and CorrelationIDMiddleware is registered twice. Fix verifier metrics initialization order (move before axum::serve), update metrics.rs macro syntax to metrics 0.21+, correct cache.go to increment cacheMisses not cacheHits, remove duplicate CorrelationIDMiddleware registration, and verify all declared metrics are properly instrumented.
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective: adding Prometheus instrumentation and exposing metrics endpoints for observability.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing Prometheus metrics as specified in issue #79; no unrelated code modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

Comment thread gateway/main.go
Comment on lines 206 to +220
r := gin.Default()

r.Use(CorrelationIDMiddleware())

if os.Getenv("METRICS_ENABLED") != "false" {
r.Use(MetricsMiddleware())

path := os.Getenv("METRICS_PATH")
if path == "" {
path = "/metrics"
}
r.GET(path, gin.WrapH(promhttp.Handler()))
}


Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate CorrelationIDMiddleware registration

CorrelationIDMiddleware() is registered twice — once at line 208 (newly added) and again at line 223 (the original). Every request will run through the middleware twice, which means a new UUID is potentially generated and then overwritten on the second pass. The duplicate at line 208 should be removed, as line 223 is the intended registration point (as indicated by the existing comment "VIBE FIX: Register the Correlation ID Middleware immediately").

Suggested change
r := gin.Default()
r.Use(CorrelationIDMiddleware())
if os.Getenv("METRICS_ENABLED") != "false" {
r.Use(MetricsMiddleware())
path := os.Getenv("METRICS_PATH")
if path == "" {
path = "/metrics"
}
r.GET(path, gin.WrapH(promhttp.Handler()))
}
r := gin.Default()
if os.Getenv("METRICS_ENABLED") != "false" {
r.Use(MetricsMiddleware())
path := os.Getenv("METRICS_PATH")
if path == "" {
path = "/metrics"
}
r.GET(path, gin.WrapH(promhttp.Handler()))
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/main.go
Line: 206:220

Comment:
**Duplicate `CorrelationIDMiddleware` registration**

`CorrelationIDMiddleware()` is registered twice — once at line 208 (newly added) and again at line 223 (the original). Every request will run through the middleware twice, which means a new UUID is potentially generated and then overwritten on the second pass. The duplicate at line 208 should be removed, as line 223 is the intended registration point (as indicated by the existing comment "VIBE FIX: Register the Correlation ID Middleware immediately").

```suggestion
	r := gin.Default()

	if os.Getenv("METRICS_ENABLED") != "false" {
		r.Use(MetricsMiddleware())
		
		path := os.Getenv("METRICS_PATH")
		if path == "" {
			path = "/metrics"
		}
		r.GET(path, gin.WrapH(promhttp.Handler()))
	}

```

How can I resolve this? If you propose a fix, please make it concise.

Comment thread gateway/cache.go Outdated
Comment on lines +181 to +185
routePath := c.FullPath()
if routePath == "" {
routePath = "unknown"
}
cacheHits.WithLabelValues(routePath).Inc()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache miss incorrectly increments cacheHits

On a cache MISS, this code increments cacheHits instead of cacheMisses. The metric cacheMisses is declared in metrics.go but is never used anywhere in the codebase.

Suggested change
routePath := c.FullPath()
if routePath == "" {
routePath = "unknown"
}
cacheHits.WithLabelValues(routePath).Inc()
routePath := c.FullPath()
if routePath == "" {
routePath = "unknown"
}
cacheMisses.WithLabelValues(routePath).Inc()
Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/cache.go
Line: 181:185

Comment:
**Cache miss incorrectly increments `cacheHits`**

On a cache MISS, this code increments `cacheHits` instead of `cacheMisses`. The metric `cacheMisses` is declared in `metrics.go` but is never used anywhere in the codebase.

```suggestion
		routePath := c.FullPath()
		if routePath == "" {
			routePath = "unknown"
		}
		cacheMisses.WithLabelValues(routePath).Inc()
```

How can I resolve this? If you propose a fix, please make it concise.

Comment thread verifier/src/metrics.rs Outdated
Comment on lines +1 to +17
use metrics::{counter, histogram};

pub fn record_verification(valid: bool, duration: f64, reason: Option<&str>){
counter!("verifier_requests_total", 1);
histogram!("verifier_request_duration_seconds", duration);

if valid {
counter!("verifier_signature_valid_total", 1);
} else {
let label = reason.unwrap_or("unknown").to_string();
counter!(
"verifier_signature_invalid_total",
1,
"reason" => label
);
}
} No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect metrics 0.21 macro API — will not compile

The metrics crate v0.21 changed its macro API. The old syntax counter!("name", 1) and histogram!("name", value) from v0.17–0.20 is no longer valid. In v0.21, the macros return handle objects, and you call .increment() or .record() on them.

This code will fail to compile. The correct v0.21 API is:

Suggested change
use metrics::{counter, histogram};
pub fn record_verification(valid: bool, duration: f64, reason: Option<&str>){
counter!("verifier_requests_total", 1);
histogram!("verifier_request_duration_seconds", duration);
if valid {
counter!("verifier_signature_valid_total", 1);
} else {
let label = reason.unwrap_or("unknown").to_string();
counter!(
"verifier_signature_invalid_total",
1,
"reason" => label
);
}
}
use metrics::{counter, histogram};
pub fn record_verification(valid: bool, duration: f64, reason: Option<&str>){
counter!("verifier_requests_total").increment(1);
histogram!("verifier_request_duration_seconds").record(duration);
if valid {
counter!("verifier_signature_valid_total").increment(1);
} else {
let label = reason.unwrap_or("unknown").to_string();
counter!(
"verifier_signature_invalid_total",
"reason" => label
).increment(1);
}
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: verifier/src/metrics.rs
Line: 1:17

Comment:
**Incorrect `metrics` 0.21 macro API — will not compile**

The `metrics` crate v0.21 changed its macro API. The old syntax `counter!("name", 1)` and `histogram!("name", value)` from v0.17–0.20 is no longer valid. In v0.21, the macros return handle objects, and you call `.increment()` or `.record()` on them.

This code will fail to compile. The correct v0.21 API is:

```suggestion
use metrics::{counter, histogram};

pub fn record_verification(valid: bool, duration: f64, reason: Option<&str>){
    counter!("verifier_requests_total").increment(1);
    histogram!("verifier_request_duration_seconds").record(duration);

    if valid {
        counter!("verifier_signature_valid_total").increment(1);
    } else {
        let label = reason.unwrap_or("unknown").to_string();
        counter!(
            "verifier_signature_invalid_total",
            "reason" => label
        ).increment(1);
    }
}
```

How can I resolve this? If you propose a fix, please make it concise.

Comment thread gateway/main.go Outdated
Comment on lines +1191 to +1194
// r := gin.New()

// r.use(MetricsMiddleware())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover commented-out code

These commented-out lines appear to be debug/development remnants and should be removed before merging.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/main.go
Line: 1191:1194

Comment:
**Leftover commented-out code**

These commented-out lines appear to be debug/development remnants and should be removed before merging.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment thread gateway/metrics.go
Comment on lines +38 to +51
rateLimitHits = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "gateway_rate_limit_hits_total",
Help: "Rate limit rejections",
},
[]string{"path"},
)
verificationTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "gateway_verification_total",
Help: "Signature verification results",
},
[]string{"result"},
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declared metrics are never used

rateLimitHits and verificationTotal are declared but never referenced anywhere in the changed or existing code. If these are intended for future use, they should be added when the corresponding instrumentation is implemented. Otherwise, they add unused metric registrations to the Prometheus output.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/metrics.go
Line: 38:51

Comment:
**Declared metrics are never used**

`rateLimitHits` and `verificationTotal` are declared but never referenced anywhere in the changed or existing code. If these are intended for future use, they should be added when the corresponding instrumentation is implemented. Otherwise, they add unused metric registrations to the Prometheus output.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Feb 17, 2026

Additional Comments (1)

verifier/src/main.rs
Dead code: Prometheus setup is unreachable

axum::serve(listener, app).await.unwrap() on line 65 blocks indefinitely while serving requests. All code after it (lines 67-78) — including the Prometheus recorder installation, the /metrics route definition, and the new Router — will never execute. This means:

  1. The metrics recorder is never installed, so calls to metrics::record_verification() will silently no-op.
  2. The /metrics endpoint is never served.
  3. A second Router is constructed (shadowing app) but never bound to a listener.

The Prometheus setup and the /metrics route need to be moved before axum::serve, and the routes should be added to the original app router that is actually served.

    let recorder = PrometheusBuilder::new()
        .install_recorder()
        .expect("failed to install recorder");

    let metrics_route = {
        let handle = recorder.clone();
        move || async move { handle.render() }
    };

    let app = Router::new()
        .route("/health", get(health))
        .route("/verify", post(verify_signature))
        .route("/metrics", get(metrics_route))
        .layer(DefaultBodyLimit::max(limit))
        .with_state(state);

    let addr = SocketAddr::from(([0, 0, 0, 0], 3002));
    println!("Rust Verifier listening on {}", addr);

    let listener = tokio::net::TcpListener::bind(addr).await.unwrap();
    axum::serve(listener, app).await.unwrap();
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: verifier/src/main.rs
Line: 55:79

Comment:
**Dead code: Prometheus setup is unreachable**

`axum::serve(listener, app).await.unwrap()` on line 65 blocks indefinitely while serving requests. All code after it (lines 67-78) — including the Prometheus recorder installation, the `/metrics` route definition, and the new `Router` — will **never execute**. This means:

1. The metrics recorder is never installed, so calls to `metrics::record_verification()` will silently no-op.
2. The `/metrics` endpoint is never served.
3. A second `Router` is constructed (shadowing `app`) but never bound to a listener.

The Prometheus setup and the `/metrics` route need to be moved **before** `axum::serve`, and the routes should be added to the original `app` router that is actually served.

```suggestion
    let recorder = PrometheusBuilder::new()
        .install_recorder()
        .expect("failed to install recorder");

    let metrics_route = {
        let handle = recorder.clone();
        move || async move { handle.render() }
    };

    let app = Router::new()
        .route("/health", get(health))
        .route("/verify", post(verify_signature))
        .route("/metrics", get(metrics_route))
        .layer(DefaultBodyLimit::max(limit))
        .with_state(state);

    let addr = SocketAddr::from(([0, 0, 0, 0], 3002));
    println!("Rust Verifier listening on {}", addr);

    let listener = tokio::net::TcpListener::bind(addr).await.unwrap();
    axum::serve(listener, app).await.unwrap();
}
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (2)
gateway/main.go (1)

1190-1194: Remove dead commented-out code.

Lines 1190-1194 contain leftover commented-out scaffolding (r := gin.New(), r.use(MetricsMiddleware())). These serve no purpose and should be cleaned up.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gateway/main.go` around lines 1190 - 1194, Remove the dead commented-out
scaffolding lines containing "r := gin.New()" and "r.use(MetricsMiddleware())"
so the file no longer contains unused commented code; locate the commented block
referencing gin.New() and MetricsMiddleware() and delete those comment lines to
keep the gateway/main.go clean.
verifier/src/main.rs (1)

327-352: Unnecessary res_headers.clone() in both match arms — headers are discarded.

The match on line 328 destructures into (status, headers, Json(response)), but the headers binding from each arm is never used — line 351 returns the original res_headers. The .clone() calls on lines 331 and 340 exist only to avoid a move, but since the cloned values are thrown away, you can simplify by not capturing headers from the match at all.

Also, multiple cargo fmt violations are flagged by CI on these lines. Please run cargo fmt. As per coding guidelines, format with cargo fmt -- --check and lint with cargo clippy -- -D warnings.

♻️ Simplify the match to avoid unnecessary clone
     let start = std::time::Instant::now();
-    let (status,headers,Json(response)) = match sig.recover_typed_data(&typed_data) {
+    let (status, response) = match sig.recover_typed_data(&typed_data) {
         Ok(addr) => (
             StatusCode::OK,
-            res_headers.clone(),
-            Json(VerifyResponse {
+            VerifyResponse {
                 is_valid: true,
                 recovered_address: Some(format!("{:?}", addr)),
                 error: None,
-            }),
+            },
         ),
         Err(e) => (
             StatusCode::OK,
-            res_headers.clone(),
-            Json(VerifyResponse {
+            VerifyResponse {
                 is_valid: false,
                 recovered_address: None,
                 error: Some(e.to_string()),
-            }),
+            },
         ),
     };
-    
+
     let duration = start.elapsed().as_secs_f64();
     metrics::record_verification(response.is_valid, duration, response.error.as_deref());
-    (status, res_headers, Json(response),)
-    
+    (status, res_headers, Json(response))
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verifier/src/main.rs` around lines 327 - 352, The match that calls
sig.recover_typed_data(&typed_data) currently returns a three-tuple with cloned
res_headers which is never used; update the match to only produce (status,
Json(response)) (or set status/response inside the match) and return the
original res_headers at the end to avoid the needless res_headers.clone() calls
and moves; adjust the binding in the match (referencing sig.recover_typed_data,
VerifyResponse, and response) so headers are not captured, then run cargo fmt
and cargo clippy -- -D warnings to fix formatting and lint violations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@gateway/cache.go`:
- Around line 181-185: In the cache MISS branch where routePath is determined
via c.FullPath() and you currently call
cacheHits.WithLabelValues(routePath).Inc(), replace that call with
cacheMisses.WithLabelValues(routePath).Inc() so misses are recorded correctly;
ensure the variable routePath and the label usage remain unchanged and only swap
the metric being incremented.

In `@gateway/go.mod`:
- Line 14: The go.mod addition of github.com/prometheus/client_golang v1.23.2
introduced missing go.sum entries causing CI failures; run `go mod tidy` to
regenerate and update go.sum so the Prometheus dependency checksums are
recorded, then run `go vet ./...` to lint the codebase as per guidelines before
committing the updated go.sum and go.mod changes.

In `@gateway/main.go`:
- Around line 208-223: Remove the duplicate CorrelationIDMiddleware
registration: there are two calls to r.Use(CorrelationIDMiddleware()) (one
before the METRICS_ENABLED block and one after); delete the earlier call (the
one before MetricsMiddleware/METRICS_PATH handling) so CorrelationIDMiddleware
is only registered once via r.Use(CorrelationIDMiddleware()), leaving
MetricsMiddleware and the metrics route intact.

In `@gateway/metrics.go`:
- Around line 16-23: The histogram metric declared as requestsDuration uses Name
"gateway_requests_duration_seconds" which deviates from the spec; rename the
metric to "gateway_request_duration_seconds" to match Issue `#79`. Locate the
promauto.NewHistogramVec call that creates requestsDuration and update the
HistogramOpts.Name field to the singular form, then run any unit/integration
tests or linting that validate metric names so dashboards/alerts align with the
spec.
- Around line 38-51: The metrics rateLimitHits and verificationTotal are
declared but never used; add increments where rate limiting and signature
verification happen: in your rate-limit middleware or cache rejection path call
rateLimitHits.WithLabelValues(path).Inc() using the request path label when
rejecting a request, and in the signature verification code paths (e.g.,
VerifySignature or the handler that validates signatures) call
verificationTotal.WithLabelValues("success").Inc() on valid signatures and
verificationTotal.WithLabelValues("failure").Inc() on failures; ensure you pass
the correct label values ("path" and "result") and keep the increments adjacent
to the existing rejection/verification logic so metrics reflect actual events.

In `@verifier/Cargo.toml`:
- Around line 13-14: The project pins metrics = "0.21" and
metrics-exporter-prometheus = "0.12" but the code uses the legacy macro API
(counter!("name", value) and histogram!("name", value)) which breaks on metrics
>=0.22; to fix, either (A) migrate all uses of the macros (e.g., counter! and
histogram! invocations) to the new handle-based API by replacing
counter!("name", x) with counter!("name").increment(x) and histogram!("name", d)
with histogram!("name").record(d) throughout the codebase (ensure you import any
new types or features required by the updated metrics crate), or (B) explicitly
document and keep the pinned versions in Cargo.toml to indicate intentional
compatibility with the legacy macro API.

In `@verifier/src/main.rs`:
- Around line 64-78: The Prometheus setup and metrics route are dead because
axum::serve(listener, app).await blocks; move the
PrometheusBuilder::new().install_recorder() call (and capture the returned
recorder) to occur before you construct the Router, then include the metrics
route in the same Router you pass to axum::serve; e.g., install the recorder,
create metrics_route (using recorder.clone()), build app =
Router::new().route("/health", get(health)).route("/verify",
post(verify_signature)).route("/metrics", get(metrics_route)), and only then
call tokio::net::TcpListener::bind(...).await and axum::serve(listener,
app).await.unwrap().

In `@verifier/src/metrics.rs`:
- Around line 1-16: Run rustfmt (cargo fmt) to fix formatting violations in the
verifier module; format the function record_verification and surrounding imports
so the file compiles with CI formatting checks (ensure use metrics::{counter,
histogram}; and the record_verification signature/body conform to rustfmt
style), then re-run cargo fmt -- --check and cargo clippy -- -D warnings to
verify; focus changes around the record_verification function and the
counter!/histogram! invocations.

---

Nitpick comments:
In `@gateway/main.go`:
- Around line 1190-1194: Remove the dead commented-out scaffolding lines
containing "r := gin.New()" and "r.use(MetricsMiddleware())" so the file no
longer contains unused commented code; locate the commented block referencing
gin.New() and MetricsMiddleware() and delete those comment lines to keep the
gateway/main.go clean.

In `@verifier/src/main.rs`:
- Around line 327-352: The match that calls sig.recover_typed_data(&typed_data)
currently returns a three-tuple with cloned res_headers which is never used;
update the match to only produce (status, Json(response)) (or set
status/response inside the match) and return the original res_headers at the end
to avoid the needless res_headers.clone() calls and moves; adjust the binding in
the match (referencing sig.recover_typed_data, VerifyResponse, and response) so
headers are not captured, then run cargo fmt and cargo clippy -- -D warnings to
fix formatting and lint violations.

Comment thread gateway/cache.go Outdated
Comment thread gateway/go.mod
github.com/gin-gonic/gin v1.11.0
github.com/google/uuid v1.6.0
github.com/joho/godotenv v1.5.1
github.com/prometheus/client_golang v1.23.2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Run go mod tidy to fix missing go.sum entries.

Pipeline failures show that go.sum is missing entries for the newly added Prometheus packages. Run go mod tidy to synchronize. As per coding guidelines, lint with go vet ./... after fixing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gateway/go.mod` at line 14, The go.mod addition of
github.com/prometheus/client_golang v1.23.2 introduced missing go.sum entries
causing CI failures; run `go mod tidy` to regenerate and update go.sum so the
Prometheus dependency checksums are recorded, then run `go vet ./...` to lint
the codebase as per guidelines before committing the updated go.sum and go.mod
changes.

Comment thread gateway/main.go Outdated
Comment thread gateway/metrics.go
Comment thread gateway/metrics.go
Comment on lines +38 to +51
rateLimitHits = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "gateway_rate_limit_hits_total",
Help: "Rate limit rejections",
},
[]string{"path"},
)
verificationTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "gateway_verification_total",
Help: "Signature verification results",
},
[]string{"result"},
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if rateLimitHits and verificationTotal are used anywhere in the gateway codebase
echo "=== rateLimitHits usage ==="
rg -n 'rateLimitHits' --type=go
echo ""
echo "=== verificationTotal usage ==="
rg -n 'verificationTotal' --type=go

Repository: AnkanMisra/MicroAI-Paygate

Length of output: 260


Instrument rateLimitHits and verificationTotal metrics throughout the gateway code.

These metrics are declared in gateway/metrics.go but are never incremented anywhere in the codebase. They will always report zero, which could mislead operators relying on these metrics for monitoring. Add instrumentation calls to increment these counters in the appropriate middleware, cache, and handler code paths where rate limit rejections and signature verifications occur.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gateway/metrics.go` around lines 38 - 51, The metrics rateLimitHits and
verificationTotal are declared but never used; add increments where rate
limiting and signature verification happen: in your rate-limit middleware or
cache rejection path call rateLimitHits.WithLabelValues(path).Inc() using the
request path label when rejecting a request, and in the signature verification
code paths (e.g., VerifySignature or the handler that validates signatures) call
verificationTotal.WithLabelValues("success").Inc() on valid signatures and
verificationTotal.WithLabelValues("failure").Inc() on failures; ensure you pass
the correct label values ("path" and "result") and keep the increments adjacent
to the existing rejection/verification logic so metrics reflect actual events.

Comment thread verifier/Cargo.toml Outdated
Comment thread verifier/src/main.rs Outdated
Comment on lines +64 to +78
let listener = tokio::net::TcpListener::bind(addr).await.unwrap();
axum::serve(listener, app).await.unwrap();

let recorder = PrometheusBuilder::new()
.install_recorder()
.expect("failed to install recorder");

let metrics_route = {
let handle = recorder.clone();
move || async move {handle.render() }
};
let app = Router::new()
.route("/health", get(health))
.route("/verify", post(verify_signature))
.route("/metrics", get(metrics_route));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: All Prometheus setup code is unreachable (dead code after blocking .await).

axum::serve(listener, app).await.unwrap() on line 65 blocks indefinitely — it is the server event loop. Lines 67-78 will never execute. This means:

  1. The Prometheus recorder is never installed, so all counter!/histogram! calls in metrics.rs silently no-op.
  2. The /metrics route is never registered, so scraping will 404.

The recorder must be installed before building the Router, and the /metrics route must be part of the original app.

🐛 Proposed fix — move Prometheus setup before serve
 #[tokio::main]
 async fn main() {
     let limit = get_max_body_size();
     let state = AppState {
         max_body_size: limit,
     };
+
+    let recorder = PrometheusBuilder::new()
+        .install_recorder()
+        .expect("failed to install recorder");
+
+    let metrics_handle = recorder.clone();
+    let metrics_route = move || async move { metrics_handle.render() };
+
     let app = Router::new()
         .route("/health", get(health))
         .route("/verify", post(verify_signature))
+        .route("/metrics", get(metrics_route))
         .layer(DefaultBodyLimit::max(limit))
         .with_state(state);
 
     let addr = SocketAddr::from(([0, 0, 0, 0], 3002));
     println!("Rust Verifier listening on {}", addr);
 
     let listener = tokio::net::TcpListener::bind(addr).await.unwrap();
     axum::serve(listener, app).await.unwrap();
-
-    let recorder = PrometheusBuilder::new()
-        .install_recorder()
-        .expect("failed to install recorder");
-
-    let metrics_route = {
-        let handle = recorder.clone();
-        move || async move {handle.render() }
-    };
-    let app = Router::new()
-        .route("/health", get(health))
-        .route("/verify", post(verify_signature))
-        .route("/metrics", get(metrics_route));
 }
🧰 Tools
🪛 GitHub Actions: Rust Lint

[error] 70-70: cargo fmt --check detected formatting changes: difference in closure body (move || async move {handle.render()}). Please run 'cargo fmt' to format.


[error] 78-78: cargo fmt --check detected formatting changes near route definitions. Please run 'cargo fmt' to format.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verifier/src/main.rs` around lines 64 - 78, The Prometheus setup and metrics
route are dead because axum::serve(listener, app).await blocks; move the
PrometheusBuilder::new().install_recorder() call (and capture the returned
recorder) to occur before you construct the Router, then include the metrics
route in the same Router you pass to axum::serve; e.g., install the recorder,
create metrics_route (using recorder.clone()), build app =
Router::new().route("/health", get(health)).route("/verify",
post(verify_signature)).route("/metrics", get(metrics_route)), and only then
call tokio::net::TcpListener::bind(...).await and axum::serve(listener,
app).await.unwrap().

Comment thread verifier/src/metrics.rs
@AnkanMisra
Copy link
Copy Markdown
Owner

@dharapandya85 fix the go issue and rust lint problems
all mentioned here

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
verifier/src/main.rs (1)

327-351: ⚠️ Potential issue | 🟠 Major

Unused headers variable causes a pipeline failure at line 328.

Both match arms clone res_headers into headers, but line 351 returns res_headers rather than headers. Clippy reports "unused variable: headers." Return headers directly instead to silence the warning and use the value that was destructured.

🐛 Proposed fix
-    (status, res_headers, Json(response))
+    (status, headers, Json(response))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verifier/src/main.rs` around lines 327 - 351, The match destructures into a
local variable named headers but the function still returns res_headers, leaving
headers unused and triggering Clippy; update the final return to use the
destructured headers variable instead of res_headers so the tuple (status,
headers, Json(response)) that you built from sig.recover_typed_data(...) is
returned and the unused-variable warning is resolved (no other logic changes
needed; keep metrics::record_verification(...) and VerifyResponse construction
as-is).
🧹 Nitpick comments (1)
gateway/cache.go (1)

116-119: Duplicate routePath extraction — minor refactor opportunity.

The same 4-line pattern appears twice:

routePath := c.FullPath()
if routePath == "" {
    routePath = "unknown"
}
♻️ Optional helper extraction
+func routePathOrUnknown(c *gin.Context) string {
+	if p := c.FullPath(); p != "" {
+		return p
+	}
+	return "unknown"
+}

Then replace both occurrences with a single call:

-		routePath := c.FullPath()
-		if routePath == "" {
-			routePath = "unknown"
-		}
-		cacheHits.WithLabelValues(routePath).Inc()
+		cacheHits.WithLabelValues(routePathOrUnknown(c)).Inc()
-		routePath := c.FullPath()
-		if routePath == "" {
-			routePath = "unknown"
-		}
-		cacheMisses.WithLabelValues(routePath).Inc()
+		cacheMisses.WithLabelValues(routePathOrUnknown(c)).Inc()

Also applies to: 183-186

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gateway/cache.go` around lines 116 - 119, Extract the repeated routePath
extraction into a small helper (e.g., routePathFromContext or getRoutePath) that
accepts the gin Context (or the same type as c) and returns c.FullPath() or
"unknown" when empty, then replace both occurrences where routePath is computed
with a single call to that helper; update uses in the surrounding
functions/methods (the places containing the original routePath := c.FullPath()
check) to call the new helper to eliminate duplication.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@verifier/src/main.rs`:
- Line 12: The metrics_route block currently contains only a let binding so it
evaluates to () and the Prometheus handle is dropped; change the block so the
closure submitted to get(metrics_route) captures and returns the Prometheus
rendering closure (or simply return the closure as the block's trailing
expression) so that the PrometheusHandle produced by install_recorder() is kept
alive and used; ensure PrometheusHandle (from install_recorder()) is referenced
(e.g., stored in a variable captured by the closure) so the
unused-import/unused-variable errors go away and get(metrics_route) registers
the real Prometheus renderer rather than unit.

---

Outside diff comments:
In `@verifier/src/main.rs`:
- Around line 327-351: The match destructures into a local variable named
headers but the function still returns res_headers, leaving headers unused and
triggering Clippy; update the final return to use the destructured headers
variable instead of res_headers so the tuple (status, headers, Json(response))
that you built from sig.recover_typed_data(...) is returned and the
unused-variable warning is resolved (no other logic changes needed; keep
metrics::record_verification(...) and VerifyResponse construction as-is).

---

Duplicate comments:
In `@gateway/metrics.go`:
- Around line 38-43: The rateLimitHits Prometheus counter (rateLimitHits) is
never incremented and will always report zero; locate the rate-limiting
rejection path (e.g., the rate limiter middleware or the handler that returns
429, likely in gateway/middleware.go or the code that enforces the limiter) and
add a call to increment the metric there using the request path label (e.g.,
rateLimitHits.WithLabelValues(requestPath).Inc()); ensure you use the actual
request path (r.URL.Path or equivalent) when calling WithLabelValues so the
metric is correctly labeled and incremented on each rate-limit rejection.

In `@verifier/src/main.rs`:
- Around line 74-79: The second Router::new() block is dead/unreachable because
axum::serve(listener, app).await.unwrap() never returns; remove the duplicate
Router creation (the let app = Router::new() ... that appears after axum::serve)
and instead ensure the Router (created via Router::new().route("/health",
get(health)).route("/verify", post(verify_signature)).route("/metrics",
get(metrics_route))) is constructed once and passed into axum::serve(listener,
app).await.unwrap(); update references so the app variable used in axum::serve
is the single Router instance.

---

Nitpick comments:
In `@gateway/cache.go`:
- Around line 116-119: Extract the repeated routePath extraction into a small
helper (e.g., routePathFromContext or getRoutePath) that accepts the gin Context
(or the same type as c) and returns c.FullPath() or "unknown" when empty, then
replace both occurrences where routePath is computed with a single call to that
helper; update uses in the surrounding functions/methods (the places containing
the original routePath := c.FullPath() check) to call the new helper to
eliminate duplication.

Comment thread verifier/src/main.rs Outdated
use ethers::types::transaction::eip712::TypedData;
use ethers::types::Signature;
mod metrics;
use metrics_exporter_prometheus::{PrometheusBuilder, PrometheusHandle};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

metrics_route evaluates to ()/metrics endpoint is broken; pipeline failures at lines 12, 60, 61.

The wrapping block at lines 60–62 assigns () to metrics_route rather than the closure, because the block contains only a let statement (no trailing expression). The closure is assigned to the local handle and immediately dropped. get(metrics_route) at line 66 therefore registers a unit handler, not the Prometheus renderer. Three pipeline errors originate from this:

  • Line 12: PrometheusHandle is imported but never explicitly named (install_recorder() return type is inferred).
  • Line 60: "let unit value — the let binding evaluates to ()."
  • Line 61: "unused variable: handle."
🐛 Proposed fix
-use metrics_exporter_prometheus::{PrometheusBuilder, PrometheusHandle};
+use metrics_exporter_prometheus::PrometheusBuilder;
-    let metrics_handle = recorder.clone();
-    let metrics_route = {
-        let handle = move || async move { metrics_handle.render() };
-    };
+    let metrics_route = move || async move { recorder.render() };

Also applies to: 59-66

🧰 Tools
🪛 GitHub Actions: Rust Lint

[error] 12-12: unused import: PrometheusHandle. The import is unused and will cause a clippy error due to -D warnings.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verifier/src/main.rs` at line 12, The metrics_route block currently contains
only a let binding so it evaluates to () and the Prometheus handle is dropped;
change the block so the closure submitted to get(metrics_route) captures and
returns the Prometheus rendering closure (or simply return the closure as the
block's trailing expression) so that the PrometheusHandle produced by
install_recorder() is kept alive and used; ensure PrometheusHandle (from
install_recorder()) is referenced (e.g., stored in a variable captured by the
closure) so the unused-import/unused-variable errors go away and
get(metrics_route) registers the real Prometheus renderer rather than unit.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
verifier/src/main.rs (2)

208-263: Early-exit paths (JSON rejection, timestamp, parse errors) bypass record_verificationverifier_requests_total undercounts real traffic.

All four early-return branches (lines 210–236 for JSON/size rejection, lines 242–263 for timestamp errors, and the typed_data/sig parse branches around lines 291–319) return without calling metrics::record_verification. The verifier_requests_total counter therefore only reflects requests that reach the crypto path, not actual service load. This diverges from the issue requirement that verifier_requests_total carries a status label covering all outcomes.

Consider extracting the common recording logic or adding lightweight calls at each early exit so the counter accurately represents all inbound requests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verifier/src/main.rs` around lines 208 - 263, Early-return branches (JSON
rejection in the payload match, timestamp validation block, and the
typed_data/sig parse failures) are returning before calling the metrics
recorder, causing verifier_requests_total undercounting; fix by invoking the
same metrics::record_verification (or a small helper that wraps it) with the
appropriate status label just before each early return (e.g., inside the
Err(JsonRejection::BytesRejection(_)) arm, the generic Err(e) arm, the timestamp
error branch that builds msg, and the typed_data/sig parse failure branches) so
every exit path increments verifier_requests_total with the correct status;
extract into a small helper function if helpful to avoid duplication.

321-345: res_headers.clone() in both match arms is immediately discarded — simplify the destructuring.

The two res_headers.clone() calls (lines 325 and 334) exist only to satisfy the tuple type in the match arms, but the second element is thrown away by let (status, _, Json(response)). The clones are wasted allocations on every verified request.

♻️ Proposed simplification
     let start = std::time::Instant::now();
-    let (status, _, Json(response)) = match sig.recover_typed_data(&typed_data) {
+    let (status, Json(response)) = match sig.recover_typed_data(&typed_data) {
         Ok(addr) => (
             StatusCode::OK,
-            res_headers.clone(),
             Json(VerifyResponse {
                 is_valid: true,
                 recovered_address: Some(format!("{:?}", addr)),
                 error: None,
             }),
         ),
         Err(e) => (
             StatusCode::OK,
-            res_headers.clone(),
             Json(VerifyResponse {
                 is_valid: false,
                 recovered_address: None,
                 error: Some(e.to_string()),
             }),
         ),
     };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verifier/src/main.rs` around lines 321 - 345, The match on
sig.recover_typed_data currently returns tuples that include res_headers.clone()
twice even though the second element is discarded by let (status, _,
Json(response)); change the match arms to return only (StatusCode,
Json(VerifyResponse)) (i.e. remove res_headers.clone() from both arms), then
after metrics::record_verification(...) perform a single res_headers.clone()
when constructing the final return tuple (status, res_headers.clone(),
Json(response)); update types accordingly so the match yields (status,
Json(response)) and the final line returns the full (status,
res_headers.clone(), Json(response)) to eliminate the wasted duplicate clones.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@verifier/src/main.rs`:
- Around line 72-73: Remove the trailing blank line inside the main() function
after the axum::serve(listener, app).await.unwrap(); statement so the closing
brace immediately follows that line; update the function containing
axum::serve(...) (main) to have no extra blank line before its final `}` to
satisfy rustfmt (`cargo fmt -- --check`).
- Around line 55-57: The Prometheus recorder installed with
PrometheusBuilder::install_recorder() needs a background upkeep loop to call
PrometheusHandle::run_upkeep() periodically to avoid unbounded histogram growth;
change how you use install_recorder() so you keep the returned PrometheusHandle
(instead of dropping it) and spawn a background task (tokio::spawn or
std::thread::spawn depending on your runtime) that loops with a sleep interval
(e.g., 10s) and calls handle.run_upkeep(). Ensure the spawned task is detached
and the handle is moved into the task so it remains alive for the lifetime of
the process.

---

Duplicate comments:
In `@verifier/src/main.rs`:
- Around line 55-66: The code correctly installs the Prometheus recorder before
building the Router and defines metrics_route as a move || async move closure
that calls recorder.render(); no changes required—ensure
PrometheusBuilder::new().install_recorder() remains before Router construction,
that metrics_route (the closure referencing recorder.render()) is used in
Router::route("/metrics", get(metrics_route)), and that any unused
PrometheusHandle import has been removed.

---

Nitpick comments:
In `@verifier/src/main.rs`:
- Around line 208-263: Early-return branches (JSON rejection in the payload
match, timestamp validation block, and the typed_data/sig parse failures) are
returning before calling the metrics recorder, causing verifier_requests_total
undercounting; fix by invoking the same metrics::record_verification (or a small
helper that wraps it) with the appropriate status label just before each early
return (e.g., inside the Err(JsonRejection::BytesRejection(_)) arm, the generic
Err(e) arm, the timestamp error branch that builds msg, and the typed_data/sig
parse failure branches) so every exit path increments verifier_requests_total
with the correct status; extract into a small helper function if helpful to
avoid duplication.
- Around line 321-345: The match on sig.recover_typed_data currently returns
tuples that include res_headers.clone() twice even though the second element is
discarded by let (status, _, Json(response)); change the match arms to return
only (StatusCode, Json(VerifyResponse)) (i.e. remove res_headers.clone() from
both arms), then after metrics::record_verification(...) perform a single
res_headers.clone() when constructing the final return tuple (status,
res_headers.clone(), Json(response)); update types accordingly so the match
yields (status, Json(response)) and the final line returns the full (status,
res_headers.clone(), Json(response)) to eliminate the wasted duplicate clones.

Comment thread verifier/src/main.rs
Comment on lines +55 to +57
let recorder = PrometheusBuilder::new()
.install_recorder()
.expect("failed to install recorder");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the code at lines 55-57 in verifier/src/main.rs
sed -n '50,70p' verifier/src/main.rs

Repository: AnkanMisra/MicroAI-Paygate

Length of output: 722


🏁 Script executed:

# Check what version of metrics-exporter-prometheus is used
cat verifier/Cargo.toml | grep -A 5 -B 5 "metrics-exporter-prometheus"

Repository: AnkanMisra/MicroAI-Paygate

Length of output: 384


🌐 Web query:

metrics-exporter-prometheus PrometheusBuilder install_recorder upkeep task

💡 Result:

In metrics-exporter-prometheus, PrometheusBuilder::install_recorder() only builds + installs the recorder (and returns a PrometheusHandle you can render into a Prometheus scrape payload yourself), but it does not spawn the background “upkeep task” for you—you must run upkeep periodically. [1][2]

What the upkeep task does: periodic maintenance like draining histogram data so values stay up-to-date and to avoid unbounded memory growth. [1][2]

Key implications:

  • builder.install() / builder.build() ⇒ exporter is set up and an upkeep task is spawned automatically. [2]
  • builder.install_recorder() / builder.build_recorder()no upkeep task, so you must call PrometheusHandle::run_upkeep() on an interval. [2]
  • You can configure the upkeep interval with PrometheusBuilder::upkeep_timeout(Duration). [1]

Minimal pattern (pseudo-structure):

let handle = PrometheusBuilder::new()
  .install_recorder()
  .unwrap();

// your scheduler/timer:
loop {
  handle.run_upkeep(); // run periodically
  sleep(upkeep_interval).await;
}

Sources: [1] PrometheusBuilder docs (docs.rs), [2] crate docs (lib.rs “Upkeep and maintenance”).


🏁 Script executed:

# Search for other uses of PrometheusBuilder to see if there's a pattern
rg "PrometheusBuilder" verifier/

Repository: AnkanMisra/MicroAI-Paygate

Length of output: 207


🏁 Script executed:

# Check if there are imports or other relevant code in main.rs
head -30 verifier/src/main.rs

Repository: AnkanMisra/MicroAI-Paygate

Length of output: 887


🌐 Web query:

metrics-exporter-prometheus PrometheusHandle Clone trait 0.18

💡 Result:

In metrics-exporter-prometheus v0.18.0, PrometheusHandle does implement Clone (impl Clone for PrometheusHandle). You can just do:

use metrics_exporter_prometheus::PrometheusBuilder;

let handle = PrometheusBuilder::new()
    .install_recorder()
    .expect("install recorder");

let handle2 = handle.clone();

Docs: PrometheusHandle shows Trait Implementations: Clone, Debug, and the impl Clone for PrometheusHandle block is present in the 0.18.0 rustdoc. [1]

If you’re seeing “the trait bound PrometheusHandle: Clone is not satisfied”, double-check you’re cloning the metrics_exporter_prometheus::PrometheusHandle type (not a different PrometheusHandle from another crate/module brought into scope by an import). [1]

Sources

  1. metrics-exporter-prometheus 0.18.0 rustdoc for PrometheusHandle (shows impl Clone) https://docs.rs/metrics-exporter-prometheus/0.18.0/metrics_exporter_prometheus/struct.PrometheusHandle.html [1]

install_recorder() does not spawn the upkeep task — histogram buckets will grow unbounded.

When using PrometheusBuilder::install_recorder(), the background upkeep task that drains histogram data structures is not spawned automatically. Without periodic calls to PrometheusHandle::run_upkeep(), memory will grow indefinitely over time.

🔧 Proposed fix — spawn a background upkeep task
     let recorder = PrometheusBuilder::new()
         .install_recorder()
         .expect("failed to install recorder");

+    // Upkeep is not automatic when using install_recorder(); drive it manually.
+    let upkeep_handle = recorder.clone();
+    tokio::spawn(async move {
+        loop {
+            tokio::time::sleep(std::time::Duration::from_secs(5)).await;
+            upkeep_handle.run_upkeep();
+        }
+    });
+
     let metrics_route = move || async move { recorder.render() };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verifier/src/main.rs` around lines 55 - 57, The Prometheus recorder installed
with PrometheusBuilder::install_recorder() needs a background upkeep loop to
call PrometheusHandle::run_upkeep() periodically to avoid unbounded histogram
growth; change how you use install_recorder() so you keep the returned
PrometheusHandle (instead of dropping it) and spawn a background task
(tokio::spawn or std::thread::spawn depending on your runtime) that loops with a
sleep interval (e.g., 10s) and calls handle.run_upkeep(). Ensure the spawned
task is detached and the handle is moved into the task so it remains alive for
the lifetime of the process.

Comment thread verifier/src/main.rs Outdated
@AnkanMisra
Copy link
Copy Markdown
Owner

@greptile review the pr and explain about the low confidence score along with snippets, be brief

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Feb 19, 2026

Looking at this PR, the low confidence score (1/5) is justified by several critical bugs:

1. Verifier Metrics Are Dead Code

In verifier/src/main.rs, the Prometheus setup happens after the blocking serve() call:

axum::serve(listener, app)
    .await
    .context("Failed to start Axum server")?;

// ⚠️ This code NEVER runs - unreachable dead code
let builder = PrometheusBuilder::new();
let handle = builder.install_recorder().expect("failed to install recorder");

let metrics_app = Router::new()
    .route("/metrics", get(|| async move { handle.render() }));

Impact: The /metrics endpoint is never served, and metrics::record_verification() calls do nothing.

2. Wrong Metrics API Syntax

In verifier/src/metrics.rs, the code uses outdated v0.17-0.20 syntax with v0.21:

counter!("verification_total", 1, "outcome" => outcome.as_str());
counter!("verification_failures", 1, "error" => error_type);

v0.21 requires:

counter!("verification_total", "outcome" => outcome.as_str()).increment(1);
counter!("verification_failures", "error" => error_type).increment(1);

Impact: This code won't compile with metrics v0.21.

3. Wrong Counter Incremented

In gateway/cache.go, cache misses increment the wrong counter:

// Cache miss
log.Printf("Cache miss for key: %s", cacheKey)
metrics.RecordCacheHit(false)
cacheHits.Inc()  // ❌ Should be cacheMisses.Inc()

Impact: Both cache hit/miss metrics will be incorrect.

4. Duplicate Middleware

In gateway/main.go:

router.Use(middleware.CorrelationIDMiddleware())
// ... other middleware ...
router.Use(middleware.CorrelationIDMiddleware())  // ❌ Registered twice

Impact: Every request runs through CorrelationIDMiddleware twice unnecessarily.


The gateway metrics structure itself is sound, but these bugs make the PR unsafe to merge without fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Prometheus Metrics Endpoint for Observability

2 participants