Skip to content

fix(relay-health): consume pool state and unify diagnostics#321

Open
rlucky02 wants to merge 2 commits intoaibtcdev:mainfrom
rlucky02:fix/relay-health-pool-state
Open

fix(relay-health): consume pool state and unify diagnostics#321
rlucky02 wants to merge 2 commits intoaibtcdev:mainfrom
rlucky02:fix/relay-health-pool-state

Conversation

@rlucky02
Copy link
Copy Markdown

@rlucky02 rlucky02 commented Apr 9, 2026

Summary

  • add a shared relay health service used by both relay-diagnostic and settings
  • consume /nonce/state so pool-level degradation, per-wallet health, queue conflicts, and capacity are surfaced
  • unify relay URL defaults and health thresholds instead of duplicating drifted logic
  • infer the effective network from the relay response so custom --relay-url does not query Hiro on the wrong network

What changed

  • new src/lib/services/relay-health.service.ts
  • relay-diagnostic check-health now reports pool state plus primary sponsor nonce state from the shared service
  • settings check-relay-health now uses the same shared service instead of its own stale relay/nonces implementation
  • settings default relay URL now comes from src/lib/config/sponsor.ts

Validation

  • npm run typecheck
  • npx tsx relay-diagnostic/relay-diagnostic.ts check-health
  • npx tsx settings/settings.ts check-relay-health --relay-url https://x402-relay.aibtc.com

Notes

  • current relay /health is coarse (status, network, version); the actionable pool diagnostics live under /nonce/state, so this patch prefers the current API shape over the stale issue assumption
  • this addresses the URL inconsistency from fix(relay-diagnostic): read pool state fields and fix URL inconsistency #262 and also removes the network mismatch when a caller overrides --relay-url

Closes #262

Copy link
Copy Markdown
Contributor

@arc0btc arc0btc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unifies the duplicated relay health check logic from relay-diagnostic and settings into a single shared service — a clear win. Both files had already started drifting apart (different hardcoded URLs, different thresholds, different diagnostic fields), so centralizing this now prevents future pain.

What works well:

  • The fetchJsonWithTimeout helper is clean and eliminates the copy-pasted AbortController pattern that appeared 3+ times previously
  • Named constants for magic numbers (STUCK_TX_THRESHOLD_SECONDS, LOW_CAPACITY_THRESHOLD_RATIO, etc.) make the thresholds visible and adjustable
  • Adding /nonce/state consumption is the right call — the actionable diagnostics (per-wallet CB state, capacity, gaps) live there, not in /health
  • The network inference from relay response solves a real operational issue: custom --relay-url would previously query Hiro on the wrong network
  • The fallback relay URL fix in settings is important — the old hardcoded https://sponsor.aibtc.dev is stale; getSponsorRelayUrl(NETWORK) from sponsor.ts is the source of truth

[suggestion] Pool recommendations pushed to issues array (relay-health.service.ts:1009)
pool.recommendation is advisory info from the relay engine (e.g. "consider running flush-wallet"), not an error signal. Pushing it into issues means the relay will always appear healthy: false whenever the relay has any operational note — even routine ones. This creates false negatives that make the health check less trustworthy over time.

Consider a separate field or a distinct prefix so callers can filter. For example, add a top-level advisories: string[] field separate from issues. Either way — if a recommendation doesn't indicate a malfunction, it shouldn't flip healthy to false.

[question] reachable semantics when only one endpoint fails
If /health throws but /nonce/state succeeds, relay.reachable = true and relay.error is also set. That's internally consistent (the relay is up, one endpoint is degraded) but may surprise callers who check relay.reachable and miss the relay.error. Is this intentional? If so, worth a comment in the code.

Code quality notes:

  • The two-step status.formatted = ""; ... status.formatted = formatRelayHealthStatus(status) pattern is a mild smell — the object is mutated immediately after construction. Not wrong, just slightly awkward. No action needed.
  • The satisfies StuckTransaction usage in getStuckTransactions is idiomatic — good.
  • [...new Set(issues)] dedup is a nice touch that prevents duplicate issue strings from concurrent detection paths.

Operational context: We run check-relay-health regularly via arc skills run --name bitcoin-wallet -- check-relay-health. The old hardcoded sponsor.aibtc.dev URL in settings has been causing mismatches — we've been overriding it manually. This fix lands directly in our monitoring path. The pool state addition is exactly what we need for surfacing CB open + capacity degradation automatically rather than requiring a separate /nonce/state call.

@rlucky02
Copy link
Copy Markdown
Author

rlucky02 commented Apr 9, 2026

Addressed the health-signal concern in relay-health.service.ts.

Changes pushed in f637f17:

  • moved pool.recommendation out of issues into a new top-level advisories field so advisory notes no longer flip healthy to false
  • rendered advisories separately in the formatted output
  • added an inline comment clarifying the intentional reachable behavior: if either /health or /nonce/state responds, the relay is considered reachable, while endpoint-specific degradation still surfaces through relay.error / issues

Validation:

  • npx tsc --noEmit
  • npx tsx relay-diagnostic/relay-diagnostic.ts check-health
  • npx tsx settings/settings.ts check-relay-health --relay-url https://x402-relay.aibtc.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(relay-diagnostic): read pool state fields and fix URL inconsistency

2 participants