fix(relay-health): consume pool state and unify diagnostics#321
fix(relay-health): consume pool state and unify diagnostics#321rlucky02 wants to merge 2 commits intoaibtcdev:mainfrom
Conversation
arc0btc
left a comment
There was a problem hiding this comment.
Unifies the duplicated relay health check logic from relay-diagnostic and settings into a single shared service — a clear win. Both files had already started drifting apart (different hardcoded URLs, different thresholds, different diagnostic fields), so centralizing this now prevents future pain.
What works well:
- The
fetchJsonWithTimeouthelper is clean and eliminates the copy-pasted AbortController pattern that appeared 3+ times previously - Named constants for magic numbers (
STUCK_TX_THRESHOLD_SECONDS,LOW_CAPACITY_THRESHOLD_RATIO, etc.) make the thresholds visible and adjustable - Adding
/nonce/stateconsumption is the right call — the actionable diagnostics (per-wallet CB state, capacity, gaps) live there, not in/health - The network inference from relay response solves a real operational issue: custom
--relay-urlwould previously query Hiro on the wrong network - The fallback relay URL fix in
settingsis important — the old hardcodedhttps://sponsor.aibtc.devis stale;getSponsorRelayUrl(NETWORK)fromsponsor.tsis the source of truth
[suggestion] Pool recommendations pushed to issues array (relay-health.service.ts:1009)
pool.recommendation is advisory info from the relay engine (e.g. "consider running flush-wallet"), not an error signal. Pushing it into issues means the relay will always appear healthy: false whenever the relay has any operational note — even routine ones. This creates false negatives that make the health check less trustworthy over time.
Consider a separate field or a distinct prefix so callers can filter. For example, add a top-level advisories: string[] field separate from issues. Either way — if a recommendation doesn't indicate a malfunction, it shouldn't flip healthy to false.
[question] reachable semantics when only one endpoint fails
If /health throws but /nonce/state succeeds, relay.reachable = true and relay.error is also set. That's internally consistent (the relay is up, one endpoint is degraded) but may surprise callers who check relay.reachable and miss the relay.error. Is this intentional? If so, worth a comment in the code.
Code quality notes:
- The two-step
status.formatted = ""; ... status.formatted = formatRelayHealthStatus(status)pattern is a mild smell — the object is mutated immediately after construction. Not wrong, just slightly awkward. No action needed. - The
satisfies StuckTransactionusage ingetStuckTransactionsis idiomatic — good. [...new Set(issues)]dedup is a nice touch that prevents duplicate issue strings from concurrent detection paths.
Operational context: We run check-relay-health regularly via arc skills run --name bitcoin-wallet -- check-relay-health. The old hardcoded sponsor.aibtc.dev URL in settings has been causing mismatches — we've been overriding it manually. This fix lands directly in our monitoring path. The pool state addition is exactly what we need for surfacing CB open + capacity degradation automatically rather than requiring a separate /nonce/state call.
|
Addressed the health-signal concern in Changes pushed in
Validation:
|
Summary
relay-diagnosticandsettings/nonce/stateso pool-level degradation, per-wallet health, queue conflicts, and capacity are surfaced--relay-urldoes not query Hiro on the wrong networkWhat changed
src/lib/services/relay-health.service.tsrelay-diagnostic check-healthnow reports pool state plus primary sponsor nonce state from the shared servicesettings check-relay-healthnow uses the same shared service instead of its own stale relay/nonces implementationsettingsdefault relay URL now comes fromsrc/lib/config/sponsor.tsValidation
npm run typechecknpx tsx relay-diagnostic/relay-diagnostic.ts check-healthnpx tsx settings/settings.ts check-relay-health --relay-url https://x402-relay.aibtc.comNotes
/healthis coarse (status,network,version); the actionable pool diagnostics live under/nonce/state, so this patch prefers the current API shape over the stale issue assumption--relay-urlCloses #262