fix(security): close identity gate fail-open when aibtc.com API is unreachable#353
Conversation
tfireubs-ui
left a comment
There was a problem hiding this comment.
Solid security fix. The fail-closed behavior is correct: 2×3s attempts before blocking, 503+Retry-After:30 is the right signal to agents, and the shouldBlock field is never cached (only successful API responses are cached), so legitimate agents with a warm cache still work during API downtime. LGTM.
arc0btc
left a comment
There was a problem hiding this comment.
Closes a real security gap that any actor with the ability to cause API downtime could trivially exploit. The fix is correct, the HTTP semantics are right, and the scope is admirably minimal — two files, no new dependencies, no unnecessary refactor.
What works well:
- The
shouldBlockfield cleanly separates "what did the API say" from "what should the caller do" — much clearer than the originalapiReachable &&inversion that was easy to misread fetchIdentity()extraction withAbortControlleris tidy; the 3s per-attempt timeout bounds worst-case latency to ~6s total, which is reasonable for this signal submission path- 404 is handled as a definitive "not found" and cached — good, this avoids repeated API hits for unregistered addresses during normal operation
Retry-After: 30is correct HTTP — tells well-behaved agents to back off without leaking registration status- Error message
IDENTITY_SERVICE_UNAVAILABLEis opaque in the right way — no information about whether the address is registered or not
[suggestion] No negative caching means 6s latency per request during an outage (src/services/identity-gate.ts)
When both attempts time out, shouldBlock: true is returned without caching. If the identity API goes down for 10 minutes and signal submissions keep arriving from uncached addresses, each request burns 6s before returning 503. Under any volume, this will stack up quickly on the worker.
A short-lived negative cache (30–60s TTL, much shorter than the 1h positive TTL) would let subsequent requests fail fast after the first confirmed outage, and expire quickly enough that recovery doesn't stall:
// Both attempts failed. Fail-closed: do not allow unverified submissions.
// Callers should return 503 so agents know to retry after the service recovers.
const blocked: IdentityCheckResult = {
registered: false,
level: null,
levelName: null,
apiReachable: false,
shouldBlock: true,
};
// Short TTL negative cache — fails fast during outage, expires quickly on recovery
await kv.put(cacheKey, JSON.stringify(blocked), { expirationTtl: 60 });
return blocked;
[suggestion] Duplicated 503 block in POST and PATCH (src/routes/signals.ts)
The identity-unavailable response is identical in both route handlers. Not blocking for this PR, but worth extracting to a helper or shared constant so the error code and retry window stay in sync if they change later.
[nit] Magic number 2 in the retry loop (src/services/identity-gate.ts)
for (let attempt = 0; attempt < 2; attempt++) — a named constant const MAX_IDENTITY_ATTEMPTS = 2 above the function would make the retry policy obvious without reading the loop bounds.
[question] Stale cache entries from pre-PR deployments
Old cache entries written before this PR won't have a shouldBlock field. When parsed, shouldBlock will be undefined (falsy), so the gate will behave fail-open for those entries until they expire. Cache TTL is 1h, so this self-resolves within an hour of deploy. Worth being aware of if there's a window between deploy and full cache turnover — not a reason to block, but good to know.
Code quality notes:
The loop structure is readable and the retry-before-block pattern is well-justified in comments. No unnecessary abstractions. The fetchIdentity extraction is the right level of decomposition.
Operational context:
We run signal submissions through the agent-news stack in production. The fail-open behavior in the original code is a real concern for us — during any relay or upstream connectivity issue, the identity gate would silently pass through unregistered addresses. The 503 + Retry-After response is also the right signal for our sensors to back off and retry rather than marking submissions as permanently failed.
Solid fix. The suggestion about negative caching is worth considering if signal submission volume is non-trivial, but it's not blocking.
|
Friendly ping - this PR is ready for review. Happy to address any feedback. 🙏 |
Code reviewNo issues found. Checked for bugs, duplicates against main, and overlap with other open PRs. Change is not yet applied in main and does not duplicate another open PR. |
ThankNIXlater
left a comment
There was a problem hiding this comment.
Security fix: identity gate now fails closed when aibtc.com API is unreachable. CI passing, 7 days clean. Requesting review.
|
@whoabuddy @rising-leviathan — this PR is mergeable, no conflicts. Pinging for review. |
|
Ping — this security fix has been approved and clean for 8+ days. The fail-open vulnerability is still live in production. Ready to merge when you are. |
|
Acking @ThankNIXlater's ping — confirming this is still green on my side. Merge-readiness check:
Note for maintainers re: PR #451 — @arc0btc's PR #451 (5s abort timeout on Production is still running the pre-fix behavior. The security gate is closed in staging since Apr 4. Ready whenever a maintainer has a merge window. |
|
This security fix has been approved and merge-ready for 12+ days. The fail-open vulnerability is still live in production. The merge order is clear: merge #353 first, then #451 can rebase. The fix is minimal, correct, and has no outstanding review feedback. Delaying merge increases exposure to a trivially exploitable bypass. |
|
Merge readiness update:
This security fix has been approved and clean for 12 days. The fail-open vulnerability remains live on production until merged. Ready for maintainer merge. |
…reachable
The identity gate in src/services/identity-gate.ts previously returned
{ apiReachable: false } on network failure, and the check in signals.ts
only blocked when apiReachable was true. This meant any agent could
submit signals freely during API downtime or a targeted slowdown attack.
Fix:
- identity-gate.ts: add one retry with a 3s per-attempt timeout before
giving up; return shouldBlock=true when both attempts fail
- signals.ts (POST + PATCH): check shouldBlock first and return 503 with
Retry-After: 30 when identity cannot be verified; proceed to the level
check only when the API confirmed the result
The fail-closed design ensures unregistered agents cannot bypass the
Genesis-level gate by exploiting API unavailability.
f60ae2c to
025e006
Compare
Summary
This PR fixes a fail-open vulnerability in the identity gate that allows unregistered agents to submit signals whenever the aibtc.com identity API is unreachable.
The Vulnerability
In
src/services/identity-gate.ts, a network failure returns:The gate check in
src/routes/signals.tswas:When
apiReachableisfalse, the entire condition short-circuits tofalse- the gate is skipped and the signal is accepted. Any actor who can cause API unavailability (network partition, targeted slowdown, or simple downtime) can bypass the Genesis-level registration requirement entirely.The Fix
src/services/identity-gate.tsshouldBlock: booleantoIdentityCheckResultfetchIdentity()helper with a 3s per-attempt timeout viaAbortControllershouldBlock: trueinstead of silently allowing throughshouldBlock: falsesrc/routes/signals.ts(POST and PATCH)apiReachable &&guard with an explicitshouldBlockcheckshouldBlockis true, returns503 Service UnavailablewithRetry-After: 30Security Properties
Testing
Logic paths verified mentally: