Skip to content

docs(solutions): signed webhook ingress hardening patterns#699

Merged
marcusrbrown merged 2 commits into
mainfrom
docs/webhook-security-learnings
May 30, 2026
Merged

docs(solutions): signed webhook ingress hardening patterns#699
marcusrbrown merged 2 commits into
mainfrom
docs/webhook-security-learnings

Conversation

@marcusrbrown
Copy link
Copy Markdown
Collaborator

Captures the reusable security patterns behind the announce webhook (#697) as a best-practices solution doc, so the next signed-webhook surface starts from the hardened shape instead of rediscovering it.

Patterns documented:

  • Raw-bytes HMAC signing (sign exact received bytes + timestamp prefix, never re-serialized JSON)
  • Signed-timestamp replay window with exact-string header/body cross-check
  • Identical 401 for every auth failure (no oracle for which check failed)
  • Atomic reserve-before-await / commit-after-success / release-on-failure replay cache
  • Streaming body-size cap (reject during read, not after buffering)
  • Socket-keyed rate limiting with a bounded key map (not the spoofable X-Forwarded-For)
  • Untrusted payload text confined to the embed body with mentions disabled
  • Bounded outbound calls so a hung downstream can't pin a reserved key

Cross-linked with the existing slash-command orchestration doc as the two halves of the gateway's inbound trust boundary.

Documents the reusable security patterns from the announce webhook: raw-bytes
HMAC signing, signed-timestamp replay window, no-oracle 401, atomic
reserve/commit/release replay handling, streaming body-size cap, socket-keyed
rate limiting, untrusted-payload mention hardening, and bounded outbound
calls.
Copy link
Copy Markdown
Owner

@fro-bot fro-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: PASS

Documentation-only PR. It captures the security patterns behind the POST /v1/announce webhook (#697) as a best-practices solution doc. I verified every code-bearing claim against the actual implementation in packages/gateway/src/http/ and packages/gateway/src/discord/presence.ts — the doc is accurate, and the snippets faithfully reflect shipped, already-reviewed code. No source, schema, public API, or test behavior changes in this PR, so the correctness/security/breaking-change/coverage surface is the prose itself.

Spot-checks that matched the implementation:

  • Raw-bytes HMAC over timestampHeader + "." + rawBody with timingSafeEqualhmac.ts:31,51.
  • Identical 401 via shared UNAUTHORIZED_BODY for hmac/timestamp/replay — announce-handler.ts:81,135,143,151.
  • Atomic synchronous reserve() before the await; commit after success; release on every post-reserve early return — replay-cache.ts:104-114, announce-handler.ts:149,160,173,182,195,200.
  • Streaming bodyLimit ahead of the handler + content-length fast path + post-buffer byteLength guard — server.ts:77-98, announce-handler.ts:107.
  • Socket-keyed rate limit (getConnInfo, not XFF) with bounded MAX_KEYS fail-closed — server.ts:108-109, rate-limit.ts:20,68.
  • allowedMentions: {parse: []} on every send + non-empty rendered_text fallback — presence.ts:89-94, and Promise.race timeout with no-op .catchpresence.ts:104-118.

Blocking issues

None.

Non-blocking concerns

  • §6 omits the absent-socket-address fallback. The doc shows connInfo.remote.address ?? undefined, but the handler maps an absent key to a shared constant bucket DEFAULT_SOURCE_KEY = '__unknown__' (announce-handler.ts:48,104). That means all keyless requests share one rate-limit bucket — a deliberate and reasonable fail-closed choice, but a reader copying only the doc's snippet won't reproduce it. Worth one sentence so the pattern is transferable.
  • §4 / replay-cache.ts reserved-entry eviction caveat is not mentioned. Reserved entries are never time-evicted (replay-cache.ts:69-92); the safety net is process restart, since §8's outbound timeout guarantees release() runs. The doc's §8 implicitly covers this, but the doc presents the reserve/commit/release invariant as self-sufficient. A one-line note that "no outbound timeout ⇒ a reserved key leaks until restart" would tie §4 and §8 together explicitly and reflect the actual single-replica/in-memory constraint documented in replay-cache.ts:16-19.
  • Single-replica constraint not surfaced in the doc. replay-cache.ts:16-19 and rate-limit.ts are in-memory and per-process; multi-replica deployments need a shared store or single replica. Since this is a reusable pattern doc, calling out that the replay/rate-limit guarantees are per-process would prevent a future surface from inheriting the patterns and silently losing them behind a load balancer.

Missing tests

None. No behavior changes; the underlying modules already carry tests (hmac.test.ts, replay-cache.test.ts, rate-limit.test.ts, announce-handler.test.ts, server.test.ts, presence/templates tests).

Risk assessment (LOW/MED/HIGH): LOW

Docs-only, additive (+245/-0, one new file), no executable paths touched. Zero regression or security blast radius. The non-blocking notes are about completeness/transferability of the prose, not correctness.


Run Summary
Field Value
Event pull_request
Repository fro-bot/agent
Run ID 26672824020
Cache hit
Session ses_1892481baffeA9eshaf9VSupz5

@fro-bot fro-bot added the agent: working Agent is currently working on this label May 30, 2026
Copy link
Copy Markdown
Owner

@fro-bot fro-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: PASS

Re-review after commit ed30162 (merge of main into the branch). The PR's net diff is still exactly one added file — docs/solutions/best-practices/signed-webhook-ingress-hardening-2026-05-29.md (+245/-0) — and the merge introduced no changes to the gateway HTTP implementation the doc describes (packages/gateway/src/http/ is untouched on main since my prior pass). My earlier verification that every code-bearing claim matches the shipped code therefore still holds; nothing reviewable changed. Approval stands.

Blocking issues

None.

Non-blocking concerns

Carried forward from the prior review (all optional doc-completeness notes, none affecting correctness):

  • §6 omits the absent-socket-address fallback. The snippet shows connInfo.remote.address ?? undefined, but the handler maps an absent key to a shared bucket DEFAULT_SOURCE_KEY = '__unknown__' (announce-handler.ts:48,104). All keyless requests share one rate-limit bucket — a deliberate fail-closed choice a reader copying only the snippet won't reproduce. One sentence would make the pattern transferable.
  • §4↔§8 reserved-entry leak link is implicit. Reserved entries are never time-evicted (replay-cache.ts:69-92); the safety net is process restart, guaranteed only because §8's outbound timeout forces release(). A one-line "no outbound timeout ⇒ reserved key leaks until restart" would tie the two sections together explicitly.
  • Single-replica constraint not surfaced. The replay cache and rate limiter are in-memory/per-process (replay-cache.ts:16-19); a multi-replica deployment silently loses both guarantees. Worth stating in a reusable pattern doc.

Missing tests

None. No behavior changes; the underlying modules already carry tests (hmac.test.ts, replay-cache.test.ts, rate-limit.test.ts, announce-handler.test.ts, server.test.ts, plus presence/templates).

Risk assessment (LOW/MED/HIGH): LOW

Docs-only, additive, one new file; the only new commit is a no-op main merge that touches no executable paths. Zero regression or security blast radius.


Run Summary
Field Value
Event pull_request
Repository fro-bot/agent
Run ID 26685069545
Cache hit
Session ses_1892481baffeA9eshaf9VSupz5

@fro-bot fro-bot removed the agent: working Agent is currently working on this label May 30, 2026
@marcusrbrown marcusrbrown merged commit 52d1ced into main May 30, 2026
10 checks passed
@marcusrbrown marcusrbrown deleted the docs/webhook-security-learnings branch May 30, 2026 13:33
@fro-bot fro-bot mentioned this pull request May 30, 2026
46 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants