feat(gateway): signed announce webhook for control-plane presence messages#697
Conversation
Adds hono@4.12.23 + @hono/node-server@1.19.14 (matching apps/workspace-agent's pinned versions) so the gateway can host the announce webhook HTTP server. Includes the implementation plan.
…e posting
Building blocks for the POST /v1/announce control-plane presence webhook:
- Config: GATEWAY_WEBHOOK_SECRET, GATEWAY_PRESENCE_CHANNEL_ID, GATEWAY_HTTP_PORT
- HMAC-SHA256 verification over timestamp + '.' + rawBody (Stripe-style), with
constant-time compare, length guards, and a replay-window check
- Effect Schema payload validation (two v1 event types; unknown rejected) with
content-free error reasons
- Presence channel posting helper that resolves a channel by ID and sends an
embed with allowedMentions:{parse:[]}
Maps each announce event_type to a Discord embed with an accent color and in-character text. Honors a non-null rendered_text override verbatim (v1 emits null). Includes reserved color stubs for the not-yet-emitted fast-follower event types.
Adds the POST /v1/announce Hono server and the request pipeline: 8 KB size cap, per-source rate limiting, HMAC verification, replay-window check, an in-memory replay cache, exact-string timestamp cross-check, schema decode, embed render, and presence post. Authentication failures return an identical 401 so callers cannot tell which check failed; the replay signature is recorded only after a successful Discord post so retries after a post failure still succeed. Reject logging records the reason only, never the body.
Starts the announce HTTP server during gateway boot and closes it in the shutdown drain alongside the Discord client; a server-close failure is logged without masking client teardown. New announce requests are refused with 503 while the gateway is draining for shutdown.
…ate-limit spoofing - Replace the replay check/record pair with an atomic reserve/commit/release so two concurrent requests carrying the same signature can no longer both post; the reservation is released on every failure path so a legitimate retry after a failed post still succeeds. - Key rate limiting on the TCP socket remote address instead of the caller-supplied X-Forwarded-For header, and bound the limiter to a maximum number of tracked keys to prevent unbounded pre-auth memory growth. - Truncate embed descriptions to Discord's 4096-character limit so an over-long payload cannot wedge the control plane into endless retries. - Drop an unreachable parse-error classification branch, share the body-size constant, and make the payload schema value module-private.
Updates the gateway agent notes for the new src/http/ announce webhook: the fail-closed request pipeline, the generic-401 no-oracle auth behavior, the presence posting helper, the shutdown drain that closes the HTTP server, and the GATEWAY_ webhook config vars. Marks Effect Schema as now in use.
fro-bot
left a comment
There was a problem hiding this comment.
Verdict: CONDITIONAL
Solid, security-conscious implementation. The fail-closed pipeline ordering, identical-401 auth oracle suppression, atomic reserve/commit replay handling, socket-keyed rate limiting, and mandatory allowedMentions:{parse:[]} are all done correctly and well-tested (423 passing). Two issues should be addressed before merge; both are narrow but real.
Blocking issues
- Unbounded body buffering before the 8 KB size check (
server.ts:88-90). The handler'sANNOUNCE_MAX_BODY_BYTESguard runs onrawBody.byteLength— afterawait c.req.arrayBuffer()has already buffered the entire request into memory. The content-length pre-check (server.ts:79-86) is the only thing gating allocation, and it is trivially bypassed by omitting or understating theContent-Lengthheader (chunked transfer encoding, or a lying client). An unauthenticated caller can therefore force the gateway to allocate arbitrarily large buffers before any rejection — the rate limiter (60/min/IP) bounds frequency but not per-request size. This contradicts the PR description's claim that "Request bodies are capped at 8 KB before any HMAC work": the cap is before HMAC, but not before full-body buffering. Add a hard streaming body limit (e.g. Hono'sbodyLimitmiddleware, or enforcemaxRequestBodySizeon the node server) so oversized requests are rejected during read, not after.
Non-blocking concerns
-
Empty-string
rendered_textyields an empty embed (announce-schema.ts:34,47+templates.ts:77-78). The schema acceptsrendered_text: ""(Schema.NullOr(Schema.String)), andrenderEmbeduses any non-null value verbatim. An empty description produces an embed with no renderable content; Discord's API rejects such embeds, so the post fails and the handler returns 500 (and releases the reservation, so retries also fail). Exploitability is low because the body is HMAC-signed by a trusted control plane, but it is a correctness gap with no test coverage. Treat empty/whitespace-onlyrendered_textas "fall back to the templated description." -
Reserved replay-cache entries are never time-evicted (
replay-cache.ts:69-92).evictExpireddeliberately skipsRESERVEDentries, relying onrelease()/commit()always running. The handler does callrelease()on every post-reserve early return, but ifpostPresenceEmbednever settles (e.g. a hung Discord fetch with no timeout —presence.ts:56has no timeout onchannels.fetch), the reservation leaks for the process lifetime and permanently blocks that exact signature's retries. Consider a reservation timestamp + safety TTL, or a timeout around the Discord post. -
unknown_event_typeclassification is fragile but harmless (announce-schema.ts:82-89). Requiring every issue path to start withevent_typeis brittle against Union error shapes, but since the handler maps bothunknown_event_typeandbad_requestto the same 400 body (announce-handler.ts:180-183), the only consequence is a log label. Fine as-is. -
evictExpiredis O(n) per request in both rate-limit and replay caches. Bounded byMAX_KEYS=10_000, so acceptable at v1 scale; flagging for awareness if traffic grows.
Missing tests
- No test for
rendered_text: ""(empty string) throughrenderEmbedor end-to-end through the handler (the empty-embed → Discord-reject path). - No test for an oversized body sent without a truthful
Content-Length(the streaming-buffer bypass).server.test.tscovers the 413 content-length precheck but not the header-omitted case. - No test for a leaked/stuck reservation (hung Discord post) blocking subsequent retries.
Risk assessment (MED)
- Likelihood of regression: LOW — new isolated
http/module, no changes to existing command/runtime paths beyond additive lifecycle wiring inprogram.ts/shutdown.ts, which are covered. - Security exposure: MED — the body-buffering gap is an unauthenticated memory-pressure DoS vector reachable before any auth check; rate limiting mitigates but does not close it. All other auth/replay/mention controls are sound.
- Blast radius: LOW — single new endpoint, single-replica constraint documented, no public API breakage. The Discord token is already held by the gateway, so the new ingress does not widen the credential surface.
Run Summary
| Field | Value |
|---|---|
| Event | pull_request |
| Repository | fro-bot/agent |
| Run ID | 26671488999 |
| Cache | hit |
| Session | ses_1895e896affe6dP28TjXboKU2r |
…the Discord post - Enforce the 8 KB announce body cap with streaming bodyLimit middleware so an oversized request is rejected while the body is read, not after the whole payload has been buffered into memory. The earlier content-length precheck was bypassable with chunked transfer encoding, leaving an unauthenticated memory-pressure path; the streaming limit closes it. - Bound the Discord fetch-and-post with a timeout so a hung call can no longer leave a replay reservation pinned for the process lifetime, which would have permanently blocked retries for that signature. - Fall back to the templated description when rendered_text is empty or whitespace-only, avoiding an empty embed that Discord would reject.
fro-bot
left a comment
There was a problem hiding this comment.
Verdict: PASS
All blocking and non-blocking issues from my prior review (commit 50f4456) are resolved with correct fixes and dedicated test coverage. The implementation is sound, security-conscious, and well-tested.
Blocking issues
None — the previously-blocking memory-DoS gap is fixed.
- Body buffering before size check → fixed.
bodyLimitmiddleware (server.ts:77-82) now enforcesANNOUNCE_MAX_BODY_BYTESduring the streaming read, so a caller cannot bypass the cap via chunked transfer encoding or an omitted/understatedContent-Length. The content-length fast-path and the handler'sbyteLengthguard remain as defense-in-depth. Hono caches the body, so the laterc.req.arrayBuffer()returns the already-bounded bytes. Covered byserver.test.ts:412(oversized body with no truthful Content-Length → 413, Discord never called).
Non-blocking concerns
All previously-raised concerns are now addressed:
- Empty
rendered_text→ fixed.renderEmbednow falls through to the per-event template whenrendered_textis null OR trims to empty (templates.ts:80), so Discord never receives an empty embed description. Covered bytemplates.test.ts:147(empty string) and:162(whitespace-only), with a:177regression guard that non-empty text is still used verbatim. - Leaked replay reservation on a hung Discord call → fixed.
postPresenceEmbednow races the fetch+send against a 10 s timeout (presence.ts:104-118), returningsend-failedso the handler always reachesreplayCache.release(). The dangling-rejection is swallowed with a no-op.catch, andclearTimeoutruns on the win path. Covered bypresence.test.ts:200(fetch never resolves → "discord post timed out"). unknown_event_typeclassification fragility — remains harmless (both branches map to the same 400 body); no change needed.evictExpiredO(n) per request — bounded byMAX_KEYS=10_000; acceptable at v1 scale.
One minor forward-looking note (non-blocking, no action required): the timeout in postPresenceEmbed bounds the reservation leak but the timed-out Discord op may still complete in the background after release() — a rare double-post window if the control plane retries the same signature within the window. Acceptable for v1 given the trusted, HMAC-signed caller and single-replica constraint.
Missing tests
None — the three gaps I flagged previously now have explicit coverage (413 streaming-bypass, empty/whitespace rendered_text, hung-Discord timeout).
Risk assessment (LOW)
- Likelihood of regression: LOW — isolated new
http/module; existing command/runtime paths untouched beyond covered additive lifecycle wiring. - Security exposure: LOW — the pre-auth memory-DoS vector is closed at the streaming boundary; auth, replay, rate-limit, and mention controls are all sound and tested.
- Blast radius: LOW — single new endpoint, single-replica constraint documented, no public API breakage, no widening of the credential surface.
Run Summary
| Field | Value |
|---|---|
| Event | pull_request |
| Repository | fro-bot/agent |
| Run ID | 26671760842 |
| Cache | hit |
| Session | ses_1895e896affe6dP28TjXboKU2r |
Closes #671.
Adds
POST /v1/announceto the gateway — the first HTTP ingress — so the control plane can post presence messages in Discord as the Fro Bot user (surveys completing, collaboration invitations accepted) rather than through a webhook bot. The gateway already holds the Discord token and a live client, so it turns a signed request into a Discord embed.Authentication
HMAC-SHA256 over
timestamp + "." + rawBody(Stripe-style) with a shared secret, constant-time compared. The signature binds the timestamp, and theX-Gateway-Timestampheader must equal the bodyfired_atby exact-string match. Requests outside a ±5 minute window are rejected. Signature verification, timestamp expiry, and replay all return an identical 401 so a caller cannot tell which check failed.Replay and abuse protection
An in-memory seen-signature cache reserves a signature before posting and commits it only after a successful post — so concurrent duplicates can't both post, and a failed post still allows a legitimate retry. Rate limiting is keyed on the TCP socket address (not the spoofable
X-Forwarded-For) and the tracked-key set is bounded. Request bodies are capped at 8 KB before any HMAC work.Posting
The payload is validated with a versioned schema (two v1 event types; unknown types are rejected). The message is rendered into a Discord embed with an event-type accent color and posted to a fixed channel from
GATEWAY_PRESENCE_CHANNEL_ID(never from the payload), withallowedMentions: {parse: []}so payload text cannot trigger pings. Descriptions are truncated to Discord's limit. A non-nullrendered_textis honored verbatim for forward compatibility.Lifecycle
The HTTP server starts during gateway boot and closes in the shutdown drain alongside the Discord client; a server-close failure is logged without masking client teardown. New requests are refused with 503 while draining.
Configuration (new)
GATEWAY_WEBHOOK_SECRET(required),GATEWAY_PRESENCE_CHANNEL_ID(required),GATEWAY_HTTP_PORT(optional, default 3000).Gateway test suite: 423 passing.