feat(gateway): signed announce webhook for control-plane presence messages by marcusrbrown · Pull Request #697 · fro-bot/agent

marcusrbrown · 2026-05-30T02:04:07Z

Closes #671.

Adds POST /v1/announce to the gateway — the first HTTP ingress — so the control plane can post presence messages in Discord as the Fro Bot user (surveys completing, collaboration invitations accepted) rather than through a webhook bot. The gateway already holds the Discord token and a live client, so it turns a signed request into a Discord embed.

Authentication

HMAC-SHA256 over timestamp + "." + rawBody (Stripe-style) with a shared secret, constant-time compared. The signature binds the timestamp, and the X-Gateway-Timestamp header must equal the body fired_at by exact-string match. Requests outside a ±5 minute window are rejected. Signature verification, timestamp expiry, and replay all return an identical 401 so a caller cannot tell which check failed.

Replay and abuse protection

An in-memory seen-signature cache reserves a signature before posting and commits it only after a successful post — so concurrent duplicates can't both post, and a failed post still allows a legitimate retry. Rate limiting is keyed on the TCP socket address (not the spoofable X-Forwarded-For) and the tracked-key set is bounded. Request bodies are capped at 8 KB before any HMAC work.

Posting

The payload is validated with a versioned schema (two v1 event types; unknown types are rejected). The message is rendered into a Discord embed with an event-type accent color and posted to a fixed channel from GATEWAY_PRESENCE_CHANNEL_ID (never from the payload), with allowedMentions: {parse: []} so payload text cannot trigger pings. Descriptions are truncated to Discord's limit. A non-null rendered_text is honored verbatim for forward compatibility.

Lifecycle

The HTTP server starts during gateway boot and closes in the shutdown drain alongside the Discord client; a server-close failure is logged without masking client teardown. New requests are refused with 503 while draining.

Configuration (new)

GATEWAY_WEBHOOK_SECRET (required), GATEWAY_PRESENCE_CHANNEL_ID (required), GATEWAY_HTTP_PORT (optional, default 3000).
The deploy wiring in the infra repo and the control-plane signing side are separate, follow-on work.

Gateway test suite: 423 passing.

Adds hono@4.12.23 + @hono/node-server@1.19.14 (matching apps/workspace-agent's pinned versions) so the gateway can host the announce webhook HTTP server. Includes the implementation plan.

…e posting Building blocks for the POST /v1/announce control-plane presence webhook: - Config: GATEWAY_WEBHOOK_SECRET, GATEWAY_PRESENCE_CHANNEL_ID, GATEWAY_HTTP_PORT - HMAC-SHA256 verification over timestamp + '.' + rawBody (Stripe-style), with constant-time compare, length guards, and a replay-window check - Effect Schema payload validation (two v1 event types; unknown rejected) with content-free error reasons - Presence channel posting helper that resolves a channel by ID and sends an embed with allowedMentions:{parse:[]}

Maps each announce event_type to a Discord embed with an accent color and in-character text. Honors a non-null rendered_text override verbatim (v1 emits null). Includes reserved color stubs for the not-yet-emitted fast-follower event types.

Adds the POST /v1/announce Hono server and the request pipeline: 8 KB size cap, per-source rate limiting, HMAC verification, replay-window check, an in-memory replay cache, exact-string timestamp cross-check, schema decode, embed render, and presence post. Authentication failures return an identical 401 so callers cannot tell which check failed; the replay signature is recorded only after a successful Discord post so retries after a post failure still succeed. Reject logging records the reason only, never the body.

Starts the announce HTTP server during gateway boot and closes it in the shutdown drain alongside the Discord client; a server-close failure is logged without masking client teardown. New announce requests are refused with 503 while the gateway is draining for shutdown.

…ate-limit spoofing - Replace the replay check/record pair with an atomic reserve/commit/release so two concurrent requests carrying the same signature can no longer both post; the reservation is released on every failure path so a legitimate retry after a failed post still succeeds. - Key rate limiting on the TCP socket remote address instead of the caller-supplied X-Forwarded-For header, and bound the limiter to a maximum number of tracked keys to prevent unbounded pre-auth memory growth. - Truncate embed descriptions to Discord's 4096-character limit so an over-long payload cannot wedge the control plane into endless retries. - Drop an unreachable parse-error classification branch, share the body-size constant, and make the payload schema value module-private.

Updates the gateway agent notes for the new src/http/ announce webhook: the fail-closed request pipeline, the generic-401 no-oracle auth behavior, the presence posting helper, the shutdown drain that closes the HTTP server, and the GATEWAY_ webhook config vars. Marks Effect Schema as now in use.

fro-bot

Verdict: CONDITIONAL

Solid, security-conscious implementation. The fail-closed pipeline ordering, identical-401 auth oracle suppression, atomic reserve/commit replay handling, socket-keyed rate limiting, and mandatory allowedMentions:{parse:[]} are all done correctly and well-tested (423 passing). Two issues should be addressed before merge; both are narrow but real.

Blocking issues

Unbounded body buffering before the 8 KB size check (server.ts:88-90). The handler's ANNOUNCE_MAX_BODY_BYTES guard runs on rawBody.byteLength — after await c.req.arrayBuffer() has already buffered the entire request into memory. The content-length pre-check (server.ts:79-86) is the only thing gating allocation, and it is trivially bypassed by omitting or understating the Content-Length header (chunked transfer encoding, or a lying client). An unauthenticated caller can therefore force the gateway to allocate arbitrarily large buffers before any rejection — the rate limiter (60/min/IP) bounds frequency but not per-request size. This contradicts the PR description's claim that "Request bodies are capped at 8 KB before any HMAC work": the cap is before HMAC, but not before full-body buffering. Add a hard streaming body limit (e.g. Hono's bodyLimit middleware, or enforce maxRequestBodySize on the node server) so oversized requests are rejected during read, not after.

Non-blocking concerns

Empty-string rendered_text yields an empty embed (announce-schema.ts:34,47 + templates.ts:77-78). The schema accepts rendered_text: "" (Schema.NullOr(Schema.String)), and renderEmbed uses any non-null value verbatim. An empty description produces an embed with no renderable content; Discord's API rejects such embeds, so the post fails and the handler returns 500 (and releases the reservation, so retries also fail). Exploitability is low because the body is HMAC-signed by a trusted control plane, but it is a correctness gap with no test coverage. Treat empty/whitespace-only rendered_text as "fall back to the templated description."
Reserved replay-cache entries are never time-evicted (replay-cache.ts:69-92). evictExpired deliberately skips RESERVED entries, relying on release()/commit() always running. The handler does call release() on every post-reserve early return, but if postPresenceEmbed never settles (e.g. a hung Discord fetch with no timeout — presence.ts:56 has no timeout on channels.fetch), the reservation leaks for the process lifetime and permanently blocks that exact signature's retries. Consider a reservation timestamp + safety TTL, or a timeout around the Discord post.
unknown_event_type classification is fragile but harmless (announce-schema.ts:82-89). Requiring every issue path to start with event_type is brittle against Union error shapes, but since the handler maps both unknown_event_type and bad_request to the same 400 body (announce-handler.ts:180-183), the only consequence is a log label. Fine as-is.
evictExpired is O(n) per request in both rate-limit and replay caches. Bounded by MAX_KEYS=10_000, so acceptable at v1 scale; flagging for awareness if traffic grows.

Missing tests

No test for rendered_text: "" (empty string) through renderEmbed or end-to-end through the handler (the empty-embed → Discord-reject path).
No test for an oversized body sent without a truthful Content-Length (the streaming-buffer bypass). server.test.ts covers the 413 content-length precheck but not the header-omitted case.
No test for a leaked/stuck reservation (hung Discord post) blocking subsequent retries.

Risk assessment (MED)

Likelihood of regression: LOW — new isolated http/ module, no changes to existing command/runtime paths beyond additive lifecycle wiring in program.ts/shutdown.ts, which are covered.
Security exposure: MED — the body-buffering gap is an unauthenticated memory-pressure DoS vector reachable before any auth check; rate limiting mitigates but does not close it. All other auth/replay/mention controls are sound.
Blast radius: LOW — single new endpoint, single-replica constraint documented, no public API breakage. The Discord token is already held by the gateway, so the new ingress does not widen the credential surface.

Run Summary

Field	Value
Event	pull_request
Repository	fro-bot/agent
Run ID	26671488999
Cache	hit
Session	ses_1895e896affe6dP28TjXboKU2r

…the Discord post - Enforce the 8 KB announce body cap with streaming bodyLimit middleware so an oversized request is rejected while the body is read, not after the whole payload has been buffered into memory. The earlier content-length precheck was bypassable with chunked transfer encoding, leaving an unauthenticated memory-pressure path; the streaming limit closes it. - Bound the Discord fetch-and-post with a timeout so a hung call can no longer leave a replay reservation pinned for the process lifetime, which would have permanently blocked retries for that signature. - Fall back to the templated description when rendered_text is empty or whitespace-only, avoiding an empty embed that Discord would reject.

fro-bot

Verdict: PASS

All blocking and non-blocking issues from my prior review (commit 50f4456) are resolved with correct fixes and dedicated test coverage. The implementation is sound, security-conscious, and well-tested.

Blocking issues

None — the previously-blocking memory-DoS gap is fixed.

Body buffering before size check → fixed. bodyLimit middleware (server.ts:77-82) now enforces ANNOUNCE_MAX_BODY_BYTES during the streaming read, so a caller cannot bypass the cap via chunked transfer encoding or an omitted/understated Content-Length. The content-length fast-path and the handler's byteLength guard remain as defense-in-depth. Hono caches the body, so the later c.req.arrayBuffer() returns the already-bounded bytes. Covered by server.test.ts:412 (oversized body with no truthful Content-Length → 413, Discord never called).

Non-blocking concerns

All previously-raised concerns are now addressed:

Empty rendered_text → fixed. renderEmbed now falls through to the per-event template when rendered_text is null OR trims to empty (templates.ts:80), so Discord never receives an empty embed description. Covered by templates.test.ts:147 (empty string) and :162 (whitespace-only), with a :177 regression guard that non-empty text is still used verbatim.
Leaked replay reservation on a hung Discord call → fixed. postPresenceEmbed now races the fetch+send against a 10 s timeout (presence.ts:104-118), returning send-failed so the handler always reaches replayCache.release(). The dangling-rejection is swallowed with a no-op .catch, and clearTimeout runs on the win path. Covered by presence.test.ts:200 (fetch never resolves → "discord post timed out").
unknown_event_type classification fragility — remains harmless (both branches map to the same 400 body); no change needed.
evictExpired O(n) per request — bounded by MAX_KEYS=10_000; acceptable at v1 scale.

One minor forward-looking note (non-blocking, no action required): the timeout in postPresenceEmbed bounds the reservation leak but the timed-out Discord op may still complete in the background after release() — a rare double-post window if the control plane retries the same signature within the window. Acceptable for v1 given the trusted, HMAC-signed caller and single-replica constraint.

Missing tests

None — the three gaps I flagged previously now have explicit coverage (413 streaming-bypass, empty/whitespace rendered_text, hung-Discord timeout).

Risk assessment (LOW)

Likelihood of regression: LOW — isolated new http/ module; existing command/runtime paths untouched beyond covered additive lifecycle wiring.
Security exposure: LOW — the pre-auth memory-DoS vector is closed at the streaming boundary; auth, replay, rate-limit, and mention controls are all sound and tested.
Blast radius: LOW — single new endpoint, single-replica constraint documented, no public API breakage, no widening of the credential surface.

Run Summary

Field	Value
Event	pull_request
Repository	fro-bot/agent
Run ID	26671760842
Cache	hit
Session	ses_1895e896affe6dP28TjXboKU2r

marcusrbrown added 7 commits May 29, 2026 17:39

chore(gateway): add hono dependency for announce webhook

ec3083d

Adds hono@4.12.23 + @hono/node-server@1.19.14 (matching apps/workspace-agent's pinned versions) so the gateway can host the announce webhook HTTP server. Includes the implementation plan.

marcusrbrown requested a review from fro-bot as a code owner May 30, 2026 02:04

fro-bot added the agent: working label May 30, 2026

fro-bot requested changes May 30, 2026

View reviewed changes

fro-bot removed the agent: working label May 30, 2026

fro-bot added the agent: working label May 30, 2026

fro-bot approved these changes May 30, 2026

View reviewed changes

fro-bot removed the agent: working label May 30, 2026

marcusrbrown merged commit 88cddce into main May 30, 2026
10 checks passed

marcusrbrown deleted the feat/gateway-announce-webhook branch May 30, 2026 02:58

fro-bot Bot mentioned this pull request May 30, 2026

chore(release): pending release v0.47.0 #698

Open

marcusrbrown mentioned this pull request May 30, 2026

docs(solutions): signed webhook ingress hardening patterns #699

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gateway): signed announce webhook for control-plane presence messages#697

feat(gateway): signed announce webhook for control-plane presence messages#697
marcusrbrown merged 8 commits into
mainfrom
feat/gateway-announce-webhook

marcusrbrown commented May 30, 2026

Uh oh!

fro-bot left a comment

Uh oh!

fro-bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

marcusrbrown commented May 30, 2026

Authentication

Replay and abuse protection

Posting

Lifecycle

Configuration (new)

Uh oh!

fro-bot left a comment

Choose a reason for hiding this comment

Verdict: CONDITIONAL

Blocking issues

Non-blocking concerns

Missing tests

Risk assessment (MED)

Uh oh!

fro-bot left a comment

Choose a reason for hiding this comment

Verdict: PASS

Blocking issues

Non-blocking concerns

Missing tests

Risk assessment (LOW)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants