fix: add keepalive ping to POST SSE response streams by jvg123 · Pull Request #898 · cloudflare/agents

jvg123 · 2026-02-12T22:55:35Z

Summary

The GET SSE handler in WorkerTransport sets up a 30-second keepalive interval that writes event: ping to prevent proxies and infrastructure from closing idle connections. The POST SSE handler was missing this keepalive, causing POST response streams to be silently dropped during long-running operations like MCP tools/call.

This is a one-line logical fix (adding the same setInterval pattern already used in the GET handler) plus proper cleanup in the stream mapping.

Problem

When a client sends a POST request that returns an SSE stream (e.g., tools/call), the response stream has no keepalive pings. During long-running tool executions (even just a few seconds), proxies, load balancers, and infrastructure can close the idle SSE connection, causing the client to never receive the response.

GET handler (line 350) — has keepalive:

const keepAlive = setInterval(() => {
  try {
    writer.write(encoder.encode("event: ping\ndata: \n\n"));
  } catch {
    clearInterval(keepAlive);
  }
}, 30000);

POST handler (line 636) — was missing keepalive entirely.

Impact

This affects any MCP server using WorkerTransport with Streamable HTTP transport. Clients report errors like:

Claude: "No result received from client-side tool execution"
ChatGPT: "TypeError: fetch failed"
MCP SDK: "MCP error -32001: Request Timeout"

Cloudflare Workers analytics show these as responseStreamDisconnected with short wall times (1-10s), confirming the SSE stream dies during tool execution rather than timing out.

Fix

Added the same keepalive setInterval to the POST handler, with clearInterval in the cleanup function. The diff is minimal — it mirrors the existing GET handler pattern exactly.

Test plan

Added post-keepalive.test.ts with 3 tests:
- GET SSE stream sets up keepalive interval (existing behavior, passes before and after)
- POST SSE stream sets up keepalive interval (failed before fix, passes after)
- POST keepalive cleanup clears the interval on stream close
All 467 existing tests still pass (npx vitest run --project workers)
npm run format passes (oxfmt)

🤖 Generated with Claude Code

The GET SSE handler sets up a 30-second keepalive interval that writes "event: ping" to prevent proxies from closing idle connections. The POST SSE handler was missing this keepalive, causing responses to be silently dropped during long-running tool calls. This adds the same keepalive pattern to the POST handler and includes tests verifying both GET and POST streams have keepalive, and that the interval is properly cleaned up. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

changeset-bot · 2026-02-12T22:55:55Z

🦋 Changeset detected

Latest commit: bbafdfc

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
agents	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

whoiskatrin · 2026-02-13T06:42:34Z

/bonk review this PR

ask-bonk

Review Summary

This is a clean, well-motivated bug fix. The problem is real and clearly articulated — the GET SSE handler has a 30-second keepalive ping to prevent proxies/load balancers from closing idle connections, but the POST SSE handler was missing it entirely. This causes POST response streams (e.g., tools/call) to be silently dropped during long-running operations.

Code change

The fix at packages/agents/src/mcp/worker-transport.ts:636-651 is correct:

The setInterval placement is right — it's set up immediately after creating the writer/encoder and before the stream mapping, exactly mirroring the GET handler pattern (line 350).
Cleanup is correct — clearInterval(keepAlive) is called in the cleanup() closure, which is invoked via:
- closeSSEStream() (line 815) — explicit stream close
- send() when all responses are ready (line 929) — normal completion
- close() (line 793) — transport teardown
The catch block in the interval handles the case where writer.write() throws after the stream is already closed, preventing the interval from running forever. This matches the GET handler exactly.

Tests

The three test cases are well-structured:

GET keepalive exists (regression guard for existing behavior)
POST keepalive exists (the actual fix verification)
POST keepalive cleanup on closeSSEStream (verifies no leaked intervals)

The tests use vi.spyOn(globalThis, "setInterval") to verify the interval is created with the correct 30000ms delay, which is a reasonable approach for this kind of infrastructure behavior.

Changeset

Correctly included as a patch bump for agents.

One minor observation (non-blocking)

The clearIntervalSpy assertion in the cleanup test (expect(clearIntervalSpy).toHaveBeenCalled()) doesn't verify it was called with the specific interval ID from the keepalive. Since clearInterval may also be called from other internal cleanup paths, this assertion could theoretically pass for the wrong reason. A stricter assertion would capture the interval ID from the setInterval spy and assert clearIntervalSpy was called with that specific ID. This is minor — the test is still valuable as-is.

LGTM. This is a minimal, targeted fix that mirrors an existing pattern and addresses a real production issue affecting MCP clients.

ask-bonk · 2026-02-13T06:44:06Z

Review submitted. Here's a summary of my findings:

PR #898 — Approved

This is a correct, minimal bug fix that adds a 30-second keepalive ping (event: ping) to POST SSE response streams in WorkerTransport, mirroring the existing pattern already used in the GET handler. The fix addresses a real production issue where proxies and load balancers silently drop idle POST SSE connections during long-running MCP tool calls.

The code change is 9 lines, correctly placed, with proper cleanup in all three paths (closeSSEStream, send completion, and close). Tests are well-structured and the changeset is included as a patch bump.

I noted one minor non-blocking observation: the cleanup test could be stricter by asserting clearInterval was called with the specific interval ID rather than just asserting it was called at all.

github run

pkg-pr-new · 2026-02-13T11:47:15Z

Open in StackBlitz

npm i https://pkg.pr.new/cloudflare/agents@898

commit: bbafdfc

mattzcarey

LGTM not sure how I missed that

mattzcarey · 2026-02-13T11:52:28Z

This could be the cause of #858

mattzcarey · 2026-02-13T11:53:16Z

@bonk can you make sure check script passes. Make the smallest change possible and commit it.

whoiskatrin · 2026-02-13T11:54:42Z

/bonk fix the npm run check issue

ask-bonk

Review of PR #898: fix: add keepalive ping to POST SSE response streams

Summary

The core fix is correct and well-motivated. The POST SSE handler was missing the keepalive ping that the GET handler already has, causing proxy/LB connection drops during long-running tool calls. The 9-line change to worker-transport.ts correctly mirrors the existing GET handler pattern.

CI Failure

The check job is failing due to a TypeScript error in the new test file:

packages/agents/src/tests/mcp/transports/post-keepalive.test.ts(77,5): error TS2322:
Type 'MockInstance<{ (handler: TimerHandler, timeout?: number | undefined, ...arguments: any[]): number; ...' 
is not assignable to type 'MockInstance<(this: unknown, ...args: unknown[]) => unknown>'.

The issue is at line 18/77 in the test file. The variable is declared as:

let setIntervalSpy: ReturnType<typeof vi.spyOn>;

But vi.spyOn(globalThis, "setInterval") returns a more specific MockInstance type (matching setInterval's overloaded signatures in @cloudflare/workers-types), which doesn't assignably match the generic ReturnType<typeof vi.spyOn>.

Fix: Use the inferred type directly instead of the generic ReturnType:

// Option A: let TypeScript infer from the assignment (move init into declaration)
let setIntervalSpy = vi.spyOn(globalThis, "setInterval");

// Option B: explicitly type it to allow any mock instance
// biome-ignore lint: complex overload type
let setIntervalSpy: ReturnType<typeof vi.spyOn<typeof globalThis, "setInterval">>;

The simplest fix is to initialize the spy at declaration time and restore it in afterEach, or use a type assertion at the assignment site:

setIntervalSpy = vi.spyOn(globalThis, "setInterval") as typeof setIntervalSpy;

Code change review (worker-transport.ts)

The production code change at lines 636-651 is correct:

The setInterval placement is right — immediately after creating the writer/encoder, before the stream mapping, matching the GET handler pattern at line 350.
Cleanup is properly wired — clearInterval(keepAlive) is called in the cleanup() closure, which is invoked from all three termination paths: closeSSEStream() (line 815), send() when all responses are ready (line 929), and close() (line 793).
The catch block in the interval callback correctly handles writer errors after stream close, preventing runaway intervals.

Minor test observation (non-blocking)

In the cleanup test (line 194), expect(clearIntervalSpy).toHaveBeenCalled() doesn't verify it was called with the specific interval ID from the keepalive. A slightly stricter assertion would capture the return value from the setInterval spy and verify clearInterval was called with that ID. This is minor and the test is still valuable as-is.

Verdict

The fix itself is good — please fix the TypeScript error in the test to get CI green.

ask-bonk · 2026-02-13T11:56:29Z

packages/agents/src/tests/mcp/transports/post-keepalive.test.ts

+ * from closing idle connections during long-running operations.
+ */
+describe("WorkerTransport POST stream keepalive", () => {
+  let setIntervalSpy: ReturnType<typeof vi.spyOn>;


This type annotation causes the CI typecheck failure (TS2322). vi.spyOn(globalThis, "setInterval") returns a MockInstance with setInterval's overloaded signatures from @cloudflare/workers-types, which is not assignable to the generic ReturnType<typeof vi.spyOn>.

The simplest fix — initialize at declaration and re-spy in beforeEach:

Suggested change

let setIntervalSpy: ReturnType<typeof vi.spyOn>;

let setIntervalSpy = vi.spyOn(globalThis, "setInterval");

Then update beforeEach to restore and re-create:

beforeEach(() => { setIntervalSpy.mockRestore(); setIntervalSpy = vi.spyOn(globalThis, "setInterval"); });

This lets TypeScript infer the correct type from the vi.spyOn call and avoids the generic ReturnType mismatch.

ask-bonk · 2026-02-13T11:56:37Z

Review posted successfully. Here's a summary of what I did:

PR #898 Review — Posted a full review + inline suggestion on the CI-failing line.

Key findings:

Production code change is correct — The 9-line addition to worker-transport.ts:636-651 correctly adds a 30s keepalive ping to POST SSE streams, matching the existing GET handler pattern. Cleanup is properly wired through all three termination paths (closeSSEStream, send completion, close).
CI is failing due to a TypeScript error in the test file at line 18. The variable setIntervalSpy is typed as ReturnType<typeof vi.spyOn>, but vi.spyOn(globalThis, "setInterval") returns a more specific MockInstance type from @cloudflare/workers-types that isn't assignable to the generic type. I posted an inline suggestion with a fix: initialize the spy at declaration time so TypeScript can infer the correct type.
Minor non-blocking observation — The cleanup test could assert that clearInterval was called with the specific interval ID rather than just asserting it was called at all.

github run

whoiskatrin · 2026-02-13T11:57:21Z

/bonk fix the build

ask-bonk

Review Summary

Good fix addressing a real production issue. The POST SSE handler was missing the keepalive ping that the GET handler already has at line 350, causing proxy/LB connection drops during long-running MCP tool calls. The 9-line production code change is correct and minimal.

Production code (`worker-transport.ts`) — Looks good

The setInterval at line 636 is placed correctly — after creating the writer/encoder, before the stream mapping registration. The cleanup path is properly wired: clearInterval(keepAlive) is called in the cleanup() closure, which is invoked from all three termination paths:

closeSSEStream() (line 815) — explicit stream close
send() when all responses are ready (line 929) — normal completion
close() (line 793) — transport teardown

The catch block in the interval callback correctly handles writer errors after stream close, preventing runaway intervals. This exactly mirrors the existing GET handler pattern.

CI failure — TypeScript error in test file

The check job fails with TS2322 at line 77 of post-keepalive.test.ts. The variable setIntervalSpy is declared as ReturnType<typeof vi.spyOn> (line 18), but vi.spyOn(globalThis, "setInterval") returns a MockInstance with setInterval's overloaded signatures from @cloudflare/workers-types/experimental, which isn't assignable to the generic return type.

Note that the same ReturnType<typeof vi.spyOn> pattern works in client-connection.test.ts for console.error because console.error has a simpler type signature. The setInterval overloads from @cloudflare/workers-types make the types incompatible.

The simplest fix is to initialize the spy at declaration time so TypeScript infers the correct type, then re-spy in beforeEach. See inline suggestion below.

Test improvement suggestion (non-blocking)

The cleanup test at line 194 (expect(clearIntervalSpy).toHaveBeenCalled()) doesn't verify it was called with the specific interval ID from the keepalive. A stricter assertion would capture the return value from the setInterval spy and verify clearInterval was called with that ID. This is minor — the test is still valuable as-is.

Changeset

Correctly included as a patch bump for agents.

packages/agents/src/tests/mcp/transports/post-keepalive.test.ts

ask-bonk · 2026-02-13T11:59:50Z

Review posted on PR #898. Here's a summary of what I found:

Production code change (worker-transport.ts:636-651) — Correct and minimal. Adds a 30s keepalive ping to POST SSE streams, mirroring the existing GET handler pattern at line 350. Cleanup is properly wired through all three termination paths (closeSSEStream, send, close).

CI is failing — The check job fails with a TS2322 TypeScript error at line 77 of post-keepalive.test.ts. The setIntervalSpy variable is typed as ReturnType<typeof vi.spyOn>, but vi.spyOn(globalThis, "setInterval") returns a more specific MockInstance type due to setInterval's overloaded signatures in @cloudflare/workers-types/experimental. I posted an inline suggestion: initialize the spy at declaration time (let setIntervalSpy = vi.spyOn(...)) so TypeScript can infer the correct type, then restore and re-create in beforeEach.

github run

Co-authored-by: ask-bonk[bot] <249159057+ask-bonk[bot]@users.noreply.github.com>

ask-bonk bot approved these changes Feb 13, 2026

View reviewed changes

whoiskatrin requested a review from mattzcarey February 13, 2026 11:45

mattzcarey approved these changes Feb 13, 2026

View reviewed changes

ask-bonk bot reviewed Feb 13, 2026

View reviewed changes

packages/agents/src/tests/mcp/transports/post-keepalive.test.ts Outdated Show resolved Hide resolved

Update packages/agents/src/tests/mcp/transports/post-keepalive.test.ts

bbafdfc

Co-authored-by: ask-bonk[bot] <249159057+ask-bonk[bot]@users.noreply.github.com>

whoiskatrin merged commit cd2d34f into cloudflare:main Feb 13, 2026
4 checks passed

github-actions bot mentioned this pull request Feb 13, 2026

Version Packages #902

Open

	let setIntervalSpy: ReturnType<typeof vi.spyOn>;
	let setIntervalSpy = vi.spyOn(globalThis, "setInterval");

Conversation

jvg123 commented Feb 12, 2026

Summary

Problem

Impact

Fix

Test plan

Uh oh!

changeset-bot bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

whoiskatrin commented Feb 13, 2026

Uh oh!

ask-bonk bot left a comment

Choose a reason for hiding this comment

Review Summary

Code change

Tests

Changeset

One minor observation (non-blocking)

Uh oh!

ask-bonk bot commented Feb 13, 2026

Uh oh!

pkg-pr-new bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattzcarey left a comment

Choose a reason for hiding this comment

Uh oh!

mattzcarey commented Feb 13, 2026

Uh oh!

mattzcarey commented Feb 13, 2026

Uh oh!

whoiskatrin commented Feb 13, 2026

Uh oh!

ask-bonk bot left a comment

Choose a reason for hiding this comment

Review of PR #898: fix: add keepalive ping to POST SSE response streams

Summary

CI Failure

Code change review (worker-transport.ts)

Minor test observation (non-blocking)

Verdict

Uh oh!

ask-bonk bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk bot commented Feb 13, 2026

Uh oh!

whoiskatrin commented Feb 13, 2026

Uh oh!

ask-bonk bot left a comment

Choose a reason for hiding this comment

Review Summary

Production code (worker-transport.ts) — Looks good

CI failure — TypeScript error in test file

Test improvement suggestion (non-blocking)

Changeset

Uh oh!

Uh oh!

ask-bonk bot commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

changeset-bot bot commented Feb 12, 2026 •

edited

Loading

pkg-pr-new bot commented Feb 13, 2026 •

edited

Loading

Production code (`worker-transport.ts`) — Looks good