Summary
StreamableHTTPClientTransport gives up on the GET-SSE response stream after only 2 reconnect retries (the hard-coded DEFAULT_STREAMABLE_HTTP_RECONNECTION_OPTIONS.maxRetries: 2), then leaves the transport in a broken state where POST requests succeed at the server but their JSON-RPC responses can never be delivered to the client because the SSE channel is dead. Every subsequent tool call hits the client's request timeoutMs and surfaces as The operation timed out., even though the server processed the request fine.
This is a silent-success failure mode — the user sees timeouts and the server sees successes. Restarting either side (or sending a SIGHUP / systemctl restart) is the only recovery, because the dead transport never auto-reconnects.
Witnessed in production 2026-05-14 against an MCP server fronted by CloudFlare Tunnel + nginx. CF Tunnel's default ~100s SSE idle timeout dropped the response stream while a tool call was in flight; SDK's 2-retry SSE reopen failed (with empty error context — see Bug 2 below); and from that point on every subsequent POST silently broke.
Source
packages/client/src/client/streamableHttp.ts, lines 21-26 of latest main:
const DEFAULT_STREAMABLE_HTTP_RECONNECTION_OPTIONS: StreamableHTTPReconnectionOptions = {
initialReconnectionDelay: 1000,
maxReconnectionDelay: 30_000,
reconnectionDelayGrowFactor: 1.5,
maxRetries: 2 // ← only 2 retries before permanent give-up
};
After maxRetries is exhausted, _scheduleReconnection stops scheduling but the transport's POST path keeps working — sending requests into a void.
Smoking-gun log (Claude Code 2.1.140 + this SDK)
03:57:37 HTTP connection dropped after 191s uptime
03:58:06 Connection error: Streamable HTTP error: Failed to open SSE stream: <none>
03:58:06 Connection error: Failed to reconnect SSE stream: Streamable HTTP error: Failed to open SSE stream: <none>
03:58:06 Terminal connection error 1/3
04:00:12 Connection error: Streamable HTTP error: Failed to open SSE stream: <none>
04:00:12 Connection error: Failed to reconnect SSE stream: ...
04:00:12 Connection error: Maximum reconnection attempts (2) exceeded.
04:00:12 SSE GET-stream reconnection exhausted; leaving transport up (POST still works)
[every subsequent tool call hits client timeoutMs because POSTs land server-side
fine but responses can't be delivered through the dead SSE channel]
(The leaving transport up (POST still works) line is from Claude Code's wrapper, but the underlying give-up decision and the missing reconnect-on-next-POST behaviour are in this SDK.)
Bug 1 — maxRetries: 2 is too low for production
A single intermediary blip (CF Tunnel idle, nginx upstream timeout, mobile NAT eviction, Wi-Fi handoff) takes more than 2 attempts within ~5 seconds to recover. A more reasonable default would be maxRetries: 10 with the existing 1.5× backoff capped at 30s — that's ~5 minutes of patient retries before giving up, well within most outage windows.
Bug 2 — Empty error context
Failed to open SSE stream: <none> — the literal string <none> (or the underlying empty response.statusText) is being captured instead of the real error. Debuggers can't tell whether it was a network failure, a 5xx, an auth issue, or a closed socket. Stringify with err?.message ?? err?.statusText ?? err?.name ?? "unknown".
Bug 3 — Silent-success after exhaustion (the worst part)
After the SSE GET stream is permanently dead, the transport should EITHER:
- (A) Reset itself — mark
_sessionId = undefined, tear down state, so the next POST attempt re-establishes a fresh transport from scratch. Caller sees one failed call (clear error: "transport reset"), then everything works.
- (B) Fail-fast on subsequent POSTs — surface a clear
TransportClosed error so callers can decide to reconnect or surface to the user. Better than the current "POST returns 202 but the response never arrives" pattern.
- (C) Keep retrying SSE indefinitely with the existing exponential backoff (1s, 2s, 4s, 8s, 16s, capped at 30s). Eventually the intermediary settles and a reopen succeeds.
The current behaviour (POSTs work, responses silently lost) is the worst of all worlds.
Reproducer
- Start any MCP server with the StreamableHTTP transport behind a proxy that has an SSE idle timeout shorter than your tool-call frequency (CloudFlare Tunnel ~100s default; nginx with
proxy_read_timeout shorter than your spacing; corporate proxies typically 60-120s).
- Connect a TS-SDK-based client (Claude Code, Inspector, custom).
- Stay idle longer than the proxy's SSE timeout.
- Try to call a tool. POST will land at the server (you'll see it in nginx access log returning 200/202), the server processes it, but the client times out at
timeoutMs (default 60s).
- Observe
Maximum reconnection attempts (2) exceeded in the client transport log — the trigger.
- Every subsequent tool call same outcome until restart.
Server-side workaround (already deployed in our environment)
Add an SSE keepalive heartbeat from the server side: write : keepalive\n\n (an SSE comment per W3C EventSource §9.2.6) every ~25s on the GET response stream so the intermediary never sees idle. Comments are ignored by SSE clients per the spec, so this can never corrupt a real notification message.
We shipped this in our mcp-core wrapper (PR for context: github.com/CloudIngenium/Knowledge-Hub/pull/698). 25s is well under all common intermediary idle timeouts (CF Tunnel ~100s, nginx default 60s, corporate proxies typically 60-120s) and costs ~10 bytes per interval. Could be done by StreamableHTTPServerTransport too, but the client-side bug remains — any other intermediary cause (TCP RST, client suspend/resume, network reachability flap) still triggers the same unrecoverable wedge.
Related issues
Environment
@modelcontextprotocol/sdk v1.26.0 (also confirmed present at v1.29.0 / current main).
- Claude Code 2.1.140 (claude-vscode wrapper around this SDK).
- Node 24.3.0, Linux x86_64.
- MCP server: behind nginx + CloudFlare Tunnel + Azure AD JWT auth.
Summary
StreamableHTTPClientTransportgives up on the GET-SSE response stream after only 2 reconnect retries (the hard-codedDEFAULT_STREAMABLE_HTTP_RECONNECTION_OPTIONS.maxRetries: 2), then leaves the transport in a broken state wherePOSTrequests succeed at the server but their JSON-RPC responses can never be delivered to the client because the SSE channel is dead. Every subsequent tool call hits the client's requesttimeoutMsand surfaces asThe operation timed out., even though the server processed the request fine.This is a silent-success failure mode — the user sees timeouts and the server sees successes. Restarting either side (or sending a SIGHUP /
systemctl restart) is the only recovery, because the dead transport never auto-reconnects.Witnessed in production 2026-05-14 against an MCP server fronted by CloudFlare Tunnel + nginx. CF Tunnel's default ~100s SSE idle timeout dropped the response stream while a tool call was in flight; SDK's 2-retry SSE reopen failed (with empty error context — see Bug 2 below); and from that point on every subsequent POST silently broke.
Source
packages/client/src/client/streamableHttp.ts, lines 21-26 of latestmain:After
maxRetriesis exhausted,_scheduleReconnectionstops scheduling but the transport's POST path keeps working — sending requests into a void.Smoking-gun log (Claude Code 2.1.140 + this SDK)
(The
leaving transport up (POST still works)line is from Claude Code's wrapper, but the underlying give-up decision and the missing reconnect-on-next-POST behaviour are in this SDK.)Bug 1 —
maxRetries: 2is too low for productionA single intermediary blip (CF Tunnel idle, nginx upstream timeout, mobile NAT eviction, Wi-Fi handoff) takes more than 2 attempts within ~5 seconds to recover. A more reasonable default would be
maxRetries: 10with the existing 1.5× backoff capped at 30s — that's ~5 minutes of patient retries before giving up, well within most outage windows.Bug 2 — Empty error context
Failed to open SSE stream: <none>— the literal string<none>(or the underlying emptyresponse.statusText) is being captured instead of the real error. Debuggers can't tell whether it was a network failure, a 5xx, an auth issue, or a closed socket. Stringify witherr?.message ?? err?.statusText ?? err?.name ?? "unknown".Bug 3 — Silent-success after exhaustion (the worst part)
After the SSE GET stream is permanently dead, the transport should EITHER:
_sessionId = undefined, tear down state, so the next POST attempt re-establishes a fresh transport from scratch. Caller sees one failed call (clear error: "transport reset"), then everything works.TransportClosederror so callers can decide to reconnect or surface to the user. Better than the current "POST returns 202 but the response never arrives" pattern.The current behaviour (POSTs work, responses silently lost) is the worst of all worlds.
Reproducer
proxy_read_timeoutshorter than your spacing; corporate proxies typically 60-120s).timeoutMs(default 60s).Maximum reconnection attempts (2) exceededin the client transport log — the trigger.Server-side workaround (already deployed in our environment)
Add an SSE keepalive heartbeat from the server side: write
: keepalive\n\n(an SSE comment per W3C EventSource §9.2.6) every ~25s on the GET response stream so the intermediary never sees idle. Comments are ignored by SSE clients per the spec, so this can never corrupt a real notification message.We shipped this in our
mcp-corewrapper (PR for context: github.com/CloudIngenium/Knowledge-Hub/pull/698). 25s is well under all common intermediary idle timeouts (CF Tunnel ~100s, nginx default 60s, corporate proxies typically 60-120s) and costs ~10 bytes per interval. Could be done byStreamableHTTPServerTransporttoo, but the client-side bug remains — any other intermediary cause (TCP RST, client suspend/resume, network reachability flap) still triggers the same unrecoverable wedge.Related issues
Environment
@modelcontextprotocol/sdkv1.26.0 (also confirmed present at v1.29.0 / currentmain).