fix: cancel in-flight handlers when transport closes in server.run() by maxisbey · Pull Request #2306 · modelcontextprotocol/python-sdk

maxisbey · 2026-03-18T15:50:02Z

When the transport closes mid-request, server.run() now cancels in-flight handlers instead of waiting for them.

Motivation and Context

Discovered while investigating #2231. That issue's stated symptom (server survives parent death with no tool running) does not reproduce — stdin EOF propagates cleanly through anyio.wrap_file + TextIOWrapper on all tested Python versions (3.10–3.14). But the investigation surfaced a real adjacent bug:

When stdin closes while a tool handler is running, server.run()'s task group joins on the handler rather than cancelling it. The handler eventually finishes, tries to send its response through a _write_stream that _receive_loop already closed, and server.run() crashes with ClosedResourceError wrapped in a triple-nested ExceptionGroup.

In practice: a stdio server with a 3-second tool gets SIGTERM'd at 2s by the client's graceful-shutdown escalation (PROCESS_TERMINATION_TIMEOUT), so finally blocks never run. With this fix, the handler gets CancelledError immediately and cleanup runs.

This is the same bug class as #2257 (merged last week), which fixed one specific write-after-close in _handle_message. This closes the general case.

The existing CancelledError catch in _handle_request (added in #1153 for CancelledNotification) now distinguishes the two cancellation sources via message.cancelled — which checks the responder's own cancel scope, set only by responder.cancel(). TG cancellation is an outer scope and doesn't set it.

May help some #526 cases, but the long-lived orphans reported there are more consistent with the client never closing stdin (client-side bug).

Go SDK has the same wait-then-crash behavior (conn.go:167); C# fixed it in modelcontextprotocol/csharp-sdk#226.

How Has This Been Tested?

New test in tests/server/test_cancel_handling.py drives server.run() with raw memory streams (can't use InMemoryTransport — it wraps server.run() in its own finally: tg.cancel_scope.cancel() which masks the bug). Fails with TimeoutError on main, passes with fix.
Existing test_server_remains_functional_after_cancel covers the CancelledNotification path unchanged.
Verified against real subprocess scenario (Python 3.14, kill -9 parent with tool in progress).

Breaking Changes

None. Handlers already needed to be cancel-safe for CancelledNotification.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Checklist

I have read the MCP Documentation
My code follows the repository's style guidelines
New and existing tests pass locally
I have added appropriate error handling
I have added or updated documentation as needed

Additional context

session.py:108 pragma removed: RequestResponder.__exit__ now sees _completed == False when a handler is cancelled before reaching respond().

Note for reviewers: this also affects streamable HTTP when terminate() fires mid-request (stateless mode after handle_request() returns early, or session DELETE). In-flight tools are now cancelled rather than drained. Since they were previously crashing on the closed write stream anyway, this is strictly better.

When the transport closes (stdin EOF, client disconnect) while a request handler is still running, server.run()'s task group joins on the handler instead of cancelling it. The handler eventually finishes, tries to send its response through a write stream that _receive_loop already closed, and server.run() crashes with ClosedResourceError wrapped in a triple-nested ExceptionGroup. The fix cancels the task group when the incoming_messages loop ends. Handlers receive CancelledError and can clean up in finally blocks. The existing CancelledError catch in _handle_request (added for CancelledNotification handling in #1153) now distinguishes the two cancellation sources: responder.cancel() already sent an error response and we skip the duplicate; transport-close cancellation is re-raised so the task group swallows it. Github-Issue: #526

claude

No bugs found, but this touches core server infrastructure (the main server.run() loop and the CancelledError handling in _handle_request), with subtle cancel-scope semantics distinguishing two cancellation sources — worth a human confirming the anyio scope behavior is correct.

Extended reasoning...

Overview

This PR fixes a real bug where server.run() would crash with ClosedResourceError when the transport closes while a handler is still running. It makes three changes: (1) adds tg.cancel_scope.cancel() in a finally block in run() so in-flight handlers are cancelled instead of joined, (2) distinguishes CancelledNotification cancellation from TG/transport-close cancellation in _handle_request via message.cancelled (which checks the responder's own cancel scope, not the TG's outer scope), and (3) removes a # pragma: no branch that is no longer valid since _completed can now be False when a handler is cancelled before calling respond().

Security risks

No security risks identified. The change is purely about lifecycle/cleanup behavior.

Level of scrutiny

This modifies the core server.run() message processing loop and the cancellation handling in _handle_request — both critical code paths for every MCP server. The PR author notes this also affects streamable HTTP when terminate() fires mid-request. The cancel-scope distinction (responder's _cancel_scope.cancel_called vs the TG's outer cancel scope) is correct based on anyio semantics, but a human should verify this understanding since getting it wrong could cause handlers to swallow cancellation they should propagate, or vice versa.

Other factors

The PR includes a well-designed test that drives server.run() with raw memory streams to avoid InMemoryTransport masking the bug. The test verifies both that the handler gets cancelled and that server.run() returns cleanly. No CODEOWNERS file exists. The PR description is thorough and references related issues across multiple SDK implementations. No bugs were found by automated analysis.

Two additional races in the same transport-close window as the previous commit, both triggered when handlers are blocked on server-to-client requests (sampling, roots, elicitation) at the moment the transport closes: 1. _receive_loop's finally iterates _response_streams.items() with await checkpoints inside the loop. The woken handler's send_request finally pops from that dict before the iterator's next __next__(), raising RuntimeError: dictionary changed size during iteration. Fix: snapshot with list() before iterating. 2. The woken handler's send_request raises MCPError (CONNECTION_CLOSED), which _handle_request catches and converts to an error response. It then falls through to message.respond() against a write stream that _receive_loop already closed. Fix: catch ClosedResourceError and drop the response. Both reproduce deterministically with two handlers blocked on list_roots() when to_server is closed. Single test covers both: fails 20/20 with either fix reverted, passes 50/50 with both.

Python 3.14's compiler attributes the async trampoline's CLEANUP_THROW instructions (for the try-body's await) to the next physical line of code, which was the else body. coverage.py traced a phantom line event there, tripping strict-no-cover even though the else never runs. Moving the try/respond after the if/else avoids the misattribution and also deduplicates the two respond() calls.

streamable_http's terminate() closes _write_stream_reader (the receive end) before _write_stream (the send end). A handler reaching respond() between those two closes gets BrokenResourceError (peer end closed) rather than ClosedResourceError (our end closed). The stdio path only ever hits ClosedResourceError because _receive_loop's async-with closes the send end.

maxisbey marked this pull request as ready for review March 18, 2026 15:51

maxisbey requested a review from Kludex March 18, 2026 15:51

claude Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread src/mcp/server/lowlevel/server.py Outdated

Comment thread src/mcp/server/lowlevel/server.py Outdated

maxisbey added 2 commits March 18, 2026 18:09

maxisbey requested a review from felixweinberger March 20, 2026 11:57

felixweinberger approved these changes Mar 20, 2026

View reviewed changes

maxisbey merged commit 92c693b into main Mar 20, 2026
53 of 54 checks passed

maxisbey deleted the fix/cancel-in-flight-on-transport-close branch March 20, 2026 13:37

maxisbey mentioned this pull request Mar 20, 2026

Stdio transport: MCP server process survives parent death (stdin EOF not causing shutdown) #2231

Closed

2 tasks

owendevereaux mentioned this pull request Mar 22, 2026

[v1.x] fix: handle ClosedResourceError when transport closes mid-request #2334

Merged

This was referenced Mar 27, 2026

fix: upgrade mcp SDK to v1.x branch to fix ClosedResourceError datagouv/datagouv-mcp#84

Closed

MCP Out of Memory errors on production instances datagouv/datagouv-mcp#86

Closed

bbarwik mentioned this pull request Apr 9, 2026

Bug: AssertionError: Request already responded to — cancellation race in v1.27.0 #2416

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: cancel in-flight handlers when transport closes in server.run()#2306

fix: cancel in-flight handlers when transport closes in server.run()#2306
maxisbey merged 4 commits into
mainfrom
fix/cancel-in-flight-on-transport-close

maxisbey commented Mar 18, 2026

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maxisbey commented Mar 18, 2026

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

Checklist

Additional context

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants