Skip to content

fix: MCP call_tool threading.Lock held across await deadlocks event loop#1554

Open
PaoloC68 wants to merge 2 commits into
agent0ai:readyfrom
PaoloC68:fix/mcp-call-tool-threading-lock-deadlock
Open

fix: MCP call_tool threading.Lock held across await deadlocks event loop#1554
PaoloC68 wants to merge 2 commits into
agent0ai:readyfrom
PaoloC68:fix/mcp-call-tool-threading-lock-deadlock

Conversation

@PaoloC68
Copy link
Copy Markdown
Contributor

Fixes #1553

Problem

Three threading.Lock instances in helpers/mcp_handler.py are held across await calls. A threading.Lock is an OS-level blocking mutex — holding it across an await suspends the coroutine while keeping the lock acquired. Any other coroutine that tries to acquire the same lock blocks the entire event loop thread, making Agent Zero completely unresponsive to new chats while a slow MCP tool (e.g. Perplexity deep research) is running.

A secondary bug: MCPClientBase.call_tool always used the global mcp_client_tool_timeout, ignoring per-server tool_timeout overrides.

Changes

helpers/mcp_handler.py — 3 fixes, 1 file:

  1. MCPServerRemote.call_tool: add asyncio.Lock instance (__async_lock), use async with self.__async_lock instead of with self.__lock. Sync methods (get_error, get_tools, has_tool, update) keep threading.Lock — correct, they're called from sync context.

  2. MCPServerLocal.call_tool: same fix as above.

  3. MCPConfig.call_tool: resolve the target server under threading.Lock (sync, instant list lookup), release the lock, then await server.call_tool() outside it. This is the outermost and most impactful site — it's the dispatcher that all MCP tool calls pass through.

  4. MCPClientBase.call_tool: tool_timeout = self.server.tool_timeout or set["mcp_client_tool_timeout"] — per-server override now takes precedence over global default.

Tests

Verified on a live Agent Zero instance (Proxmox LXC, Docker):

  • Triggered Perplexity deep research (1200s timeout configured)
  • Opened new chat while research was running → responded immediately (previously: no response until timeout)
  • Research completed successfully after ~3 minutes
  • No regressions observed on sqlite, sequential-thinking, deep-wiki MCP servers

Existing test suite: pytest — no new failures introduced.

Notes

  • No new dependencies
  • No breaking changes
  • threading.Lock is intentionally kept for all sync accessors — only the async call_tool path is changed

PaoloC68 added 2 commits May 8, 2026 23:45
Two bugs in helpers/mcp_handler.py:

1. threading.Lock held across await in MCPServerRemote/MCPServerLocal.call_tool
   caused the entire event loop thread to block whenever a slow MCP tool
   (e.g. Perplexity deep research) was running, making A0 unresponsive to
   all new chats until the global timeout expired.
   Fix: use asyncio.Lock for call_tool; keep threading.Lock for sync methods.

2. MCPClientBase.call_tool always used the global mcp_client_tool_timeout,
   ignoring per-server tool_timeout overrides.
   Fix: prefer self.server.tool_timeout, fall back to global setting.
The real deadlock: MCPConfig.call_tool held threading.Lock while
awaiting server.call_tool(), blocking the entire event loop thread
for all other coroutines (new chats, any other MCP calls).

Fix: resolve the target server under the lock (sync, instant),
release the lock, then await outside it.
@PaoloC68 PaoloC68 force-pushed the fix/mcp-call-tool-threading-lock-deadlock branch from 1154592 to beab5fc Compare May 8, 2026 21:46
@PaoloC68 PaoloC68 changed the base branch from development to ready May 8, 2026 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant