Skip to content

MCP connection failure crashes Web UI worker instead of graceful degradation #1766

@Citrus086

Description

@Citrus086

Problem

When an MCP server fails to connect (e.g., port conflict), the Web UI session worker crashes entirely instead of continuing without MCP tools. This causes messages to get stuck in "thinking" state indefinitely, and the frontend becomes unresponsive.

Reproduction Steps

  1. Configure an MCP server that uses a fixed port (e.g., chrome-local-bridge on port 10086)
  2. Open a first kimi-cli TUI window — the MCP server starts successfully
  3. In the same session, execute /web to switch to Web UI
  4. Send a message in Web UI
  5. The session worker tries to start, attempts to connect the same MCP server, but the port is already in use by the first TUI window
  6. MCPRuntimeError is thrown, the worker process crashes
  7. Message stays stuck in "thinking" forever

Expected Behavior

MCP connection failure should be a graceful degradation:

  • Log a warning about the failed MCP server
  • Continue running without that MCP server's tools
  • The conversation should proceed normally

Actual Behavior

  • wait_for_background_mcp_loading() throws MCPRuntimeError
  • Exception propagates uncaught through _agent_loop()
  • Entire worker process exits
  • WebSocket _read_loop dies without emitting error/idle status
  • Frontend message remains stuck in "thinking"

Root Cause

In kimi_cli/soul/kimisoul.py, _agent_loop() (around line 680):

try:
    await self.wait_for_background_mcp_loading()
finally:
    if loading:
        wire_send(StatusUpdate(mcp_status=self._mcp_status_snapshot()))
        wire_send(MCPLoadingEnd())

The try/finally only ensures MCPLoadingEnd is sent, but does not catch MCPRuntimeError. The exception bubbles up and crashes the agent loop.

Additionally, in kimi_cli/web/runner/process.py, _read_loop() catches the unexpected exception but:

  1. Does not clear _in_flight_prompt_ids
  2. Does not emit "error" or "idle" status to WebSockets
  3. Frontend has no way to know the worker died

Suggested Fix

  1. In _agent_loop(): Add except MCPRuntimeError to gracefully handle MCP failures:
try:
    await self.wait_for_background_mcp_loading()
except MCPRuntimeError as e:
    logger.warning("MCP loading failed, continuing without MCP tools: {}", e)
finally:
    if loading:
        wire_send(StatusUpdate(mcp_status=self._mcp_status_snapshot()))
        wire_send(MCPLoadingEnd())
  1. In _read_loop(): On unexpected exceptions, also clear in-flight prompts and emit error status before exiting.

Related

  • The /web switch also has a subprocess cleanup issue: when preserve_background_tasks=True, MCP child processes from the TUI are not terminated, causing port conflicts for the Web UI worker.

Environment

  • kimi-cli version: latest (via uv tool upgrade kimi-cli --no-cache)
  • OS: macOS
  • Python: 3.13

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions