Skip to content

[Bug] AsyncLogger writes to stdout, breaking MCP stdio transport #1968

@sherman-yang

Description

@sherman-yang

crawl4ai version

0.8.6 (installed via pip install crawl4ai==0.8.6 as a dep of mcp-crawl4ai 0.3.1).

Expected Behavior

When crawl4ai is used inside any program that uses stdout as a structured
data channel
— e.g. an MCP server using the stdio transport — log output
should go to stderr, leaving stdout clean for the host protocol.

Current Behavior

crawl4ai/async_logger.py:139 constructs self.console = Console() with no
file= argument, so rich.console.Console defaults to sys.stdout.
Every log line — including the progress markers from url_status() such as
[FETCH] ↓ ..., https://... | ✓ | ⏱: 1.52s, [SCRAPE] ◆, [COMPLETE] ●
is written to stdout.

When crawl4ai is wrapped by an MCP stdio server (e.g.
wyattowalsh/mcp-crawl4ai),
the MCP transport spec requires stdout to contain only newline-delimited
JSON-RPC messages. crawl4ai's log lines corrupt the JSON-RPC stream and the
client's JSONRPCMessage.model_validate_json raises Pydantic validation
errors for every leaked line:

ERROR mcp.client.stdio: Failed to parse JSONRPC message from server
pydantic_core._pydantic_core.ValidationError: 1 validation error for JSONRPCMessage
Invalid JSON: expected value at line 1 column 2 [type=json_invalid,
   input_value='[FETCH]... ↓ ', input_type=str]
...
input_value='https://www.example.com...path/to/page', input_type=str
input_value='age-infrastructure/ | ✓ | ⏱: 1.52s ', input_type=str
input_value='[SCRAPE].. ◆ ', input_type=str
input_value='[COMPLETE] ● ', input_type=str

Functionally the tool call still succeeds (the real JSON-RPC response is
also written to stdout and parses fine), but every scrape produces 6–10
spurious ERROR lines in the host's log and risks confusing some MCP
clients into closing the connection.

Note: even when downstream code passes BrowserConfig(verbose=False),
url_status() still emits these lines — verbose=False only gates a
subset of log calls.

Root cause

crawl4ai/async_logger.py:

from rich.console import Console
...
class AsyncLogger(AsyncLoggerBase):
    ...
    def __init__(self, ...):
        ...
        self.console = Console()        # ← defaults to sys.stdout

Proposed fix

Default the logger console to sys.stderr:

import sys
from rich.console import Console
...
self.console = Console(file=sys.stderr)

This is the universal convention for library logging and matches what
logging.StreamHandler defaults to. Programs that genuinely want logs
on stdout can override by passing a custom Console via the existing
constructor.

Alternatively, expose a stream / console parameter on AsyncLogger
so downstream wrappers (mcp-crawl4ai, FastMCP integrations) can force
stderr without monkey-patching.

Reproduction

  1. pip install crawl4ai==0.8.6 mcp-crawl4ai==0.3.1 mcp
  2. Run any MCP client (e.g. Claude Desktop, mcp-inspector) against
    crawl4ai_mcp.server over stdio.
  3. Call the scrape tool on any URL.
  4. Observe Failed to parse JSONRPC message from server errors in the
    client log for each scrape, with input_value matching crawl4ai's
    progress markers.

Environment

  • crawl4ai 0.8.6
  • mcp-crawl4ai 0.3.1
  • mcp (python-sdk) 1.x
  • Python 3.14 / 3.11
  • macOS (also reproduces on Linux per stdio spec)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions