Skip to content

fix: tolerate non-utf8 worker output#2350

Open
he-yufeng wants to merge 1 commit into
MoonshotAI:mainfrom
he-yufeng:fix/web-runner-nonutf8-stderr
Open

fix: tolerate non-utf8 worker output#2350
he-yufeng wants to merge 1 commit into
MoonshotAI:mainfrom
he-yufeng:fix/web-runner-nonutf8-stderr

Conversation

@he-yufeng
Copy link
Copy Markdown

@he-yufeng he-yufeng commented May 23, 2026

Summary

Fixes #2313.

The web session runner decoded worker stdout and crash stderr with strict UTF-8. On Windows, child processes can still emit locale-encoded bytes such as cp1252 smart punctuation. A single invalid byte then hides the real worker failure behind a UnicodeDecodeError.

This change decodes worker output with replacement characters at the process boundary, matching the existing uploaded text-file behavior. Wire JSON is still parsed after decoding, but the runner no longer crashes while trying to report a worker error.

To verify

  • uv run pytest tests\web\test_session_error_recovery.py -q
  • uv run ruff check src\kimi_cli\web\runner\process.py tests\web\test_session_error_recovery.py
  • python -m py_compile src\kimi_cli\web\runner\process.py tests\web\test_session_error_recovery.py
  • git diff --check

Open in Devin Review

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 2 additional findings in Devin Review.

Open in Devin Review

Comment on lines +339 to 342
await self._broadcast(_decode_worker_output(line).rstrip("\n"))

# Handle out message
try:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 json.loads(line) on raw bytes raises uncaught UnicodeDecodeError for non-UTF-8 stdout lines

On line 339, _decode_worker_output(line) safely handles non-UTF-8 bytes for the broadcast. However, on line 343, json.loads(line) still receives the original raw bytes. When those bytes contain invalid UTF-8 (e.g., \x97), json.loads raises UnicodeDecodeError (confirmed: it is NOT a subclass of json.JSONDecodeError). This exception escapes the inner except json.JSONDecodeError: handler at line 362 and propagates to the outer except Exception as e: at line 367, terminating the entire read loop and putting the session into error state. The intent of the inner handler is to gracefully skip unparseable lines, but this case slips through. The fix is to pass the already-decoded string to json.loads so that invalid-UTF-8 lines produce a json.JSONDecodeError (due to the replacement character making invalid JSON), which IS caught by the inner handler, allowing the loop to continue.

(Refers to lines 339-343)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

'utf-8' codec can't decode byte 0x97 in position 462: invalid start byte

1 participant