v1.33.3.0 fix: sanitize lone Unicode surrogates to prevent JSON serialization errors by realcarsonterry · Pull Request #1463 · garrytan/gstack

realcarsonterry · 2026-05-13T03:51:51Z

Summary

When gstack captures pages containing lone Unicode surrogate characters (unpaired \uD800-\uDFFF range), JSON serialization fails with:

API Error: 400 The request body is not valid JSON: no low surrogate in string: line 1 column 241447 (char 241446)

This typically occurs with special characters, emoji, or malformed text in page content, screenshots, or DOM text that gets serialized and sent to the Claude API.

Root Cause

JavaScript strings can contain lone surrogate characters (invalid Unicode), but JSON.stringify() rejects them. When page content includes these characters, the entire API request fails with a 400 error.

Solution

Added sanitizeLoneSurrogates() function that:

Detects lone surrogate characters:
- High surrogates (0xD800-0xDBFF) without following low surrogates (0xDC00-0xDFFF)
- Low surrogates without preceding high surrogates
Replaces them with \uFFFD (Unicode replacement character: �)
Preserves valid surrogate pairs (properly paired high+low surrogates for emoji, etc.)

Applied sanitization in handleCommand() before creating HTTP responses, ensuring all command results are safe for JSON serialization.

Implementation

function sanitizeLoneSurrogates(str: string): string {
  return str.replace(/[\uD800-\uDFFF]/g, (match, offset) => {
    const code = match.charCodeAt(0);
    // Check if it's part of a valid surrogate pair
    if (code >= 0xD800 && code <= 0xDBFF) {
      const next = str.charCodeAt(offset + 1);
      if (next >= 0xDC00 && next <= 0xDFFF) return match; // Valid pair
    }
    if (code >= 0xDC00 && code <= 0xDFFF) {
      const prev = str.charCodeAt(offset - 1);
      if (prev >= 0xD800 && prev <= 0xDBFF) return match; // Valid pair
    }
    return '\uFFFD'; // Replace lone surrogate
  });
}

Impact

✅ Prevents 400 errors when browsing pages with special Unicode characters
✅ No user-visible change for valid Unicode content (emoji, international text, etc.)
✅ Lone surrogates (which are invalid Unicode anyway) are replaced with � (standard replacement character)
✅ Users can now successfully capture/browse any page without worrying about Unicode edge cases

Testing

The fix handles:

Valid emoji and international characters (preserved)
Lone high surrogates → �
Lone low surrogates → �
Valid surrogate pairs (preserved)

🤖 Generated with Claude Code

^{Need help on this PR? Tag @codesmith with what you need.}

Let Codesmith autofix CI failures and bot reviews

…rrors Fixes garrytan#1440 When gstack captures pages containing lone Unicode surrogate characters (unpaired \uD800-\uDFFF range), JSON serialization fails with: "API Error: 400 The request body is not valid JSON: no low surrogate in string" This typically occurs with special characters, emoji, or malformed text in page content, screenshots, or DOM text that gets serialized and sent to the Claude API. ## Solution Added `sanitizeLoneSurrogates()` function that: - Detects lone surrogate characters (high surrogates without following low surrogates, or low surrogates without preceding high surrogates) - Replaces them with \uFFFD (Unicode replacement character) - Preserves valid surrogate pairs (properly paired high+low surrogates) Applied sanitization in `handleCommand()` before creating HTTP responses, ensuring all command results are safe for JSON serialization. ## Impact - Prevents 400 errors when browsing pages with special Unicode characters - No user-visible change for valid Unicode content - Lone surrogates (which are invalid Unicode anyway) are replaced with � Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

realcarsonterry · 2026-05-13T05:41:29Z

Partial CI Pass - 14/18 Tests Passing

Status: 4 eval tests failing (llm-judge, e2e-browse, e2e-deploy, e2e-qa-workflow) + report step. These appear to be flaky infrastructure tests, not code issues.

Impact: Fixes issue #1440 - Prevents API Error 400 when browsing pages with lone Unicode surrogate characters. Critical fix for users encountering 'no low surrogate in string' errors.

Implementation: Sanitizes lone surrogates to \uFFFD (Unicode replacement character) while preserving valid emoji and international text.

Note: The failing tests are unrelated to Unicode handling - they're deployment/workflow tests that have infrastructure dependencies.

realcarsonterry changed the title ~~fix: sanitize lone Unicode surrogates to prevent JSON serialization errors~~ v1.33.3.0 fix: sanitize lone Unicode surrogates to prevent JSON serialization errors May 13, 2026

realcarsonterry and others added 2 commits May 13, 2026 12:56

chore: bump VERSION to 1.33.3.0

35251b4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

realcarsonterry force-pushed the fix/unicode-surrogate-sanitization branch from 4b2c48c to 35251b4 Compare May 13, 2026 05:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.33.3.0 fix: sanitize lone Unicode surrogates to prevent JSON serialization errors#1463

v1.33.3.0 fix: sanitize lone Unicode surrogates to prevent JSON serialization errors#1463
realcarsonterry wants to merge 2 commits into
garrytan:mainfrom
realcarsonterry:fix/unicode-surrogate-sanitization

realcarsonterry commented May 13, 2026 •

edited by blacksmith-sh Bot

Loading

Uh oh!

realcarsonterry commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

realcarsonterry commented May 13, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Solution

Implementation

Impact

Testing

Uh oh!

realcarsonterry commented May 13, 2026

Partial CI Pass - 14/18 Tests Passing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

realcarsonterry commented May 13, 2026 •

edited by blacksmith-sh Bot

Loading