Skip to content

fix(eval): swallow microsandbox 'egress frame too large' so the eval doesn't die mid-run#4

Closed
nickwinder wants to merge 1 commit into
mainfrom
nick/agentic-usability/egress-frame-fix
Closed

fix(eval): swallow microsandbox 'egress frame too large' so the eval doesn't die mid-run#4
nickwinder wants to merge 1 commit into
mainfrom
nick/agentic-usability/egress-frame-fix

Conversation

@nickwinder
Copy link
Copy Markdown
Contributor

Why

microsandbox's egressIntercept() runs its event-pump loop as a fire-and-forget async IIFE with try/finally — no catch. When the native binding throws "egress frame too large" (protocol parser hit a frame above its internal buffer), the rejection escapes as unhandledRejection and crashes the parent node process, taking out the whole eval. We've reproduced this consistently mid-suite — early test cases finish, then a later test's executor sandbox triggers the bug and the eval dies before any judging happens.

Summary

  • Install a process-level unhandledRejection handler at the CLI entry point that only catches errors with code === 'GenericFailure' AND message matching /egress frame too large/. Re-throw everything else so genuine unhandled rejections still crash loudly.
  • Egress logging for the affected test case stops (it's observability, not correctness — the eval can still execute, snapshot, and judge).
  • After this fix the same suite that previously died mid-run completes and all test cases produce judge results.

Single-file change to src/index.ts (+18 lines, no test changes — the bug originates in the native binding, hard to reproduce deterministically in unit tests).

…doesn't die mid-run

microsandbox's egressIntercept() runs its event-pump loop as a fire-and-forget
async IIFE with try/finally — no catch. When the native binding throws
"egress frame too large" (the protocol parser hit a frame above its internal
buffer), the rejection escapes as unhandledRejection and crashes the parent
node process, taking out the whole eval. We've reproduced this consistently
on TC-005 with the Nutrient Web SDK suite — TC-001 finishes, then the next
test's executor sandbox triggers the bug and the entire eval dies before any
judging happens.

Install a process-level unhandledRejection handler at the CLI entry point
that ONLY catches errors with code === 'GenericFailure' AND message matching
/egress frame too large/. Re-throw everything else so genuine unhandled
rejections still crash loudly. Egress logging for that one test case stops
(it's observability, not correctness — the eval can still execute, snapshot,
and judge).

After this fix the same TC-001+TC-005 run completes and both produce judge
results.
@nickwinder nickwinder self-assigned this May 14, 2026
@nickwinder
Copy link
Copy Markdown
Contributor Author

not required after fixing up claude oauth sessions

@nickwinder nickwinder closed this May 15, 2026
@nickwinder nickwinder deleted the nick/agentic-usability/egress-frame-fix branch May 15, 2026 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant