feat(browser-passthrough): VNC-based browser streaming in session pane#82
Conversation
…estart false-positives Zombies present at service startup are added to the reported set without being recorded in the fork-pressure ring buffer. Only zombies that appear after the first scan (i.e. growth over time) count toward the alert threshold. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…2026-05-14) Conflict resolutions: - benchmarks/: took upstream (canonical baseline) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a "Browser" tab to the session detail view that streams a per-session
virtual X11 display (Xvfb + x11vnc) to the browser via noVNC WebSocket proxy.
AI agents that open Chrome/Chromium automatically appear in this tab.
Core implementation:
- session/vnc/: VNCProcessManager with two-phase startup (StartDisplay before
tmux, StartServer after) so DISPLAY=:<N> is injected via tmux ExtraEnv
- Xvfb display allocator using O_CREATE|O_EXCL lock-file protocol (100-200)
- xdotool window tracker with adaptive 500ms/2s polling; x11vnc bound to
detected window for browser-only capture
- Go WebSocket proxy at /api/sessions/{id}/vnc with CloseRead() shutdown
- noVNC (@novnc/novnc) embedded in React with keep-alive mount pattern
- Graceful degradation: no-op manager + disabled tab on non-Linux or missing deps
- BrowserPassthroughConfig with *bool Enabled (opt-out semantics)
- InstanceFinder interface for O(1) live-session lookup (avoids SQLite on WS connect)
Also fixes ManagedProcess.Stop() deadlock regression: restores select{default}
guard so Stop() after Wait() doesn't block on an already-drained waitErr channel.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a VNC-based "Browser" tab to the session detail pane so users can view and interact with the agent's virtual X11 display from the web app. Backend allocates an Xvfb display per session and proxies x11vnc traffic over a WebSocket; the frontend embeds noVNC inside a keep-alive tab.
Changes:
- New
BrowserTab/NoVNCViewercomponents (and styles) plus a noVNC module declaration; tab is hidden/disabled when VNC isn't available. SessionDetailViewwires a newbrowsertab into the tab strip with disabled-state styling, whileSessionDetailTabtypes are extended.- Test stubs added (
SessionDetail.embedded,BrowserTab.test) to cover the new component.
Reviewed changes
Copilot reviewed 65 out of 72 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| web-app/src/types/novnc.d.ts | Minimal ambient RFB type for @novnc/novnc. |
| web-app/src/lib/pane/paneTypes.ts | Adds "browser" to SessionDetailTab. |
| web-app/src/components/sessions/SessionDetail.tsx | Mirrors "browser" in re-exported tab type. |
| web-app/src/components/sessions/SessionDetailView.tsx | Adds Browser tab + disabled-state styling and keep-alive tab panel. |
| web-app/src/components/sessions/NoVNCViewer.tsx | Manages RFB lifecycle (connect/quality/focus/blur). |
| web-app/src/components/sessions/BrowserTab.tsx | Status-driven wrapper with quality controls and hasBeenReadyRef keep-alive. |
| web-app/src/components/sessions/BrowserTab.css.ts | Vanilla-extract styles for the tab. |
| web-app/src/components/sessions/tests/BrowserTab.test.tsx | Unit tests for status handling and WS URL building. |
| web-app/src/components/sessions/tests/SessionDetail.embedded.test.tsx | Adjusts tab-panel assertion and mocks BrowserTab/NoVNCViewer. |
Files not reviewed (2)
- gen/proto/go/session/v1/types.pb.go: Language not supported
- web-app/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // VNCState is not yet in the generated proto types — access via cast until Epic 3 lands. | ||
| const vncState = (session as unknown as { vncState?: VNCState }).vncState; | ||
| const vncStatus = vncState?.status ?? VNCStatus.UNSPECIFIED; | ||
| const isBrowserAvailable = vncStatus !== VNCStatus.UNAVAILABLE && vncStatus !== VNCStatus.UNSPECIFIED; |
| aria-disabled={tab.disabled} | ||
| className={`${styles.tab} ${activeTab === tab.id ? styles.active : ""}`} | ||
| onClick={() => handleTabChange(tab.id)} | ||
| onClick={() => { if (!tab.disabled) handleTabChange(tab.id); }} | ||
| title={tab.disabled && tab.id === "browser" ? "Browser passthrough requires Linux with Xvfb, x11vnc, and xdotool" : undefined} | ||
| style={tab.disabled ? { opacity: 0.4, cursor: 'not-allowed' } : undefined} |
| function buildWsUrl(baseUrl: string, sessionId: string): string { | ||
| // Normalize: strip trailing slashes, then strip /api suffix if present. | ||
| // Handles cases where baseUrl ends with '/' or lacks the /api suffix, | ||
| // which would otherwise produce a doubled path like wss://host/api/api/sessions/... | ||
| const origin = baseUrl.replace(/\/+$/, '').replace(/\/api$/, ''); | ||
| const wsScheme = origin.startsWith('https') ? 'wss' : 'ws'; | ||
| const host = origin.replace(/^https?:\/\//, ''); | ||
| return `${wsScheme}://${host}/api/sessions/${sessionId}/vnc`; | ||
| } |
| // VNCState mirrors proto VNCState (session/v1/types.proto). | ||
| // TODO: replace with the proto-generated type from @/gen/session/v1/types_pb.ts | ||
| // once SessionDetailView passes the generated type through vncState prop. | ||
| export interface VNCState { | ||
| status?: number; // 0=UNSPECIFIED, 1=STARTING, 2=READY, 3=NO_BROWSER, 4=UNAVAILABLE | ||
| displayNumber?: number; | ||
| browserWindowDetected?: boolean; | ||
| } | ||
|
|
||
| export const VNCStatus = { | ||
| UNSPECIFIED: 0, | ||
| STARTING: 1, | ||
| READY: 2, | ||
| NO_BROWSER: 3, | ||
| UNAVAILABLE: 4, | ||
| } as const; |
Adds Chrome DevTools Protocol streaming alongside the existing Xvfb
virtual display, making the Browser tab functional on macOS and any
Linux system where x11vnc/xdotool are not installed.
How it works:
- CDPStreamManager allocates a free TCP port per session and creates
chrome wrapper scripts (google-chrome, chromium, etc.) in a temp dir
- The wrapper dir + CDP_PORT are injected into the session's ExtraEnv
so any Chrome invocation by the agent picks up --remote-debugging-port
- After tmux starts, CDPStreamManager polls http://localhost:<port>/json/version
until Chrome appears, then subscribes to Page.screencastFrame events
- Go relay at /api/sessions/{id}/cdp-stream forwards JPEG frames at 15fps
over WebSocket and routes client mouse/keyboard events back to Chrome
via Input.dispatch{Mouse,Key}Event
Frontend:
- CDPViewer.tsx replaces NoVNCViewer: pure canvas renderer using
createImageBitmap for JPEG frame display, no third-party library
- Sends CDP Input.* JSON messages for mouse and keyboard events
- Mouse coordinates scaled via getBoundingClientRect for accuracy
- Auto-reconnects with 2s backoff
macOS: Chrome runs on the real display, CDP streaming works without Xvfb
Linux: Xvfb still provides virtual display; CDP replaces x11vnc/xdotool
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ng Xvfb If DISPLAY is already set in the environment when a session starts (e.g. running inside a desktop X session, inside xvfb-run, or on a CI host that pre-allocates a display), skip Xvfb allocation entirely and reuse the existing display. The raw DISPLAY value is stored in passthroughDisplay and injected into the tmux session via VNCDisplayEnv() -> DisplayEnv(). Also adds DisplayEnv() string to the VNCProcessManager interface so callers get the correct DISPLAY= assignment regardless of whether the display was freshly allocated or passthrough, fixing the :0 edge case where DisplayNumber returns 0 and the old n <= 0 guard incorrectly suppressed the injection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… 1+2) Wave 1 — code quality and correctness: - Add writeMu to CDPStreamManager to serialize WebSocket writes - Fix input receiver goroutine shutdown via SetReadDeadline on context cancel - Fix doStartServer guard to allow passthrough display mode (displayN=0) - Add restartMu to eliminate TOCTOU race in x11vnc crash recovery - Add goroutineWg to track crash recovery goroutine and wait on Stop - Replace busy-sleep in crash recovery with context-aware ticker - Remove dead nil guard in initVNCManager - Extract chromeRetryDelay constant; remove unused port param from readLoop - Add sync.Once caching to CheckDependencies to avoid repeated LookPath - Remove CDPState.port from proto (security); consumers use CDPManager.Port() - Add VNC_STATUS_PASSTHROUGH enum value for existing-display case - Remove dead quality state/buttons from BrowserTab - Replace hand-rolled VNCState with proto-generated types throughout - Fix useCallback dependency array issue in CDPViewer renderFrame - Wrap RFB constructor in try/catch Wave 2 — tests and configuration: - Add session/vnc/display_alloc_test.go (15 tests: parseLockPID, isPIDAlive, Allocate/Release round-trip, full-range error, CleanupStaleDisplays) - Add session/cdp/manager_test.go (11 tests: Allocate, state, Stop safety, LatestFrame, callback, CheckDependencies caching, noop interface) - Add server/services/cdp_stream_handler_test.go and vnc_proxy_handler_test.go (3 tests each: 400/404/503 paths) - Add web-app CDPViewer.test.tsx (21 tests: WS lifecycle, getModifiers bitmask, reconnect cleanup, wsUrl boundary) - Add BrowserTab sticky-mount tests (2 tests: viewer stays mounted after VNC state regresses from READY) - Add tests/e2e/browser-passthrough.spec.ts with feature registry entry - Add BrowserPassthroughCDPConfig sub-struct with screencast tuning fields (quality, maxWidth, maxHeight, maxFPS) wired through to CDPConfig - Add ReconcileOrphans to CDPStreamManager for wrapper script cleanup on startup Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
UX (interaction and accessibility):
- Add onWheel handler forwarding scroll events as CDP Input.dispatchMouseEvent
mouseWheel — scroll was silently failing, violating FR-5
- Add onTouchStart/Move/End handlers translating touch to CDP mouse events,
making the feature functional on touch devices
- Wire onConnected/onDisconnected callbacks from CDPViewer to BrowserTab;
add connectionState ('connected'|'reconnecting'|'disconnected') with a
visible "Reconnecting…" banner and manual Reconnect button
- Add role="status" aria-live="polite" live region below the canvas for
screen reader visibility of connection state transitions
- Add BrowserTab.css.ts with vanilla-extract styles for reconnect banner
Engineering (registry compliance):
- Add // +api: browser:cdp-stream and // +api: browser:vnc-proxy markers
to handler files so make registry-generate can auto-track them
- Update CDPStream.json and VNCProxy.json: tested=true, testIds populated,
markerFound=true; regenerate all registry files via make registry-generate
- Write ADR-016: CDP screencasting over VNC — documents the pivot from
x11vnc+noVNC to Chrome DevTools Protocol, supersedes ADR-013 and ADR-015
- Write ADR-017: CDP port TOCTOU trade-off — documents OS-assigned port
allocation strategy and why the listen→close→Chrome-bind window is
acceptable on localhost
Product (requirements accuracy):
- Rewrite FR-2 through FR-5, FR-8, FR-9, FR-10 in requirements.md to
describe the CDP screencasting approach instead of VNC/noVNC
- Update NFR-1 through NFR-4 to match CDP reality
- Add Assumption Register section documenting Chrome PATH-wrapper assumption
- Update success criteria to reference --remote-debugging-port injection
- Add header note referencing ADR-016 revision
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Stages the full set of file changes from origin/main that were not included in the initial conflict-resolution commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Go Benchmarks (Tier 1) |
E2E RPC Latency |
…r main merge Replace sd.programs[""].readyRegexes references with direct struct fields (sd.readyRegexes, etc.) following the refactor that removed the per-program map in favour of flat compiled-regex fields on StatusDetector. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Frontend Terminal Throughput |
UX Analysis
|
🎬 E2E Feature Demos2 shard(s) recorded feature flows for this PR. recordings shard 1 Demo preview opens directly in browser (single-file HTML). Raw WebM recordings in ZIP. Expires after 30 days. |
✅ Registry ValidationTest Coverage: 86/114 features have
|
…ab-disabled style to CSS class - buildWsUrl now handles protocol-relative (//host/api), relative (/api), and already-ws:// inputs without producing broken WebSocket URLs - Tab disabled styling moved from inline style prop to vanilla-extract tabDisabled class in SessionDetail.css.ts - VNCState/VNCStatus local duplicates already removed (uses proto imports) - Added buildWsUrl unit tests covering all new edge cases (38 tests pass) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
✅ Registry ValidationTest Coverage: 86/114 features have
|
Feature is off unless config.json explicitly sets browser_passthrough.enabled=true. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
✅ Registry ValidationTest Coverage: 86/114 features have
|
Wire browser-passthrough into the feature flag system so it can be toggled
from Settings → Features without editing config.json directly.
- Add "browser-passthrough" to knownFeatureFlags in session_service.go
- initVNCManager/initCDPManager now accept full *config.Config and check
GetFeatureFlag("browser-passthrough") in addition to the struct field
- Feature flag OR explicit BrowserPassthroughConfig.Enabled enables the feature
- No FeatureController needed: flag is read at session creation from fresh config
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
✅ Registry ValidationTest Coverage: 86/114 features have
|
Summary
Architecture
Two-phase VNC startup (prevents DISPLAY race):
StartDisplay()allocates Xvfb beforetmux new-session—DISPLAY=:<N>injected viaExtraEnvStartServer()starts x11vnc + xdotool window tracker after tmux is liveKey components:
session/vnc/—VNCProcessManagerwith display allocator (O_CREATE|O_EXCL lock-file, display range 100-200), xdotool window tracker (adaptive 500ms/2s poll), crash-recovery goroutineserver/services/vnc_proxy_handler.go— Go WebSocket↔TCP relay at/api/sessions/{id}/vncusingCloseRead()for clean shutdown without pollingweb-app/.../BrowserTab.tsx— noVNC@novnc/novncembedded withnext/dynamic ssr:false; keep-alive mount pattern viahasBeenReadyRefInstanceFinderinterface for O(1) live-session lookup (no SQLite on every WebSocket upgrade)ADRs added: 013 (Xvfb+x11vnc stack), 014 (proxy-only auth), 015 (noVNC npm embedding)
Test plan
go build ./...passesgo test ./executor/...— 104 tests pass (including restored Stop-after-Wait deadlock regression)npx jest BrowserTab|SessionDetail.embedded— 21 tests passmake quick-check— exit 0Notes
Also fixes a
ManagedProcess.Stop()deadlock introduced by an earlier code review suggestion: restores theselect { default }guard soStop()afterWait()doesn't block forever on an already-drainedwaitErrchannel.🤖 Generated with Claude Code