feat: enable Computer Use with macOS + Windows + Linux support#98
Conversation
Phase 1: Replace @ant/computer-use-mcp stub (12 files, 6517 lines). Phase 2: Remove 8 macOS-only guards in src/: - main.tsx: remove getPlatform()==='macos' check - swiftLoader.ts: remove darwin-only throw - executor.ts: extend platform guard, clipboard dispatch, paste key - drainRunLoop.ts: skip CFRunLoop pump on non-darwin - escHotkey.ts: non-darwin returns false (Ctrl+C fallback) - hostAdapter.ts: non-darwin permissions granted - common.ts: dynamic platform + screenshotFiltering - gates.ts: enabled:true, subscription check removed Phase 3: Add Linux backends (xdotool/scrot/xrandr/wmctrl): - computer-use-input/backends/linux.ts (173 lines) - computer-use-swift/backends/linux.ts (278 lines) Verified on Windows x64: mouse, screenshot, displays, foreground app. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughReworks Computer Use from macOS-only to a platform-dispatch architecture: adds Windows and Linux backends for input and swift, introduces MCP executor/tools/permissions/pixel-compare/OCR/UIA modules, enables CHICAGO_MCP in defaults, and removes macOS-only runtime guards across client utilities. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Dispatcher as Platform Dispatcher
participant Backend as Platform Backend
participant OSTools as OS Tooling
participant OS as Operating System
Client->>Dispatcher: call input/display/screenshot API
Dispatcher->>Dispatcher: detect process.platform & load backend
Dispatcher->>Backend: invoke normalized API (moveMouse/screenshot/ocr/...)
Backend->>OSTools: spawn platform command (osascript/xdotool/PowerShell)
OSTools->>OS: native API calls (CoreGraphics/Win32/X11)
OS-->>OSTools: bytes/json/stdout
OSTools-->>Backend: parse & normalize result
Backend-->>Dispatcher: return platform-agnostic result
Dispatcher-->>Client: deliver response
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
src/utils/computerUse/gates.ts (1)
12-21:⚠️ Potential issue | 🟠 MajorFail closed when the Chicago config is absent.
readConfig()overlays a partial payload on top ofDEFAULTS, so switching this default totruemeans any missing or staleenabledfield now enables Computer Use for every eligible user. That turns config misses into an implicit rollout.Suggested change
const DEFAULTS: ChicagoConfig = { - enabled: true, + enabled: false, pixelValidation: false, clipboardPasteMultiline: true, mouseAnimation: true,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/gates.ts` around lines 12 - 21, DEFAULTS currently enables Chicago by default which causes readConfig (which overlays partial payloads onto DEFAULTS) to implicitly turn features on when the config is missing or stale; change DEFAULTS.enabled to false so the system fails closed, and confirm the behavior in the overlay code that uses readConfig (refer to DEFAULTS, ChicagoConfig, and readConfig) still merges partial payloads but will not enable Computer Use when the config is absent or incomplete.build.ts (1)
13-19:⚠️ Potential issue | 🟠 MajorDon't ship
CHICAGO_MCPinDEFAULT_BUILD_FEATURES.This makes the flag effectively always-on in release builds, so
FEATURE_CHICAGO_MCP=1no longer controls rollout. Keep the build default toAGENT_TRIGGERS_REMOTEand let Chicago stay opt-in via env/dev tooling.Suggested change
-const DEFAULT_BUILD_FEATURES = ["AGENT_TRIGGERS_REMOTE", "CHICAGO_MCP"]; +const DEFAULT_BUILD_FEATURES = ["AGENT_TRIGGERS_REMOTE"];Based on learnings, build mode enables
AGENT_TRIGGERS_REMOTEonly, and specific features can be enabled viaFEATURE_<FLAG_NAME>=1environment variables.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@build.ts` around lines 13 - 19, DEFAULT_BUILD_FEATURES currently includes "CHICAGO_MCP", which forces that feature on in builds; remove "CHICAGO_MCP" so only "AGENT_TRIGGERS_REMOTE" remains in DEFAULT_BUILD_FEATURES, leaving Chicago opt-in via the FEATURE_CHICAGO_MCP env var flow that uses envFeatures and features; update the DEFAULT_BUILD_FEATURES declaration (and any nearby comment) to reflect this change so rollouts remain controlled by FEATURE_<FLAG_NAME> environment variables.src/utils/computerUse/drainRunLoop.ts (1)
61-79:⚠️ Potential issue | 🟠 MajorKeep the 30s timeout on non-macOS calls.
Line 62 skips not just the CFRunLoop pump but also the timeout/unhandled-rejection guard. A hung Windows/Linux backend call will now block forever instead of surfacing
computer-use native call exceeded 30000ms.Suggested change
export async function drainRunLoop<T>(fn: () => Promise<T>): Promise<T> { - if (process.platform !== 'darwin') return fn() - retain() + const needsPump = process.platform === 'darwin' + if (needsPump) retain() let timer: ReturnType<typeof setTimeout> | undefined try { // If the timeout wins the race, fn()'s promise is orphaned — a late @@ const timeout = withResolvers<never>() timer = setTimeout(timeoutReject, TIMEOUT_MS, timeout.reject) return await Promise.race([work, timeout.promise]) } finally { clearTimeout(timer) - release() + if (needsPump) release() } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/drainRunLoop.ts` around lines 61 - 79, The current early return in drainRunLoop skips the timeout/unhandled-rejection guard on non-darwin platforms; change drainRunLoop so the timeout logic (withResolvers, timer via setTimeout(timeoutReject, TIMEOUT_MS, ...), attaching work.catch(() => {}), and Promise.race) always runs, but only call retain() and release() (and any CFRunLoop-specific logic) when process.platform === 'darwin'; update the implementation around the symbols drainRunLoop, retain, release, withResolvers, timeoutReject, TIMEOUT_MS and ensure timer is cleared in finally so non-macOS calls still get the 30s timeout and unhandled-rejection protection.src/main.tsx (1)
1605-1623:⚠️ Potential issue | 🔴 CriticalDon't inject Chicago MCP after the enterprise MCP policy gate.
This block appends a
type: 'stdio'server todynamicMcpConfigafterdoesEnterpriseMcpConfigExist()/areMcpConfigsAllowedWithEnterpriseMcpConfig()have already run. That means orgs that intentionally forbid dynamic MCP servers still get Computer Use injected. Please re-run the policy check after this merge, or make the exemption explicit in the policy layer instead of bypassing it here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main.tsx` around lines 1605 - 1623, The Chicago MCP injection currently appends to dynamicMcpConfig and allowedTools after the enterprise MCP policy checks, bypassing org-level bans; modify the flow so you re-run the MCP policy check (use areMcpConfigsAllowedWithEnterpriseMcpConfig() or doesEnterpriseMcpConfigExist() / the existing policy gating functions) immediately after calling setupComputerUseMCP() and only merge mcpConfig / push cuTools when that re-check returns allowed, or alternatively move the Chicago setup to run before the policy decision and let the policy layer explicitly exempt or allow it; reference getChicagoEnabled, setupComputerUseMCP, dynamicMcpConfig, allowedTools, doesEnterpriseMcpConfigExist, and areMcpConfigsAllowedWithEnterpriseMcpConfig when making the change.
🟠 Major comments (16)
packages/@ant/computer-use-mcp/src/keyBlocklist.ts-124-128 (1)
124-128:⚠️ Potential issue | 🟠 MajorHandle Linux system shortcuts in this API.
This package is now three-platform, but
isSystemKeyCombo()still only accepts"darwin" | "win32". That leaves the Linux backend without a safe way to enforce thesystemKeyCombosgrant, so common window-manager shortcuts can slip through.Suggested change
+const BLOCKED_LINUX = new Set([ + "alt+f4", + "alt+tab", + "meta+l", + "meta+d", +]); + export function isSystemKeyCombo( seq: string, - platform: "darwin" | "win32", + platform: "darwin" | "win32" | "linux", ): boolean { - const blocklist = platform === "darwin" ? BLOCKED_DARWIN : BLOCKED_WIN32; + const blocklist = + platform === "darwin" + ? BLOCKED_DARWIN + : platform === "linux" + ? BLOCKED_LINUX + : BLOCKED_WIN32;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-mcp/src/keyBlocklist.ts around lines 124 - 128, Update isSystemKeyCombo to accept Linux as a platform and enforce Linux shortcuts: change the platform type union to include "linux" (or use the broader platform type), add or reference a BLOCKED_LINUX set, and branch similarly to the existing BLOCKED_DARWIN/BLOCKED_WIN32 logic so the function uses BLOCKED_LINUX when platform === "linux"; ensure callers are updated where necessary to pass "linux" so the Linux backend correctly enforces systemKeyCombos.src/utils/computerUse/executor.ts-300-302 (1)
300-302:⚠️ Potential issue | 🟠 MajorWindows/Linux still inherit the macOS host-ID path.
Turning on
win32/linuxhere sends those platforms through the existinggetTerminalBundleId()/CLI_HOST_BUNDLE_IDflow unchanged. That flow only produces macOS bundle IDs or the darwin sentinel, but the new cross-platform input contract already uses non-bundle identifiers on Windows/Linux.prepareDisplay,previewHideSet, and screenshot exclusion therefore can't reliably recognize the host terminal on the new platforms.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/executor.ts` around lines 300 - 302, createCliExecutor currently lets win32/linux flow through the macOS bundle-ID path (getTerminalBundleId()/CLI_HOST_BUNDLE_ID), causing host identification mismatches; update createCliExecutor to branch by platform so that for 'darwin' it uses getTerminalBundleId()/CLI_HOST_BUNDLE_ID as before, but for 'win32' and 'linux' it assigns the new non-bundle host identifier used by the cross-platform input contract (and does not call getTerminalBundleId), and ensure callers that rely on host identity (prepareDisplay, previewHideSet, screenshot exclusion logic) check both macOS bundle IDs and the new Windows/Linux host identifier so the terminal is recognized correctly on all platforms.packages/@ant/computer-use-mcp/src/sentinelApps.ts-13-36 (1)
13-36:⚠️ Potential issue | 🟠 MajorExtend sentinel IDs beyond macOS.
This catalog is still bundle-id-only. With the new Windows/Linux backends, shell/file-system/system-settings apps on those platforms will never match these sets, so the approval UI loses its escalation warning exactly where the PR is adding support. Please add Windows/Linux sentinel IDs or normalize app identities before categorization.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-mcp/src/sentinelApps.ts around lines 13 - 36, The current sentinel sets (SHELL_ACCESS_BUNDLE_IDS, FILESYSTEM_ACCESS_BUNDLE_IDS, SYSTEM_SETTINGS_BUNDLE_IDS) and exported SENTINEL_BUNDLE_IDS only contain macOS bundle IDs so non-macOS backends won't match; update sentinel logic by either (A) expanding these sets to include Windows and Linux identifiers (executable names, package IDs, WSL/Flatpak/Snap IDs) for common shells/file-managers/system-settings, and/or (B) add a normalization layer when categorizing apps that maps platform-specific identity fields (e.g., process executable name, binary path, package name) into a canonical key before membership checks against SENTINEL_BUNDLE_IDS; modify the categorization code that uses SentinelCategory ("shell" | "filesystem" | "system_settings") to consult the normalized identity or the extended sets so approvals on Windows/Linux produce the same escalation warnings.scripts/dev.ts-18-18 (1)
18-18:⚠️ Potential issue | 🟠 MajorKeep
CHICAGO_MCPopt-in in dev.This widens the fixed dev baseline and makes Computer Use come up on every
bun run devwith no opt-out path. Please keepDEFAULT_FEATURESto the documented four defaults and requireFEATURE_CHICAGO_MCP=1for this flag.♻️ Proposed fix
-const DEFAULT_FEATURES = ["BUDDY", "TRANSCRIPT_CLASSIFIER", "BRIDGE_MODE", "AGENT_TRIGGERS_REMOTE", "CHICAGO_MCP"]; +const DEFAULT_FEATURES = [ + "BUDDY", + "TRANSCRIPT_CLASSIFIER", + "BRIDGE_MODE", + "AGENT_TRIGGERS_REMOTE", +];As per coding guidelines,
scripts/dev.tsshould enable onlyBUDDY,TRANSCRIPT_CLASSIFIER,BRIDGE_MODE, andAGENT_TRIGGERS_REMOTEby default, and other features should be enabled viaFEATURE_<FLAG_NAME>=1.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/dev.ts` at line 18, Remove "CHICAGO_MCP" from the DEFAULT_FEATURES array so DEFAULT_FEATURES only contains "BUDDY", "TRANSCRIPT_CLASSIFIER", "BRIDGE_MODE", and "AGENT_TRIGGERS_REMOTE"; then add a runtime check for process.env.FEATURE_CHICAGO_MCP === "1" and, if true, push "CHICAGO_MCP" onto the features list (e.g. in the same initialization logic that uses DEFAULT_FEATURES) so the flag is opt-in via FEATURE_CHICAGO_MCP=1.src/utils/computerUse/executor.ts-71-79 (1)
71-79:⚠️ Potential issue | 🟠 MajorUse stdin/stdout for Windows clipboard operations.
The Windows write path at line 100 embeds clipboard text directly in the PowerShell
-Commandargument, exposing it to process listing and audit logs. The read path at line 72 is safe. Both Linux (xclip) and macOS (pbcopy) already use stdin/stdout correctly. Adopt the same pattern for Windows to keep the payload out of argv and ensure consistent behavior across platforms.Proposed changes
if (process.platform === 'win32') { const { stdout, code } = await execFileNoThrow('powershell', ['-NoProfile', '-Command', 'Get-Clipboard'], { useCwd: false, })For the write path, replace the template literal approach with stdin:
if (process.platform === 'win32') { - const { code } = await execFileNoThrow('powershell', ['-NoProfile', '-Command', `Set-Clipboard -Value '${text.replace(/'/g, "''")}'`], { + const { code } = await execFileNoThrow('powershell', ['-NoProfile', '-Command', 'Set-Clipboard -Value ([Console]::In.ReadToEnd())'], { + input: text, useCwd: false, })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/executor.ts` around lines 71 - 79, The Windows clipboard write path currently embeds the payload into the PowerShell '-Command' argv (exposing it in process lists); instead, modify the write branch to call execFileNoThrow('powershell', ['-NoProfile', '-Command', 'Set-Clipboard -Value ([Console]::In.ReadToEnd())'], { useCwd: false, input: text }) (or equivalent) so the clipboard text is sent via stdin rather than an argv template; keep the same exit code check and error handling used in the read path (which calls execFileNoThrow with 'Get-Clipboard') and reuse the same options (useCwd: false) to maintain consistent behavior.packages/@ant/computer-use-mcp/src/tools.ts-40-103 (1)
40-103:⚠️ Potential issue | 🟠 MajorBatch actions never tell the model which coordinate mode to use.
Because
BATCH_ACTION_ITEM_SCHEMAis hard-coded outsidebuildComputerUseTools(),computer_batch,teach_step.actions, andteach_batch.steps[].actionsonly expose generic(x, y)tuples. Innormalized_0_100mode, batched clicks and drags are ambiguous even though the non-batch tools are parameterized bycoordinateMode.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-mcp/src/tools.ts around lines 40 - 103, BATCH_ACTION_ITEM_SCHEMA is defined globally and omits any coordinateMode, so batch endpoints (computer_batch, teach_step.actions, teach_batch.steps[].actions) only accept raw (x,y) tuples and become ambiguous for normalized_0_100 coordinates; move or extend the schema inside buildComputerUseTools() (or augment it there) to include a coordinateMode field (e.g., "coordinateMode" enum with values like "pixels" and "normalized_0_100") and update BATCH_ACTION_ITEM_SCHEMA (or the schema returned by buildComputerUseTools) to accept coordinateMode and interpret coordinate/start_coordinate accordingly so batched clicks/drags are unambiguous. Ensure the tools that produce/consume these actions (referenced by buildComputerUseTools, computer_batch, teach_step.actions, teach_batch.steps[].actions) validate and propagate coordinateMode with each action.packages/@ant/computer-use-input/src/backends/linux.ts-14-27 (1)
14-27:⚠️ Potential issue | 🟠 MajorDon't treat failed helper commands as success.
These wrappers ignore exit status and stderr. A failed child process currently looks like an empty, successful result, so callers can no-op or return bogus data while the tool call still appears to have worked.
🛠️ Suggested fix
function run(cmd: string[]): string { const result = Bun.spawnSync({ cmd, stdout: 'pipe', stderr: 'pipe', }) + if (result.exitCode !== 0) { + throw new Error( + `${cmd.join(' ')} exited ${result.exitCode}: ${new TextDecoder().decode(result.stderr).trim()}`, + ) + } return new TextDecoder().decode(result.stdout).trim() }Mirror the same exit-code check in
runAsync().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-input/src/backends/linux.ts around lines 14 - 27, The current helpers run and runAsync ignore exit status and stderr, treating failed child processes as empty successes; update both functions (run and runAsync) to check the process exit code (and/or error properties) and stderr, and throw an Error (including the command, exit code and stderr/stdout) when the child exits non‑zero or reports an error so callers no longer receive silent bogus results; for run, inspect the Bun.spawnSync return (status/code/stderr) and throw on non-zero, and for runAsync, await proc.exited, inspect proc.exitCode or the resolved exit status and stderr, then throw with contextual details if the command failed.packages/@ant/computer-use-input/src/backends/win32.ts-199-201 (1)
199-201:⚠️ Potential issue | 🟠 MajorEscape
SendKeysmetacharacters before typing arbitrary text.
SendWait()treats characters like+,^,%,~,(,),{, and}as control tokens. Raw text containing them will not be typed literally. Enclose each in braces:{+},{^},{%},{~},{(},{)}, and use doubled braces for literal braces:{{and}}.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-input/src/backends/win32.ts around lines 199 - 201, The typeText function currently only escapes single quotes for the PowerShell string; update it to also escape SendKeys metacharacters so they are typed literally: in typeText, before calling ps(...SendWait('${escaped}') ), transform the input text so every one of these characters + ^ % ~ ( ) is replaced with a braced literal ({+}, {^}, {%}, {~}, {(}, {)}), and ensure literal braces become doubled (replace { with {{ and } with }} or equivalent) before you escape single quotes for PowerShell; then use that transformed value in the existing escaped variable passed to ps to preserve the PowerShell quoting.packages/@ant/computer-use-input/src/index.ts-50-59 (1)
50-59:⚠️ Potential issue | 🟠 Major
isSupportedshould not be based on module load alone.This flips true as soon as the platform file can be required, even when runtime prerequisites are missing. On Linux, for example, the feature will advertise support before we've verified the xdotool/X11 path actually works. Similarly, macOS relies on osascript and Windows on powershell—all of which may be unavailable at runtime, but the backends perform no upfront verification. Failures only occur when functions are invoked.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-input/src/index.ts around lines 50 - 59, isSupported currently flips true when the platform module loads (backend !== null) even if runtime prerequisites are missing; change the contract so the platform backends perform an explicit runtime verification and expose it, and update the top-level isSupported to reflect that verification. Concretely: have each platform backend export an isSupported boolean or async function that actually checks required runtime tools (e.g., xdotool/osascript/powershell), then replace the current isSupported = backend !== null with a call to backend.isSupported (or await backend.isSupported() if async) so exports like moveMouse, key, keys, mouseLocation, mouseButton, mouseScroll, typeText, getFrontmostAppInfo still default to unsupported but isSupported accurately reflects runtime capability.packages/@ant/computer-use-input/src/backends/darwin.ts-50-58 (1)
50-58:⚠️ Potential issue | 🟠 Major
key()doesn't support held keys on macOS —releaseis a no-op.AppleScript's
System Eventskey codeandkeystrokecommands synthesize a complete key press (down + up) and do not provide separate key-down and key-up events for regular character keys. Thereleaseaction has no effect, so any caller expecting to hold a key will only trigger a tap onpress.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-input/src/backends/darwin.ts around lines 50 - 58, The current key function ignores the 'release' action and only synthesizes a full key press via AppleScript, so held keys cannot be represented; update the key(InputBackend['key']) implementation to stop returning early for 'release' and instead emit explicit key-down and key-up events using macOS Quartz/CGEvent APIs (create a helper like sendKeyEvent(keyCode: number, down: boolean) and for non-key-code characters implement sendUnicodeKeyEvent/unicode string via CGEventKeyboardSetUnicodeString), call sendKeyEvent(KEY_MAP[lower], true) on 'press' and sendKeyEvent(..., false) on 'release', and for characters that require Unicode use the unicode helper for both down/up; ensure the helper is awaited and preserves existing KEY_MAP lookup logic.packages/@ant/computer-use-swift/src/backends/darwin.ts-141-143 (1)
141-143:⚠️ Potential issue | 🟠 MajorDon't hardcode every macOS window to display 1.
Returning
[1]for every bundle ID breaks multi-monitor display resolution, and it doesn't even line up with this backend'slistAll()IDs, which are CG display IDs rather than a fixed1.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-swift/src/backends/darwin.ts around lines 141 - 143, The findWindowDisplays implementation incorrectly returns a hardcoded [1] for every bundleId; update the findWindowDisplays(bundleIds) function to retrieve actual CG display IDs for each app's windows (e.g., via the same system APIs used in this backend's listAll() or CGWindowList/CGDisplay APIs), map each window to its CG display ID, de-duplicate the IDs, and return { bundleId, displayIds: [...] } with those real display IDs so results line up with listAll() and support multi-monitor setups.packages/@ant/computer-use-swift/src/backends/linux.ts-244-277 (1)
244-277:⚠️ Potential issue | 🟠 MajorDon't reuse a global screenshot path.
Both capture paths write to
/tmp/cu-screenshot.png. Ifscrotfails, the next read can return stale bytes from a previous capture, and overlapping captures can clobber each other. Use a per-call temp file and clean it up infinally.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-swift/src/backends/linux.ts around lines 244 - 277, The code currently reuses the global SCREENSHOT_PATH which can return stale data or cause clobbering; change both captureExcluding and captureRegion to create a unique per-call temp file (instead of using SCREENSHOT_PATH) when invoking runAsync (e.g., include a UUID/timestamp in the filename), read that temp file into base64 as before, and ensure you delete the temp file in a finally block so it’s always cleaned up; update any references to SCREENSHOT_PATH inside captureExcluding and captureRegion to use the new per-call temp path and keep the existing error handling/return shapes.packages/@ant/computer-use-swift/src/backends/linux.ts-96-98 (1)
96-98:⚠️ Potential issue | 🟠 MajorImplement actual window-to-display mapping on Linux.
Returning
displayIds: [0]for every app makes multi-monitor targeting incorrect as soon as the window is not on display 0.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-swift/src/backends/linux.ts around lines 96 - 98, The current stub findWindowDisplays in function findWindowDisplays returns displayIds: [0] for every bundle and must be replaced with real Linux logic: for each bundleId, enumerate windows (using X11/XCB or Wayland APIs), match windows to the application (e.g., WM_CLASS/NET_WM_PID or process name), get each window's geometry and map it to the monitor using XRandR/Monitor output or Wayland output geometry, build WindowDisplayInfo objects with the actual display IDs (or monitor indices) and return them; ensure the async signature is preserved, handle missing windows by returning an empty displayIds array, and keep error handling and type compatibility with WindowDisplayInfo.packages/@ant/computer-use-swift/src/backends/win32.ts-91-93 (1)
91-93:⚠️ Potential issue | 🟠 MajorImplement real window-to-display resolution here.
Returning
displayIds: [0]for every app makes any window on a secondary monitor look like it's on the primary display, so auto-targeting and display-aware screenshots will pick the wrong monitor.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-swift/src/backends/win32.ts around lines 91 - 93, The current stub in findWindowDisplays always returns displayIds: [0], which misreports windows on secondary monitors; replace it with real Win32-based resolution: for each bundleId use EnumWindows to enumerate top-level windows, match windows to processes via GetWindowThreadProcessId, check the process executable/command to correlate with bundleId (or compare process PID mapping you already maintain), then for each matched window call MonitorFromWindow (or MonitorFromRect/MonitorFromPoint) and GetMonitorInfo to obtain a monitor identifier/index; collect unique monitor IDs per bundleId and return them as displayIds. Ensure findWindowDisplays handles multiple windows per app (de-duplicate display IDs), ignores invisible/minimized windows, and returns an empty array if no windows are found.packages/@ant/computer-use-swift/src/backends/darwin.ts-145-153 (1)
145-153:⚠️ Potential issue | 🟠 MajorUse the supplied point in
appUnderPoint().This ignores
xandyand always reportsfrontmostApplication. A click on a background window or another monitor will be attributed to the wrong app, weakening the per-app permission gate.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-swift/src/backends/darwin.ts around lines 145 - 153, The appUnderPoint(_x, _y) implementation ignores the supplied coordinates and always returns the frontmost app; update the jxa snippet in appUnderPoint to hit-test the screen point and resolve the app owning the window under that point: call CoreGraphics window APIs (e.g., CGWindowListCreateWindowAtPoint / CGWindowListCreateDescriptionFromArray or CGWindowListCopyWindowInfo filtered by point) to obtain the window's owner PID, then use NSRunningApplication.runningApplicationWithProcessIdentifier or NSWorkspace to get the bundleIdentifier/localizedName for that PID and JSON.stringify that result instead of using NSWorkspace.sharedWorkspace.frontmostApplication. Ensure the code uses the _x/_y CGPoint (pt) you already construct.packages/@ant/computer-use-mcp/src/executor.ts-45-48 (1)
45-48:⚠️ Potential issue | 🟠 MajorAdd Linux to the public platform union.
This PR adds Linux support, but
ComputerExecutorCapabilities.platformstill only allows"darwin" | "win32". That makes a correct Linux executor impossible to type.💡 Minimal fix
export interface ComputerExecutorCapabilities { screenshotFiltering: 'native' | 'none' - platform: 'darwin' | 'win32' + platform: 'darwin' | 'win32' | 'linux' hostBundleId: string }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/`@ant/computer-use-mcp/src/executor.ts around lines 45 - 48, The ComputerExecutorCapabilities interface's platform union is missing Linux; update the platform type in ComputerExecutorCapabilities to include 'linux' (i.e., 'darwin' | 'win32' | 'linux') so Linux executors can be correctly typed, and search for any usages of ComputerExecutorCapabilities or platform checks to ensure they handle the new 'linux' variant where necessary (e.g., switch/case or conditional logic in executor.ts).
🟡 Minor comments (2)
docs/features/computer-use.md-178-195 (1)
178-195:⚠️ Potential issue | 🟡 MinorAdd a language to this fenced block.
markdownlintflags unlabeled fences here;textis enough if this is meant to stay a plain execution plan.Suggested change
-``` +```text Phase 2(解锁 macOS + Windows) ├── 2.1-2.3 移除 3 处硬编码 throw/skip ├── 2.4-2.5 剪贴板 + 粘贴快捷键平台分发 @@ Phase 4(集成验证 + PR) -``` +```🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/features/computer-use.md` around lines 178 - 195, The fenced code block in the docs (the triple-backtick block showing Phase 2/3/4) is unlabeled and triggers markdownlint; update the opening fence to include a language tag (e.g., change the opening "```" to "```text") so the block is treated as plain text and linting passes, leaving the block contents and closing "```" unchanged.DEV-LOG.md-25-33 (1)
25-33:⚠️ Potential issue | 🟡 MinorAdd a language to this fenced block.
Markdownlint is already flagging this block with MD040. Adding
texthere keeps docs lint clean.📝 Proposed fix
-``` +```text packages/@ant/computer-use-{input,swift}/src/ ├── index.ts ← dispatcher ├── types.ts ← 共享接口 └── backends/ ├── darwin.ts ← macOS AppleScript(原样拆出,不改逻辑) ├── win32.ts ← Windows PowerShell └── linux.ts ← Linux xdotool/scrot/xrandr/wmctrl</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@DEV-LOG.mdaround lines 25 - 33, The fenced code block containing the file
tree starting with "packages/@ant/computer-use-{input,swift}/src/" should
include a language tag to satisfy markdownlint MD040; change the opening fence
fromtotext so it reads ```text and leave the block contents unchanged
(the tree lines including index.ts, types.ts and backends/*).</details> </blockquote></details> </blockquote></details> <details> <summary>🧹 Nitpick comments (1)</summary><blockquote> <details> <summary>src/utils/computerUse/gates.ts (1)</summary><blockquote> `39-55`: **Remove the dead entitlement helper.** `hasRequiredSubscription()` is now a constant `true`, so `getChicagoEnabled()` still reads like it enforces entitlements when it doesn't. Inlining the condition will make future rollout changes easier to reason about. <details> <summary>Suggested change</summary> ```diff -function hasRequiredSubscription(): boolean { - return true -} - export function getChicagoEnabled(): boolean { @@ - return hasRequiredSubscription() && readConfig().enabled + return readConfig().enabled } ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@src/utils/computerUse/gates.ts` around lines 39 - 55, Remove the dead helper hasRequiredSubscription() and inline its result into getChicagoEnabled(): delete or stop calling hasRequiredSubscription() and replace the final return to directly use the true/entitlement expression (i.e., remove the hasRequiredSubscription() call and rely on readConfig().enabled plus the existing ant/monorepo check); update references to hasRequiredSubscription() in this file (if any) and keep isEnvTruthy(process.env.ALLOW_ANT_COMPUTER_USE_MCP) and readConfig().enabled as the gating conditions in getChicagoEnabled(). ``` </details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Inline comments:
In@packages/@ant/computer-use-mcp/src/mcpServer.ts:
- Around line 272-287: The CallTool handler currently calls dispatch regardless
of adapter state, so disabled adapters still execute tools; update the handler
registered via server.setRequestHandler(CallToolRequestSchema, ...) to first
check adapter.isDisabled() (or the same gating used for ListTools) and
short-circuit with an appropriate failure/empty response (e.g., a blocked error
or empty result) when disabled; ensure this check is applied before calling
bindSessionContext/dispatch so bindSessionContext and dispatch are never invoked
for disabled adapters.In
@packages/@ant/computer-use-mcp/src/pixelCompare.ts:
- Around line 102-106: comparePixelAtLocation currently returns false when the
rect or cropped patches are missing (e.g., from crop failures), which makes
validateClickTarget treat internal errors as real mismatches; change
comparePixelAtLocation to return a distinct "skip" sentinel (null) instead of
false when rect/crop/patches are unavailable (see the checks around
crop(lastScreenshot.base64, rect) and crop(freshScreenshot.base64, rect)), and
update the other identical block later (lines ~147-164) to do the same so
callers like validateClickTarget can detect null and skip validation rather than
aborting the click.In
@packages/@ant/computer-use-swift/src/backends/darwin.ts:
- Around line 160-181: The listInstalled() implementation is fabricating
bundleIds and only scans top-level Applications; update it to enumerate real app
bundles (include /Applications, /System/Applications and subfolders) and for
each app use a system query (e.g., mdls or reading Contents/Info.plist) via the
existing osascript call or a child-process call to extract CFBundleIdentifier,
then return that identifier in the returned objects; ensure the mapping in
listInstalled() returns bundleId, displayName and path using the real
CFBundleIdentifier so it matches open() and listRunning().In
@packages/@ant/computer-use-swift/src/backends/linux.ts:
- Around line 124-157: listInstalled() currently sets bundleId to the .desktop
filename, which is inconsistent with listRunning() and appUnderPoint() that use
the executable path or PID fallback; change listInstalled() (in the InstalledApp
creation) to derive bundleId from the Exec entry (the same value you set into
path: exec.split(/\s+/)[0]) so bundleId matches running-app IDs, and only fall
back to the .desktop basename if Exec is missing or empty; keep displayName from
Name and preserve the existing NoDisplay filter.In
@packages/@ant/computer-use-swift/src/backends/win32.ts:
- Around line 123-146: listInstalled() currently sets bundleId to PSChildName
which doesn't match how listRunning(), appUnderPoint(), and open() identify apps
(by executable path); update listInstalled() to use the install executable path
as the bundleId (i.e., set bundleId to the parsed InstallLocation/executable
path instead of PSChildName), ensure the mapped object uses path as the
canonical identifier (falling back to name if path is missing), and keep
displayName populated from DisplayName so grants and open() will match
running/frontmost apps across listRunning(), appUnderPoint(), and open().
Outside diff comments:
In@build.ts:
- Around line 13-19: DEFAULT_BUILD_FEATURES currently includes "CHICAGO_MCP",
which forces that feature on in builds; remove "CHICAGO_MCP" so only
"AGENT_TRIGGERS_REMOTE" remains in DEFAULT_BUILD_FEATURES, leaving Chicago
opt-in via the FEATURE_CHICAGO_MCP env var flow that uses envFeatures and
features; update the DEFAULT_BUILD_FEATURES declaration (and any nearby comment)
to reflect this change so rollouts remain controlled by FEATURE_<FLAG_NAME>
environment variables.In
@src/main.tsx:
- Around line 1605-1623: The Chicago MCP injection currently appends to
dynamicMcpConfig and allowedTools after the enterprise MCP policy checks,
bypassing org-level bans; modify the flow so you re-run the MCP policy check
(use areMcpConfigsAllowedWithEnterpriseMcpConfig() or
doesEnterpriseMcpConfigExist() / the existing policy gating functions)
immediately after calling setupComputerUseMCP() and only merge mcpConfig / push
cuTools when that re-check returns allowed, or alternatively move the Chicago
setup to run before the policy decision and let the policy layer explicitly
exempt or allow it; reference getChicagoEnabled, setupComputerUseMCP,
dynamicMcpConfig, allowedTools, doesEnterpriseMcpConfigExist, and
areMcpConfigsAllowedWithEnterpriseMcpConfig when making the change.In
@src/utils/computerUse/drainRunLoop.ts:
- Around line 61-79: The current early return in drainRunLoop skips the
timeout/unhandled-rejection guard on non-darwin platforms; change drainRunLoop
so the timeout logic (withResolvers, timer via setTimeout(timeoutReject,
TIMEOUT_MS, ...), attaching work.catch(() => {}), and Promise.race) always runs,
but only call retain() and release() (and any CFRunLoop-specific logic) when
process.platform === 'darwin'; update the implementation around the symbols
drainRunLoop, retain, release, withResolvers, timeoutReject, TIMEOUT_MS and
ensure timer is cleared in finally so non-macOS calls still get the 30s timeout
and unhandled-rejection protection.In
@src/utils/computerUse/gates.ts:
- Around line 12-21: DEFAULTS currently enables Chicago by default which causes
readConfig (which overlays partial payloads onto DEFAULTS) to implicitly turn
features on when the config is missing or stale; change DEFAULTS.enabled to
false so the system fails closed, and confirm the behavior in the overlay code
that uses readConfig (refer to DEFAULTS, ChicagoConfig, and readConfig) still
merges partial payloads but will not enable Computer Use when the config is
absent or incomplete.
Major comments:
In@packages/@ant/computer-use-input/src/backends/darwin.ts:
- Around line 50-58: The current key function ignores the 'release' action and
only synthesizes a full key press via AppleScript, so held keys cannot be
represented; update the key(InputBackend['key']) implementation to stop
returning early for 'release' and instead emit explicit key-down and key-up
events using macOS Quartz/CGEvent APIs (create a helper like
sendKeyEvent(keyCode: number, down: boolean) and for non-key-code characters
implement sendUnicodeKeyEvent/unicode string via
CGEventKeyboardSetUnicodeString), call sendKeyEvent(KEY_MAP[lower], true) on
'press' and sendKeyEvent(..., false) on 'release', and for characters that
require Unicode use the unicode helper for both down/up; ensure the helper is
awaited and preserves existing KEY_MAP lookup logic.In
@packages/@ant/computer-use-input/src/backends/linux.ts:
- Around line 14-27: The current helpers run and runAsync ignore exit status and
stderr, treating failed child processes as empty successes; update both
functions (run and runAsync) to check the process exit code (and/or error
properties) and stderr, and throw an Error (including the command, exit code and
stderr/stdout) when the child exits non‑zero or reports an error so callers no
longer receive silent bogus results; for run, inspect the Bun.spawnSync return
(status/code/stderr) and throw on non-zero, and for runAsync, await proc.exited,
inspect proc.exitCode or the resolved exit status and stderr, then throw with
contextual details if the command failed.In
@packages/@ant/computer-use-input/src/backends/win32.ts:
- Around line 199-201: The typeText function currently only escapes single
quotes for the PowerShell string; update it to also escape SendKeys
metacharacters so they are typed literally: in typeText, before calling
ps(...SendWait('${escaped}') ), transform the input text so every one of these
characters + ^ % ~ ( ) is replaced with a braced literal ({+}, {^}, {%}, {~},
{(}, {)}), and ensure literal braces become doubled (replace { with {{ and }
with }} or equivalent) before you escape single quotes for PowerShell; then use
that transformed value in the existing escaped variable passed to ps to preserve
the PowerShell quoting.In
@packages/@ant/computer-use-input/src/index.ts:
- Around line 50-59: isSupported currently flips true when the platform module
loads (backend !== null) even if runtime prerequisites are missing; change the
contract so the platform backends perform an explicit runtime verification and
expose it, and update the top-level isSupported to reflect that verification.
Concretely: have each platform backend export an isSupported boolean or async
function that actually checks required runtime tools (e.g.,
xdotool/osascript/powershell), then replace the current isSupported = backend
!== null with a call to backend.isSupported (or await backend.isSupported() if
async) so exports like moveMouse, key, keys, mouseLocation, mouseButton,
mouseScroll, typeText, getFrontmostAppInfo still default to unsupported but
isSupported accurately reflects runtime capability.In
@packages/@ant/computer-use-mcp/src/executor.ts:
- Around line 45-48: The ComputerExecutorCapabilities interface's platform union
is missing Linux; update the platform type in ComputerExecutorCapabilities to
include 'linux' (i.e., 'darwin' | 'win32' | 'linux') so Linux executors can be
correctly typed, and search for any usages of ComputerExecutorCapabilities or
platform checks to ensure they handle the new 'linux' variant where necessary
(e.g., switch/case or conditional logic in executor.ts).In
@packages/@ant/computer-use-mcp/src/keyBlocklist.ts:
- Around line 124-128: Update isSystemKeyCombo to accept Linux as a platform and
enforce Linux shortcuts: change the platform type union to include "linux" (or
use the broader platform type), add or reference a BLOCKED_LINUX set, and branch
similarly to the existing BLOCKED_DARWIN/BLOCKED_WIN32 logic so the function
uses BLOCKED_LINUX when platform === "linux"; ensure callers are updated where
necessary to pass "linux" so the Linux backend correctly enforces
systemKeyCombos.In
@packages/@ant/computer-use-mcp/src/sentinelApps.ts:
- Around line 13-36: The current sentinel sets (SHELL_ACCESS_BUNDLE_IDS,
FILESYSTEM_ACCESS_BUNDLE_IDS, SYSTEM_SETTINGS_BUNDLE_IDS) and exported
SENTINEL_BUNDLE_IDS only contain macOS bundle IDs so non-macOS backends won't
match; update sentinel logic by either (A) expanding these sets to include
Windows and Linux identifiers (executable names, package IDs, WSL/Flatpak/Snap
IDs) for common shells/file-managers/system-settings, and/or (B) add a
normalization layer when categorizing apps that maps platform-specific identity
fields (e.g., process executable name, binary path, package name) into a
canonical key before membership checks against SENTINEL_BUNDLE_IDS; modify the
categorization code that uses SentinelCategory ("shell" | "filesystem" |
"system_settings") to consult the normalized identity or the extended sets so
approvals on Windows/Linux produce the same escalation warnings.In
@packages/@ant/computer-use-mcp/src/tools.ts:
- Around line 40-103: BATCH_ACTION_ITEM_SCHEMA is defined globally and omits any
coordinateMode, so batch endpoints (computer_batch, teach_step.actions,
teach_batch.steps[].actions) only accept raw (x,y) tuples and become ambiguous
for normalized_0_100 coordinates; move or extend the schema inside
buildComputerUseTools() (or augment it there) to include a coordinateMode field
(e.g., "coordinateMode" enum with values like "pixels" and "normalized_0_100")
and update BATCH_ACTION_ITEM_SCHEMA (or the schema returned by
buildComputerUseTools) to accept coordinateMode and interpret
coordinate/start_coordinate accordingly so batched clicks/drags are unambiguous.
Ensure the tools that produce/consume these actions (referenced by
buildComputerUseTools, computer_batch, teach_step.actions,
teach_batch.steps[].actions) validate and propagate coordinateMode with each
action.In
@packages/@ant/computer-use-swift/src/backends/darwin.ts:
- Around line 141-143: The findWindowDisplays implementation incorrectly returns
a hardcoded [1] for every bundleId; update the findWindowDisplays(bundleIds)
function to retrieve actual CG display IDs for each app's windows (e.g., via the
same system APIs used in this backend's listAll() or CGWindowList/CGDisplay
APIs), map each window to its CG display ID, de-duplicate the IDs, and return {
bundleId, displayIds: [...] } with those real display IDs so results line up
with listAll() and support multi-monitor setups.- Around line 145-153: The appUnderPoint(_x, _y) implementation ignores the
supplied coordinates and always returns the frontmost app; update the jxa
snippet in appUnderPoint to hit-test the screen point and resolve the app owning
the window under that point: call CoreGraphics window APIs (e.g.,
CGWindowListCreateWindowAtPoint / CGWindowListCreateDescriptionFromArray or
CGWindowListCopyWindowInfo filtered by point) to obtain the window's owner PID,
then use NSRunningApplication.runningApplicationWithProcessIdentifier or
NSWorkspace to get the bundleIdentifier/localizedName for that PID and
JSON.stringify that result instead of using
NSWorkspace.sharedWorkspace.frontmostApplication. Ensure the code uses the _x/_y
CGPoint (pt) you already construct.In
@packages/@ant/computer-use-swift/src/backends/linux.ts:
- Around line 244-277: The code currently reuses the global SCREENSHOT_PATH
which can return stale data or cause clobbering; change both captureExcluding
and captureRegion to create a unique per-call temp file (instead of using
SCREENSHOT_PATH) when invoking runAsync (e.g., include a UUID/timestamp in the
filename), read that temp file into base64 as before, and ensure you delete the
temp file in a finally block so it’s always cleaned up; update any references to
SCREENSHOT_PATH inside captureExcluding and captureRegion to use the new
per-call temp path and keep the existing error handling/return shapes.- Around line 96-98: The current stub findWindowDisplays in function
findWindowDisplays returns displayIds: [0] for every bundle and must be replaced
with real Linux logic: for each bundleId, enumerate windows (using X11/XCB or
Wayland APIs), match windows to the application (e.g., WM_CLASS/NET_WM_PID or
process name), get each window's geometry and map it to the monitor using
XRandR/Monitor output or Wayland output geometry, build WindowDisplayInfo
objects with the actual display IDs (or monitor indices) and return them; ensure
the async signature is preserved, handle missing windows by returning an empty
displayIds array, and keep error handling and type compatibility with
WindowDisplayInfo.In
@packages/@ant/computer-use-swift/src/backends/win32.ts:
- Around line 91-93: The current stub in findWindowDisplays always returns
displayIds: [0], which misreports windows on secondary monitors; replace it with
real Win32-based resolution: for each bundleId use EnumWindows to enumerate
top-level windows, match windows to processes via GetWindowThreadProcessId,
check the process executable/command to correlate with bundleId (or compare
process PID mapping you already maintain), then for each matched window call
MonitorFromWindow (or MonitorFromRect/MonitorFromPoint) and GetMonitorInfo to
obtain a monitor identifier/index; collect unique monitor IDs per bundleId and
return them as displayIds. Ensure findWindowDisplays handles multiple windows
per app (de-duplicate display IDs), ignores invisible/minimized windows, and
returns an empty array if no windows are found.In
@scripts/dev.ts:
- Line 18: Remove "CHICAGO_MCP" from the DEFAULT_FEATURES array so
DEFAULT_FEATURES only contains "BUDDY", "TRANSCRIPT_CLASSIFIER", "BRIDGE_MODE",
and "AGENT_TRIGGERS_REMOTE"; then add a runtime check for
process.env.FEATURE_CHICAGO_MCP === "1" and, if true, push "CHICAGO_MCP" onto
the features list (e.g. in the same initialization logic that uses
DEFAULT_FEATURES) so the flag is opt-in via FEATURE_CHICAGO_MCP=1.In
@src/utils/computerUse/executor.ts:
- Around line 300-302: createCliExecutor currently lets win32/linux flow through
the macOS bundle-ID path (getTerminalBundleId()/CLI_HOST_BUNDLE_ID), causing
host identification mismatches; update createCliExecutor to branch by platform
so that for 'darwin' it uses getTerminalBundleId()/CLI_HOST_BUNDLE_ID as before,
but for 'win32' and 'linux' it assigns the new non-bundle host identifier used
by the cross-platform input contract (and does not call getTerminalBundleId),
and ensure callers that rely on host identity (prepareDisplay, previewHideSet,
screenshot exclusion logic) check both macOS bundle IDs and the new
Windows/Linux host identifier so the terminal is recognized correctly on all
platforms.- Around line 71-79: The Windows clipboard write path currently embeds the
payload into the PowerShell '-Command' argv (exposing it in process lists);
instead, modify the write branch to call execFileNoThrow('powershell',
['-NoProfile', '-Command', 'Set-Clipboard -Value ([Console]::In.ReadToEnd())'],
{ useCwd: false, input: text }) (or equivalent) so the clipboard text is sent
via stdin rather than an argv template; keep the same exit code check and error
handling used in the read path (which calls execFileNoThrow with
'Get-Clipboard') and reuse the same options (useCwd: false) to maintain
consistent behavior.
Minor comments:
In@DEV-LOG.md:
- Around line 25-33: The fenced code block containing the file tree starting
with "packages/@ant/computer-use-{input,swift}/src/" should include a language
tag to satisfy markdownlint MD040; change the opening fence fromtotext
so it reads ```text and leave the block contents unchanged (the tree lines
including index.ts, types.ts and backends/*).In
@docs/features/computer-use.md:
- Around line 178-195: The fenced code block in the docs (the triple-backtick
block showing Phase 2/3/4) is unlabeled and triggers markdownlint; update the
opening fence to include a language tag (e.g., change the opening "" to "text") so the block is treated as plain text and linting passes, leaving the
block contents and closing "```" unchanged.
Nitpick comments:
In@src/utils/computerUse/gates.ts:
- Around line 39-55: Remove the dead helper hasRequiredSubscription() and inline
its result into getChicagoEnabled(): delete or stop calling
hasRequiredSubscription() and replace the final return to directly use the
true/entitlement expression (i.e., remove the hasRequiredSubscription() call and
rely on readConfig().enabled plus the existing ant/monorepo check); update
references to hasRequiredSubscription() in this file (if any) and keep
isEnvTruthy(process.env.ALLOW_ANT_COMPUTER_USE_MCP) and readConfig().enabled as
the gating conditions in getChicagoEnabled().</details> <details> <summary>🪄 Autofix (Beta)</summary> Fix all unresolved CodeRabbit comments on this PR: - [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended) - [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes </details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: defaults **Review profile**: CHILL **Plan**: Pro **Run ID**: `dd7c9e64-4c3f-46ae-b741-78e2cb2e651a` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 465e9f01c69ee7b2326aabe0b18ce7d8f1f2148e and e3264a16919675b13d3c118cd7e0e639d0436e39. </details> <details> <summary>📒 Files selected for processing (34)</summary> * `DEV-LOG.md` * `build.ts` * `docs/features/computer-use.md` * `packages/@ant/computer-use-input/src/backends/darwin.ts` * `packages/@ant/computer-use-input/src/backends/linux.ts` * `packages/@ant/computer-use-input/src/backends/win32.ts` * `packages/@ant/computer-use-input/src/index.ts` * `packages/@ant/computer-use-input/src/types.ts` * `packages/@ant/computer-use-mcp/src/deniedApps.ts` * `packages/@ant/computer-use-mcp/src/executor.ts` * `packages/@ant/computer-use-mcp/src/imageResize.ts` * `packages/@ant/computer-use-mcp/src/index.ts` * `packages/@ant/computer-use-mcp/src/keyBlocklist.ts` * `packages/@ant/computer-use-mcp/src/mcpServer.ts` * `packages/@ant/computer-use-mcp/src/pixelCompare.ts` * `packages/@ant/computer-use-mcp/src/sentinelApps.ts` * `packages/@ant/computer-use-mcp/src/subGates.ts` * `packages/@ant/computer-use-mcp/src/toolCalls.ts` * `packages/@ant/computer-use-mcp/src/tools.ts` * `packages/@ant/computer-use-mcp/src/types.ts` * `packages/@ant/computer-use-swift/src/backends/darwin.ts` * `packages/@ant/computer-use-swift/src/backends/linux.ts` * `packages/@ant/computer-use-swift/src/backends/win32.ts` * `packages/@ant/computer-use-swift/src/index.ts` * `packages/@ant/computer-use-swift/src/types.ts` * `scripts/dev.ts` * `src/main.tsx` * `src/utils/computerUse/common.ts` * `src/utils/computerUse/drainRunLoop.ts` * `src/utils/computerUse/escHotkey.ts` * `src/utils/computerUse/executor.ts` * `src/utils/computerUse/gates.ts` * `src/utils/computerUse/hostAdapter.ts` * `src/utils/computerUse/swiftLoader.ts` </details> <details> <summary>💤 Files with no reviewable changes (1)</summary> * src/utils/computerUse/swiftLoader.ts </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
| server.setRequestHandler(ListToolsRequestSchema, async () => | ||
| adapter.isDisabled() ? { tools: [] } : { tools }, | ||
| ); | ||
|
|
||
| if (context) { | ||
| const dispatch = bindSessionContext(adapter, coordinateMode, context); | ||
| server.setRequestHandler( | ||
| CallToolRequestSchema, | ||
| async (request): Promise<CallToolResult> => { | ||
| const { screenshot: _s, telemetry: _t, ...result } = await dispatch( | ||
| request.params.name, | ||
| request.params.arguments ?? {}, | ||
| ); | ||
| return result; | ||
| }, | ||
| ); |
There was a problem hiding this comment.
Disabled servers still accept direct CallTool requests.
ListTools hides the surface when adapter.isDisabled() is true, but this handler still dispatches the call. Any MCP client that already knows the tool names can bypass the disable/subscription gate.
🛡️ Suggested fix
server.setRequestHandler(
CallToolRequestSchema,
async (request): Promise<CallToolResult> => {
+ if (adapter.isDisabled()) {
+ return {
+ content: [{ type: "text", text: "Computer Use is currently disabled." }],
+ isError: true,
+ }
+ }
const { screenshot: _s, telemetry: _t, ...result } = await dispatch(
request.params.name,
request.params.arguments ?? {},
);
return result;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| server.setRequestHandler(ListToolsRequestSchema, async () => | |
| adapter.isDisabled() ? { tools: [] } : { tools }, | |
| ); | |
| if (context) { | |
| const dispatch = bindSessionContext(adapter, coordinateMode, context); | |
| server.setRequestHandler( | |
| CallToolRequestSchema, | |
| async (request): Promise<CallToolResult> => { | |
| const { screenshot: _s, telemetry: _t, ...result } = await dispatch( | |
| request.params.name, | |
| request.params.arguments ?? {}, | |
| ); | |
| return result; | |
| }, | |
| ); | |
| server.setRequestHandler(ListToolsRequestSchema, async () => | |
| adapter.isDisabled() ? { tools: [] } : { tools }, | |
| ); | |
| if (context) { | |
| const dispatch = bindSessionContext(adapter, coordinateMode, context); | |
| server.setRequestHandler( | |
| CallToolRequestSchema, | |
| async (request): Promise<CallToolResult> => { | |
| if (adapter.isDisabled()) { | |
| return { | |
| content: [{ type: "text", text: "Computer Use is currently disabled." }], | |
| isError: true, | |
| } | |
| } | |
| const { screenshot: _s, telemetry: _t, ...result } = await dispatch( | |
| request.params.name, | |
| request.params.arguments ?? {}, | |
| ); | |
| return result; | |
| }, | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/`@ant/computer-use-mcp/src/mcpServer.ts around lines 272 - 287, The
CallTool handler currently calls dispatch regardless of adapter state, so
disabled adapters still execute tools; update the handler registered via
server.setRequestHandler(CallToolRequestSchema, ...) to first check
adapter.isDisabled() (or the same gating used for ListTools) and short-circuit
with an appropriate failure/empty response (e.g., a blocked error or empty
result) when disabled; ensure this check is applied before calling
bindSessionContext/dispatch so bindSessionContext and dispatch are never invoked
for disabled adapters.
| if (!rect) return false; | ||
|
|
||
| const patch1 = crop(lastScreenshot.base64, rect); | ||
| const patch2 = crop(freshScreenshot.base64, rect); | ||
| if (!patch1 || !patch2) return false; |
There was a problem hiding this comment.
Internal crop failures currently block clicks instead of skipping validation.
comparePixelAtLocation() returns false when the rect or cropped patch is unavailable, and validateClickTarget() treats that as a real mismatch. That violates the documented “skip on internal error” contract and can abort valid clicks whenever crop/decode fails.
Also applies to: 147-164
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/`@ant/computer-use-mcp/src/pixelCompare.ts around lines 102 - 106,
comparePixelAtLocation currently returns false when the rect or cropped patches
are missing (e.g., from crop failures), which makes validateClickTarget treat
internal errors as real mismatches; change comparePixelAtLocation to return a
distinct "skip" sentinel (null) instead of false when rect/crop/patches are
unavailable (see the checks around crop(lastScreenshot.base64, rect) and
crop(freshScreenshot.base64, rect)), and update the other identical block later
(lines ~147-164) to do the same so callers like validateClickTarget can detect
null and skip validation rather than aborting the click.
| async listInstalled() { | ||
| try { | ||
| const result = await osascript(` | ||
| tell application "System Events" | ||
| set appList to "" | ||
| repeat with appFile in (every file of folder "Applications" of startup disk whose name ends with ".app") | ||
| set appPath to POSIX path of (appFile as alias) | ||
| set appName to name of appFile | ||
| set appList to appList & appPath & "|" & appName & "\\n" | ||
| end repeat | ||
| return appList | ||
| end tell | ||
| `) | ||
| return result.split('\n').filter(Boolean).map(line => { | ||
| const [path, name] = line.split('|', 2) | ||
| const displayName = (name ?? '').replace(/\.app$/, '') | ||
| return { | ||
| bundleId: `com.app.${displayName.toLowerCase().replace(/\s+/g, '-')}`, | ||
| displayName, | ||
| path: path ?? '', | ||
| } | ||
| }) |
There was a problem hiding this comment.
Return real bundle identifiers from listInstalled().
This fabricates com.app.<name> values instead of reading each app's actual bundle identifier, while open() and listRunning() operate on the real identifier. Grants created from this list won't match running apps, and scanning only the top-level /Applications folder also misses many built-in apps.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/`@ant/computer-use-swift/src/backends/darwin.ts around lines 160 -
181, The listInstalled() implementation is fabricating bundleIds and only scans
top-level Applications; update it to enumerate real app bundles (include
/Applications, /System/Applications and subfolders) and for each app use a
system query (e.g., mdls or reading Contents/Info.plist) via the existing
osascript call or a child-process call to extract CFBundleIdentifier, then
return that identifier in the returned objects; ensure the mapping in
listInstalled() returns bundleId, displayName and path using the real
CFBundleIdentifier so it matches open() and listRunning().
| async listInstalled(): Promise<InstalledApp[]> { | ||
| try { | ||
| // Read .desktop files from standard locations | ||
| const dirs = ['/usr/share/applications', '/usr/local/share/applications', `${process.env.HOME}/.local/share/applications`] | ||
| const apps: InstalledApp[] = [] | ||
|
|
||
| for (const dir of dirs) { | ||
| let files: string | ||
| try { | ||
| files = run(['find', dir, '-name', '*.desktop', '-maxdepth', '1']) | ||
| } catch { continue } | ||
|
|
||
| for (const filepath of files.split('\n').filter(Boolean)) { | ||
| try { | ||
| const content = run(['cat', filepath]) | ||
| const nameMatch = content.match(/^Name=(.+)$/m) | ||
| const execMatch = content.match(/^Exec=(.+)$/m) | ||
| const noDisplay = content.match(/^NoDisplay=true$/m) | ||
| if (noDisplay) continue | ||
|
|
||
| const name = nameMatch?.[1] ?? '' | ||
| const exec = execMatch?.[1] ?? '' | ||
| if (!name) continue | ||
|
|
||
| apps.push({ | ||
| bundleId: filepath.split('/').pop()?.replace('.desktop', '') ?? '', | ||
| displayName: name, | ||
| path: exec.split(/\s+/)[0] ?? '', | ||
| }) | ||
| } catch { /* skip unreadable files */ } | ||
| } | ||
| } | ||
|
|
||
| return apps.slice(0, 200) |
There was a problem hiding this comment.
Keep Linux app IDs consistent across discovery APIs.
listInstalled() emits the .desktop basename as bundleId, but listRunning() and appUnderPoint() emit an executable path or PID fallback. Grants created from the installed-app list won't match the running app, so frontmost-app gating and reopen flows will misfire.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/`@ant/computer-use-swift/src/backends/linux.ts around lines 124 -
157, listInstalled() currently sets bundleId to the .desktop filename, which is
inconsistent with listRunning() and appUnderPoint() that use the executable path
or PID fallback; change listInstalled() (in the InstalledApp creation) to derive
bundleId from the Exec entry (the same value you set into path:
exec.split(/\s+/)[0]) so bundleId matches running-app IDs, and only fall back to
the .desktop basename if Exec is missing or empty; keep displayName from Name
and preserve the existing NoDisplay filter.
| async listInstalled() { | ||
| try { | ||
| const raw = await psAsync(` | ||
| $apps = @() | ||
| $paths = @( | ||
| 'HKLM:\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\*', | ||
| 'HKLM:\\SOFTWARE\\WOW6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\*', | ||
| 'HKCU:\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\*' | ||
| ) | ||
| foreach ($p in $paths) { | ||
| Get-ItemProperty $p -ErrorAction SilentlyContinue | Where-Object { $_.DisplayName } | ForEach-Object { | ||
| $apps += "$($_.DisplayName)|$($_.InstallLocation)|$($_.PSChildName)" | ||
| } | ||
| } | ||
| $apps | Select-Object -Unique | Select-Object -First 200 | ||
| `) | ||
| return raw.split('\n').filter(Boolean).map(line => { | ||
| const [name, path, id] = line.split('|', 3) | ||
| return { | ||
| bundleId: id ?? name ?? '', | ||
| displayName: name ?? '', | ||
| path: path ?? '', | ||
| } | ||
| }) |
There was a problem hiding this comment.
Use the same app identifier in listInstalled() as the runtime APIs.
This stores PSChildName as bundleId, while listRunning() and appUnderPoint() identify the same app by executable path. Grants created from the installed-app list will never match the running/frontmost app, and open() will be handed an identifier it can't launch.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/`@ant/computer-use-swift/src/backends/win32.ts around lines 123 -
146, listInstalled() currently sets bundleId to PSChildName which doesn't match
how listRunning(), appUnderPoint(), and open() identify apps (by executable
path); update listInstalled() to use the install executable path as the bundleId
(i.e., set bundleId to the parsed InstallLocation/executable path instead of
PSChildName), ensure the mapped object uses path as the canonical identifier
(falling back to name if path is missing), and keep displayName populated from
DisplayName so grants and open() will match running/frontmost apps across
listRunning(), appUnderPoint(), and open().
New Windows-native capabilities: - windowCapture.ts: PrintWindow API for per-window screenshot (works on occluded/background windows) - windowEnum.ts: EnumWindows for precise window enumeration with HWND - uiAutomation.ts: IUIAutomation for UI tree reading, element clicking, text input, and coordinate-based element identification - ocr.ts: Windows.Media.Ocr for screen text recognition (en-US + zh-CN) Updated win32.ts backend to use EnumWindows for listRunning() and added captureWindowTarget() for window-specific screenshots. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two root causes fixed:
1. swiftLoader.ts: require('@ant/computer-use-swift') returns a module
with { ComputerUseAPI } class, not an instance. macOS native .node
exports a plain object. Fixed by detecting class export and calling
new ComputerUseAPI().
2. executor.ts resolvePrepareCapture: toolCalls.ts expects result to have
{ hidden: string[], displayId: number } fields. Our ComputerUseAPI
returns { base64, width, height } only. Fixed by backfilling missing
fields with defaults.
Verified: request_access → screenshot → left_click all work on Windows.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
三平台 Computer Use 支持。参考项目仅 macOS,本次扩展为 macOS + Windows + Linux。
Phase 1: 替换
@ant/computer-use-mcpstub(12 文件,6517 行)Phase 2: 移除 src/ 中 8 处 macOS 硬编码(剪贴板分发、粘贴快捷键、CFRunLoop、ESC 热键、权限检查、平台标识)
Phase 3: 新增 Linux 后端(xdotool/scrot/xrandr/wmctrl)
Architecture
Windows verification (x64)
isSupported: truebun run build463 filesTest plan
bun run dev启动无报错🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Refactor