Skip to content

feat: enable Computer Use with Windows support#97

Closed
amDosion wants to merge 1 commit intoclaude-code-best:mainfrom
amDosion:feat/computer-use-windows
Closed

feat: enable Computer Use with Windows support#97
amDosion wants to merge 1 commit intoclaude-code-best:mainfrom
amDosion:feat/computer-use-windows

Conversation

@amDosion
Copy link
Copy Markdown
Contributor

@amDosion amDosion commented Apr 3, 2026

Summary

  • 替换 @ant/computer-use-mcp stub 为完整实现(12 文件,6517 行)
  • 重构 @ant/computer-use-input@ant/computer-use-swift 为 dispatcher + backends/ 架构
  • 新增 Windows PowerShell 后端(参考项目仅 macOS)
  • CHICAGO_MCP 加入默认编译开关

Architecture

packages/@ant/computer-use-{input,swift}/src/
├── index.ts              ← dispatcher(按 platform 选后端)
├── types.ts              ← 共享接口
└── backends/
    ├── darwin.ts          ← macOS AppleScript(原有逻辑,原样拆出)
    └── win32.ts           ← Windows PowerShell(新增)

Key design

  • dispatcher fallback 不硬编码尺寸,无后端时直接 throw
  • Windows open() 直接启动 exe 路径(Start-Process
  • 后续加 Linux 只需新增 backends/linux.ts,不动其他文件

Test plan

  • Windows: bun run dev 启动无报错
  • Windows: 鼠标移动、截图、前台窗口信息正常
  • macOS: backends/darwin.ts 逻辑未改,现有功能不受影响

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Restored and expanded "Computer Use" functionality with full Windows support (previously macOS-only).
    • Added comprehensive app permission system with tiered access levels (read, click, full control).
    • Introduced display management and multi-display support for screenshots and automation.
    • Added pixel-level validation for click targets to prevent outdated screen interactions.
    • Integrated keyboard shortcut blocking to prevent unintended system commands.

Phase 1: Replace @ant/computer-use-mcp stub with full implementation
(12 files, 6517 lines from reference project).

Phase 2-3: Refactor @ant/computer-use-input and @ant/computer-use-swift
from single-file to dispatcher + backends/ architecture:
- backends/darwin.ts — existing macOS AppleScript (unchanged logic)
- backends/win32.ts — new Windows PowerShell (SetCursorPos, SendInput,
  CopyFromScreen, GetForegroundWindow)

Add CHICAGO_MCP to default build features.

Verified on Windows x64: mouse control, dual-monitor detection,
full-screen screenshot, foreground app info, running process list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This pull request restores and expands "Computer Use" cross-platform functionality by refactoring three packages into backend-dispatched architectures supporting both macOS and Windows, replacing an MCP stub with a full server implementation featuring permission management, app categorization, key blocking, and screenshot validation.

Changes

Cohort / File(s) Summary
Documentation & Configuration
DEV-LOG.md, docs/features/computer-use.md, build.ts, scripts/dev.ts
Adds feature documentation detailing three-phase restoration of Computer Use with macOS/Windows support, enables CHICAGO_MCP feature flag in build and dev configurations.
Computer Use Input Refactoring
packages/@ant/computer-use-input/src/types.ts, backends/darwin.ts, backends/win32.ts, index.ts
Converts input package from macOS-only implementation to platform-dispatched architecture with shared InputBackend interface, extracting existing AppleScript logic to darwin.ts and implementing new Windows PowerShell backend for mouse/keyboard control.
Computer Use Swift Refactoring
packages/@ant/computer-use-swift/src/types.ts, backends/darwin.ts, backends/win32.ts, index.ts
Converts screenshot/display package from macOS-only screencapture wrapper to platform-dispatched architecture with shared SwiftBackend/DisplayAPI/AppsAPI/ScreenshotAPI interfaces, adding Windows PowerShell backend for display enumeration and screen capture.
Computer Use MCP Full Implementation
packages/@ant/computer-use-mcp/src/types.ts, executor.ts, deniedApps.ts, sentinelApps.ts, keyBlocklist.ts, subGates.ts, imageResize.ts, mcpServer.ts, tools.ts, pixelCompare.ts, index.ts
Replaces stub MCP implementation with comprehensive server featuring permission request/response workflows, app categorization/denial logic, system key combo blocking, image resizing with token constraints, pixel-level click validation, MCP tool schema generation, and session-aware dispatch orchestration.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/Host
    participant Session as ComputerUseSessionContext
    participant Dispatcher as MCP Dispatcher<br/>(bindSessionContext)
    participant Tool as Tool Handler<br/>(handleToolCall)
    participant Platform as Platform Backend<br/>(darwin/win32)
    participant UI as Screenshot/Display

    User->>Session: Initialize session context
    Session->>Dispatcher: Create call dispatcher with context
    
    User->>Dispatcher: Call tool (e.g., click @ x,y)
    Dispatcher->>Dispatcher: Check/acquire execution lock
    Dispatcher->>Session: Fetch allowed apps & grant flags
    Dispatcher->>Tool: Dispatch to handleToolCall
    
    Tool->>Tool: Validate click target pixels<br/>(comparePixelAtLocation)
    Tool->>Platform: Execute input action
    Platform->>Platform: Platform-specific impl<br/>(AppleScript/PowerShell)
    Platform-->>Tool: Action result
    
    Tool->>Tool: Capture screenshot<br/>(if needed)
    Tool->>UI: Call screenshot backend
    UI->>Platform: Capture screen (darwin/win32)
    Platform-->>UI: Return screenshot base64
    UI-->>Tool: Screenshot result
    
    Tool->>Tool: Persist screenshot in closure
    Tool->>Session: Forward screenshot dimensions
    Tool-->>Dispatcher: Return tool result<br/>(stripped of telemetry)
    
    Dispatcher->>Dispatcher: Abort any in-flight dialog
    Dispatcher-->>User: MCP response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 Hops through Windows and macOS trails,
Input and screenshots now never fail,
With backends that know each platform's way,
Computer Use returns—hooray, hooray!
Permission gates guard, pixels compare,
The MCP server awaits with utmost care.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the primary change: enabling Computer Use functionality with Windows support, which aligns with the major refactoring of computer-use packages and addition of Windows backends.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@amDosion
Copy link
Copy Markdown
Contributor Author

amDosion commented Apr 3, 2026

功能尚未完整:需要修正平台限制、GrowthBook 默认值、以及新增 /computer-use 命令

@amDosion amDosion closed this Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant