English | 中文
Browser automation built for AI agents.
Give your LLM a real Chrome browser — with your sessions, cookies, and extensions — through simple CLI commands.
pip install browser-ctl
bctl go https://github.com
bctl snapshot # List interactive elements → e0, e1, e2, …
bctl click e3 # Click by ref — no CSS selector needed
bctl type e5 "browser-ctl" # Type into element by ref
bctl press Enter
bctl screenshot results.pngTools like browser-use, Playwright MCP, and Puppeteer are powerful, but they share a set of pain points when used with AI agents:
| Pain point | Typical tools | browser-ctl |
|---|---|---|
| Heavy browser binaries — must download and manage a bundled Chromium (~400 MB) | Playwright, Puppeteer | Uses your existing Chrome — zero browser downloads |
| No access to real sessions — launches a fresh, empty browser with no cookies, logins, or extensions | browser-use, Playwright MCP | Controls your real Chrome — all sessions, cookies, and extensions intact |
| Anti-bot detection — headless browsers are flagged and blocked by many websites | Puppeteer, Playwright | Uses your real browser profile — indistinguishable from normal browsing |
| Complex SDK integration — requires importing libraries and writing async code | browser-use, Stagehand | Pure CLI with JSON output — any LLM can call bctl click "button" |
| Heavy dependencies — Playwright alone pulls ~50 MB of packages + browser binary | Playwright, Puppeteer | CLI is stdlib-only; server needs only aiohttp |
| Token-inefficient for LLMs — verbose API calls waste context window tokens | SDK-based tools | Concise commands: bctl text h1 vs pages of boilerplate |
| Broken clicks on SPAs — programmatic clicks get blocked by popup blockers | Puppeteer, Playwright | Intercepts window.open() and navigates via chrome.tabs — SPA-compatible |
browser-ctl is purpose-built for AI agent workflows:
- Snapshot-first workflow —
bctl snapshotlists interactive elements ase0,e1, … then operate by ref (bctl click e3) — no CSS selector guessing - Tool-calling ready — every command is a single shell call returning structured JSON, perfect for function-calling / tool-use patterns
- Built-in AI skill — ships with
SKILL.mdthat teaches AI agents (Cursor, OpenCode, etc.) the full command set and best practices - Real browser = real access — your LLM can operate on authenticated pages (Gmail, Jira, internal tools) without credential management
- Deterministic output — JSON responses with element refs or CSS selectors, no vision model needed for most tasks
- Minimal token cost —
bctl snapshot+bctl click e5vs multi-step screenshot → vision → parse loops
# Install the AI skill for Cursor IDE in one command
bctl setup cursorAI Agent / Terminal ──HTTP──▶ Bridge Server ◀──WebSocket── Chrome Extension
(bctl CLI) (:19876) (your browser)
- CLI (
bctl) sends commands via HTTP to a local bridge server - Bridge server relays them over WebSocket to the Chrome extension
- Extension executes commands using Chrome APIs & content scripts in your real browser
- Results flow back the same path as JSON
The bridge server auto-starts on first command — no manual setup needed.
Step 1 — Install the Python package:
pip install browser-ctlStep 2 — Load the Chrome extension:
bctl setupThen in Chrome: chrome://extensions → Enable Developer mode → Load unpacked → select ~/.browser-ctl/extension/
Step 3 — Verify:
bctl ensure-ready
# {"success": true, "data": {"server": true, "extension": true}}| Command | Description |
|---|---|
bctl navigate <url> |
Navigate to URL (aliases: nav, go; auto-prepends https://) |
bctl back |
Go back in history |
bctl forward |
Go forward (alias: fwd) |
bctl reload |
Reload current page |
All <sel> arguments accept CSS selectors or element refs from snapshot (e.g. e5).
| Command | Description |
|---|---|
bctl click <sel> [-i N] [-t text] |
Click element; -t filters by visible text (substring) |
bctl dblclick <sel> [-i N] [-t text] |
Double-click element |
bctl hover <sel> [-i N] [-t text] |
Hover over element; -t filters by visible text |
bctl focus <sel> [-i N] [-t text] |
Focus element |
bctl type <sel> <text> |
Type text into input/textarea (React-compatible, replaces value) |
bctl input-text <sel> <text> |
Char-by-char typing for rich text editors [--clear] [--delay ms] |
bctl press <key> |
Press key — Enter submits forms, Escape closes dialogs |
bctl check <sel> [-i N] [-t text] |
Check a checkbox or radio button |
bctl uncheck <sel> [-i N] [-t text] |
Uncheck a checkbox |
bctl scroll <dir|sel> [px] |
Scroll: up / down / top / bottom or element into view |
bctl select-option <sel> <val> |
Select dropdown option (alias: sopt) [--text] |
bctl drag <src> [target] |
Drag to element or offset [--dx N --dy N] |
| Command | Description |
|---|---|
bctl snapshot [--all] |
List interactive elements with refs e0, e1, … (alias: snap) |
bctl text [sel] |
Get text content (default: body) |
bctl html [sel] |
Get innerHTML |
bctl attr <sel> [name] [-i N] |
Get attribute(s) of element |
bctl select <sel> [-l N] |
List matching elements (alias: sel) |
bctl count <sel> |
Count matching elements |
bctl status |
Current page URL and title |
bctl is-visible <sel> [-i N] |
Check if element is visible (returns bounding rect) |
bctl get-value <sel> [-i N] |
Get value of form element (input / select / textarea) |
| Command | Description |
|---|---|
bctl eval <code> |
Execute JS in page context (auto-bypasses CSP) |
| Command | Description |
|---|---|
bctl tabs |
List all tabs |
bctl tab <id> |
Switch to tab by ID |
bctl new-tab [url] |
Open new tab |
bctl close-tab [id] |
Close tab (default: active) |
| Command | Description |
|---|---|
bctl screenshot [path] |
Capture screenshot (alias: ss) |
bctl download <target> [-o path] [-i N] |
Download file/image (alias: dl; -o supports absolute paths) |
bctl upload <sel> <files...> |
Upload file(s) to <input type="file"> |
| Command | Description |
|---|---|
bctl wait <sel|seconds> [timeout] |
Wait for element or sleep |
bctl dialog [accept|dismiss] [--text val] |
Handle next alert / confirm / prompt |
| Command | Description |
|---|---|
bctl pipe |
Read commands from stdin, one per line (JSONL output). Consecutive DOM ops are auto-batched into a single browser call |
bctl batch '<cmd1>' '<cmd2>' ... |
Execute multiple commands in one call with smart batching |
| Command | Description |
|---|---|
bctl ensure-ready |
Ensure server + extension are ready (auto-starts server, auto-launches Chrome if needed) |
bctl ping |
Check server & extension status |
bctl capabilities |
Show actions supported by the connected extension |
bctl self-test |
Run generic end-to-end smoke tests for core skill actions |
bctl serve |
Start server in foreground |
bctl stop |
Stop server |
bctl setup |
Install extension to ~/.browser-ctl/extension/ + open Chrome extensions page |
bctl setup cursor |
Install AI skill (SKILL.md) into Cursor IDE |
bctl setup opencode |
Install AI skill into OpenCode |
bctl setup <path> |
Install AI skill to a custom directory |
Snapshot workflow (recommended for AI agents)
bctl go "https://example.com"
bctl snapshot # List all interactive elements as e0, e1, …
bctl click e3 # Click by ref — no CSS selector needed
bctl type e5 "hello world" # Type into element by ref
bctl get-value e5 # Read form value
bctl is-visible e3 # Check visibilitySearch and extract
bctl go "https://news.ycombinator.com"
bctl select "a.titlelink" -l 5 # Top 5 links with text, href, etc.Click by visible text (SPA-friendly)
bctl click "button" -t "Sign in" # Click button containing "Sign in"
bctl click "a" -t "Settings" # Click link containing "Settings"
bctl click "div[role=button]" -t "Save" # Works with any element + text filterFill a form
bctl type "input[name=email]" "user@example.com"
bctl type "input[name=password]" "hunter2"
bctl select-option "select#country" "US"
bctl upload "input[type=file]" ./resume.pdf
bctl click "button[type=submit]"Scroll and screenshot
bctl go "https://en.wikipedia.org/wiki/Web_browser"
bctl scroll down 1000
bctl ss page.pngHandle dialogs
bctl dialog accept # Set up handler BEFORE triggering
bctl click "#delete-button" # This triggers a confirm() dialogDrag and drop
bctl drag ".task-card" ".done-column"
bctl drag ".range-slider" --dx 50 --dy 0Batch / Pipe (fast multi-step)
# Pipe mode: multiple commands in one call, auto-batched
bctl pipe <<'EOF'
click "button" -t "Select tag"
wait 1
type "input[placeholder='Search']" "v1.0.0"
wait 1
click "button" -t "Create new tag"
EOF
# Batch mode: same thing as arguments
bctl batch \
'click "button" -t "Sign in"' \
'wait 1' \
'type "#email" "user@example.com"' \
'type "#password" "secret"' \
'click "button[type=submit]"'Shell scripting
# Extract all image URLs from a page
bctl go "https://example.com"
bctl eval "JSON.stringify(Array.from(document.images).map(i=>i.src))"
# Wait for SPA content to load
bctl go "https://app.example.com/dashboard"
bctl wait ".dashboard-loaded" 15
bctl text ".metric-value"All commands return JSON to stdout:
Non-zero exit code on errors — works naturally with set -e and && chains.
┌─────────────────────────────────────────────────────┐
│ AI Agent / Terminal │
│ $ bctl click "button.submit" │
│ │ │
│ ▼ HTTP POST localhost:19876/command │
│ ┌──────────────────────┐ │
│ │ Bridge Server │ (Python, aiohttp) │
│ │ :19876 │ │
│ └──────────┬───────────┘ │
│ │ WebSocket │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ Chrome Extension │ (Manifest V3) │
│ │ Service Worker │ │
│ └──────────┬───────────┘ │
│ │ chrome.scripting / chrome.debugger │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ Your Real Browser │ (sessions, cookies, etc) │
│ └──────────────────────┘ │
└─────────────────────────────────────────────────────┘
| Component | Details |
|---|---|
| CLI | Stdlib only, raw-socket HTTP (zero heavy imports, ~5ms cold start) |
| Bridge Server | Async relay (aiohttp), auto-daemonizes |
| Extension | MV3 service worker, auto-reconnects via chrome.alarms |
| Click | Three-phase: pointer events → MAIN-world click → window.open() interception for SPA compatibility |
| Eval | Dual strategy: MAIN-world injection (fast) + CDP fallback (CSP-safe) |
- Python >= 3.11
- Chrome / Chromium with the extension loaded
- macOS, Linux, or Windows
All communication is local (127.0.0.1). No analytics, no telemetry, no external servers. See PRIVACY.md.