Skip to content

mikuh/browser-ctl

Repository files navigation

English | 中文

browser-ctl

Browser automation built for AI agents.
Give your LLM a real Chrome browser — with your sessions, cookies, and extensions — through simple CLI commands.

PyPI Python License


pip install browser-ctl

bctl go https://github.com
bctl snapshot                        # List interactive elements → e0, e1, e2, …
bctl click e3                        # Click by ref — no CSS selector needed
bctl type e5 "browser-ctl"          # Type into element by ref
bctl press Enter
bctl screenshot results.png

The Problem with Existing Browser Automation

Tools like browser-use, Playwright MCP, and Puppeteer are powerful, but they share a set of pain points when used with AI agents:

Pain point Typical tools browser-ctl
Heavy browser binaries — must download and manage a bundled Chromium (~400 MB) Playwright, Puppeteer Uses your existing Chrome — zero browser downloads
No access to real sessions — launches a fresh, empty browser with no cookies, logins, or extensions browser-use, Playwright MCP Controls your real Chrome — all sessions, cookies, and extensions intact
Anti-bot detection — headless browsers are flagged and blocked by many websites Puppeteer, Playwright Uses your real browser profile — indistinguishable from normal browsing
Complex SDK integration — requires importing libraries and writing async code browser-use, Stagehand Pure CLI with JSON output — any LLM can call bctl click "button"
Heavy dependencies — Playwright alone pulls ~50 MB of packages + browser binary Playwright, Puppeteer CLI is stdlib-only; server needs only aiohttp
Token-inefficient for LLMs — verbose API calls waste context window tokens SDK-based tools Concise commands: bctl text h1 vs pages of boilerplate
Broken clicks on SPAs — programmatic clicks get blocked by popup blockers Puppeteer, Playwright Intercepts window.open() and navigates via chrome.tabs — SPA-compatible

Designed for LLM Agents

browser-ctl is purpose-built for AI agent workflows:

  • Snapshot-first workflowbctl snapshot lists interactive elements as e0, e1, … then operate by ref (bctl click e3) — no CSS selector guessing
  • Tool-calling ready — every command is a single shell call returning structured JSON, perfect for function-calling / tool-use patterns
  • Built-in AI skill — ships with SKILL.md that teaches AI agents (Cursor, OpenCode, etc.) the full command set and best practices
  • Real browser = real access — your LLM can operate on authenticated pages (Gmail, Jira, internal tools) without credential management
  • Deterministic output — JSON responses with element refs or CSS selectors, no vision model needed for most tasks
  • Minimal token costbctl snapshot + bctl click e5 vs multi-step screenshot → vision → parse loops
# Install the AI skill for Cursor IDE in one command
bctl setup cursor

How It Works

AI Agent / Terminal  ──HTTP──▶  Bridge Server  ◀──WebSocket──  Chrome Extension
     (bctl CLI)                  (:19876)                      (your browser)
  1. CLI (bctl) sends commands via HTTP to a local bridge server
  2. Bridge server relays them over WebSocket to the Chrome extension
  3. Extension executes commands using Chrome APIs & content scripts in your real browser
  4. Results flow back the same path as JSON

The bridge server auto-starts on first command — no manual setup needed.


Installation

Step 1 — Install the Python package:

pip install browser-ctl

Step 2 — Load the Chrome extension:

bctl setup

Then in Chrome: chrome://extensions → Enable Developer modeLoad unpacked → select ~/.browser-ctl/extension/

Step 3 — Verify:

bctl ensure-ready
# {"success": true, "data": {"server": true, "extension": true}}

Command Reference

Navigation

Command Description
bctl navigate <url> Navigate to URL   (aliases: nav, go; auto-prepends https://)
bctl back Go back in history
bctl forward Go forward   (alias: fwd)
bctl reload Reload current page

Interaction

All <sel> arguments accept CSS selectors or element refs from snapshot (e.g. e5).

Command Description
bctl click <sel> [-i N] [-t text] Click element; -t filters by visible text (substring)
bctl dblclick <sel> [-i N] [-t text] Double-click element
bctl hover <sel> [-i N] [-t text] Hover over element; -t filters by visible text
bctl focus <sel> [-i N] [-t text] Focus element
bctl type <sel> <text> Type text into input/textarea (React-compatible, replaces value)
bctl input-text <sel> <text> Char-by-char typing for rich text editors [--clear] [--delay ms]
bctl press <key> Press key — Enter submits forms, Escape closes dialogs
bctl check <sel> [-i N] [-t text] Check a checkbox or radio button
bctl uncheck <sel> [-i N] [-t text] Uncheck a checkbox
bctl scroll <dir|sel> [px] Scroll: up / down / top / bottom or element into view
bctl select-option <sel> <val> Select dropdown option   (alias: sopt) [--text]
bctl drag <src> [target] Drag to element or offset [--dx N --dy N]

DOM Query

Command Description
bctl snapshot [--all] List interactive elements with refs e0, e1, …   (alias: snap)
bctl text [sel] Get text content (default: body)
bctl html [sel] Get innerHTML
bctl attr <sel> [name] [-i N] Get attribute(s) of element
bctl select <sel> [-l N] List matching elements   (alias: sel)
bctl count <sel> Count matching elements
bctl status Current page URL and title
bctl is-visible <sel> [-i N] Check if element is visible (returns bounding rect)
bctl get-value <sel> [-i N] Get value of form element (input / select / textarea)

JavaScript

Command Description
bctl eval <code> Execute JS in page context (auto-bypasses CSP)

Tabs

Command Description
bctl tabs List all tabs
bctl tab <id> Switch to tab by ID
bctl new-tab [url] Open new tab
bctl close-tab [id] Close tab (default: active)

Screenshot & Files

Command Description
bctl screenshot [path] Capture screenshot   (alias: ss)
bctl download <target> [-o path] [-i N] Download file/image   (alias: dl; -o supports absolute paths)
bctl upload <sel> <files...> Upload file(s) to <input type="file">

Wait & Dialog

Command Description
bctl wait <sel|seconds> [timeout] Wait for element or sleep
bctl dialog [accept|dismiss] [--text val] Handle next alert / confirm / prompt

Batch / Pipe

Command Description
bctl pipe Read commands from stdin, one per line (JSONL output). Consecutive DOM ops are auto-batched into a single browser call
bctl batch '<cmd1>' '<cmd2>' ... Execute multiple commands in one call with smart batching

Server

Command Description
bctl ensure-ready Ensure server + extension are ready (auto-starts server, auto-launches Chrome if needed)
bctl ping Check server & extension status
bctl capabilities Show actions supported by the connected extension
bctl self-test Run generic end-to-end smoke tests for core skill actions
bctl serve Start server in foreground
bctl stop Stop server
bctl setup Install extension to ~/.browser-ctl/extension/ + open Chrome extensions page
bctl setup cursor Install AI skill (SKILL.md) into Cursor IDE
bctl setup opencode Install AI skill into OpenCode
bctl setup <path> Install AI skill to a custom directory

Examples

Snapshot workflow (recommended for AI agents)
bctl go "https://example.com"
bctl snapshot                          # List all interactive elements as e0, e1, …
bctl click e3                          # Click by ref — no CSS selector needed
bctl type e5 "hello world"            # Type into element by ref
bctl get-value e5                      # Read form value
bctl is-visible e3                     # Check visibility
Search and extract
bctl go "https://news.ycombinator.com"
bctl select "a.titlelink" -l 5       # Top 5 links with text, href, etc.
Click by visible text (SPA-friendly)
bctl click "button" -t "Sign in"        # Click button containing "Sign in"
bctl click "a" -t "Settings"            # Click link containing "Settings"
bctl click "div[role=button]" -t "Save" # Works with any element + text filter
Fill a form
bctl type "input[name=email]" "user@example.com"
bctl type "input[name=password]" "hunter2"
bctl select-option "select#country" "US"
bctl upload "input[type=file]" ./resume.pdf
bctl click "button[type=submit]"
Scroll and screenshot
bctl go "https://en.wikipedia.org/wiki/Web_browser"
bctl scroll down 1000
bctl ss page.png
Handle dialogs
bctl dialog accept              # Set up handler BEFORE triggering
bctl click "#delete-button"     # This triggers a confirm() dialog
Drag and drop
bctl drag ".task-card" ".done-column"
bctl drag ".range-slider" --dx 50 --dy 0
Batch / Pipe (fast multi-step)
# Pipe mode: multiple commands in one call, auto-batched
bctl pipe <<'EOF'
click "button" -t "Select tag"
wait 1
type "input[placeholder='Search']" "v1.0.0"
wait 1
click "button" -t "Create new tag"
EOF

# Batch mode: same thing as arguments
bctl batch \
  'click "button" -t "Sign in"' \
  'wait 1' \
  'type "#email" "user@example.com"' \
  'type "#password" "secret"' \
  'click "button[type=submit]"'
Shell scripting
# Extract all image URLs from a page
bctl go "https://example.com"
bctl eval "JSON.stringify(Array.from(document.images).map(i=>i.src))"

# Wait for SPA content to load
bctl go "https://app.example.com/dashboard"
bctl wait ".dashboard-loaded" 15
bctl text ".metric-value"

Output Format

All commands return JSON to stdout:

// Success
{"success": true, "data": {"url": "https://example.com", "title": "Example"}}

// Error
{"success": false, "error": "Element not found: .missing"}

Non-zero exit code on errors — works naturally with set -e and && chains.


Architecture

┌─────────────────────────────────────────────────────┐
│  AI Agent / Terminal                                │
│  $ bctl click "button.submit"                       │
│       │                                             │
│       ▼  HTTP POST localhost:19876/command           │
│  ┌──────────────────────┐                           │
│  │   Bridge Server      │  (Python, aiohttp)        │
│  │   :19876             │                           │
│  └──────────┬───────────┘                           │
│             │  WebSocket                            │
│             ▼                                       │
│  ┌──────────────────────┐                           │
│  │  Chrome Extension    │  (Manifest V3)            │
│  │  Service Worker      │                           │
│  └──────────┬───────────┘                           │
│             │  chrome.scripting / chrome.debugger    │
│             ▼                                       │
│  ┌──────────────────────┐                           │
│  │  Your Real Browser   │  (sessions, cookies, etc) │
│  └──────────────────────┘                           │
└─────────────────────────────────────────────────────┘
Component Details
CLI Stdlib only, raw-socket HTTP (zero heavy imports, ~5ms cold start)
Bridge Server Async relay (aiohttp), auto-daemonizes
Extension MV3 service worker, auto-reconnects via chrome.alarms
Click Three-phase: pointer events → MAIN-world click → window.open() interception for SPA compatibility
Eval Dual strategy: MAIN-world injection (fast) + CDP fallback (CSP-safe)

Requirements

  • Python >= 3.11
  • Chrome / Chromium with the extension loaded
  • macOS, Linux, or Windows

Privacy

All communication is local (127.0.0.1). No analytics, no telemetry, no external servers. See PRIVACY.md.

License

MIT

About

Control your Chrome browser from the command line via a Chrome extension + WebSocket bridge

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors