Playwright-based data connectors for DataConnect. Each connector exports a user's data from a web platform using browser automation. Credentials never leave the device.
Each connector has a status indicating its maturity level:
- Stable -- Production-ready. Well-tested across multiple releases, reliable login and data extraction, complete schemas.
- Beta -- Functional but less tested. Works in most cases but may have edge cases or require more login handling.
- Experimental -- New or untested. May not work reliably; contributed recently with limited real-world usage.
| Platform | Company | Status | Scopes |
|---|---|---|---|
| ChatGPT | OpenAI | Stable | chatgpt.conversations, chatgpt.memories |
| GitHub | GitHub | Stable | github.profile, github.repositories, github.starred |
| Meta | Stable | instagram.profile, instagram.posts, instagram.ads | |
| Stable | linkedin.profile, .experience, .education, .skills, .languages, .connections | ||
| Oura Ring | Oura | Stable | oura.readiness, oura.sleep, oura.activity |
| Spotify | Spotify | Stable | spotify.profile, spotify.savedTracks, spotify.playlists |
| Amazon | Amazon | Beta | amazon.profile, amazon.orders |
| Instagram Ads | Meta | Beta | instagram.ads |
| Shop | Shopify | Beta | shop.orders |
| Uber | Uber | Beta | uber.trips, uber.receipts |
| YouTube | Beta | youtube.profile, youtube.subscriptions, youtube.playlists, youtube.playlistItems, youtube.likes, youtube.watchLater, youtube.history | |
| Claude | Anthropic | Experimental | claude.conversations, claude.projects |
| H-E-B | HEB | Experimental | heb.profile, heb.orders, heb.nutrition |
| Steam | Valve | Experimental | steam.profile, steam.games, steam.friends |
| Whole Foods Market | Whole Foods | Experimental | wholefoods.profile, wholefoods.orders, wholefoods.nutrition |
node run-connector.cjs ./connectors/github/github-playwright.js # JSON output (for agents)
node run-connector.cjs ./connectors/github/github-playwright.js --pretty # colored output (for humans)
node run-connector.cjs ./connectors/github/github-playwright.js --inputs '{"username":"x","password":"y"}'See skills/vana-connect/ for the agent skill: setup, running, creating new connectors, and data recipes.
| Folder | What's inside | Audience |
|---|---|---|
connectors/ |
All platform connectors (<company>/<name>-playwright.js + .json) |
Everyone |
scripts/ |
Developer tooling: scaffold, test, validate, session capture | Human developers |
skills/ |
AI agent skill for creating/running connectors (vana-connect/) |
AI agents (Claude, etc.) |
schemas/ |
JSON Schema definitions, one per scope (<platform>.<scope>.json) |
Validation |
icons/ |
SVG/PNG icons for the DataConnect UI | Frontend |
types/ |
TypeScript type definitions (connector.d.ts) |
TypeScript consumers |
connectors/ # All platform connectors
├── <company>/
│ ├── <name>-playwright.js # Connector script (plain JS)
│ └── <name>-playwright.json # Metadata (login URL, selectors, scopes)
│
scripts/ # Developer tooling (human-driven)
├── create-connector.sh # End-to-end connector scaffold + test
├── capture-session.cjs # Browser session capture (manual login)
├── test-connector.cjs # Run connector against a real browser
├── validate-connector.cjs # Structure + output validator
└── reference/ # Templates and API docs
│
skills/vana-connect/ # Agent skill (AI-agent-driven)
├── SKILL.md # Skill entry point (setup, connect, recipes)
├── CREATE.md # Full walkthrough for building connectors
└── scripts/ # Agent-facing scripts (runner, validator, etc.)
│
schemas/ # JSON Schema definitions (one per scope)
├── <platform>.<scope>.json
│
registry.json # Central registry (checksums, versions, OTA)
run-connector.cjs # Symlink → skills/vana-connect/scripts/run-connector.cjs
test-connector.cjs # Standalone test runner
create-connector.sh # Quick autonomous scaffold script
Each connector lives in connectors/<company>/. A connector consists of two files:
<name>-playwright.js-- the connector script (plain JS, runs inside the Playwright runner sidecar)<name>-playwright.json-- metadata (display name, login URL, selectors, scopes)
Some connectors also include a README with platform-specific setup instructions (e.g., API keys).
The repo has two interfaces for building and running connectors. They serve different audiences but share the same connector format and output:
scripts/ |
skills/vana-connect/ |
|
|---|---|---|
| Audience | Human developers at a terminal | AI agents (Claude, etc.) |
| Entry point | scripts/create-connector.sh |
skills/vana-connect/SKILL.md |
| Login | Manual browser login via capture-session.cjs |
CLI-driven (vana connect) |
| Testing | scripts/test-connector.cjs |
run-connector.cjs |
| When to use | Local development, debugging, manual QA | Automated connector creation and data export |
Both produce the same connector files (connectors/<company>/<name>-playwright.js + .json) and use the same schemas, registry, and validation scripts.
Connectors run in a sandboxed Playwright browser managed by the DataConnect app. The runner provides a page API object (not raw Playwright). The browser starts headless; connectors call page.showBrowser() when login is needed and page.goHeadless() after.
Phase 1 -- Login (visible browser)
- Navigate to the platform's login page (headless)
- Check if the user is already logged in via persistent session
- If not, show the browser so the user can log in manually
- Extract auth tokens/cookies once logged in
Phase 2 -- Data collection (headless)
- Switch to headless mode (browser disappears)
- Fetch data via API calls, network capture, or DOM scraping
- Report structured progress to the UI
- Return the collected data with an export summary
Connectors return a scoped result object where data keys use the format source.category (e.g., linkedin.profile, chatgpt.conversations). The frontend auto-detects scoped keys (any key containing a . that isn't a metadata field) and POSTs each scope separately to the Personal Server at POST /v1/data/{scope}.
const result = {
'platform.scope1': { /* scope data */ },
'platform.scope2': { /* scope data */ },
exportSummary: { count, label, details },
timestamp: new Date().toISOString(),
version: '2.0.0-playwright',
platform: 'platform-name',
};Metadata keys (exportSummary, timestamp, version, platform) are not treated as scopes.
| Pattern | When to use | Example connector |
|---|---|---|
API fetch via page.evaluate() |
Platform has REST/JSON APIs | connectors/openai/chatgpt-playwright.js |
Network capture via page.captureNetwork() |
Platform uses GraphQL/XHR that fires on navigation | connectors/meta/instagram-playwright.js |
DOM scraping via page.evaluate() |
No API available, data only in rendered HTML | connectors/linkedin/linkedin-playwright.js |
See skills/vana-connect/CREATE.md for the full walkthrough. Summary:
- Scaffold:
node skills/vana-connect/scripts/scaffold.cjs <platform> [company]-- generates script, metadata, and stub schema - Implement: Write login + data collection logic (see CREATE.md for auth patterns, extraction strategies, and reference connectors)
- Validate structure:
node scripts/validate-connector.cjs connectors/<company>/<name>-playwright.js - Test:
node run-connector.cjs connectors/<company>/<name>-playwright.js --inputs '{"username":"x","password":"y"}' - Validate output:
node scripts/validate-connector.cjs connectors/<company>/<name>-playwright.js --check-result ~/.dataconnect/last-result.json - Register:
node skills/vana-connect/scripts/register.cjs connectors/<company>/<name>-playwright.js-- adds entry + checksums toregistry.json
The page object is available as a global in connector scripts. The runner implementation lives in data-connect/playwright-runner.
| Method | Description |
|---|---|
page.evaluate(jsString) |
Run JS in browser context, return result |
page.screenshot() |
Take a JPEG screenshot, returns base64 string |
page.requestInput({message, schema?}) |
Request data from the driver (credentials, 2FA codes, etc.) |
page.goto(url, options?) |
Navigate to URL |
page.sleep(ms) |
Wait for milliseconds |
page.setData(key, value) |
Send data to host ('status', 'error', 'result') |
page.setProgress({phase, message, count}) |
Structured progress for the UI |
page.showBrowser(url?) |
Escalate to headed mode; returns { headed: true/false } |
page.goHeadless() |
Switch to headless mode (no-op if already headless) |
page.promptUser(msg, checkFn, interval) |
Poll checkFn until truthy |
page.captureNetwork({urlPattern, bodyPattern, key}) |
Register a network capture |
page.getCapturedResponse(key) |
Get captured response or null |
page.hasCapturedResponse(key) |
Check if a response was captured |
page.clearNetworkCaptures() |
Clear all captures |
page.closeBrowser() |
Close browser, keep process for HTTP work |
page.httpFetch(url, options?) |
Node.js fetch with auto-injected cookies from the browser session |
showBrowser switches the browser to headed mode for cases that require live human interaction (e.g., interactive CAPTCHAs). It returns { headed: true } on success or { headed: false } if the driver doesn't support headed mode. Connectors should check the return value and handle the fallback:
const { headed } = await page.showBrowser(url);
if (!headed) {
// Headed not available — retry, skip, or report error
}For normal login flows, use requestInput to ask the driver for credentials without showing a browser:
const { email, password } = await page.requestInput({
message: 'Log in to ChatGPT',
schema: {
type: 'object',
properties: {
email: { type: 'string', format: 'email' },
password: { type: 'string', format: 'password' }
},
required: ['email', 'password']
}
});The runner relays the request to the driver (Tauri app, agent, CLI) and resolves with the response. The schema field uses JSON Schema — the same format used by OpenAI, Anthropic, and Google for LLM tool definitions. See the headless-first runner spec for the full protocol design.
await page.setProgress({
phase: { step: 1, total: 3, label: 'Fetching memories' },
message: 'Downloaded 50 of 200 items...',
count: 50,
});phase.step/phase.total-- drives the step indicator ("Step 1 of 3")phase.label-- short label for the current phasemessage-- human-readable progress textcount-- numeric count for progress tracking
- DataConnect cloned and able to run (
npm run tauri:dev)
- Clone this repo alongside DataConnect:
git clone https://github.com/vana-com/data-connectors.git- Point DataConnect to your local connectors during development:
# From the DataConnect repo
CONNECTORS_PATH=../data-connectors npm run tauri:devThe CONNECTORS_PATH environment variable tells the fetch script to skip downloading and use your local directory instead.
- After editing connector files, sync them to the app's runtime directory:
# From the DataConnect repo
node scripts/sync-connectors-dev.jsThis copies your connector files to ~/.dataconnect/connectors/ where the running app reads them. The app checks this directory first, so your local edits take effect without rebuilding.
- Edit your connector script
- Run
node scripts/sync-connectors-dev.js(from the DataConnect repo) - Click the connector in the app to test
- Check logs in
~/Library/Logs/DataConnect/(macOS) for debugging
Test connectors without the full DataConnect app. The runner spawns playwright-runner as a child process and outputs JSON protocol messages.
Prerequisites: The DataConnect repo cloned alongside this one (the runner auto-detects ../data-dt-app/playwright-runner), or set PLAYWRIGHT_RUNNER_DIR to point to the playwright-runner directory.
# Run a connector (headed by default, browser visible)
node run-connector.cjs ./connectors/linkedin/linkedin-playwright.js
# Colored, human-readable output
node run-connector.cjs ./connectors/linkedin/linkedin-playwright.js --pretty
# Pre-supply credentials
node run-connector.cjs ./connectors/linkedin/linkedin-playwright.js --inputs '{"username":"x","password":"y"}'
# Run headless (no visible browser)
node run-connector.cjs ./connectors/linkedin/linkedin-playwright.js --headless
# Override the initial URL
node run-connector.cjs ./connectors/linkedin/linkedin-playwright.js --url https://linkedin.com/feed
# Save result to a custom path (default: ./connector-result.json)
node run-connector.cjs ./connectors/linkedin/linkedin-playwright.js --output ./my-result.jsonThe runner reads the connector's sibling .json metadata to resolve the connectURL. In headed mode, goHeadless() becomes a no-op so the browser stays visible throughout.
- Fork this repo
- Create a branch:
git checkout -b feat/<platform>-connector - Add your files in
connectors/<company>/:connectors/<company>/<name>-playwright.js-- connector scriptconnectors/<company>/<name>-playwright.json-- metadataschemas/<platform>.<scope>.json-- data schema (optional but encouraged)
- Test locally using the instructions above
- Update
registry.jsonwith your connector entry and checksums - Open a pull request
- Fork and branch
- Make your changes to the connector script and/or metadata
- Test locally
- Update the version in the metadata JSON
- Regenerate checksums and update
registry.json - Open a pull request
- Credentials stay on-device. Never send tokens or passwords to external servers.
- Use
page.setProgress()to report progress during long exports. - Include
exportSummaryin the result. The UI uses it to display what was collected. - Handle errors. Use
page.setData('error', message)with clear error messages. - Prefer API fetch over DOM scraping. APIs are more stable than DOM structure.
- Avoid obfuscated CSS class names. Use structural selectors, heading text, and content heuristics.
- Rate-limit API calls. Add
page.sleep()between requests. - Test pagination edge cases -- empty results, single page, large datasets.
The registry uses SHA-256 checksums to verify file integrity during OTA updates. Always regenerate checksums when modifying connector files:
shasum -a 256 connectors/<company>/<name>-playwright.js | awk '{print "sha256:" $1}'
shasum -a 256 connectors/<company>/<name>-playwright.json | awk '{print "sha256:" $1}'DataConnect fetches registry.json from this repo on app startup and during npm postinstall. For each connector listed:
- Check if local files exist with matching checksums
- If not, download from
baseUrl/<file_path>(this repo's raw GitHub URL) - Verify SHA-256 checksums match
- Write to local
connectors/directory
This enables OTA connector updates without a full app release.