From 5d929ae358976feddf0a22ff83db0a623b52dbef Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Pierzcha=C5=82a?= Date: Thu, 2 Apr 2026 17:50:05 +0200 Subject: [PATCH 1/3] docs: restructure agent-device skill guidance --- skills/agent-device/SKILL.md | 82 ++--- .../agent-device/references/accessibility.md | 29 ++ skills/agent-device/references/batch.md | 64 ++++ .../references/bootstrap-install.md | 173 ++------- skills/agent-device/references/debugging.md | 4 +- skills/agent-device/references/exploration.md | 334 ++++-------------- skills/agent-device/references/qa.md | 46 +++ .../references/session-routing.md | 73 ++++ 8 files changed, 352 insertions(+), 453 deletions(-) create mode 100644 skills/agent-device/references/accessibility.md create mode 100644 skills/agent-device/references/batch.md create mode 100644 skills/agent-device/references/qa.md create mode 100644 skills/agent-device/references/session-routing.md diff --git a/skills/agent-device/SKILL.md b/skills/agent-device/SKILL.md index 91a5068e..372db839 100644 --- a/skills/agent-device/SKILL.md +++ b/skills/agent-device/SKILL.md @@ -5,68 +5,54 @@ description: Automates interactions for Apple-platform apps (iOS, tvOS, macOS) a # agent-device -Use this skill as a router with mandatory defaults. Read this file first. For normal device tasks, always load `references/bootstrap-install.md` and `references/exploration.md` before acting. Use bootstrap to confirm or establish deterministic setup. Use exploration for UI inspection, interaction, and verification once the app session is open. +Use this skill as a thin router. Read this file first, then load only the references that match the task. -## Default operating rules +## Routing map -- Start conservative. Prefer read-only inspection before mutating the UI. -- Start deterministic. If the app name, package, device, or session is uncertain, load bootstrap and discover them before interacting. -- Use plain `snapshot` when the task is to verify what text or structure is currently visible on screen. -- Use `snapshot -i` only when you need interactive refs such as `@e3` for a requested action or targeted query. On iOS and Android, default snapshot output uses the same visible-first model: off-screen interactive content is exposed as discovery hints, not tappable refs. -- Prefer `diff snapshot` after a nearby mutation when you only need to know what changed. -- Avoid speculative mutations. You may take the smallest reversible UI action needed to unblock inspection or complete the requested task, such as dismissing a popup, closing an alert, or clearing an unintended surface. -- In React Native dev or debug builds, check early for visible warning or error overlays, tooltips, and toasts that can steal focus or intercept taps. If they are not part of the requested behavior, dismiss them and continue. If you saw them, report them in the final summary. -- Do not browse the web or use external sources unless the user explicitly asks. -- Re-snapshot after meaningful UI changes instead of reusing stale refs. -- Treat refs in default snapshot output as actionable-now, not durable identities. If a target is off-screen, use `scrollintoview` or scroll and re-snapshot. -- Prefer `@ref` or selector targeting over raw coordinates. -- Ensure the correct target is pinned and an app session is open before interacting. -- Keep the loop short: `open` -> inspect/act -> verify if needed -> `close`. +| Situation | Read next | Skip when | +| --- | --- | --- | +| Normal session setup, app launch, install, target selection | [references/bootstrap-install.md](references/bootstrap-install.md) | Skip only if the correct app session is already open on the correct target | +| Normal UI inspection or interaction | [references/exploration.md](references/exploration.md) | Skip only if the task is pure setup or pure debugging | +| Open-ended bug hunt with report | [../dogfood/SKILL.md](../dogfood/SKILL.md) | Skip for normal task execution | +| QA from acceptance criteria | [references/qa.md](references/qa.md) | Skip for open-ended exploration or bug hunts | +| Accessibility-gap audit | [references/accessibility.md](references/accessibility.md) | Skip unless the task is specifically about AX exposure | +| Known stable flow to run with `batch` | [references/batch.md](references/batch.md) | Skip unless the sequence is already known | +| Failure triage, logs, alerts, permissions, unstable sessions | [references/debugging.md](references/debugging.md) | Skip when the normal flow is working | +| Screenshots, diff, recording, replay maintenance, perf | [references/verification.md](references/verification.md) | Skip until the main interaction flow is working | +| macOS desktop surfaces, menu bar, frontmost app | [references/macos-desktop.md](references/macos-desktop.md) | Skip unless `--platform macos` or desktop surfaces matter | +| Shared host routing, session locking, scoped discovery | [references/session-routing.md](references/session-routing.md) | Skip for single-run local flows | +| Remote daemon or tenant-scoped host control | [references/remote-tenancy.md](references/remote-tenancy.md) | Skip for local runs | -## Default flow +## Normal device tasks -1. Load [references/bootstrap-install.md](references/bootstrap-install.md) and [references/exploration.md](references/exploration.md) before acting on a normal device task. -2. Use bootstrap first to confirm or establish the correct target, app install, and open app session. -3. Once the app session is open and stable, use exploration for inspection, interaction, and verification. -4. Start with plain `snapshot` if the goal is to read or verify what is visible. -5. Escalate to `snapshot -i` only if you need refs for interactive exploration or a requested action. -6. Use `get`, `is`, or `find` before mutating the UI when a read-only command can answer the question. -7. End by capturing proof if needed, then `close`. +For normal device tasks: -## QA modes +1. Load [references/bootstrap-install.md](references/bootstrap-install.md) if the correct app session is not already open on the correct target. +2. Load [references/exploration.md](references/exploration.md) before normal UI inspection or interaction. -- Open-ended bug hunt with reporting: use [../dogfood/SKILL.md](../dogfood/SKILL.md). -- Pass/fail QA from acceptance criteria: stay in this skill, start with [references/bootstrap-install.md](references/bootstrap-install.md), then use the QA loop in [references/exploration.md](references/exploration.md). +Use bootstrap to pin the correct target, app, and session. Use exploration once the app session is open and stable, or immediately if that session is already ready. -## Required references +## Golden paths -- For every normal device task, after reading this file, load [references/bootstrap-install.md](references/bootstrap-install.md) first, then [references/exploration.md](references/exploration.md), before acting. -- Use bootstrap to confirm or establish deterministic setup, especially in sandbox or cloud environments. -- Use exploration once the app session is open and stable. -- Load additional references only when their scope is needed. +1. Normal interaction: bootstrap -> exploration -> verification only if you need proof. +2. RN warning during interaction: exploration -> dismiss warning -> continue without re-snapshotting -> debugging only if it keeps returning or becomes the task. +3. QA from acceptance criteria: bootstrap -> exploration -> [references/qa.md](references/qa.md). +4. Bug hunt with reporting: switch to [../dogfood/SKILL.md](../dogfood/SKILL.md). +5. Accessibility audit: bootstrap if needed -> exploration -> [references/accessibility.md](references/accessibility.md). +6. Stable scripted flow: bootstrap if needed -> exploration -> [references/batch.md](references/batch.md). -## Decision rules +## QA modes -- Use plain `snapshot` when you need to verify whether text is visible. -- Use `snapshot -i` mainly for interactive exploration and choosing refs. -- Use `diff snapshot` for compact post-action verification; use `snapshot --diff` when that alias is easier to discover from snapshot help. -- Use `get`, `is`, or `find` when they can answer the question without changing UI state. -- Use `fill` to replace text. -- Use `type` to append text. -- Do not write `type @eN "text"`. Use `fill @eN "text"` to target a field directly, or `press @eN` then `type "text"` when the field already has focus and you want append semantics. -- If the on-screen keyboard blocks the next step, prefer `keyboard dismiss` over navigation. On iOS, keep an app session open first; `keyboard status|get` remains Android-only. -- When a task asks to "go back", use plain `back` for predictable app-owned navigation and reserve `back --system` for platform back gestures or button semantics. -- Use `type --delay-ms` or `fill --delay-ms` for debounced search fields that drop characters when typed too quickly. -- If there is no simulator, no app install, or no open app session yet, switch to `bootstrap-install.md` instead of improvising setup steps. -- Use the smallest unblock action first when transient UI blocks inspection, but do not navigate, search, or enter new text just to make the UI reveal data unless the user asked for that interaction. -- In React Native dev or debug apps, treat visible warning or error overlays as transient blockers unless the user is explicitly asking you to diagnose them. Dismiss them when safe, then continue the requested flow. -- Do not use external lookups to compensate for missing on-screen data unless the user asked for them. -- If the needed information is not exposed on screen, say that plainly instead of compensating with extra navigation, text entry, or web search. -- Prefer `@ref` or selector targeting over raw coordinates. +- Open-ended bug hunt with reporting: use [../dogfood/SKILL.md](../dogfood/SKILL.md). +- Pass/fail QA from acceptance criteria: stay in this skill, use [references/bootstrap-install.md](references/bootstrap-install.md) if the correct app session is not already open on the correct target, then [references/exploration.md](references/exploration.md), then [references/qa.md](references/qa.md). ## Additional references - Need logs, network, alerts, permissions, or failure triage: [references/debugging.md](references/debugging.md) - Need screenshots, diff, recording, replay maintenance, or perf data: [references/verification.md](references/verification.md) +- Need acceptance-criteria mapping or pass/fail checks: [references/qa.md](references/qa.md) +- Need accessibility-gap auditing: [references/accessibility.md](references/accessibility.md) +- Need a known stable `batch` flow: [references/batch.md](references/batch.md) - Need desktop surfaces, menu bar behavior, or macOS-specific interaction rules: [references/macos-desktop.md](references/macos-desktop.md) +- Need shared-host routing, session locking, or scoped discovery: [references/session-routing.md](references/session-routing.md) - Need remote HTTP transport, `--remote-config` launches, or tenant leases on a remote macOS host: [references/remote-tenancy.md](references/remote-tenancy.md) diff --git a/skills/agent-device/references/accessibility.md b/skills/agent-device/references/accessibility.md new file mode 100644 index 00000000..abd9b4c4 --- /dev/null +++ b/skills/agent-device/references/accessibility.md @@ -0,0 +1,29 @@ +# Accessibility + +## When to open this file + +Open this file when the task is to find UI that is visible to a user but missing from the accessibility tree. + +## Audit loop + +1. Capture a `screenshot` to see what is visually rendered. +2. Capture a `snapshot` or `snapshot -i` to see what the accessibility tree exposes. +3. Compare the two: + - visible in screenshot and present in snapshot: exposed to accessibility + - visible in screenshot and missing from snapshot: likely accessibility gap +4. If you suspect the node exists in AX but is filtered from interactive output, retry with `snapshot --raw`. + +## Example + +```bash +agent-device screenshot /tmp/accessibility-screen.png +agent-device snapshot -i +``` + +Use `screenshot` as the visual source of truth and `snapshot` as the accessibility source of truth for this audit. + +## When to leave this file + +- Return to [exploration.md](exploration.md) once the accessibility comparison is complete. +- Switch to [verification.md](verification.md) if you need screenshots, recordings, or other proof artifacts. +- Switch to [debugging.md](debugging.md) if the audit turns into a failure investigation rather than an accessibility comparison. diff --git a/skills/agent-device/references/batch.md b/skills/agent-device/references/batch.md new file mode 100644 index 00000000..2c2b0700 --- /dev/null +++ b/skills/agent-device/references/batch.md @@ -0,0 +1,64 @@ +# Batch + +## When to open this file + +Open this file only when a short command sequence is already known and belongs to one logical screen flow. + +## Core rules + +- Use `batch` only after exploration has stabilized the flow. +- Keep batch size moderate, roughly 5 to 20 steps. +- Add `wait` or `is exists` guards after mutating steps. +- Do not use `batch` for highly dynamic flows that need replanning after each step. +- Nested `batch` and `replay` are rejected. +- Replan from the first failing step instead of rerunning the whole flow blindly. + +## Example command + +```bash +agent-device batch --session sim --platform ios --steps-file /tmp/batch-steps.json --json +``` + +## Step payload contract + +```json +[ + { "command": "open", "positionals": ["Settings"], "flags": { "platform": "ios" } }, + { "command": "wait", "positionals": ["label=\"Privacy & Security\"", "3000"], "flags": {} }, + { "command": "click", "positionals": ["label=\"Privacy & Security\""], "flags": {} }, + { "command": "get", "positionals": ["text", "label=\"Tracking\""], "flags": {} } +] +``` + +- `positionals` is optional and defaults to `[]`. +- `flags` is optional and defaults to `{}`. +- Only `command`, `positionals`, `flags`, and `runtime` are accepted as top-level step keys. +- Supported error mode is stop-on-first-error. + +## Canonical stable-flow recipe + +```json +[ + { "command": "open", "positionals": ["com.example.app"], "flags": { "platform": "android" } }, + { "command": "wait", "positionals": ["text", "Home", "3000"], "flags": {} }, + { "command": "press", "positionals": ["label=\"More actions\" role=button"], "flags": {} }, + { "command": "wait", "positionals": ["text", "Camera scan", "2000"], "flags": {} }, + { "command": "press", "positionals": ["label=\"Camera scan\""], "flags": {} }, + { "command": "wait", "positionals": ["text", "Expense created", "15000"], "flags": {} }, + { "command": "is", "positionals": ["visible", "label=\"Expense created\""], "flags": {} } +] +``` + +## Common batch error categories + +- `INVALID_ARGS`: fix the payload shape and retry. +- `SESSION_NOT_FOUND`: open or select the correct session, then retry. +- `UNSUPPORTED_OPERATION`: switch to a supported command or surface. +- `AMBIGUOUS_MATCH`: refine the selector or locator, then retry the failed step. +- `COMMAND_FAILED`: add sync guards and retry from the failing step. + +## When to leave this file + +- Return to [exploration.md](exploration.md) if the flow is no longer stable enough for `batch`. +- Switch to [verification.md](verification.md) if the batch flow is working and you only need proof or replay maintenance. +- Switch to [debugging.md](debugging.md) if failures need log, alert, or permission triage. diff --git a/skills/agent-device/references/bootstrap-install.md b/skills/agent-device/references/bootstrap-install.md index 2e876fce..39a20cc4 100644 --- a/skills/agent-device/references/bootstrap-install.md +++ b/skills/agent-device/references/bootstrap-install.md @@ -2,42 +2,42 @@ ## When to open this file -Open this file when you still need to choose the right target, start the right session, install or relaunch the app, or pin automation to one device before interacting. This is the deterministic setup layer for sandbox, cloud, or other environments where install paths, device state, or app readiness may be uncertain. +Open this file when you still need to choose the right target, start the right session, install or relaunch the app, or pin the run to one device before interacting. -## Open-first path +## Default setup path -- `devices` -- `apps` -- `ensure-simulator` -- `open` -- `session list` +Use this order when you are not sure about the target or installed app identifier: -Use this exact order when you are not sure about the installed app identifier. On Android dev builds in particular, `apps` is cheaper than guessing package suffixes and retrying failed `open` calls. +1. `devices` +2. `apps` +3. `ensure-simulator` +4. `open` +5. `session list` -## Install path - -- `install` or `reinstall` +On Android dev builds in particular, `apps` is cheaper than guessing package suffixes and retrying failed `open` calls. ## Most common mistake to avoid -Do not start acting before you have pinned the correct target and opened an `app` session. In mixed-device environments, always pass `--device`, `--udid`, or `--serial`. - -## Deterministic setup rule - -If there is no simulator, no app install, no open app session, or any uncertainty about where the app should come from, stay in this file and use deterministic setup commands or bootstrap scripts first. Do not improvise install paths or app-launch flows while exploring. - -After setup is confirmed or completed, move to `exploration.md` before doing UI inspection or interaction. +Do not start acting before you have pinned the correct target and opened an `app` session. In mixed-device environments, always pass `--device`, `--udid`, or `--serial` while choosing the target. ## Open-first rule - If the user asks to test an app and does not provide an install artifact or explicit install instruction, try `open ` first. -- If `open ` fails, run `agent-device apps` and retry with a discovered app name before considering install steps. +- If `open ` fails or you are not sure which app name is available on the target, run `agent-device apps` and retry with a discovered app name instead of guessing. - Do not install or reinstall on the first attempt unless the user explicitly asks for installation or provides a concrete artifact path or URL. - When installation is required from a known location, prefer a checked-in shell script or other deterministic bootstrap command over ad hoc path guessing. -- If `open ` fails, or you are not sure which app name is available on the target, run `agent-device apps` first and choose from the discovered app list instead of guessing. -- Use `apps --platform ` together with `--device`, `--udid`, or `--serial` when target selection matters. -- Once you have the correct app name, retry `open` with that exact discovered value. +## Install guidance + +- Use `install ` when the app may already be installed and you do not need a fresh-state reset. +- Use `reinstall ` when you explicitly need uninstall plus install as one deterministic step. +- Keep install and open as separate phases. Do not turn them into one default command flow. +- Supported binary formats: + - Android: `.apk` and `.aab` + - iOS: `.app` and `.ipa` +- For iOS `.ipa` files, `` is used as the bundle id or bundle name hint when the archive contains multiple app bundles. +- After install or reinstall, later use `open ` with the exact discovered or known package or bundle identifier, not the artifact path. +- Do not use `open --relaunch` on Android. ## Common starting points @@ -60,56 +60,37 @@ agent-device install com.example.app ./build/app.apk --platform android --serial agent-device install com.example.app ./build/MyApp.app --platform ios --device "iPhone 17 Pro" ``` -## Install guidance - -- Use `install ` when the app may already be installed and you do not need a fresh-state reset. -- Use `reinstall ` when you explicitly need uninstall plus install as one deterministic step. -- Keep install and open as separate phases. Do not turn them into one default command flow. -- Supported binary formats: - - Android: `.apk` and `.aab` - - iOS: `.app` and `.ipa` -- For iOS `.ipa` files, `` is used as the bundle id or bundle name hint when the archive contains multiple app bundles. -- After install or reinstall, later use `open ` with the exact discovered or known package/bundle identifier, not the artifact path. - ## Choose the right starting point - iOS local QA: prefer simulators unless the task explicitly requires physical hardware. - iOS in mixed simulator and device environments: run `ensure-simulator` first, then keep using `--device` or `--udid`. -- TV targets: use `--target tv` together with `--platform` when the task is for tvOS or Android TV rather than phone or tablet surfaces. +- TV targets: use `--target tv` together with `--platform` when the task is for tvOS or Android TV. - Android binary flow: use `install` or `reinstall` for `.apk` or `.aab`, then open by installed package name. - macOS desktop app flow: use `open --platform macos`. Only load [macos-desktop.md](macos-desktop.md) if a desktop surface or macOS-specific behavior matters. -TV example: - -```bash -agent-device open MyTvApp --platform ios --target tv -agent-device open com.example.androidtv --platform android --target tv -``` - -## Session rules - -- Use `--session ` when you need a named session: - -```bash -agent-device --session auth open Settings --platform ios -agent-device --session auth snapshot -i -``` +## Session basics +- Use `--session ` when you need a named session. - Use `open ` before interactions. - Use `close` when done. Add `--shutdown` when you want simulators or emulators torn down with the session. +- Use `close --save-script=` when you want to preserve a replay script from the session. - Use semantic session names when you need multiple concurrent runs. -- Use `--save-script=` on `close` when you want to keep a replay script. - For dev loops where state can linger, prefer `open --relaunch`. - In iOS sessions, use `open ` for the app itself. Use `open ` for deep links, and `open ` when you need to launch the app and deep link in one step. - On iOS, `appstate` is session-scoped and requires the matching active session on the target device. -## After a session is established +Example: + +```bash +agent-device --session auth open Settings --platform ios +agent-device --session auth snapshot -i +``` -Once you have opened the correct session on the correct target, default to the conservative rule: keep the session binding on follow-up commands, and stop repeating device-routing flags unless you are intentionally retargeting. +## After a session is established - Prefer `--session ` on follow-up commands, or use sandboxed `AGENT_DEVICE_SESSION`. -- Do not keep repeating `--platform`, `--target`, `--device`, `--udid`, `--serial`, or similar target-selection flags on normal follow-up commands. -- Only omit follow-up session flags when the environment explicitly guarantees isolation. +- Do not keep repeating `--platform`, `--target`, `--device`, `--udid`, or `--serial` on normal follow-up commands. +- Use target-selection flags again only when you are choosing the target before opening a session, or when you intentionally mean to retarget. Good shared-host pattern: @@ -120,90 +101,8 @@ agent-device --session auth press @e3 agent-device --session auth close ``` -Bad shared-host pattern: - -```bash -agent-device --session auth open Settings --platform ios --device "iPhone 17 Pro" -agent-device --session auth snapshot -i --platform ios --device "iPhone 17 Pro" -``` - -Use target-selection flags again only when you are choosing the target before opening a session, or when you explicitly mean to retarget. - -## Session-bound automation - -Use this when an orchestrator must keep plain CLI calls on one session and device. - -```bash -export AGENT_DEVICE_SESSION=qa-ios -export AGENT_DEVICE_PLATFORM=ios -export AGENT_DEVICE_SESSION_LOCK=strip - -agent-device open MyApp --relaunch -``` - -- `AGENT_DEVICE_SESSION` plus `AGENT_DEVICE_PLATFORM` provides the default binding. -- `--session-lock reject|strip` controls whether conflicting per-call routing flags fail or are ignored. -- Conflicts include explicit retargeting flags such as `--platform`, `--target`, `--device`, `--udid`, `--serial`, `--ios-simulator-device-set`, and `--android-device-allowlist`. -- Lock policy applies to nested `batch` steps too. -- Compatibility aliases remain supported: `--session-locked`, `--session-lock-conflicts`, `AGENT_DEVICE_SESSION_LOCKED`, and `AGENT_DEVICE_SESSION_LOCK_CONFLICTS`. - -Android emulator variant: - -```bash -export AGENT_DEVICE_SESSION=qa-android -export AGENT_DEVICE_PLATFORM=android - -agent-device --session-lock reject open com.example.myapp --relaunch -``` - -## Scoped discovery - -Use scoped discovery when one run must not see host-global device lists. - -```bash -agent-device devices --platform ios --ios-simulator-device-set /tmp/tenant-a/simulators -agent-device devices --platform android --android-device-allowlist emulator-5554,device-1234 -``` - -- Scope is applied before `--device`, `--udid`, and `--serial`. -- Out-of-scope selectors fail with `DEVICE_NOT_FOUND`. -- With iOS simulator-set scope enabled, iOS physical devices are not enumerated. -- If the scoped iOS simulator set is empty, the error should point at the set path and suggest creating a simulator in that set. -- Environment equivalents: - - `AGENT_DEVICE_IOS_SIMULATOR_DEVICE_SET` - - `AGENT_DEVICE_ANDROID_DEVICE_ALLOWLIST` - -## Session inspection and replay - -```bash -agent-device session list -agent-device replay ./session.ad --session auth -agent-device replay -u ./session.ad --session auth -``` - -- iOS session entries include `device_udid` and `ios_simulator_device_set`. Use them to confirm routing in concurrent runs. -- Prefer selector-based actions and assertions in saved replay scripts. -- Tenant isolation namespaces sessions as `:` during tenant-scoped runs. - ## When to leave this file - Once the correct target and session are pinned, move to [exploration.md](exploration.md). - If opening, startup, permissions, or logs become the blocker, switch to [debugging.md](debugging.md). - -## Install examples - -```bash -agent-device reinstall MyApp /path/to/app-debug.apk --platform android --serial emulator-5554 -``` - -```bash -agent-device install com.example.app ./build/MyApp.ipa --platform ios --device "iPhone 17 Pro" -``` - -Do not use `open --relaunch` on Android. - -## Security and trust notes - -- Treat signing, provisioning, and daemon auth values as host secrets. Do not paste them into shared logs or commit them to source control. -- Prefer Xcode Automatic Signing over manual overrides when a physical iOS device is involved. -- Keep persistent host-specific defaults in environment variables rather than checked-in project config. +- If you need advanced session locking, scoped discovery, or concurrent-run routing, switch to [session-routing.md](session-routing.md). diff --git a/skills/agent-device/references/debugging.md b/skills/agent-device/references/debugging.md index 885fdf2b..d28879ee 100644 --- a/skills/agent-device/references/debugging.md +++ b/skills/agent-device/references/debugging.md @@ -17,7 +17,7 @@ Open this file when the task turns into failure triage, logs, network inspection Do not leave logging on for normal flows or dump full log files into context. Keep debug windows short and inspect logs with `grep` or `tail`. -In React Native dev or debug builds, do not dismiss visible warning or error overlays without remembering to report them later. If you close one to keep the flow moving, keep at least a screenshot or a short marked log window so the summary can name it. +For the normal React Native warning flow, follow [exploration.md](exploration.md). If you are already in a debug window and dismiss a visible warning or error overlay to keep the flow moving, keep at least a screenshot or a short marked log window so the summary can name it. ## Canonical loop @@ -72,7 +72,7 @@ grep -n -E "agent-device.*mark|before tap" tail -50 ``` -If the app showed a visible warning or error overlay during the flow: +If the app showed a visible warning or error overlay during a debug run: - Prefer a narrow grep window around your `logs mark` lines instead of loading the whole file. - Mention the surfaced warning or error in the final summary even if it did not block completion. diff --git a/skills/agent-device/references/exploration.md b/skills/agent-device/references/exploration.md index 2ff799f7..4dc638ae 100644 --- a/skills/agent-device/references/exploration.md +++ b/skills/agent-device/references/exploration.md @@ -2,135 +2,75 @@ ## When to open this file -Open this file when the app or screen is already running and you need to discover the UI, choose targets, read state, wait for conditions, or perform normal interactions. +Open this file when the app session is already running and you need to inspect the UI, choose a target, interact with the current screen, or verify a nearby state change. + +## Default loop + +1. Inspect the current screen with `snapshot` or `snapshot -i`. +2. Act with the smallest needed command. +3. Verify with `get`, `is`, `wait`, `diff snapshot`, or `screenshot` as needed. +4. Re-snapshot only after meaningful UI changes, except for transient React Native warnings that you dismiss and continue past. + +## Decision shortcut + +- Need visible text or structure: `snapshot` +- Need to tap, type, select, or choose a ref: `snapshot -i` +- Need exact text from a known target: `get text` +- Need an assertion: `is` +- Need search-driven targeting: `find` +- Need sync after a mutation: `wait` +- Need compact structural verification after a nearby change: `diff snapshot` +- Need proof image: `screenshot` +- Need to dismiss the keyboard: `keyboard dismiss` +- Need Android keyboard visibility or input-type state: `keyboard status` or `keyboard get` +- Need logs, alerts, or failure triage: switch to [debugging.md](debugging.md) +- Need proof artifacts, replay maintenance, or performance checks: switch to [verification.md](verification.md) +- Need QA from acceptance criteria: switch to [qa.md](qa.md) +- Need accessibility-gap auditing: switch to [accessibility.md](accessibility.md) +- Need `batch` for a known stable flow: switch to [batch.md](batch.md) ## Read-only first - If the question is what text, labels, or structure is visible on screen, start with plain `snapshot`. -- Escalate to `snapshot -i` only when you need refs such as `@e3` for interactive exploration or a requested action. -- If you intend to `press`, `fill`, or otherwise interact, start with `snapshot -i` and fall back to plain `snapshot` only if interactive refs are unavailable. +- Escalate to `snapshot -i` only when you need refs such as `@e3` for an interaction or targeted query. - Prefer `get`, `is`, or `find` before mutating the UI when a read-only command can answer the question. -- You may take the smallest reversible UI action needed to unblock inspection, such as dismissing a popup, closing an alert, or backing out of an unintended surface. +- On Android, use `keyboard status` or `keyboard get` when keyboard visibility or input type matters and you do not need to change UI state. +- Use the smallest reversible UI action needed to unblock inspection, such as dismissing a popup, closing an alert, or backing out of an unintended surface. - Do not type or fill text just to make hidden information easier to access unless the user asked for that interaction. - Do not use external sources to infer missing UI state unless the user explicitly asked. - If the answer is not visible or exposed in the UI, report that gap instead of compensating with search, navigation, or text entry. -## Decision shortcut - -- User asks what is visible on screen: `snapshot` -- User asks for exact text from a known target: `get text` -- User asks you to tap, type, or choose an element: `snapshot -i`, then act -- React Native dev or debug build shows warning/error UI: capture enough evidence to identify it, dismiss it if it is not the requested behavior, then continue the flow and report it in the summary -- The on-screen keyboard is blocking the next step: `keyboard dismiss`; on iOS do this only while an app session is active, and use `keyboard status|get` only on Android -- UI does not expose the answer: say so plainly; do not browse or force the app into a new state unless asked - -## Read-only commands - -- `snapshot` -- `get` -- `is` -- `find` -- `keyboard status|get` on Android when keyboard visibility or input type matters - -## Interaction commands - -- `snapshot -i` -- `press` -- `fill` -- `type` -- `scrollintoview` -- `wait` -- `keyboard dismiss` when the keyboard obscures the next target - -## Common mistakes to avoid - -**Stale refs.** Do not treat `@ref` values as durable after navigation or dynamic updates. Re-snapshot after the UI changes, and switch to selectors when the flow must stay stable. - -**Android AX tree lag.** After submits, route changes, or composer transitions, the accessibility tree can lag behind the visible UI. If `snapshot -i` and `screenshot` disagree: - -1. Trust the screenshot as visual truth. -2. Take one fresh `snapshot -i`. Android retries briefly after navigation-sensitive actions. -3. If the tree still disagrees with the screenshot, wait briefly, then take one more fresh snapshot. Do not loop snapshots immediately. - -**React Native dev overlays.** In dev or debug builds, warning or error overlays can block taps, change focus, or hide the real UI. Check for them near app open and after major transitions. - -- Not blocking the task: dismiss and continue. -- Blocking or recurring: switch to [debugging.md](debugging.md) and collect evidence. -- Seen at any point: mention in the final summary even if dismissed. - -## Common example loops - -These are examples, not required exact sequences. Adapt them to the app, state, and task at hand. - -### Interactive exploration loop - -```bash -agent-device open Settings --platform ios -agent-device snapshot -i -agent-device press @e3 -agent-device wait visible 'label="Privacy & Security"' 3000 -agent-device get text 'label="Privacy & Security"' -agent-device close -``` - -### Screen verification loop - -```bash -agent-device open MyApp --platform ios -# perform the necessary actions to reach the state you need to verify -agent-device snapshot -# verify whether the expected element or text is present -agent-device close -``` - ## Snapshot choices -- Use plain `snapshot` when you only need to verify whether visible text or structure is on screen. -- Use `snapshot -i` when you need refs such as `@e3` for interactive exploration or for an intended interaction. -- On iOS and Android, default snapshot output is visible-first. Off-screen interactive content is surfaced as discovery hints (including inline scroll/list hidden-content hints when known), not shown as directly tappable refs. -- Treat large text-surface lines in `snapshot -i` as discovery output. If a node shows preview or truncation metadata, use `get text @ref` only after you have already decided that `snapshot -i` is needed for that surface. -- Use `snapshot -i -s "Camera"` or `snapshot -i -s @e3` when you want a smaller, scoped result. -- If `snapshot -i -s ""` returns 0 nodes, the scope did not match the current screen. Widen the query or re-check the screen state instead of assuming the command silently fell back to the full tree. +- Use plain `snapshot` when you only need visible text or structure. +- Use `snapshot -i` when you need refs such as `@e3` for interaction or targeted inspection. +- On iOS and Android, default snapshot output is visible-first. Off-screen interactive content is surfaced as discovery hints, not as directly tappable refs. +- Use `snapshot -i -s "Camera"` or `snapshot -i -s @e3` when you want a smaller scoped result. +- If `snapshot -i -s ""` returns 0 nodes, widen the query or re-check the current screen instead of assuming it fell back to the full tree. - If `snapshot -i` returns 0 nodes but the screen is visibly populated, treat `screenshot` as visual truth, wait briefly, then re-run `snapshot -i` once before escalating. - If `snapshot -i -d ` says the interactive output is empty at that depth, retry without `-d` instead of taking more shallow snapshots. -Example: - -```bash -agent-device snapshot -i -``` - -Sample output: - -```text -Page: com.apple.Preferences -App: com.apple.Preferences - -@e1 [ioscontentgroup] - @e2 [button] "Camera" - @e3 [button] "Privacy & Security" -[off-screen below] 2 interactive items: "Location Services", "Battery" -``` - -## Refs vs selectors +## Refs and selectors - Use refs for discovery, debugging, and short local loops. +- Use selectors for deterministic scripts, assertions, and replay-friendly actions. +- Prefer selector or `@ref` targeting over raw coordinates. - Use `scrollintoview @ref` when the target is already known from the current snapshot and you want the command to re-snapshot after each swipe until the element reaches the viewport safe band. - If `scrollintoview @ref` succeeds, prefer the returned `currentRef` for the next action. - Visible-first off-screen summaries are intentionally compact. If you need the full off-screen tree instead of a short summary, retry with `snapshot --raw`. - Cap long searches with `--max-scrolls ` when the list may be unbounded or the target may not exist. -- Use selectors for deterministic scripts, assertions, and replay-friendly actions. -- Prefer selector or `@ref` targeting over raw coordinates. - For tap interactions, `press` is canonical and `click` is an equivalent alias. -Examples: +## Text entry rules -```bash -agent-device press @e2 -agent-device fill @e5 "test" -agent-device press 'id="camera_row" || label="Camera" role=button' -agent-device is visible 'id="camera_settings_anchor"' -``` +- Use `fill` to replace text in an editable field. +- Use `type` to append text to the current insertion point. +- Use `fill @ref "text"` when you need to target a field directly by ref. +- Use `press @ref`, then `type "text"` when the field is already focused and you need append semantics. +- Do not write `type @ref "text"`; `type` only accepts text and will not target that ref for you. +- If the keyboard blocks the next control after text entry, prefer `keyboard dismiss` instead of backing out of the screen. +- On iOS, `keyboard dismiss` depends on the active app session, so do not rely on it after closing or without `open`. +- Do not use `fill` or `type` just to make the app reveal information that is not currently visible unless the user asked for that interaction. ## Interaction fallbacks @@ -140,24 +80,30 @@ When `press @ref` fails: 2. Re-snapshot if the UI may have changed. 3. Retry `press @ref` or a selector-based `press`. 4. If `screenshot --overlay-refs --json` returned a reliable `overlayRefs[].center`, use `agent-device press `. -5. Use an external vision-based tap tool only after semantic and coordinate targeting fail. +5. Open [coordinate-system.md](coordinate-system.md) if you are forced onto raw coordinates. - Prefer `@ref` over coordinates. - Do not guess coordinates from the image when structured `center` is available. - `agent-device` does not provide a built-in vision-tap flag. -## Text entry rules +## Common mistakes to avoid -- Use `fill` to replace text in an editable field. -- Use `type` to append text to the current insertion point. -- Use `fill @ref "text"` when you need to target a field directly by ref. -- Use `press @ref`, then `type "text"` when the field is already focused and you need append semantics. -- Do not write `type @ref "text"`; `type` only accepts text and will not target that ref for you. -- If the keyboard blocks the next control after text entry, prefer `keyboard dismiss` instead of backing out of the screen. -- On iOS, `keyboard dismiss` depends on the active app session to keep the target app foregrounded, so do not rely on selector-only dismiss calls after closing or without `open`. -- Do not use `fill` or `type` just to make the app reveal information that is not currently visible unless the user asked for that interaction. +**Stale refs.** Do not treat `@ref` values as durable after navigation or dynamic updates. Re-snapshot after the UI changes, and switch to selectors when the flow must stay stable. + +**Android AX tree lag.** After submits, route changes, or composer transitions, the accessibility tree can lag behind the visible UI. If `snapshot -i` and `screenshot` disagree: + +1. Trust the screenshot as visual truth. +2. Take one fresh `snapshot -i`. Android retries briefly after navigation-sensitive actions. +3. If the tree still disagrees with the screenshot, wait briefly, then take one more fresh snapshot. Do not loop snapshots immediately. + +**React Native dev overlays.** In dev or debug builds, warning or error overlays can block taps, change focus, or hide the real UI. + +- If the warning or error is not the thing the user asked you to investigate, dismiss it and continue. +- After dismissing a transient React Native warning overlay, continue without re-snapshotting. +- If the overlay keeps returning or becomes the task, switch to [debugging.md](debugging.md). +- Mention visible warnings or errors in the final summary even if you dismissed them. -## React Native dev or debug overlays +## React Native warning loop Use this loop for React Native dev clients, Metro-backed builds, and local debug sessions where warnings or errors may appear as tooltips, banners, toasts, or modal overlays. @@ -166,165 +112,18 @@ Use this loop for React Native dev clients, Metro-backed builds, and local debug - preferred: `screenshot` - optional: `logs mark "warning visible"` or `logs mark "error visible"` if you are already in a debug window 3. If the overlay is not the thing the user asked you to investigate, dismiss or close it with the smallest reversible action. -4. Re-check the intended screen before continuing the task. +4. Continue the intended flow without forcing a fresh snapshot. 5. Report any visible warnings or errors in the final summary, even if the flow succeeded after dismissal. -Use this rule of thumb: - -- Warning overlay that does not block the task: dismiss and keep going. -- Error overlay that does not block the task: dismiss, keep going, and report it. -- Error overlay that blocks the task or keeps returning: stop treating it as noise and switch to [debugging.md](debugging.md). - ## Query and sync rules - Use `get` to read text, attrs, or state from a known target. - Use `is` for assertions. - Use `wait` when the UI needs time to settle after a mutation. - Use `find "" click --json` when you need search-driven targeting plus matched-target metadata. -- Use `find "" click --first` or `--last` when ambiguous matches are expected and you want the first or last occurrence without falling back to raw coordinates. -- If you are forced onto raw coordinates, open [coordinate-system.md](coordinate-system.md) first. - -Example: - -```bash -agent-device find "Increment" click --json -``` - -Returned metadata comes from the matched snapshot node and can be used for observability or replay maintenance. - -## QA from acceptance criteria - -Use this loop when the task starts from acceptance criteria and you need to turn them into concrete checks. - -Preferred mapping: - -- visibility claim for what is on-screen now: `is visible` or plain `snapshot` -- presence claim regardless of viewport visibility: `is exists` -- exact text, label, or value claim: `get text` -- post-action state change: act, then `wait`, then `is` or `get` -- nearby structural UI change: `diff snapshot` -- proof artifact for the final result: `screenshot` or `record` - -Notes: - -- `wait text` is useful for synchronizing on text presence, but it is not the same as `is visible`. -- After a nearby navigation or submit on Android, prefer `screenshot`, then `wait 500` or `wait 1000`, then one fresh `snapshot -i` if the accessibility tree seems stale. - -Anti-hallucination rules: - +- Use `find "" click --first` or `--last` when ambiguous matches are expected. - Do not invent app names, device ids, session names, refs, selectors, or package names. - Discover them first with `devices`, `open`, `snapshot -i`, `find`, or `session list`. -- If refs drift after navigation, re-snapshot or switch to selectors instead of guessing. - -Avoid this escalation path for visible-text questions: - -- Do not jump from `snapshot -i` to `get text @ref`, then to web search, then to typing into a search box just to force the app to reveal the answer. -- Start with `snapshot`. If the text is not visible or exposed, report that directly. -- After Android submit or navigation-heavy actions when the UI looks wrong: `screenshot` first, then `snapshot -i`. - -Canonical QA loop: - -```bash -agent-device open MyApp --platform ios -agent-device snapshot -i -agent-device press @e3 -agent-device wait visible 'label="Success"' 3000 -agent-device is visible 'label="Success"' -agent-device screenshot /tmp/qa-proof.png -agent-device close -``` - -## Accessibility audit - -Use this pattern when you need to find UI that is visible to a user but missing from the accessibility tree. - -Audit loop: - -1. Capture a `screenshot` to see what is visually rendered. -2. Capture a `snapshot` or `snapshot -i` to see what the accessibility tree exposes. -3. Compare the two: - - visible in screenshot and present in snapshot: exposed to accessibility - - visible in screenshot and missing from snapshot: likely accessibility gap -4. If you suspect the node exists in AX but is filtered from interactive output, retry with `snapshot --raw`. - -Example: - -```bash -agent-device screenshot /tmp/accessibility-screen.png -agent-device snapshot -i -``` - -Use `screenshot` as the visual source of truth and `snapshot` as the accessibility source of truth for this audit. - -## Batch only when the sequence is already known - -Use `batch` when a short command sequence is already planned and belongs to one logical screen flow. - -```bash -agent-device batch --session sim --platform ios --steps-file /tmp/batch-steps.json --json -``` - -- Keep batch size moderate, roughly 5 to 20 steps. -- Add `wait` or `is exists` guards after mutating steps. -- Do not use `batch` for highly dynamic flows that need replanning after each step. - -Example: known chat-send flow - -```json -[ - { "command": "open", "positionals": ["ChatApp"], "flags": { "platform": "android" } }, - { "command": "click", "positionals": ["label=\"Travel chat\""], "flags": {} }, - { "command": "wait", "positionals": ["label=\"Message\"", "3000"], "flags": {} }, - { "command": "fill", "positionals": ["label=\"Message\"", "Filed the expense"], "flags": {} }, - { "command": "press", "positionals": ["label=\"Send\""], "flags": {} } -] -``` - -Step payload contract: - -```json -[ - { "command": "open", "positionals": ["Settings"], "flags": { "platform": "ios" } }, - { "command": "wait", "positionals": ["label=\"Privacy & Security\"", "3000"], "flags": {} }, - { "command": "click", "positionals": ["label=\"Privacy & Security\""], "flags": {} }, - { "command": "get", "positionals": ["text", "label=\"Tracking\""], "flags": {} } -] -``` - -- `positionals` is optional and defaults to `[]`. -- `flags` is optional and defaults to `{}`. -- Only `command`, `positionals`, `flags`, and `runtime` are accepted as top-level step keys. -- Nested `batch` and `replay` are rejected. -- Supported error mode is stop-on-first-error. - -Response handling: - -- Success returns fields such as `total`, `executed`, `totalDurationMs`, and `results[]`. -- Human-mode `batch` runs also print a short per-step success summary. -- Failed runs include `details.step`, `details.command`, `details.executed`, and `details.partialResults`. -- Replan from the first failing step instead of rerunning the whole flow blindly. - -Canonical batch recipe: open app -> open action menu -> choose option -> verify - -```json -[ - { "command": "open", "positionals": ["com.example.app"], "flags": { "platform": "android" } }, - { "command": "wait", "positionals": ["text", "Home", "3000"], "flags": {} }, - { "command": "press", "positionals": ["label=\"More actions\" role=button"], "flags": {} }, - { "command": "wait", "positionals": ["text", "Camera scan", "2000"], "flags": {} }, - { "command": "press", "positionals": ["label=\"Camera scan\""], "flags": {} }, - { "command": "wait", "positionals": ["text", "Expense created", "15000"], "flags": {} }, - { "command": "is", "positionals": ["visible", "label=\"Expense created\""], "flags": {} } -] -``` - -Common batch error categories: - -- `INVALID_ARGS`: fix the payload shape and retry. -- `SESSION_NOT_FOUND`: open or select the correct session, then retry. -- `UNSUPPORTED_OPERATION`: switch to a supported command or surface. -- `AMBIGUOUS_MATCH`: refine the selector or locator, then retry the failed step. -- `COMMAND_FAILED`: add sync guards and retry from the failing step. ## Stop conditions @@ -332,3 +131,6 @@ Common batch error categories: - If a desktop surface or context menu is involved on macOS, load [macos-desktop.md](macos-desktop.md). - If logs, network, alerts, or setup failures become the blocker, switch to [debugging.md](debugging.md). - If the flow is stable and you need proof or replay maintenance, switch to [verification.md](verification.md). +- If the task becomes QA from acceptance criteria, switch to [qa.md](qa.md). +- If the task becomes an accessibility audit, switch to [accessibility.md](accessibility.md). +- If the flow is known and you want `batch`, switch to [batch.md](batch.md). diff --git a/skills/agent-device/references/qa.md b/skills/agent-device/references/qa.md new file mode 100644 index 00000000..015b2dc3 --- /dev/null +++ b/skills/agent-device/references/qa.md @@ -0,0 +1,46 @@ +# QA + +## When to open this file + +Open this file when the task starts from acceptance criteria and you need to turn those criteria into concrete checks. + +## Preferred mapping + +- visibility claim for what is on-screen now: `is visible` or plain `snapshot` +- presence claim regardless of viewport visibility: `is exists` +- exact text, label, or value claim: `get text` +- post-action state change: act, then `wait`, then `is` or `get` +- nearby structural UI change: `diff snapshot` +- proof artifact for the final result: `screenshot` or `record` + +## Notes + +- `wait text` is useful for synchronizing on text presence, but it is not the same as `is visible`. +- After a nearby navigation or submit on Android, prefer `screenshot`, then `wait 500` or `wait 1000`, then one fresh `snapshot -i` if the accessibility tree seems stale. +- Do not invent app names, device ids, session names, refs, selectors, or package names. +- Discover them first with `devices`, `open`, `snapshot -i`, `find`, or `session list`. +- If refs drift after navigation, re-snapshot or switch to selectors instead of guessing. + +## Avoid this escalation path for visible-text questions + +- Do not jump from `snapshot -i` to `get text @ref`, then to web search, then to typing into a search box just to force the app to reveal the answer. +- Start with `snapshot`. If the text is not visible or exposed, report that directly. +- After Android submit or navigation-heavy actions when the UI looks wrong: `screenshot` first, then `snapshot -i`. + +## Canonical QA loop + +```bash +agent-device open MyApp --platform ios +agent-device snapshot -i +agent-device press @e3 +agent-device wait visible 'label="Success"' 3000 +agent-device is visible 'label="Success"' +agent-device screenshot /tmp/qa-proof.png +agent-device close +``` + +## When to leave this file + +- Return to [exploration.md](exploration.md) once the acceptance criteria are translated into concrete checks. +- Switch to [verification.md](verification.md) if the flow is stable and you only need proof artifacts. +- Switch to [debugging.md](debugging.md) if failures, logs, alerts, or setup problems become the blocker. diff --git a/skills/agent-device/references/session-routing.md b/skills/agent-device/references/session-routing.md new file mode 100644 index 00000000..b213b46e --- /dev/null +++ b/skills/agent-device/references/session-routing.md @@ -0,0 +1,73 @@ +# Session Routing + +## When to open this file + +Open this file when one run must stay pinned to one session or device across many commands, when multiple concurrent runs share a host, or when you need scoped device discovery. + +## Session-bound automation + +Use this when an orchestrator must keep plain CLI calls on one session and device. + +```bash +export AGENT_DEVICE_SESSION=qa-ios +export AGENT_DEVICE_PLATFORM=ios +export AGENT_DEVICE_SESSION_LOCK=strip + +agent-device open MyApp --relaunch +``` + +- `AGENT_DEVICE_SESSION` plus `AGENT_DEVICE_PLATFORM` provides the default binding. +- `--session-lock reject|strip` controls whether conflicting per-call routing flags fail or are ignored. +- Conflicts include explicit retargeting flags such as `--platform`, `--target`, `--device`, `--udid`, `--serial`, `--ios-simulator-device-set`, and `--android-device-allowlist`. +- Lock policy applies to nested `batch` steps too. +- Compatibility aliases remain supported: `--session-locked`, `--session-lock-conflicts`, `AGENT_DEVICE_SESSION_LOCKED`, and `AGENT_DEVICE_SESSION_LOCK_CONFLICTS`. + +Android emulator variant: + +```bash +export AGENT_DEVICE_SESSION=qa-android +export AGENT_DEVICE_PLATFORM=android + +agent-device --session-lock reject open com.example.myapp --relaunch +``` + +## Scoped discovery + +Use scoped discovery when one run must not see host-global device lists. + +```bash +agent-device devices --platform ios --ios-simulator-device-set /tmp/tenant-a/simulators +agent-device devices --platform android --android-device-allowlist emulator-5554,device-1234 +``` + +- Scope is applied before `--device`, `--udid`, and `--serial`. +- Out-of-scope selectors fail with `DEVICE_NOT_FOUND`. +- With iOS simulator-set scope enabled, iOS physical devices are not enumerated. +- If the scoped iOS simulator set is empty, the error should point at the set path and suggest creating a simulator in that set. +- Environment equivalents: + - `AGENT_DEVICE_IOS_SIMULATOR_DEVICE_SET` + - `AGENT_DEVICE_ANDROID_DEVICE_ALLOWLIST` + +## Session inspection and replay routing + +```bash +agent-device session list +agent-device replay ./session.ad --session auth +agent-device replay -u ./session.ad --session auth +``` + +- iOS session entries include `device_udid` and `ios_simulator_device_set`. Use them to confirm routing in concurrent runs. +- Prefer selector-based actions and assertions in saved replay scripts. +- Tenant isolation namespaces sessions as `:` during tenant-scoped runs. + +## Security and trust notes + +- Treat signing, provisioning, and daemon auth values as host secrets. Do not paste them into shared logs or commit them to source control. +- Prefer Xcode Automatic Signing over manual overrides when a physical iOS device is involved. +- Keep persistent host-specific defaults in environment variables rather than checked-in project config. + +## When to leave this file + +- Return to [bootstrap-install.md](bootstrap-install.md) once routing is pinned and you are ready to open the correct session. +- Return to [exploration.md](exploration.md) once the correct routed session is already open and stable. +- Switch to [remote-tenancy.md](remote-tenancy.md) if the run becomes a remote daemon or tenant-scoped host-control task. From e89025f55fa4ba0f0f9976b30698d7ce0c3d2d94 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Pierzcha=C5=82a?= Date: Thu, 2 Apr 2026 17:58:45 +0200 Subject: [PATCH 2/3] docs: refine agent-device skill routing --- skills/agent-device/SKILL.md | 2 ++ skills/agent-device/references/batch.md | 7 +++++++ skills/agent-device/references/bootstrap-install.md | 7 +++++++ skills/agent-device/references/coordinate-system.md | 5 +++++ skills/agent-device/references/exploration.md | 5 ----- skills/agent-device/references/verification.md | 6 ++++++ 6 files changed, 27 insertions(+), 5 deletions(-) diff --git a/skills/agent-device/SKILL.md b/skills/agent-device/SKILL.md index 372db839..ceab6855 100644 --- a/skills/agent-device/SKILL.md +++ b/skills/agent-device/SKILL.md @@ -41,6 +41,8 @@ Use bootstrap to pin the correct target, app, and session. Use exploration once 5. Accessibility audit: bootstrap if needed -> exploration -> [references/accessibility.md](references/accessibility.md). 6. Stable scripted flow: bootstrap if needed -> exploration -> [references/batch.md](references/batch.md). +Treat transient React Native warnings as part of the normal interaction path. Switch to debugging only when the warning keeps returning or becomes the thing you need to investigate. + ## QA modes - Open-ended bug hunt with reporting: use [../dogfood/SKILL.md](../dogfood/SKILL.md). diff --git a/skills/agent-device/references/batch.md b/skills/agent-device/references/batch.md index 2c2b0700..9f5a0459 100644 --- a/skills/agent-device/references/batch.md +++ b/skills/agent-device/references/batch.md @@ -57,6 +57,13 @@ agent-device batch --session sim --platform ios --steps-file /tmp/batch-steps.js - `AMBIGUOUS_MATCH`: refine the selector or locator, then retry the failed step. - `COMMAND_FAILED`: add sync guards and retry from the failing step. +## Response handling + +- Success returns fields such as `total`, `executed`, `totalDurationMs`, and `results[]`. +- Human-mode `batch` runs also print a short per-step success summary. +- Failed runs include `details.step`, `details.command`, `details.executed`, and `details.partialResults`. +- Replan from the first failing step instead of rerunning the whole flow blindly. + ## When to leave this file - Return to [exploration.md](exploration.md) if the flow is no longer stable enough for `batch`. diff --git a/skills/agent-device/references/bootstrap-install.md b/skills/agent-device/references/bootstrap-install.md index 39a20cc4..f1aa869d 100644 --- a/skills/agent-device/references/bootstrap-install.md +++ b/skills/agent-device/references/bootstrap-install.md @@ -101,6 +101,13 @@ agent-device --session auth press @e3 agent-device --session auth close ``` +Bad shared-host pattern: + +```bash +agent-device --session auth open Settings --platform ios --device "iPhone 17 Pro" +agent-device --session auth snapshot -i --platform ios --device "iPhone 17 Pro" +``` + ## When to leave this file - Once the correct target and session are pinned, move to [exploration.md](exploration.md). diff --git a/skills/agent-device/references/coordinate-system.md b/skills/agent-device/references/coordinate-system.md index 03b8f2ef..0e316adc 100644 --- a/skills/agent-device/references/coordinate-system.md +++ b/skills/agent-device/references/coordinate-system.md @@ -26,3 +26,8 @@ agent-device click 120 240 - iOS uses device points. - Android uses pixels. - Use screenshots to reason about coordinates before acting. + +## When to leave this file + +- Return to [exploration.md](exploration.md) once you can switch back to selector or `@ref` targeting. +- Return to [macos-desktop.md](macos-desktop.md) if the coordinate problem is specific to macOS surfaces rather than general mobile targeting. diff --git a/skills/agent-device/references/exploration.md b/skills/agent-device/references/exploration.md index 4dc638ae..19b4f519 100644 --- a/skills/agent-device/references/exploration.md +++ b/skills/agent-device/references/exploration.md @@ -23,11 +23,6 @@ Open this file when the app session is already running and you need to inspect t - Need proof image: `screenshot` - Need to dismiss the keyboard: `keyboard dismiss` - Need Android keyboard visibility or input-type state: `keyboard status` or `keyboard get` -- Need logs, alerts, or failure triage: switch to [debugging.md](debugging.md) -- Need proof artifacts, replay maintenance, or performance checks: switch to [verification.md](verification.md) -- Need QA from acceptance criteria: switch to [qa.md](qa.md) -- Need accessibility-gap auditing: switch to [accessibility.md](accessibility.md) -- Need `batch` for a known stable flow: switch to [batch.md](batch.md) ## Read-only first diff --git a/skills/agent-device/references/verification.md b/skills/agent-device/references/verification.md index 8a38ad4d..233488e4 100644 --- a/skills/agent-device/references/verification.md +++ b/skills/agent-device/references/verification.md @@ -109,3 +109,9 @@ agent-device perf --json - Android app sessions also expose `memory` (`dumpsys meminfo`) and `cpu` (`dumpsys cpuinfo`) snapshots when the session has an app package context. - Apple app sessions on macOS and iOS simulators also expose `memory` and `cpu` process snapshots when the session has an app bundle ID. - `fps` is still unavailable, and physical iOS devices still leave `memory` and `cpu` unavailable in this release. + +## When to leave this file + +- Return to [exploration.md](exploration.md) if you still need to reach or stabilize the target UI state. +- Return to [batch.md](batch.md) if you want to run a known stable multi-step flow rather than capture or maintain artifacts after the fact. +- Switch to [debugging.md](debugging.md) if the verification task turns into log, alert, permission, or crash triage. From 6b79dcc2c6dd8e7097ddd47bed92cc86e26a6fc7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Pierzcha=C5=82a?= Date: Thu, 2 Apr 2026 18:45:18 +0200 Subject: [PATCH 3/3] docs: tighten agent-device guidance and add checklist --- AGENTS.md | 1 + docs/agent-device-top-50-checklist.md | 114 ++++++++++++++++++ skills/agent-device/SKILL.md | 4 +- .../references/bootstrap-install.md | 50 +++++++- skills/agent-device/references/debugging.md | 17 +++ skills/agent-device/references/exploration.md | 12 ++ .../references/session-routing.md | 11 ++ 7 files changed, 206 insertions(+), 3 deletions(-) create mode 100644 docs/agent-device-top-50-checklist.md diff --git a/AGENTS.md b/AGENTS.md index d6a92fa7..d8b24b3e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -167,6 +167,7 @@ Command-only flags (like `find --first`) that don't flow to the platform layer o - For behavior/CLI surface changes, evaluate docs/skills updates. - Update `README.md` and relevant `website/docs/**` pages for command behavior/flags/aliases/workflows. - Update relevant `skills/**/SKILL.md` when usage examples/workflow recommendations change. +- When iterating on `skills/agent-device/**`, review [docs/agent-device-top-50-checklist.md](docs/agent-device-top-50-checklist.md) and check for regressions in routing, first-step guidance, and fallback coverage across those task families. - Keep skill docs task-first: - top-level `SKILL.md` should stay a thin router, not a full manual. - keep detailed workflows/troubleshooting in a `references/` folder instead of growing the router. diff --git a/docs/agent-device-top-50-checklist.md b/docs/agent-device-top-50-checklist.md new file mode 100644 index 00000000..f9f245d6 --- /dev/null +++ b/docs/agent-device-top-50-checklist.md @@ -0,0 +1,114 @@ +# agent-device Top 50 Checklist + +Use this file when reviewing or refactoring `agent-device` skills. + +This is a verification checklist, not a scoring rubric. The goal is to confirm that the skill package still gives both small and large models a clear, productive path through the most common mobile-app build, test, and exploration tasks. + +## How to use this checklist + +When changing `skills/agent-device/**`: + +1. Review the touched guidance against the task families below. +2. Check for regressions in routing, missing first-step guidance, and lost fallback paths. +3. Focus especially on whether a smaller model can choose the right next reference and first command without guessing. +4. In your review or summary, call out any tasks that became clearer, weaker, or ambiguous. + +## 1. Setup and environment + +| # | Task | Goal | +| --- | --- | --- | +| 1 | Choose the right device target | Pick the correct simulator, emulator, phone, tablet, TV target, or desktop surface | +| 2 | Boot the correct simulator or emulator | Get the target running before app interaction | +| 3 | Launch the app on the correct target | Start the intended app on the intended device | +| 4 | Install a fresh build artifact | Put a new `.apk`, `.aab`, `.app`, or `.ipa` on the target | +| 5 | Relaunch into a clean runtime state | Reset stale app state without changing more than needed | +| 6 | Pick the correct app identifier or package | Discover and use the actual installed app name or identifier | +| 7 | Open the app in a named session | Pin work to a reusable session | +| 8 | Preserve a replay script when closing | Save a useful `.ad` artifact from a finished session | + +## 2. Routing and session control + +| # | Task | Goal | +| --- | --- | --- | +| 9 | Keep many commands pinned to one session | Avoid retargeting across a longer flow | +| 10 | Avoid retargeting by accident on a shared host | Keep one run scoped to one chosen device | +| 11 | Scope discovery to a subset of devices | Limit device selection in mixed environments | +| 12 | Run multiple concurrent sessions safely | Avoid session collisions across runs | +| 13 | Decide when session-routing is needed | Know when to stay in bootstrap vs load advanced routing | +| 14 | Route commands in tenant or remote setups | Handle remote daemon or tenant-scoped host control safely | + +## 3. Navigation and interaction + +| # | Task | Goal | +| --- | --- | --- | +| 15 | Inspect the current screen | Read visible structure and state | +| 16 | Find a specific element to act on | Resolve a control, row, field, or button reliably | +| 17 | Tap, scroll, and navigate normally | Move through the app using the default loop | +| 18 | Use selectors vs refs correctly | Choose stable targeting for the current task | +| 19 | Recover when a ref goes stale or off-screen | Re-target without guesswork | +| 20 | Use raw coordinates only as a fallback | Fall back safely when structured targeting fails | +| 21 | Open a deep link or route directly | Land on a known route without manual navigation | +| 22 | Navigate directly to a known screen without waste | Skip unnecessary manual steps when direct routing is available | + +## 4. Text entry and keyboard + +| # | Task | Goal | +| --- | --- | --- | +| 23 | Fill a form field correctly | Replace field text reliably | +| 24 | Append text to an already-focused field | Type at the insertion point without retargeting | +| 25 | Handle blocked UI due to keyboard | Continue the flow when the keyboard covers controls | +| 26 | Inspect Android keyboard state or input type | Read keyboard visibility or field type without mutating UI | +| 27 | Avoid typing just to reveal hidden info | Keep inspection read-only unless text entry is actually required | + +## 5. Read-only QA and assertions + +| # | Task | Goal | +| --- | --- | --- | +| 28 | Verify visible UI content | Confirm text, labels, and structure on screen | +| 29 | Turn acceptance criteria into checks | Convert expected behavior into concrete pass/fail verification | +| 30 | Verify nearby structural UI changes | Confirm what changed after one local mutation | +| 31 | Capture proof artifacts after verification | Save evidence once the expected state is reached | +| 32 | Compare before/after state around a mutation | Validate a nearby change without re-exploring everything | + +## 6. Exploratory testing + +| # | Task | Goal | +| --- | --- | --- | +| 33 | Run an open-ended bug hunt | Explore broadly for issues rather than proving one requirement | +| 34 | Systematically move through major app areas | Cover the main surfaces without getting stuck | +| 35 | Capture repro evidence for bugs | Save enough proof to make the issue actionable | +| 36 | Produce a structured issue report | Turn findings into a reproducible handoff | + +## 7. Debugging and failure triage + +| # | Task | Goal | +| --- | --- | --- | +| 37 | Inspect app logs for a broken flow | Narrow down what happened during a repro | +| 38 | Inspect network activity tied to the session | Check request and response behavior in context | +| 39 | Handle permission prompts and alerts | Resolve blocking system or app prompts correctly | +| 40 | Distinguish AX tree lag from real breakage | Avoid misdiagnosing stale snapshots as product failures | +| 41 | Triage a crash or fatal termination | Branch quickly to the right crash evidence source | +| 42 | Narrow a flaky repro to a short debug window | Reduce noise and capture only the relevant failure interval | + +## 8. React Native and dev-build specific + +| # | Task | Goal | +| --- | --- | --- | +| 43 | Dismiss transient React Native warnings and continue | Keep the flow moving when the warning is not the task | +| 44 | Escalate recurring React Native warnings to debugging | Treat repeated overlays as part of app state, not disposable chrome | +| 45 | Preserve evidence when React Native overlays appear | Capture enough proof to name the warning or error later | + +## 9. Accessibility + +| # | Task | Goal | +| --- | --- | --- | +| 46 | Compare visual UI with the accessibility tree | Detect mismatches between rendered UI and AX exposure | +| 47 | Detect elements visible to users but missing from AX | Find missing accessibility exposure gaps | +| 48 | Retry with `snapshot --raw` when AX exposure is unclear | Separate collector filtering from truly missing content | + +## 10. Automation and maintenance + +| # | Task | Goal | +| --- | --- | --- | +| 49 | Run a known stable multi-step flow with `batch` | Reduce round trips once the flow is already known | +| 50 | Maintain replay scripts when selectors drift | Keep `.ad` flows usable over time | diff --git a/skills/agent-device/SKILL.md b/skills/agent-device/SKILL.md index ceab6855..d500d188 100644 --- a/skills/agent-device/SKILL.md +++ b/skills/agent-device/SKILL.md @@ -34,9 +34,9 @@ Use bootstrap to pin the correct target, app, and session. Use exploration once ## Golden paths -1. Normal interaction: bootstrap -> exploration -> verification only if you need proof. +1. Normal interaction: bootstrap if needed -> exploration -> verification only if you need proof. 2. RN warning during interaction: exploration -> dismiss warning -> continue without re-snapshotting -> debugging only if it keeps returning or becomes the task. -3. QA from acceptance criteria: bootstrap -> exploration -> [references/qa.md](references/qa.md). +3. QA from acceptance criteria: bootstrap if needed -> exploration -> [references/qa.md](references/qa.md). 4. Bug hunt with reporting: switch to [../dogfood/SKILL.md](../dogfood/SKILL.md). 5. Accessibility audit: bootstrap if needed -> exploration -> [references/accessibility.md](references/accessibility.md). 6. Stable scripted flow: bootstrap if needed -> exploration -> [references/batch.md](references/batch.md). diff --git a/skills/agent-device/references/bootstrap-install.md b/skills/agent-device/references/bootstrap-install.md index f1aa869d..184c5119 100644 --- a/skills/agent-device/references/bootstrap-install.md +++ b/skills/agent-device/references/bootstrap-install.md @@ -10,16 +10,30 @@ Use this order when you are not sure about the target or installed app identifie 1. `devices` 2. `apps` -3. `ensure-simulator` +3. If you chose an iOS simulator target and it may not exist or be booted yet: `ensure-simulator` 4. `open` 5. `session list` On Android dev builds in particular, `apps` is cheaper than guessing package suffixes and retrying failed `open` calls. +Do not run `ensure-simulator` for Android, physical-device, TV, or macOS starts. ## Most common mistake to avoid Do not start acting before you have pinned the correct target and opened an `app` session. In mixed-device environments, always pass `--device`, `--udid`, or `--serial` while choosing the target. +## Target selection quick guide + +| Situation | Default choice | +| --- | --- | +| Local iOS QA or feature work | iOS simulator | +| Physical-device-only behavior | iOS or Android physical device | +| Android dev build and package uncertainty | `apps`, then `open` by discovered package | +| TV app flow | `--target tv` with the correct platform | +| Desktop app flow | `open --platform macos` | +| Shared host or multiple concurrent runs | open [session-routing.md](session-routing.md) before proceeding | + +If the target is still uncertain after `devices` and `apps`, do not continue into exploration yet. + ## Open-first rule - If the user asks to test an app and does not provide an install artifact or explicit install instruction, try `open ` first. @@ -39,6 +53,22 @@ Do not start acting before you have pinned the correct target and opened an `app - After install or reinstall, later use `open ` with the exact discovered or known package or bundle identifier, not the artifact path. - Do not use `open --relaunch` on Android. +## Clean relaunch path + +Use a clean relaunch when stale runtime state is more likely than a real product failure. + +- Prefer `open --relaunch` for normal dev loops. +- Prefer `reinstall ` only when you need uninstall-plus-install, not as a first reaction to every failure. +- If the app was already open in the wrong state, close or relaunch it before exploring deeper. +- After relaunch, confirm the session with `session list` or a fresh `snapshot` before continuing. + +Example: + +```bash +agent-device --session auth open MyApp --platform ios --relaunch +agent-device --session auth snapshot +``` + ## Common starting points These are examples, not required exact sequences. Use the smallest setup flow that matches the task. @@ -68,6 +98,24 @@ agent-device install com.example.app ./build/MyApp.app --platform ios --device " - Android binary flow: use `install` or `reinstall` for `.apk` or `.aab`, then open by installed package name. - macOS desktop app flow: use `open --platform macos`. Only load [macos-desktop.md](macos-desktop.md) if a desktop surface or macOS-specific behavior matters. +## Deep links and direct routing + +Use deep links when the task is to land directly on a known route instead of navigating there manually. + +- On iOS, use `open ` when the app should be chosen by the platform from the deep link alone. +- On iOS, use `open ` when you need to force one app session and deliver the deep link into it. +- This reference only guarantees the iOS deep-link forms above. If the task needs platform-specific Android routing behavior beyond normal `open`, do not assume the same syntax without checking product support first. +- If the app is already open in the correct session, prefer the app-scoped form so the route lands in the intended session. +- After any deep-link launch, verify the landing screen with `snapshot`, `snapshot -i`, `get`, or `is` before assuming the route succeeded. +- If the deep link is the task target, do not spend tokens manually navigating to the same screen first. + +Example: + +```bash +agent-device open MyApp myapp://settings/privacy --platform ios --relaunch +agent-device snapshot +``` + ## Session basics - Use `--session ` when you need a named session. diff --git a/skills/agent-device/references/debugging.md b/skills/agent-device/references/debugging.md index d28879ee..9ed80911 100644 --- a/skills/agent-device/references/debugging.md +++ b/skills/agent-device/references/debugging.md @@ -130,6 +130,23 @@ grep -n -E "SIGABRT|SIGSEGV|EXC_|fatal|exception|terminated|killed|jetsam|memory - Android: if the app log is not enough, use `adb logcat` for `FATAL EXCEPTION`, `Abort message`, or `signal` lines around process death. - If no crash signature appears in app logs, stop collecting broad logs and switch to the platform-native crash source. +## Crash triage branch guide + +Use this when the app terminated, vanished, or became unresponsive and you need the next debugging step quickly. + +1. Confirm whether the app actually died, or whether the UI just stopped updating. +2. Check `logs path` and grep for a crash signature first. +3. If you see `SIGABRT`, `SIGSEGV`, `EXC_*`, `FATAL EXCEPTION`, `Abort message`, or similar, treat it as a real crash and switch to the platform-native crash source immediately. +4. If the app is still alive but the UI tree is empty or stale, treat it as a UI-state or AX-sync problem first, not a crash. +5. If the app disappeared after a permission prompt, route through the alert and permissions path before escalating. +6. If the process restarted after relaunch, narrow the log window to the latest repro instead of reading broad historical logs. + +Minimal branch examples: + +- App vanished after tap on iOS: `logs path` -> grep crash terms -> inspect `~/Library/Logs/DiagnosticReports` +- App vanished after tap on Android: `logs path` -> grep crash terms -> use `adb logcat` around process death +- App still foregrounded but `snapshot` is empty: return to the AX/tree guidance above instead of calling it a crash + ## When to leave this file - Return to [exploration.md](exploration.md) once the app is stable again. diff --git a/skills/agent-device/references/exploration.md b/skills/agent-device/references/exploration.md index 19b4f519..858cd813 100644 --- a/skills/agent-device/references/exploration.md +++ b/skills/agent-device/references/exploration.md @@ -23,6 +23,9 @@ Open this file when the app session is already running and you need to inspect t - Need proof image: `screenshot` - Need to dismiss the keyboard: `keyboard dismiss` - Need Android keyboard visibility or input-type state: `keyboard status` or `keyboard get` +- Need slower search-as-you-type input: `type --delay-ms ` +- Need bounded manual scrolling: `scroll --pixels ` +- Need explicit back behavior: `back --in-app` or `back --system` ## Read-only first @@ -54,6 +57,7 @@ Open this file when the app session is already running and you need to inspect t - If `scrollintoview @ref` succeeds, prefer the returned `currentRef` for the next action. - Visible-first off-screen summaries are intentionally compact. If you need the full off-screen tree instead of a short summary, retry with `snapshot --raw`. - Cap long searches with `--max-scrolls ` when the list may be unbounded or the target may not exist. +- Use `scroll --pixels ` when you need one bounded manual scroll distance instead of search-driven `scrollintoview`. - For tap interactions, `press` is canonical and `click` is an equivalent alias. ## Text entry rules @@ -63,10 +67,18 @@ Open this file when the app session is already running and you need to inspect t - Use `fill @ref "text"` when you need to target a field directly by ref. - Use `press @ref`, then `type "text"` when the field is already focused and you need append semantics. - Do not write `type @ref "text"`; `type` only accepts text and will not target that ref for you. +- If search-as-you-type or debounced inputs drop characters, retry with `type --delay-ms ` after focusing the field. - If the keyboard blocks the next control after text entry, prefer `keyboard dismiss` instead of backing out of the screen. - On iOS, `keyboard dismiss` depends on the active app session, so do not rely on it after closing or without `open`. +- On Android, `keyboard dismiss` can fail when the current IME only supports dismissal through back navigation. If that happens, tap a safe empty area instead of using `back` unless navigation is intended. - Do not use `fill` or `type` just to make the app reveal information that is not currently visible unless the user asked for that interaction. +## Back navigation + +- Prefer `back --in-app` when you mean the app's own back control or navigation stack. +- Use `back --system` only when you intentionally want platform back behavior. +- Do not rely on bare `back` when the distinction matters for the task. + ## Interaction fallbacks When `press @ref` fails: diff --git a/skills/agent-device/references/session-routing.md b/skills/agent-device/references/session-routing.md index b213b46e..66bbf17d 100644 --- a/skills/agent-device/references/session-routing.md +++ b/skills/agent-device/references/session-routing.md @@ -4,6 +4,17 @@ Open this file when one run must stay pinned to one session or device across many commands, when multiple concurrent runs share a host, or when you need scoped device discovery. +## Quick decision guide + +| Situation | What to do | +| --- | --- | +| Single local run on one obvious device | Stay in [bootstrap-install.md](bootstrap-install.md) | +| Many commands must stay on one chosen session | Use session lock here | +| Shared host with multiple concurrent runs | Use scoped discovery here | +| Remote daemon or tenant-scoped host control | Switch to [remote-tenancy.md](remote-tenancy.md) | + +If you do not need one of those cases, leave this file and keep the default bootstrap path. + ## Session-bound automation Use this when an orchestrator must keep plain CLI calls on one session and device.