fix: improve browser tool instructions and element resolution by jamiepine · Pull Request #351 · spacedriveapp/spacebot

jamiepine · 2026-03-07T06:52:12Z

Summary

Workers keep making the same browser tool mistakes in a loop — forgetting act_kind, using open instead of navigate, failing to find elements. Three fixes:

Rewrote browser section in worker prompt with a parameter table, concrete JSON examples, explicit act_kind emphasis, and a "common mistakes" section
Multi-strategy element selector — native HTML elements (<button>, <a>, <input>) have implicit ARIA roles without [role] attributes in the DOM. Now tries: [role][aria-label] → tag[aria-label] → tag[title] → tag → [role] fallback
Improved all error messages to include valid parameter values and JSON examples so the LLM can self-correct instead of retrying the same broken call

Note

This PR improves browser tool usability by providing clearer documentation, better error guidance, and more robust element selection. Changes span prompt files (worker.md.j2, browser_description.md.j2) and implementation (browser.rs), with updated error messages that include JSON examples and parameter validation guidance for faster LLM self-correction.

_{Written by Tembo for commit 5aab401. This will update automatically on new commits.}

- Rewrite browser section in worker prompt with parameter table, concrete JSON examples, and explicit act_kind emphasis - Add multi-strategy element selector (native tags, aria-label, title attr) to handle implicit ARIA roles on native HTML elements - Improve all error messages to include valid values and examples - Update tool description to front-load act_kind requirement

coderabbitai · 2026-03-07T06:52:25Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3334afe9-b644-4461-8de8-d400c9b86898

📥 Commits

Reviewing files that changed from the base of the PR and between 5aab401 and 4e5d924.

📒 Files selected for processing (1)

src/tools/browser.rs

Walkthrough

This PR clarifies the browser tool workflow (launch → navigate → snapshot → act → close), expands action definitions and act requirements, and refactors element resolution in Rust to a JS-driven, multi-pattern selector strategy with new helper functions and improved error handling.

Changes

Cohort / File(s)	Summary
Browser workflow docs `prompts/en/tools/browser_description.md.j2`, `prompts/en/worker.md.j2`	Rewrote workflow to explicit sequence (launch → navigate → snapshot → act → close); added detailed Actions table, mandatory `act_kind` values and parameters, element ref semantics (refs reset on navigation), examples, and common-mistakes guidance.
Element resolution & actions (Rust) `src/tools/browser.rs`	Replaced direct backend element resolution with JS-driven selector strategy: added `build_js_selector`, `run_js_action`, `build_selectors_for_ref`, `role_to_native_tag`; updated Act handlers (click, type, hover, scroll_into_view, focus, press_key) to execute JS selectors/actions, improved error messages, and changed element-scoped screenshot handling to use JS bounding-box computation. Removed legacy `resolve_element_ref`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: persistent browser sessions with configurable close policy #309 — Modifies element-resolution and selector logic in src/tools/browser.rs, replacing single-selector behavior with multi-pattern helper strategies similar to this PR's refactoring approach.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: improve browser tool instructions and element resolution' directly summarizes the main changes: documentation improvements and element resolution enhancements for the browser tool.
Description check	✅ Passed	The description clearly explains the pull request objectives: reducing repeated worker mistakes through better documentation, improved element selection strategy, and enhanced error messages.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/browser-worker-instructions

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

src/tools/browser.rs (2)
1419-1435: Consider extending textbox mapping to also try textarea.

The current mapping "textbox" => Some("input") misses <textarea> elements, which also have the implicit textbox role. This could reduce match rates for multi-line text inputs. Since build_selectors_for_ref iterates through selectors, the existing fallbacks should still work, but you could improve coverage by handling both.
💡 Potential enhancement (optional)

You could return a comma-separated selector or handle this in build_selectors_for_ref by pushing both input and textarea variants when role is textbox:
-        "textbox" => Some("input"),
+        "textbox" => Some("input, textarea"),
Or handle it at the build_selectors_for_ref level for finer control. This is a nice-to-have improvement that can be deferred.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/browser.rs` around lines 1419 - 1435, The role_to_native_tag
function currently maps "textbox" only to "input", missing textarea elements;
update role_to_native_tag to include textarea as well (e.g., return a selector
that matches both input and textarea for the "textbox" case) or alternatively
modify build_selectors_for_ref to push both the input and textarea selectors
when it sees role "textbox" so multi-line textareas are covered by the selector
generation; locate function role_to_native_tag and adjust the "textbox" branch
(or update build_selectors_for_ref's handling of textbox) accordingly.
1442-1448: Consider more robust CSS escaping for element names.

The current escaping at line 1444 only handles single quotes. Element names containing backslashes, newlines, or null characters could break CSS selector parsing or cause unexpected matches. While accessible names from the AX tree are typically sanitized, a more defensive approach would be safer.
💡 Optional: More robust escaping
-    let escaped_name = elem_ref.name.as_ref().map(|n| n.replace('\'', "\\'"));
+    let escaped_name = elem_ref.name.as_ref().map(|n| {
+        n.replace('\\', "\\\\")
+            .replace('\'', "\\'")
+            .replace('\n', "\\n")
+    });
Alternatively, consider using CSS.escape() via the cssparser crate if names become more complex. This is low priority since AX tree names are typically well-formed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/browser.rs` around lines 1442 - 1448, The code in
build_selectors_for_ref only replaces single quotes when building escaped_name;
update it to perform robust CSS attribute-value escaping for elem_ref.name (or
call a library escape function) so backslashes, control characters (newlines,
nulls) and quotes are properly escaped before formatting selectors; implement a
small helper (e.g., escape_css_attr) or use a crate-provided escape (cssparser
or similar) and replace the current escaped_name computation and subsequent uses
in build_selectors_for_ref to use the new safe-escaped value.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/tools/browser.rs`:
- Around line 1419-1435: The role_to_native_tag function currently maps
"textbox" only to "input", missing textarea elements; update role_to_native_tag
to include textarea as well (e.g., return a selector that matches both input and
textarea for the "textbox" case) or alternatively modify build_selectors_for_ref
to push both the input and textarea selectors when it sees role "textbox" so
multi-line textareas are covered by the selector generation; locate function
role_to_native_tag and adjust the "textbox" branch (or update
build_selectors_for_ref's handling of textbox) accordingly.
- Around line 1442-1448: The code in build_selectors_for_ref only replaces
single quotes when building escaped_name; update it to perform robust CSS
attribute-value escaping for elem_ref.name (or call a library escape function)
so backslashes, control characters (newlines, nulls) and quotes are properly
escaped before formatting selectors; implement a small helper (e.g.,
escape_css_attr) or use a crate-provided escape (cssparser or similar) and
replace the current escaped_name computation and subsequent uses in
build_selectors_for_ref to use the new safe-escaped value.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0d7828a5-4ee2-4f5b-b9f6-6395fbf9ea99

📥 Commits

Reviewing files that changed from the base of the PR and between 7e78eee and 5aab401.

📒 Files selected for processing (3)

prompts/en/tools/browser_description.md.j2
prompts/en/worker.md.j2
src/tools/browser.rs

tembo · 2026-03-07T06:56:55Z

prompts/en/worker.md.j2

+| `act` | `act_kind`, `element_ref` | Interact with an element. **`act_kind` is mandatory.** |
+| `screenshot` | `full_page` (optional) | Capture the viewport (or full page). |


Minor nit: the quick-reference table makes element_ref look required for all act kinds, but press_key can omit it. Also screenshot supports element_ref (element screenshot) per the tool args, but the table doesn’t mention it.

Suggested change

| `act` | `act_kind`, `element_ref` | Interact with an element. **`act_kind` is mandatory.** |

| `screenshot` | `full_page` (optional) | Capture the viewport (or full page). |

| `act` | `act_kind`, `element_ref` (except `press_key`) | Interact with an element. **`act_kind` is mandatory.** |

| `screenshot` | `element_ref` (optional), `full_page` (optional) | Capture the viewport, full page, or a specific element. |

tembo · 2026-03-07T06:57:08Z

prompts/en/tools/browser_description.md.j2

@@ -1 +1 @@
-Browser automation tool. Launch a headless Chrome browser, navigate pages, interact with elements, take screenshots, and extract page content. Workflow: launch → navigate → snapshot (get element refs) → act (click/type by ref) → screenshot. Element refs like "e1", "e2" are assigned during snapshot and used in act calls.
+Browser automation tool. Workflow: launch → navigate → snapshot → act → close. The `act` action REQUIRES `act_kind` (click, type, press_key, hover, scroll_into_view, focus) and `element_ref` from the last snapshot. Example: {"action": "act", "act_kind": "click", "element_ref": "e3"}. Always snapshot before acting — refs reset on navigation. Use `navigate` to go to URLs (not `open`, which creates new tabs).


Small consistency thing with the worker prompt: press_key doesn’t require an element_ref, so saying act requires element_ref can push workers into always inventing one.

Suggested change

Browser automation tool. Workflow: launch → navigate → snapshot → act → close. The `act` action REQUIRES `act_kind` (click, type, press_key, hover, scroll_into_view, focus) and `element_ref` from the last snapshot. Example: {"action": "act", "act_kind": "click", "element_ref": "e3"}. Always snapshot before acting — refs reset on navigation. Use `navigate` to go to URLs (not `open`, which creates new tabs).

Browser automation tool. Workflow: launch → navigate → snapshot → act → close. The `act` action REQUIRES `act_kind` (click, type, press_key, hover, scroll_into_view, focus). For most act kinds, pass `element_ref` from the last snapshot; for `press_key`, pass `key` and optionally `element_ref`. Example: {"action": "act", "act_kind": "click", "element_ref": "e3"}. Always snapshot before acting — refs reset on navigation. Use `navigate` to go to URLs (not `open`, which creates new tabs).

tembo · 2026-03-07T06:57:26Z

src/tools/browser.rs

+        for selector in &selectors {
+            if let Ok(element) = page.find_element(selector).await {
+                return Ok(element);
+            }
+        }
+
+        Err(BrowserError::new(format!(
+            "failed to find element for ref '{ref_id}' \
+             (role: {}, name: {:?}) — the page may have changed since \
+             the last snapshot. Run snapshot again to get fresh refs, \
+             then retry with the updated ref. \
+             Selectors tried: {}",
+            elem_ref.role,
+            elem_ref.name,
+            selectors.join(", "),
+        )))


Right now we swallow all find_element errors and always return the same “page may have changed” message. If the selector is invalid (or CDP errors), snapshot+retry won’t help and we’ll just loop.

Could be worth capturing the last error and including it in the final message so the worker can course-correct faster.

Suggested change

for selector in &selectors {

if let Ok(element) = page.find_element(selector).await {

return Ok(element);

}

}

Err(BrowserError::new(format!(

"failed to find element for ref '{ref_id}' \

(role: {}, name: {:?}) — the page may have changed since \

the last snapshot. Run snapshot again to get fresh refs, \

then retry with the updated ref. \

Selectors tried: {}",

elem_ref.role,

elem_ref.name,

selectors.join(", "),

)))

let mut last_error: Option<String> = None;

for selector in &selectors {

match page.find_element(selector).await {

Ok(element) => return Ok(element),

Err(error) => last_error = Some(error.to_string()),

}

}

let last_error = last_error.unwrap_or_else(|| "unknown error".to_string());

Err(BrowserError::new(format!(

"failed to find element for ref '{ref_id}' \

(role: {}, name: {:?}) — the page may have changed since \

the last snapshot. Run snapshot again to get fresh refs, \

then retry with the updated ref. \

Selectors tried: {}. Last error: {last_error}",

elem_ref.role,

elem_ref.name,

selectors.join(", "),

)))

tembo · 2026-03-07T06:57:36Z

src/tools/browser.rs

+fn build_selectors_for_ref(elem_ref: &ElementRef) -> Vec<String> {
+    let mut selectors = Vec::with_capacity(4);
+    let escaped_name = elem_ref.name.as_ref().map(|n| n.replace('\'', "\\'"));
+
+    // Strategy 1: [role='X'][aria-label='Y'] — explicit ARIA attributes
+    if let Some(name) = &escaped_name {
+        selectors.push(format!("[role='{}'][aria-label='{name}']", elem_ref.role));
+    }

-    if let Some(name) = &elem_ref.name {
-        // Escape single quotes in the name for CSS selector safety
-        let escaped = name.replace('\'', "\\'");
-        format!("{role_selector}[aria-label='{escaped}']")
-    } else {
-        role_selector
+    // Strategy 2: native tag + aria-label (e.g., button[aria-label='Submit'])
+    if let Some(tag) = role_to_native_tag(&elem_ref.role) {
+        if let Some(name) = &escaped_name {
+            selectors.push(format!("{tag}[aria-label='{name}']"));
+        }
    }
+
+    // Strategy 3: native tag with text content match via XPath-style selector
+    // using the name as button/link text
+    if let Some(tag) = role_to_native_tag(&elem_ref.role) {


Nit: the “XPath-style selector / text content match” comment doesn’t match what the code actually does (title + tag-only fallbacks). Also with_capacity(4) looks a bit low given the max selectors here is 5.

Suggested change

fn build_selectors_for_ref(elem_ref: &ElementRef) -> Vec<String> {

let mut selectors = Vec::with_capacity(4);

let escaped_name = elem_ref.name.as_ref().map(|n| n.replace('\'', "\\'"));

// Strategy 1: [role='X'][aria-label='Y'] — explicit ARIA attributes

if let Some(name) = &escaped_name {

selectors.push(format!("[role='{}'][aria-label='{name}']", elem_ref.role));

}

if let Some(name) = &elem_ref.name {

// Escape single quotes in the name for CSS selector safety

let escaped = name.replace('\'', "\\'");

format!("{role_selector}[aria-label='{escaped}']")

} else {

role_selector

// Strategy 2: native tag + aria-label (e.g., button[aria-label='Submit'])

if let Some(tag) = role_to_native_tag(&elem_ref.role) {

if let Some(name) = &escaped_name {

selectors.push(format!("{tag}[aria-label='{name}']"));

}

}

// Strategy 3: native tag with text content match via XPath-style selector

// using the name as button/link text

if let Some(tag) = role_to_native_tag(&elem_ref.role) {

fn build_selectors_for_ref(elem_ref: &ElementRef) -> Vec<String> {

let mut selectors = Vec::with_capacity(5);

let escaped_name = elem_ref.name.as_ref().map(|n| n.replace('\'', "\\'"));

// Strategy 1: [role='X'][aria-label='Y'] — explicit ARIA attributes

if let Some(name) = &escaped_name {

selectors.push(format!("[role='{}'][aria-label='{name}']", elem_ref.role));

}

// Strategy 2: native tag + aria-label (e.g., button[aria-label='Submit'])

if let Some(tag) = role_to_native_tag(&elem_ref.role) {

if let Some(name) = &escaped_name {

selectors.push(format!("{tag}[aria-label='{name}']"));

}

}

// Strategy 3: native tag + title fallback, then tag-only fallback

if let Some(tag) = role_to_native_tag(&elem_ref.role) {

Element clicks/type/hover/focus now use Runtime.evaluate with DOM methods instead of CDP node IDs that go stale between snapshot and interaction. - Rewrite handle_act to use JS injection for all ActKind variants - Add build_js_selector: tries CSS selectors then text-content fallback - Add run_js_action: executes JS via page.evaluate, parses JSON result - Update handle_screenshot to use JS + clip viewport instead of Element - Remove old resolve_element_ref (replaced by JS injection) - Merge role_to_native_tag entries from both branches

coderabbitai bot reviewed Mar 7, 2026

View reviewed changes

tembo bot reviewed Mar 7, 2026

View reviewed changes

jamiepine closed this Mar 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve browser tool instructions and element resolution#351

fix: improve browser tool instructions and element resolution#351
jamiepine wants to merge 2 commits intomainfrom
fix/browser-worker-instructions

jamiepine commented Mar 7, 2026 •

edited by tembo bot

Loading

Uh oh!

coderabbitai bot commented Mar 7, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

tembo bot Mar 7, 2026

Uh oh!

tembo bot Mar 7, 2026

Uh oh!

tembo bot Mar 7, 2026

Uh oh!

tembo bot Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		\| `act` \| `act_kind`, `element_ref` \| Interact with an element. `act_kind` is mandatory. \|
		\| `screenshot` \| `full_page` (optional) \| Capture the viewport (or full page). \|

		@@ -1 +1 @@
		Browser automation tool. Launch a headless Chrome browser, navigate pages, interact with elements, take screenshots, and extract page content. Workflow: launch → navigate → snapshot (get element refs) → act (click/type by ref) → screenshot. Element refs like "e1", "e2" are assigned during snapshot and used in act calls. No newline at end of file
		Browser automation tool. Workflow: launch → navigate → snapshot → act → close. The `act` action REQUIRES `act_kind` (click, type, press_key, hover, scroll_into_view, focus) and `element_ref` from the last snapshot. Example: {"action": "act", "act_kind": "click", "element_ref": "e3"}. Always snapshot before acting — refs reset on navigation. Use `navigate` to go to URLs (not `open`, which creates new tabs). No newline at end of file

Conversation

jamiepine commented Mar 7, 2026 • edited by tembo bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

coderabbitai bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tembo bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

tembo bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

tembo bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

tembo bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jamiepine commented Mar 7, 2026 •

edited by tembo bot

Loading

coderabbitai bot commented Mar 7, 2026 •

edited

Loading