Skip to content

fix: browser context management — dedup snapshots, consecutive loop guard, overflow resilience#359

Merged
jamiepine merged 4 commits intomainfrom
fix/browser-context-management
Mar 8, 2026
Merged

fix: browser context management — dedup snapshots, consecutive loop guard, overflow resilience#359
jamiepine merged 4 commits intomainfrom
fix/browser-context-management

Conversation

@jamiepine
Copy link
Member

@jamiepine jamiepine commented Mar 8, 2026

Summary

  • Loop guard now tracks consecutive identical calls, not lifetime totals. browser_snapshot (always empty args) was getting permanently blocked after 7 total uses across a session. Now the counter resets when a different tool call runs in between. Observation tools (browser_snapshot, browser_tab_list) get the poll multiplier on thresholds and are excluded from ping-pong detection.

  • Dedup stale tool results before every LLM call. All but the most recent browser_snapshot and browser_tab_list results are replaced with a one-liner ([browser_snapshot output superseded...]). The tool call/result structure stays intact but the multi-KB ARIA tree content is gone. Full transcript preserved in worker run records.

  • Pre-prompt compaction and smaller segments. Context usage checked before every LLM call (not just at segment boundaries). TURNS_PER_SEGMENT reduced from 25 → 15 so compaction checks run more frequently. Overflow retry reduced from 3 → 2 since dedup + pre-prompt compaction should prevent hitting it.

  • Fixed get_element_center error message to tell the model to run browser_snapshot instead of the useless "try scrolling or taking a screenshot."

Test plan

All 460 lib tests pass including 3 new loop guard tests (non_consecutive_identical_calls_allowed, consecutive_identical_calls_still_blocked, observation_tool_consecutive_gets_relaxed_threshold).

Note

Technical changes: Three files modified (253 additions, 7 deletions). Core improvements: new dedup_tool_results() function strips stale browser snapshots from history before LLM calls; loop guard refactored with last_call_hash tracking to reset consecutive counters when non-identical tools run; observation tools categorized separately to get relaxed thresholds and bypass ping-pong detection. Context management now happens pre-prompt (not just post-max-turns), reducing overflow scenarios. Fully tested with new loop guard unit tests validating the consecutive-vs-interleaved behavior and observation tool multiplier.

Written by Tembo for commit 18744a5. This will update automatically on new commits.

…uard, overflow resilience

Loop guard was counting total lifetime calls per (tool, args) hash, so
browser_snapshot (empty args) got permanently blocked after 7 uses even
when other tools ran between each call. Now tracks consecutive identical
calls — the counter resets whenever a different tool runs. Observation
tools (browser_snapshot, browser_tab_list) also get the poll multiplier
and are excluded from ping-pong detection since snapshot→click is normal
browser workflow.

Worker context overflow was killing the worker after 3 retries of
force-compact. Now: dedup stale tool results before every LLM call
(replaces all but the most recent browser_snapshot/browser_tab_list
result with a one-liner), pre-prompt compaction check every segment
boundary, and reduced segment size (25→15 turns) for more frequent
checks. Overflow retry kept at 2 as a safety net since dedup+compaction
should prevent hitting it.

Also fixed the get_element_center error message to tell the model to
run browser_snapshot instead of suggesting screenshots/scrolling.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 8, 2026

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

Introduces in-run tool-result deduplication and transient retry/backoff logic in the agent worker, tightens segment and overflow thresholds, expands retriable LLM error detection, and extends LoopGuard to track per-(tool,args) counts and observation-tool behavior; also tweaks a browser error message.

Changes

Cohort / File(s) Summary
Agent worker: deduplication & retries
src/agent/worker.rs
Adds DEDUP_TOOL_RESULTS and dedup_tool_results(history) to remove older ToolResult entries for specific tools; integrates deduplication at multiple LLM call sites and overflow paths; introduces transient retry with exponential backoff (MAX_TRANSIENT_RETRIES, base delay), reduces TURNS_PER_SEGMENT (25→15) and MAX_OVERFLOW_RETRIES (3→2); minor other control-flow adjustments.
LoopGuard: polling/ping-pong logic
src/hooks/loop_guard.rs
Adds OBSERVATION_TOOLS, replaces recent per-call counters with call_counts: HashMap and adds last_call_hash: Option<String> to LoopGuard; introduces involves_observation_tool() helper, treats observation tools as poll calls (affecting thresholds), updates reset behavior, and adds tests for consecutive/non-consecutive and observation-tool scenarios.
LLM routing: retriable errors
src/llm/routing.rs
Expands is_retriable_error to include HTTP 500 and additional server-related phrases for retriable detection.
Browser tool: error message
src/tools/browser.rs
Modifies box-model failure error text to mention the element may have been removed from the DOM and to recommend running browser_snapshot for current interactable elements/indices.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the three main changes: browser context deduplication, consecutive loop guard tracking, and overflow resilience improvements.
Description check ✅ Passed The description provides detailed context for all major changes including loop guard improvements, deduplication strategy, context management, and error message fixes, all directly related to the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/browser-context-management

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/agent/worker.rs (1)

894-900: Consider centralizing the observation tool list.

DEDUP_TOOL_RESULTS contains the same tools as OBSERVATION_TOOLS in src/hooks/loop_guard.rs. While they serve different purposes (deduplication vs. loop guard thresholds), having two independently maintained lists creates a synchronization risk if new browser tools are added.

Consider extracting a shared constant or at least adding a comment cross-referencing the other location to help future maintainers keep them in sync.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/worker.rs` around lines 894 - 900, DEDUP_TOOL_RESULTS duplicates
the same tool names as OBSERVATION_TOOLS, risking drift; move the shared list
into a single public constant (e.g., OBSERVATION_TOOL_NAMES) in a common module
(or a crate-level constants module) and replace both DEDUP_TOOL_RESULTS and
OBSERVATION_TOOLS to reference that new constant, or if you prefer a minimal
change add a clear cross-reference comment above DEDUP_TOOL_RESULTS pointing to
OBSERVATION_TOOLS; update uses in worker.rs (DEDUP_TOOL_RESULTS) and
loop_guard.rs (OBSERVATION_TOOLS) to import the shared constant so future tool
additions only need one update.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/agent/worker.rs`:
- Around line 894-900: DEDUP_TOOL_RESULTS duplicates the same tool names as
OBSERVATION_TOOLS, risking drift; move the shared list into a single public
constant (e.g., OBSERVATION_TOOL_NAMES) in a common module (or a crate-level
constants module) and replace both DEDUP_TOOL_RESULTS and OBSERVATION_TOOLS to
reference that new constant, or if you prefer a minimal change add a clear
cross-reference comment above DEDUP_TOOL_RESULTS pointing to OBSERVATION_TOOLS;
update uses in worker.rs (DEDUP_TOOL_RESULTS) and loop_guard.rs
(OBSERVATION_TOOLS) to import the shared constant so future tool additions only
need one update.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b7ab3229-5cd7-4dee-a7de-fae142bfa65b

📥 Commits

Reviewing files that changed from the base of the PR and between e81b99b and 18744a5.

📒 Files selected for processing (3)
  • src/agent/worker.rs
  • src/hooks/loop_guard.rs
  • src/tools/browser.rs

…mption, retain pingpong warnings

- Fix dedup doc comment: transcript is not lossless since dedup mutates
  history in-place before persist_transcript runs.
- Track (message_index, item_index) in dedup so multiple ToolResult
  entries in the same User message are handled correctly.
- Use XOR in involves_observation_tool so two observation tools
  alternating (snapshot↔tab_list) is still caught as ping-pong.
- Retain pingpong_ warning buckets when resetting consecutive counts
  so max_warnings_per_call escalation still works for alternating
  patterns.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/agent/worker.rs (1)

894-900: Avoid duplicating the browser observation-tool list.

DEDUP_TOOL_RESULTS now has to stay in lockstep with src/hooks/loop_guard.rs::OBSERVATION_TOOLS. If one list changes without the other, a tool can be deduped without getting relaxed loop thresholds, or vice versa. Please move this classification into one shared constant/module and consume it from both places.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/worker.rs` around lines 894 - 900, DEDUP_TOOL_RESULTS and
OBSERVATION_TOOLS must be unified into a single shared constant so they stay in
sync; create a new shared module (e.g. a constants or observation_tools module)
that exports one constant (name it something descriptive like
OBSERVATION_TOOL_NAMES or OBSERVATION_TOOLS_SHARED with the same type &[&str])
and replace the local DEDUP_TOOL_RESULTS in src/agent/worker.rs and the
OBSERVATION_TOOLS usage in src/hooks/loop_guard.rs to import and use that shared
constant; ensure the constant's type and contents match existing expectations
and update any references to DEDUP_TOOL_RESULTS or OBSERVATION_TOOLS to the new
symbol to avoid duplication or divergence.
src/hooks/loop_guard.rs (1)

722-746: Add one regression test for browser_snapshotbrowser_tab_list.

The new tests cover observation↔action and repeated observation calls, but the XOR branch that keeps observation↔observation alternation detectable is still unpinned. A small snapshot → tab_list → snapshot → tab_list ... case would lock in the exact edge case this helper now depends on.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/hooks/loop_guard.rs` around lines 722 - 746, Add a regression test named
something like observation_tool_snapshot_tablist_alternation that uses
LoopGuard::new(worker_config()) and repeatedly calls LoopGuard::check
alternating between "browser_snapshot" and "browser_tab_list" (e.g., snapshot,
tab_list, snapshot, ...), asserting that the alternating sequence is treated as
observation↔observation instead of observation↔action: allow all calls up to the
relaxed warn threshold and then assert a warning (LoopGuardVerdict::Block
containing "Warning") at the expected threshold; reference LoopGuard::check and
LoopGuardVerdict in the test to pin the XOR alternation behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/agent/worker.rs`:
- Around line 18-25: TURNS_PER_SEGMENT is currently reused by
default_max_turns(), unintentionally shrinking worker Rig budgets and causing
MaxTurnsError; separate the compaction cadence from worker turn caps by creating
a new constant (e.g., COMPACTION_TURNS or SEGMENT_CHECK_TURNS) and use that
where compaction checks occur, then restore default_max_turns() to return the
intended worker cap (or stop using TURNS_PER_SEGMENT inside
default_max_turns()); also ensure all Rig agents explicitly set max_turns (call
max_turns(50) for workers, max_turns(10) for branches, max_turns(5) for
channels) so the worker prompt budgets are fixed regardless of compaction
cadence.

---

Nitpick comments:
In `@src/agent/worker.rs`:
- Around line 894-900: DEDUP_TOOL_RESULTS and OBSERVATION_TOOLS must be unified
into a single shared constant so they stay in sync; create a new shared module
(e.g. a constants or observation_tools module) that exports one constant (name
it something descriptive like OBSERVATION_TOOL_NAMES or OBSERVATION_TOOLS_SHARED
with the same type &[&str]) and replace the local DEDUP_TOOL_RESULTS in
src/agent/worker.rs and the OBSERVATION_TOOLS usage in src/hooks/loop_guard.rs
to import and use that shared constant; ensure the constant's type and contents
match existing expectations and update any references to DEDUP_TOOL_RESULTS or
OBSERVATION_TOOLS to the new symbol to avoid duplication or divergence.

In `@src/hooks/loop_guard.rs`:
- Around line 722-746: Add a regression test named something like
observation_tool_snapshot_tablist_alternation that uses
LoopGuard::new(worker_config()) and repeatedly calls LoopGuard::check
alternating between "browser_snapshot" and "browser_tab_list" (e.g., snapshot,
tab_list, snapshot, ...), asserting that the alternating sequence is treated as
observation↔observation instead of observation↔action: allow all calls up to the
relaxed warn threshold and then assert a warning (LoopGuardVerdict::Block
containing "Warning") at the expected threshold; reference LoopGuard::check and
LoopGuardVerdict in the test to pin the XOR alternation behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cd87e125-2231-46bb-a2a8-18ec84249409

📥 Commits

Reviewing files that changed from the base of the PR and between 18744a5 and c9dc6be.

📒 Files selected for processing (2)
  • src/agent/worker.rs
  • src/hooks/loop_guard.rs

Comment on lines 18 to +25
/// How many turns per segment before we check context and potentially compact.
const TURNS_PER_SEGMENT: usize = 25;
///
/// Kept relatively low so compaction checks run frequently. Fast models can
/// burn through many tool-call turns quickly, and each turn may add large
/// tool results (browser snapshots, shell output). Checking every 15 turns
/// instead of 25 reduces the chance of blowing past the context window
/// within a single segment.
const TURNS_PER_SEGMENT: usize = 15;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Decouple segment cadence from the worker turn cap.

This constant still feeds default_max_turns() on Line 290, so dropping it to 15 does more than increase compaction frequency: it also reduces each worker prompt to a 15-turn Rig budget and makes MaxTurnsError much more likely. If the intent is “compact every 15 turns,” that needs a separate checkpoint from the worker’s actual max_turns. As per coding guidelines, "Set explicit max_turns on all Rig agents: max_turns(50) for workers, max_turns(10) for branches, max_turns(5) for channels".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/worker.rs` around lines 18 - 25, TURNS_PER_SEGMENT is currently
reused by default_max_turns(), unintentionally shrinking worker Rig budgets and
causing MaxTurnsError; separate the compaction cadence from worker turn caps by
creating a new constant (e.g., COMPACTION_TURNS or SEGMENT_CHECK_TURNS) and use
that where compaction checks occur, then restore default_max_turns() to return
the intended worker cap (or stop using TURNS_PER_SEGMENT inside
default_max_turns()); also ensure all Rig agents explicitly set max_turns (call
max_turns(50) for workers, max_turns(10) for branches, max_turns(5) for
channels) so the worker prompt budgets are fixed regardless of compaction
cadence.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/agent/worker.rs (2)

18-25: ⚠️ Potential issue | 🟠 Major

Don’t couple compaction cadence to the worker turn budget.

This change still drops the worker’s Rig budget to 15 turns because default_max_turns(TURNS_PER_SEGMENT) now uses the new segment constant directly. That makes MaxTurnsError much more likely instead of only increasing maintenance frequency. If you want 15-turn compaction checkpoints, they need to be separate from the worker’s explicit max-turn budget.

As per coding guidelines, "Set explicit max_turns on all Rig agents: max_turns(50) for workers, max_turns(10) for branches, max_turns(5) for channels".

Also applies to: 288-290

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/worker.rs` around lines 18 - 25, The compaction cadence constant
TURNS_PER_SEGMENT was incorrectly reused to set the worker Rig's turn budget via
default_max_turns(TURNS_PER_SEGMENT), causing unintended MaxTurnsError; revert
the worker Rig's explicit max_turns to the required budget (use max_turns(50)
when constructing the worker Rig or call default_max_turns(50)) and keep
TURNS_PER_SEGMENT solely for compaction/checkpoint logic, ensuring compaction
checks still use TURNS_PER_SEGMENT while the Rig creation uses max_turns(50)
(also update the similar occurrence around the code referenced at the other
occurrence near lines 288-290).

325-334: ⚠️ Potential issue | 🟠 Major

This makes persisted worker transcripts lossy.

dedup_tool_results() rewrites the canonical history, and persist_transcript() later serializes that mutated history. After the first maintenance pass, older browser_snapshot / browser_tab_list payloads in worker_runs.transcript are placeholders, not the original results. That still conflicts with the PR goal of keeping full transcripts in worker run records.

Also applies to: 467-469, 502-503, 667-675, 907-984

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/worker.rs` around lines 325 - 334, The maintenance step currently
mutates the canonical history via dedup_tool_results() and
maybe_compact_history(), causing persisted worker_runs.transcript entries (e.g.,
browser_snapshot and browser_tab_list payloads) to become placeholders instead
of original results; instead, perform dedup/compaction on a copy used only for
LLM context assembly and leave the canonical history intact for
persist_transcript(). Concretely, change calls to dedup_tool_results(&mut
history) and maybe_compact_history(&mut compacted_history, &mut history) so they
operate on a cloned history (or return a compacted copy) used for model calls,
or adjust dedup_tool_results to return a non-mutating compacted vector, and
ensure persist_transcript() serializes the original, unmodified history. Also
apply the same non-mutating approach at the other occurrences that call
dedup_tool_results/maybe_compact_history so worker_runs.transcript keeps full
original payloads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/agent/worker.rs`:
- Around line 18-25: The compaction cadence constant TURNS_PER_SEGMENT was
incorrectly reused to set the worker Rig's turn budget via
default_max_turns(TURNS_PER_SEGMENT), causing unintended MaxTurnsError; revert
the worker Rig's explicit max_turns to the required budget (use max_turns(50)
when constructing the worker Rig or call default_max_turns(50)) and keep
TURNS_PER_SEGMENT solely for compaction/checkpoint logic, ensuring compaction
checks still use TURNS_PER_SEGMENT while the Rig creation uses max_turns(50)
(also update the similar occurrence around the code referenced at the other
occurrence near lines 288-290).
- Around line 325-334: The maintenance step currently mutates the canonical
history via dedup_tool_results() and maybe_compact_history(), causing persisted
worker_runs.transcript entries (e.g., browser_snapshot and browser_tab_list
payloads) to become placeholders instead of original results; instead, perform
dedup/compaction on a copy used only for LLM context assembly and leave the
canonical history intact for persist_transcript(). Concretely, change calls to
dedup_tool_results(&mut history) and maybe_compact_history(&mut
compacted_history, &mut history) so they operate on a cloned history (or return
a compacted copy) used for model calls, or adjust dedup_tool_results to return a
non-mutating compacted vector, and ensure persist_transcript() serializes the
original, unmodified history. Also apply the same non-mutating approach at the
other occurrences that call dedup_tool_results/maybe_compact_history so
worker_runs.transcript keeps full original payloads.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0b57f1ca-b5db-4bcc-b6c0-824308e84996

📥 Commits

Reviewing files that changed from the base of the PR and between c9dc6be and 6966cea.

📒 Files selected for processing (1)
  • src/agent/worker.rs

Workers now catch retriable errors (upstream 500s, timeouts, rate limits
that survived model-level retries) and back off with exponential delay
before retrying, up to 5 attempts. Previously any error that wasn't a
context overflow or cancellation killed the worker immediately.

Also added missing patterns to is_retriable_error: generic server errors
like 'The server had an error' and '500' status codes that OpenRouter
wraps in various phrasings.
@jamiepine jamiepine merged commit cb101c5 into main Mar 8, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant