fix: browser context management — dedup snapshots, consecutive loop guard, overflow resilience by jamiepine · Pull Request #359 · spacedriveapp/spacebot

jamiepine · 2026-03-08T02:26:57Z

Summary

Loop guard now tracks consecutive identical calls, not lifetime totals. browser_snapshot (always empty args) was getting permanently blocked after 7 total uses across a session. Now the counter resets when a different tool call runs in between. Observation tools (browser_snapshot, browser_tab_list) get the poll multiplier on thresholds and are excluded from ping-pong detection.
Dedup stale tool results before every LLM call. All but the most recent browser_snapshot and browser_tab_list results are replaced with a one-liner ([browser_snapshot output superseded...]). The tool call/result structure stays intact but the multi-KB ARIA tree content is gone. Full transcript preserved in worker run records.
Pre-prompt compaction and smaller segments. Context usage checked before every LLM call (not just at segment boundaries). TURNS_PER_SEGMENT reduced from 25 → 15 so compaction checks run more frequently. Overflow retry reduced from 3 → 2 since dedup + pre-prompt compaction should prevent hitting it.
Fixed get_element_center error message to tell the model to run browser_snapshot instead of the useless "try scrolling or taking a screenshot."

Test plan

All 460 lib tests pass including 3 new loop guard tests (non_consecutive_identical_calls_allowed, consecutive_identical_calls_still_blocked, observation_tool_consecutive_gets_relaxed_threshold).

Note

Technical changes: Three files modified (253 additions, 7 deletions). Core improvements: new dedup_tool_results() function strips stale browser snapshots from history before LLM calls; loop guard refactored with last_call_hash tracking to reset consecutive counters when non-identical tools run; observation tools categorized separately to get relaxed thresholds and bypass ping-pong detection. Context management now happens pre-prompt (not just post-max-turns), reducing overflow scenarios. Fully tested with new loop guard unit tests validating the consecutive-vs-interleaved behavior and observation tool multiplier.

_{Written by Tembo for commit 18744a5. This will update automatically on new commits.}

…uard, overflow resilience Loop guard was counting total lifetime calls per (tool, args) hash, so browser_snapshot (empty args) got permanently blocked after 7 uses even when other tools ran between each call. Now tracks consecutive identical calls — the counter resets whenever a different tool runs. Observation tools (browser_snapshot, browser_tab_list) also get the poll multiplier and are excluded from ping-pong detection since snapshot→click is normal browser workflow. Worker context overflow was killing the worker after 3 retries of force-compact. Now: dedup stale tool results before every LLM call (replaces all but the most recent browser_snapshot/browser_tab_list result with a one-liner), pre-prompt compaction check every segment boundary, and reduced segment size (25→15 turns) for more frequent checks. Overflow retry kept at 2 as a safety net since dedup+compaction should prevent hitting it. Also fixed the get_element_center error message to tell the model to run browser_snapshot instead of suggesting screenshots/scrolling.

coderabbitai · 2026-03-08T02:27:13Z

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

Introduces in-run tool-result deduplication and transient retry/backoff logic in the agent worker, tightens segment and overflow thresholds, expands retriable LLM error detection, and extends LoopGuard to track per-(tool,args) counts and observation-tool behavior; also tweaks a browser error message.

Changes

Cohort / File(s)	Summary
Agent worker: deduplication & retries `src/agent/worker.rs`	Adds `DEDUP_TOOL_RESULTS` and `dedup_tool_results(history)` to remove older ToolResult entries for specific tools; integrates deduplication at multiple LLM call sites and overflow paths; introduces transient retry with exponential backoff (`MAX_TRANSIENT_RETRIES`, base delay), reduces `TURNS_PER_SEGMENT` (25→15) and `MAX_OVERFLOW_RETRIES` (3→2); minor other control-flow adjustments.
LoopGuard: polling/ping-pong logic `src/hooks/loop_guard.rs`	Adds `OBSERVATION_TOOLS`, replaces recent per-call counters with `call_counts: HashMap` and adds `last_call_hash: Option<String>` to `LoopGuard`; introduces `involves_observation_tool()` helper, treats observation tools as poll calls (affecting thresholds), updates reset behavior, and adds tests for consecutive/non-consecutive and observation-tool scenarios.
LLM routing: retriable errors `src/llm/routing.rs`	Expands `is_retriable_error` to include HTTP 500 and additional server-related phrases for retriable detection.
Browser tool: error message `src/tools/browser.rs`	Modifies box-model failure error text to mention the element may have been removed from the DOM and to recommend running `browser_snapshot` for current interactable elements/indices.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

feat: rig 0.31 upgrade, tool nudging, and invariant harness #292 — Also modifies src/agent/worker.rs and alters worker prompt/retry control flow; likely overlaps with transient retry and deduplication approach.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the three main changes: browser context deduplication, consecutive loop guard tracking, and overflow resilience improvements.
Description check	✅ Passed	The description provides detailed context for all major changes including loop guard improvements, deduplication strategy, context management, and error message fixes, all directly related to the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/browser-context-management

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

src/agent/worker.rs

src/hooks/loop_guard.rs

coderabbitai

🧹 Nitpick comments (1)

src/agent/worker.rs (1)
894-900: Consider centralizing the observation tool list.

DEDUP_TOOL_RESULTS contains the same tools as OBSERVATION_TOOLS in src/hooks/loop_guard.rs. While they serve different purposes (deduplication vs. loop guard thresholds), having two independently maintained lists creates a synchronization risk if new browser tools are added.

Consider extracting a shared constant or at least adding a comment cross-referencing the other location to help future maintainers keep them in sync.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/worker.rs` around lines 894 - 900, DEDUP_TOOL_RESULTS duplicates
the same tool names as OBSERVATION_TOOLS, risking drift; move the shared list
into a single public constant (e.g., OBSERVATION_TOOL_NAMES) in a common module
(or a crate-level constants module) and replace both DEDUP_TOOL_RESULTS and
OBSERVATION_TOOLS to reference that new constant, or if you prefer a minimal
change add a clear cross-reference comment above DEDUP_TOOL_RESULTS pointing to
OBSERVATION_TOOLS; update uses in worker.rs (DEDUP_TOOL_RESULTS) and
loop_guard.rs (OBSERVATION_TOOLS) to import the shared constant so future tool
additions only need one update.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/agent/worker.rs`:
- Around line 894-900: DEDUP_TOOL_RESULTS duplicates the same tool names as
OBSERVATION_TOOLS, risking drift; move the shared list into a single public
constant (e.g., OBSERVATION_TOOL_NAMES) in a common module (or a crate-level
constants module) and replace both DEDUP_TOOL_RESULTS and OBSERVATION_TOOLS to
reference that new constant, or if you prefer a minimal change add a clear
cross-reference comment above DEDUP_TOOL_RESULTS pointing to OBSERVATION_TOOLS;
update uses in worker.rs (DEDUP_TOOL_RESULTS) and loop_guard.rs
(OBSERVATION_TOOLS) to import the shared constant so future tool additions only
need one update.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b7ab3229-5cd7-4dee-a7de-fae142bfa65b

📥 Commits

Reviewing files that changed from the base of the PR and between e81b99b and 18744a5.

📒 Files selected for processing (3)

src/agent/worker.rs
src/hooks/loop_guard.rs
src/tools/browser.rs

src/hooks/loop_guard.rs

…mption, retain pingpong warnings - Fix dedup doc comment: transcript is not lossless since dedup mutates history in-place before persist_transcript runs. - Track (message_index, item_index) in dedup so multiple ToolResult entries in the same User message are handled correctly. - Use XOR in involves_observation_tool so two observation tools alternating (snapshot↔tab_list) is still caught as ping-pong. - Retain pingpong_ warning buckets when resetting consecutive counts so max_warnings_per_call escalation still works for alternating patterns.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/agent/worker.rs (1)
894-900: Avoid duplicating the browser observation-tool list.

DEDUP_TOOL_RESULTS now has to stay in lockstep with src/hooks/loop_guard.rs::OBSERVATION_TOOLS. If one list changes without the other, a tool can be deduped without getting relaxed loop thresholds, or vice versa. Please move this classification into one shared constant/module and consume it from both places.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/worker.rs` around lines 894 - 900, DEDUP_TOOL_RESULTS and
OBSERVATION_TOOLS must be unified into a single shared constant so they stay in
sync; create a new shared module (e.g. a constants or observation_tools module)
that exports one constant (name it something descriptive like
OBSERVATION_TOOL_NAMES or OBSERVATION_TOOLS_SHARED with the same type &[&str])
and replace the local DEDUP_TOOL_RESULTS in src/agent/worker.rs and the
OBSERVATION_TOOLS usage in src/hooks/loop_guard.rs to import and use that shared
constant; ensure the constant's type and contents match existing expectations
and update any references to DEDUP_TOOL_RESULTS or OBSERVATION_TOOLS to the new
symbol to avoid duplication or divergence.
src/hooks/loop_guard.rs (1)
722-746: Add one regression test for browser_snapshot ↔ browser_tab_list.

The new tests cover observation↔action and repeated observation calls, but the XOR branch that keeps observation↔observation alternation detectable is still unpinned. A small snapshot → tab_list → snapshot → tab_list ... case would lock in the exact edge case this helper now depends on.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/hooks/loop_guard.rs` around lines 722 - 746, Add a regression test named
something like observation_tool_snapshot_tablist_alternation that uses
LoopGuard::new(worker_config()) and repeatedly calls LoopGuard::check
alternating between "browser_snapshot" and "browser_tab_list" (e.g., snapshot,
tab_list, snapshot, ...), asserting that the alternating sequence is treated as
observation↔observation instead of observation↔action: allow all calls up to the
relaxed warn threshold and then assert a warning (LoopGuardVerdict::Block
containing "Warning") at the expected threshold; reference LoopGuard::check and
LoopGuardVerdict in the test to pin the XOR alternation behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/agent/worker.rs`:
- Around line 18-25: TURNS_PER_SEGMENT is currently reused by
default_max_turns(), unintentionally shrinking worker Rig budgets and causing
MaxTurnsError; separate the compaction cadence from worker turn caps by creating
a new constant (e.g., COMPACTION_TURNS or SEGMENT_CHECK_TURNS) and use that
where compaction checks occur, then restore default_max_turns() to return the
intended worker cap (or stop using TURNS_PER_SEGMENT inside
default_max_turns()); also ensure all Rig agents explicitly set max_turns (call
max_turns(50) for workers, max_turns(10) for branches, max_turns(5) for
channels) so the worker prompt budgets are fixed regardless of compaction
cadence.

---

Nitpick comments:
In `@src/agent/worker.rs`:
- Around line 894-900: DEDUP_TOOL_RESULTS and OBSERVATION_TOOLS must be unified
into a single shared constant so they stay in sync; create a new shared module
(e.g. a constants or observation_tools module) that exports one constant (name
it something descriptive like OBSERVATION_TOOL_NAMES or OBSERVATION_TOOLS_SHARED
with the same type &[&str]) and replace the local DEDUP_TOOL_RESULTS in
src/agent/worker.rs and the OBSERVATION_TOOLS usage in src/hooks/loop_guard.rs
to import and use that shared constant; ensure the constant's type and contents
match existing expectations and update any references to DEDUP_TOOL_RESULTS or
OBSERVATION_TOOLS to the new symbol to avoid duplication or divergence.

In `@src/hooks/loop_guard.rs`:
- Around line 722-746: Add a regression test named something like
observation_tool_snapshot_tablist_alternation that uses
LoopGuard::new(worker_config()) and repeatedly calls LoopGuard::check
alternating between "browser_snapshot" and "browser_tab_list" (e.g., snapshot,
tab_list, snapshot, ...), asserting that the alternating sequence is treated as
observation↔observation instead of observation↔action: allow all calls up to the
relaxed warn threshold and then assert a warning (LoopGuardVerdict::Block
containing "Warning") at the expected threshold; reference LoopGuard::check and
LoopGuardVerdict in the test to pin the XOR alternation behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cd87e125-2231-46bb-a2a8-18ec84249409

📥 Commits

Reviewing files that changed from the base of the PR and between 18744a5 and c9dc6be.

📒 Files selected for processing (2)

src/agent/worker.rs
src/hooks/loop_guard.rs

coderabbitai · 2026-03-08T02:56:21Z

src/agent/worker.rs

 /// How many turns per segment before we check context and potentially compact.
-const TURNS_PER_SEGMENT: usize = 25;
+///
+/// Kept relatively low so compaction checks run frequently. Fast models can
+/// burn through many tool-call turns quickly, and each turn may add large
+/// tool results (browser snapshots, shell output). Checking every 15 turns
+/// instead of 25 reduces the chance of blowing past the context window
+/// within a single segment.
+const TURNS_PER_SEGMENT: usize = 15;


⚠️ Potential issue | 🟠 Major

Decouple segment cadence from the worker turn cap.

This constant still feeds default_max_turns() on Line 290, so dropping it to 15 does more than increase compaction frequency: it also reduces each worker prompt to a 15-turn Rig budget and makes MaxTurnsError much more likely. If the intent is “compact every 15 turns,” that needs a separate checkpoint from the worker’s actual max_turns. As per coding guidelines, "Set explicit max_turns on all Rig agents: max_turns(50) for workers, max_turns(10) for branches, max_turns(5) for channels".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/agent/worker.rs` around lines 18 - 25, TURNS_PER_SEGMENT is currently reused by default_max_turns(), unintentionally shrinking worker Rig budgets and causing MaxTurnsError; separate the compaction cadence from worker turn caps by creating a new constant (e.g., COMPACTION_TURNS or SEGMENT_CHECK_TURNS) and use that where compaction checks occur, then restore default_max_turns() to return the intended worker cap (or stop using TURNS_PER_SEGMENT inside default_max_turns()); also ensure all Rig agents explicitly set max_turns (call max_turns(50) for workers, max_turns(10) for branches, max_turns(5) for channels) so the worker prompt budgets are fixed regardless of compaction cadence.

coderabbitai

♻️ Duplicate comments (2)

src/agent/worker.rs (2)
18-25: ⚠️ Potential issue | 🟠 Major

Don’t couple compaction cadence to the worker turn budget.

This change still drops the worker’s Rig budget to 15 turns because default_max_turns(TURNS_PER_SEGMENT) now uses the new segment constant directly. That makes MaxTurnsError much more likely instead of only increasing maintenance frequency. If you want 15-turn compaction checkpoints, they need to be separate from the worker’s explicit max-turn budget.

As per coding guidelines, "Set explicit max_turns on all Rig agents: max_turns(50) for workers, max_turns(10) for branches, max_turns(5) for channels".

Also applies to: 288-290
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/worker.rs` around lines 18 - 25, The compaction cadence constant
TURNS_PER_SEGMENT was incorrectly reused to set the worker Rig's turn budget via
default_max_turns(TURNS_PER_SEGMENT), causing unintended MaxTurnsError; revert
the worker Rig's explicit max_turns to the required budget (use max_turns(50)
when constructing the worker Rig or call default_max_turns(50)) and keep
TURNS_PER_SEGMENT solely for compaction/checkpoint logic, ensuring compaction
checks still use TURNS_PER_SEGMENT while the Rig creation uses max_turns(50)
(also update the similar occurrence around the code referenced at the other
occurrence near lines 288-290).
325-334: ⚠️ Potential issue | 🟠 Major

This makes persisted worker transcripts lossy.

dedup_tool_results() rewrites the canonical history, and persist_transcript() later serializes that mutated history. After the first maintenance pass, older browser_snapshot / browser_tab_list payloads in worker_runs.transcript are placeholders, not the original results. That still conflicts with the PR goal of keeping full transcripts in worker run records.

Also applies to: 467-469, 502-503, 667-675, 907-984
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/worker.rs` around lines 325 - 334, The maintenance step currently
mutates the canonical history via dedup_tool_results() and
maybe_compact_history(), causing persisted worker_runs.transcript entries (e.g.,
browser_snapshot and browser_tab_list payloads) to become placeholders instead
of original results; instead, perform dedup/compaction on a copy used only for
LLM context assembly and leave the canonical history intact for
persist_transcript(). Concretely, change calls to dedup_tool_results(&mut
history) and maybe_compact_history(&mut compacted_history, &mut history) so they
operate on a cloned history (or return a compacted copy) used for model calls,
or adjust dedup_tool_results to return a non-mutating compacted vector, and
ensure persist_transcript() serializes the original, unmodified history. Also
apply the same non-mutating approach at the other occurrences that call
dedup_tool_results/maybe_compact_history so worker_runs.transcript keeps full
original payloads.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/agent/worker.rs`:
- Around line 18-25: The compaction cadence constant TURNS_PER_SEGMENT was
incorrectly reused to set the worker Rig's turn budget via
default_max_turns(TURNS_PER_SEGMENT), causing unintended MaxTurnsError; revert
the worker Rig's explicit max_turns to the required budget (use max_turns(50)
when constructing the worker Rig or call default_max_turns(50)) and keep
TURNS_PER_SEGMENT solely for compaction/checkpoint logic, ensuring compaction
checks still use TURNS_PER_SEGMENT while the Rig creation uses max_turns(50)
(also update the similar occurrence around the code referenced at the other
occurrence near lines 288-290).
- Around line 325-334: The maintenance step currently mutates the canonical
history via dedup_tool_results() and maybe_compact_history(), causing persisted
worker_runs.transcript entries (e.g., browser_snapshot and browser_tab_list
payloads) to become placeholders instead of original results; instead, perform
dedup/compaction on a copy used only for LLM context assembly and leave the
canonical history intact for persist_transcript(). Concretely, change calls to
dedup_tool_results(&mut history) and maybe_compact_history(&mut
compacted_history, &mut history) so they operate on a cloned history (or return
a compacted copy) used for model calls, or adjust dedup_tool_results to return a
non-mutating compacted vector, and ensure persist_transcript() serializes the
original, unmodified history. Also apply the same non-mutating approach at the
other occurrences that call dedup_tool_results/maybe_compact_history so
worker_runs.transcript keeps full original payloads.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0b57f1ca-b5db-4bcc-b6c0-824308e84996

📥 Commits

Reviewing files that changed from the base of the PR and between c9dc6be and 6966cea.

📒 Files selected for processing (1)

src/agent/worker.rs

Workers now catch retriable errors (upstream 500s, timeouts, rate limits that survived model-level retries) and back off with exponential delay before retrying, up to 5 attempts. Previously any error that wasn't a context overflow or cancellation killed the worker immediately. Also added missing patterns to is_retriable_error: generic server errors like 'The server had an error' and '500' status codes that OpenRouter wraps in various phrasings.

tembo bot reviewed Mar 8, 2026

View reviewed changes

src/agent/worker.rs Outdated Show resolved Hide resolved

src/agent/worker.rs Outdated Show resolved Hide resolved

src/hooks/loop_guard.rs Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 8, 2026

View reviewed changes

tembo bot reviewed Mar 8, 2026

View reviewed changes

src/hooks/loop_guard.rs Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 8, 2026

View reviewed changes

fix: collapse nested if blocks for clippy collapsible_if

6966cea

coderabbitai bot reviewed Mar 8, 2026

View reviewed changes

jamiepine merged commit cb101c5 into main Mar 8, 2026
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: browser context management — dedup snapshots, consecutive loop guard, overflow resilience#359

fix: browser context management — dedup snapshots, consecutive loop guard, overflow resilience#359
jamiepine merged 4 commits intomainfrom
fix/browser-context-management

jamiepine commented Mar 8, 2026 •

edited by tembo bot

Loading

Uh oh!

coderabbitai bot commented Mar 8, 2026 •

edited

Loading

Review failed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 8, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jamiepine commented Mar 8, 2026 • edited by tembo bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

coderabbitai bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jamiepine commented Mar 8, 2026 •

edited by tembo bot

Loading

coderabbitai bot commented Mar 8, 2026 •

edited

Loading