Conversation
…uard, overflow resilience Loop guard was counting total lifetime calls per (tool, args) hash, so browser_snapshot (empty args) got permanently blocked after 7 uses even when other tools ran between each call. Now tracks consecutive identical calls — the counter resets whenever a different tool runs. Observation tools (browser_snapshot, browser_tab_list) also get the poll multiplier and are excluded from ping-pong detection since snapshot→click is normal browser workflow. Worker context overflow was killing the worker after 3 retries of force-compact. Now: dedup stale tool results before every LLM call (replaces all but the most recent browser_snapshot/browser_tab_list result with a one-liner), pre-prompt compaction check every segment boundary, and reduced segment size (25→15 turns) for more frequent checks. Overflow retry kept at 2 as a safety net since dedup+compaction should prevent hitting it. Also fixed the get_element_center error message to tell the model to run browser_snapshot instead of suggesting screenshots/scrolling.
|
Caution Review failedPull request was closed or merged during review WalkthroughIntroduces in-run tool-result deduplication and transient retry/backoff logic in the agent worker, tightens segment and overflow thresholds, expands retriable LLM error detection, and extends LoopGuard to track per-(tool,args) counts and observation-tool behavior; also tweaks a browser error message. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/agent/worker.rs (1)
894-900: Consider centralizing the observation tool list.
DEDUP_TOOL_RESULTScontains the same tools asOBSERVATION_TOOLSinsrc/hooks/loop_guard.rs. While they serve different purposes (deduplication vs. loop guard thresholds), having two independently maintained lists creates a synchronization risk if new browser tools are added.Consider extracting a shared constant or at least adding a comment cross-referencing the other location to help future maintainers keep them in sync.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/agent/worker.rs` around lines 894 - 900, DEDUP_TOOL_RESULTS duplicates the same tool names as OBSERVATION_TOOLS, risking drift; move the shared list into a single public constant (e.g., OBSERVATION_TOOL_NAMES) in a common module (or a crate-level constants module) and replace both DEDUP_TOOL_RESULTS and OBSERVATION_TOOLS to reference that new constant, or if you prefer a minimal change add a clear cross-reference comment above DEDUP_TOOL_RESULTS pointing to OBSERVATION_TOOLS; update uses in worker.rs (DEDUP_TOOL_RESULTS) and loop_guard.rs (OBSERVATION_TOOLS) to import the shared constant so future tool additions only need one update.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/agent/worker.rs`:
- Around line 894-900: DEDUP_TOOL_RESULTS duplicates the same tool names as
OBSERVATION_TOOLS, risking drift; move the shared list into a single public
constant (e.g., OBSERVATION_TOOL_NAMES) in a common module (or a crate-level
constants module) and replace both DEDUP_TOOL_RESULTS and OBSERVATION_TOOLS to
reference that new constant, or if you prefer a minimal change add a clear
cross-reference comment above DEDUP_TOOL_RESULTS pointing to OBSERVATION_TOOLS;
update uses in worker.rs (DEDUP_TOOL_RESULTS) and loop_guard.rs
(OBSERVATION_TOOLS) to import the shared constant so future tool additions only
need one update.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b7ab3229-5cd7-4dee-a7de-fae142bfa65b
📒 Files selected for processing (3)
src/agent/worker.rssrc/hooks/loop_guard.rssrc/tools/browser.rs
…mption, retain pingpong warnings - Fix dedup doc comment: transcript is not lossless since dedup mutates history in-place before persist_transcript runs. - Track (message_index, item_index) in dedup so multiple ToolResult entries in the same User message are handled correctly. - Use XOR in involves_observation_tool so two observation tools alternating (snapshot↔tab_list) is still caught as ping-pong. - Retain pingpong_ warning buckets when resetting consecutive counts so max_warnings_per_call escalation still works for alternating patterns.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/agent/worker.rs (1)
894-900: Avoid duplicating the browser observation-tool list.
DEDUP_TOOL_RESULTSnow has to stay in lockstep withsrc/hooks/loop_guard.rs::OBSERVATION_TOOLS. If one list changes without the other, a tool can be deduped without getting relaxed loop thresholds, or vice versa. Please move this classification into one shared constant/module and consume it from both places.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/agent/worker.rs` around lines 894 - 900, DEDUP_TOOL_RESULTS and OBSERVATION_TOOLS must be unified into a single shared constant so they stay in sync; create a new shared module (e.g. a constants or observation_tools module) that exports one constant (name it something descriptive like OBSERVATION_TOOL_NAMES or OBSERVATION_TOOLS_SHARED with the same type &[&str]) and replace the local DEDUP_TOOL_RESULTS in src/agent/worker.rs and the OBSERVATION_TOOLS usage in src/hooks/loop_guard.rs to import and use that shared constant; ensure the constant's type and contents match existing expectations and update any references to DEDUP_TOOL_RESULTS or OBSERVATION_TOOLS to the new symbol to avoid duplication or divergence.src/hooks/loop_guard.rs (1)
722-746: Add one regression test forbrowser_snapshot↔browser_tab_list.The new tests cover observation↔action and repeated observation calls, but the XOR branch that keeps observation↔observation alternation detectable is still unpinned. A small
snapshot → tab_list → snapshot → tab_list ...case would lock in the exact edge case this helper now depends on.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/hooks/loop_guard.rs` around lines 722 - 746, Add a regression test named something like observation_tool_snapshot_tablist_alternation that uses LoopGuard::new(worker_config()) and repeatedly calls LoopGuard::check alternating between "browser_snapshot" and "browser_tab_list" (e.g., snapshot, tab_list, snapshot, ...), asserting that the alternating sequence is treated as observation↔observation instead of observation↔action: allow all calls up to the relaxed warn threshold and then assert a warning (LoopGuardVerdict::Block containing "Warning") at the expected threshold; reference LoopGuard::check and LoopGuardVerdict in the test to pin the XOR alternation behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/agent/worker.rs`:
- Around line 18-25: TURNS_PER_SEGMENT is currently reused by
default_max_turns(), unintentionally shrinking worker Rig budgets and causing
MaxTurnsError; separate the compaction cadence from worker turn caps by creating
a new constant (e.g., COMPACTION_TURNS or SEGMENT_CHECK_TURNS) and use that
where compaction checks occur, then restore default_max_turns() to return the
intended worker cap (or stop using TURNS_PER_SEGMENT inside
default_max_turns()); also ensure all Rig agents explicitly set max_turns (call
max_turns(50) for workers, max_turns(10) for branches, max_turns(5) for
channels) so the worker prompt budgets are fixed regardless of compaction
cadence.
---
Nitpick comments:
In `@src/agent/worker.rs`:
- Around line 894-900: DEDUP_TOOL_RESULTS and OBSERVATION_TOOLS must be unified
into a single shared constant so they stay in sync; create a new shared module
(e.g. a constants or observation_tools module) that exports one constant (name
it something descriptive like OBSERVATION_TOOL_NAMES or OBSERVATION_TOOLS_SHARED
with the same type &[&str]) and replace the local DEDUP_TOOL_RESULTS in
src/agent/worker.rs and the OBSERVATION_TOOLS usage in src/hooks/loop_guard.rs
to import and use that shared constant; ensure the constant's type and contents
match existing expectations and update any references to DEDUP_TOOL_RESULTS or
OBSERVATION_TOOLS to the new symbol to avoid duplication or divergence.
In `@src/hooks/loop_guard.rs`:
- Around line 722-746: Add a regression test named something like
observation_tool_snapshot_tablist_alternation that uses
LoopGuard::new(worker_config()) and repeatedly calls LoopGuard::check
alternating between "browser_snapshot" and "browser_tab_list" (e.g., snapshot,
tab_list, snapshot, ...), asserting that the alternating sequence is treated as
observation↔observation instead of observation↔action: allow all calls up to the
relaxed warn threshold and then assert a warning (LoopGuardVerdict::Block
containing "Warning") at the expected threshold; reference LoopGuard::check and
LoopGuardVerdict in the test to pin the XOR alternation behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cd87e125-2231-46bb-a2a8-18ec84249409
📒 Files selected for processing (2)
src/agent/worker.rssrc/hooks/loop_guard.rs
| /// How many turns per segment before we check context and potentially compact. | ||
| const TURNS_PER_SEGMENT: usize = 25; | ||
| /// | ||
| /// Kept relatively low so compaction checks run frequently. Fast models can | ||
| /// burn through many tool-call turns quickly, and each turn may add large | ||
| /// tool results (browser snapshots, shell output). Checking every 15 turns | ||
| /// instead of 25 reduces the chance of blowing past the context window | ||
| /// within a single segment. | ||
| const TURNS_PER_SEGMENT: usize = 15; |
There was a problem hiding this comment.
Decouple segment cadence from the worker turn cap.
This constant still feeds default_max_turns() on Line 290, so dropping it to 15 does more than increase compaction frequency: it also reduces each worker prompt to a 15-turn Rig budget and makes MaxTurnsError much more likely. If the intent is “compact every 15 turns,” that needs a separate checkpoint from the worker’s actual max_turns. As per coding guidelines, "Set explicit max_turns on all Rig agents: max_turns(50) for workers, max_turns(10) for branches, max_turns(5) for channels".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/agent/worker.rs` around lines 18 - 25, TURNS_PER_SEGMENT is currently
reused by default_max_turns(), unintentionally shrinking worker Rig budgets and
causing MaxTurnsError; separate the compaction cadence from worker turn caps by
creating a new constant (e.g., COMPACTION_TURNS or SEGMENT_CHECK_TURNS) and use
that where compaction checks occur, then restore default_max_turns() to return
the intended worker cap (or stop using TURNS_PER_SEGMENT inside
default_max_turns()); also ensure all Rig agents explicitly set max_turns (call
max_turns(50) for workers, max_turns(10) for branches, max_turns(5) for
channels) so the worker prompt budgets are fixed regardless of compaction
cadence.
There was a problem hiding this comment.
♻️ Duplicate comments (2)
src/agent/worker.rs (2)
18-25:⚠️ Potential issue | 🟠 MajorDon’t couple compaction cadence to the worker turn budget.
This change still drops the worker’s Rig budget to 15 turns because
default_max_turns(TURNS_PER_SEGMENT)now uses the new segment constant directly. That makesMaxTurnsErrormuch more likely instead of only increasing maintenance frequency. If you want 15-turn compaction checkpoints, they need to be separate from the worker’s explicit max-turn budget.As per coding guidelines, "Set explicit
max_turnson all Rig agents:max_turns(50)for workers,max_turns(10)for branches,max_turns(5)for channels".Also applies to: 288-290
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/agent/worker.rs` around lines 18 - 25, The compaction cadence constant TURNS_PER_SEGMENT was incorrectly reused to set the worker Rig's turn budget via default_max_turns(TURNS_PER_SEGMENT), causing unintended MaxTurnsError; revert the worker Rig's explicit max_turns to the required budget (use max_turns(50) when constructing the worker Rig or call default_max_turns(50)) and keep TURNS_PER_SEGMENT solely for compaction/checkpoint logic, ensuring compaction checks still use TURNS_PER_SEGMENT while the Rig creation uses max_turns(50) (also update the similar occurrence around the code referenced at the other occurrence near lines 288-290).
325-334:⚠️ Potential issue | 🟠 MajorThis makes persisted worker transcripts lossy.
dedup_tool_results()rewrites the canonicalhistory, andpersist_transcript()later serializes that mutated history. After the first maintenance pass, olderbrowser_snapshot/browser_tab_listpayloads inworker_runs.transcriptare placeholders, not the original results. That still conflicts with the PR goal of keeping full transcripts in worker run records.Also applies to: 467-469, 502-503, 667-675, 907-984
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/agent/worker.rs` around lines 325 - 334, The maintenance step currently mutates the canonical history via dedup_tool_results() and maybe_compact_history(), causing persisted worker_runs.transcript entries (e.g., browser_snapshot and browser_tab_list payloads) to become placeholders instead of original results; instead, perform dedup/compaction on a copy used only for LLM context assembly and leave the canonical history intact for persist_transcript(). Concretely, change calls to dedup_tool_results(&mut history) and maybe_compact_history(&mut compacted_history, &mut history) so they operate on a cloned history (or return a compacted copy) used for model calls, or adjust dedup_tool_results to return a non-mutating compacted vector, and ensure persist_transcript() serializes the original, unmodified history. Also apply the same non-mutating approach at the other occurrences that call dedup_tool_results/maybe_compact_history so worker_runs.transcript keeps full original payloads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/agent/worker.rs`:
- Around line 18-25: The compaction cadence constant TURNS_PER_SEGMENT was
incorrectly reused to set the worker Rig's turn budget via
default_max_turns(TURNS_PER_SEGMENT), causing unintended MaxTurnsError; revert
the worker Rig's explicit max_turns to the required budget (use max_turns(50)
when constructing the worker Rig or call default_max_turns(50)) and keep
TURNS_PER_SEGMENT solely for compaction/checkpoint logic, ensuring compaction
checks still use TURNS_PER_SEGMENT while the Rig creation uses max_turns(50)
(also update the similar occurrence around the code referenced at the other
occurrence near lines 288-290).
- Around line 325-334: The maintenance step currently mutates the canonical
history via dedup_tool_results() and maybe_compact_history(), causing persisted
worker_runs.transcript entries (e.g., browser_snapshot and browser_tab_list
payloads) to become placeholders instead of original results; instead, perform
dedup/compaction on a copy used only for LLM context assembly and leave the
canonical history intact for persist_transcript(). Concretely, change calls to
dedup_tool_results(&mut history) and maybe_compact_history(&mut
compacted_history, &mut history) so they operate on a cloned history (or return
a compacted copy) used for model calls, or adjust dedup_tool_results to return a
non-mutating compacted vector, and ensure persist_transcript() serializes the
original, unmodified history. Also apply the same non-mutating approach at the
other occurrences that call dedup_tool_results/maybe_compact_history so
worker_runs.transcript keeps full original payloads.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0b57f1ca-b5db-4bcc-b6c0-824308e84996
📒 Files selected for processing (1)
src/agent/worker.rs
Workers now catch retriable errors (upstream 500s, timeouts, rate limits that survived model-level retries) and back off with exponential delay before retrying, up to 5 attempts. Previously any error that wasn't a context overflow or cancellation killed the worker immediately. Also added missing patterns to is_retriable_error: generic server errors like 'The server had an error' and '500' status codes that OpenRouter wraps in various phrasings.
Summary
Loop guard now tracks consecutive identical calls, not lifetime totals.
browser_snapshot(always empty args) was getting permanently blocked after 7 total uses across a session. Now the counter resets when a different tool call runs in between. Observation tools (browser_snapshot,browser_tab_list) get the poll multiplier on thresholds and are excluded from ping-pong detection.Dedup stale tool results before every LLM call. All but the most recent
browser_snapshotandbrowser_tab_listresults are replaced with a one-liner ([browser_snapshot output superseded...]). The tool call/result structure stays intact but the multi-KB ARIA tree content is gone. Full transcript preserved in worker run records.Pre-prompt compaction and smaller segments. Context usage checked before every LLM call (not just at segment boundaries).
TURNS_PER_SEGMENTreduced from 25 → 15 so compaction checks run more frequently. Overflow retry reduced from 3 → 2 since dedup + pre-prompt compaction should prevent hitting it.Fixed
get_element_centererror message to tell the model to runbrowser_snapshotinstead of the useless "try scrolling or taking a screenshot."Test plan
All 460 lib tests pass including 3 new loop guard tests (
non_consecutive_identical_calls_allowed,consecutive_identical_calls_still_blocked,observation_tool_consecutive_gets_relaxed_threshold).Note
Technical changes: Three files modified (253 additions, 7 deletions). Core improvements: new
dedup_tool_results()function strips stale browser snapshots from history before LLM calls; loop guard refactored withlast_call_hashtracking to reset consecutive counters when non-identical tools run; observation tools categorized separately to get relaxed thresholds and bypass ping-pong detection. Context management now happens pre-prompt (not just post-max-turns), reducing overflow scenarios. Fully tested with new loop guard unit tests validating the consecutive-vs-interleaved behavior and observation tool multiplier.Written by Tembo for commit 18744a5. This will update automatically on new commits.