fix: prevent tool-use marker leakage in toolshim output#8310
fix: prevent tool-use marker leakage in toolshim output#8310eugenio wants to merge 1 commit intoblock:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8acacdcaad
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if let Some(value) = parse_json_value_tolerant(json_obj) { | ||
| let maybe_name = value.get("name").and_then(Value::as_str); | ||
| let maybe_args = value.get("arguments").and_then(Value::as_object); | ||
| if let (Some(raw_name), Some(arguments)) = (maybe_name, maybe_args) { |
There was a problem hiding this comment.
Require explicit tool syntax before promoting inline JSON
This parser promotes any JSON object with "name" and object "arguments" into an executable tool call, even when it is just example/output text. In toolshim mode, a normal assistant response like an instructional snippet ({"name":"shell","arguments":{...}}) will now be treated as a real tool invocation, which can trigger unintended tool execution (or approval prompts) without explicit tool markers.
Useful? React with 👍 / 👎.
| if lower.contains("using tool:") | ||
| || (text.text.contains("\"name\"") && text.text.contains("\"arguments\"")) | ||
| { | ||
| text.text.clear(); |
There was a problem hiding this comment.
Narrow residual-marker detection to avoid deleting valid text
The residual sanitizer clears text whenever it finds both "name" and "arguments", which matches many legitimate JSON explanations. In toolshim turns where parsing returns no tool calls, normal assistant content discussing those fields is erased entirely, producing blank or truncated responses unrelated to tool markers.
Useful? React with 👍 / 👎.
8acacdc to
3265e19
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3265e1967b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if let Some(msg) = accumulated_message { | ||
| let processed = toolshim_postprocess(msg, &toolshim_tools).await?; | ||
| yield (Some(processed), final_usage); |
There was a problem hiding this comment.
Preserve usage-only chunks in toolshim buffering
This branch only emits a stream item when accumulated_message is present, which drops turns that contain usage but no text message. Multiple provider formatters in this repo emit final usage as (None, Some(usage)) (for example OpenAI/Google paths), so in toolshim mode those usage-only completions are now lost and Agent::update_session_metrics never runs for that turn. Please emit final_usage even when no message was accumulated.
Useful? React with 👍 / 👎.
| warn!( | ||
| "Toolshim augmentation failed, skipping tool augmentation: {}", | ||
| e | ||
| ); | ||
| Ok(response) |
There was a problem hiding this comment.
Surface toolshim interpreter errors to callers
Catching all augmentation errors and returning the raw response silently disables toolshim behavior when backend selection/parsing fails (e.g., invalid GOOSE_TOOLSHIM_BACKEND or missing local model config introduced in this commit). In those cases no structured tool requests are produced, and the agent proceeds as if no tool call was needed, which is hard to diagnose and can stall tool-driven workflows. This should propagate an explicit error (or a structured failure) instead of silently continuing.
Useful? React with 👍 / 👎.
3265e19 to
ebf8535
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ebf85359cd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| while let Some(result) = stream.next().await { | ||
| let (msg_opt, usage_opt) = result?; | ||
|
|
||
| if let Some(usage) = usage_opt.as_ref() { | ||
| crate::providers::base::set_current_model(&usage.model); |
There was a problem hiding this comment.
Preserve cancellation points while buffering toolshim stream
In toolshim mode this loop now consumes the entire provider stream before emitting any chunk, which removes the per-chunk cancellation checkpoints that the agent loop relies on. Agent::run only checks the cancel token between yielded stream items, so a user cancel during a long toolshim response will not be honored until generation fully completes. This is a regression from the previous chunked behavior and makes stop/cancel effectively unresponsive for toolshim sessions.
Useful? React with 👍 / 👎.
ab47b72 to
8f09be1
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8f09be1ac4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ] { | ||
| stripped = stripped.replace(marker, " "); | ||
| } | ||
|
|
||
| stripped.trim().to_string() |
There was a problem hiding this comment.
Remove bare tool_call blocks, not just section-wrapped blocks
parse_tokenized_tool_calls accepts <|tool_call_begin|>...<|tool_call_end|> even without a surrounding <|tool_calls_section_begin|> block, but strip_tokenized_tool_markup only removes full spans when section markers exist and otherwise just deletes marker tokens. In that case the raw payload (functions.* and JSON arguments) is still shown to the user even though a tool request was extracted, which defeats the marker-leakage fix for malformed/partial outputs.
Useful? React with 👍 / 👎.
8f09be1 to
ea46170
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ea461702ae
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| for part in message.content { | ||
| if let MessageContent::Text(text) = part { | ||
| content.push_str(&text.text); |
There was a problem hiding this comment.
Handle ToolRequest chunks in local toolshim interpreter
In GOOSE_TOOLSHIM_BACKEND=local mode, this collector only appends MessageContent::Text and ignores MessageContent::ToolRequest. I checked the local inference native path (crates/goose/src/providers/local_inference/inference_native_tools.rs) and it emits tool calls as ToolRequest messages when the model outputs {"tool_calls": ...}; in that case content stays empty here, process_interpreter_response receives an empty string, and no tool calls are returned. The result is that local toolshim silently drops valid tool intents and the agent proceeds without executing the requested tools.
Useful? React with 👍 / 👎.
ea46170 to
ef3de12
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ef3de12f41
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| for marker in [ | ||
| TOOL_CALL_BEGIN, | ||
| TOOL_CALL_ARGUMENT_BEGIN, | ||
| TOOL_CALL_END, | ||
| TOOL_CALLS_SECTION_BEGIN, | ||
| TOOL_CALLS_SECTION_END, |
There was a problem hiding this comment.
Strip argument-end tool marker in backend sanitization
The tokenized cleanup path removes TOOL_CALL_ARGUMENT_BEGIN but never handles the corresponding <|tool_call_argument_end|> token. When models emit that end marker, parse_tokenized_tool_calls can fail to parse raw_args (the marker remains attached to the JSON), and the residual sanitizer can still leak marker text to users because this token is never removed. Add the argument-end marker to the backend marker set used for detection/stripping.
Useful? React with 👍 / 👎.
ef3de12 to
5d2c38a
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5d2c38ae86
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
ui/desktop/src/types/message.ts
Outdated
| } | ||
|
|
||
| // Strip out XML tool call markers that shouldn't be displayed to the user | ||
| textContent = stripToolCallMarkers(textContent); |
There was a problem hiding this comment.
Restrict marker stripping to assistant messages
getTextAndImageContent now always runs stripToolCallMarkers on every message, but this helper is also used by user-message rendering/editing (UserMessage calls it). That means any user-authored text containing literal tool-marker tokens (for example when discussing/debugging these markers) is silently removed, and the .trim() in the scrubber also rewrites original whitespace. This is user-visible data loss; marker sanitization should be scoped to assistant output (or gated by explicit leakage metadata) rather than applied universally.
Useful? React with 👍 / 👎.
Toolshim models emit raw markers (<|tool_calls_section_begin|>, "Using tool:", etc.) as plain text. Previously these leaked into final assistant output in two scenarios: 1. Streaming: markers arrived token-by-token across multiple chunks, so per-chunk sanitization never saw the full marker string. 2. Parse failure: when marker parsing returned zero tool calls (malformed markers), no sanitization ran at all. Changes: - Accumulate full response before toolshim post-processing instead of processing per-chunk, so markers spanning multiple stream chunks are detected and stripped. - Add catch-all sanitize_residual_markers() applied on every exit path of augment_message_with_tool_calls, covering parse-failure cases. - Sanitize markers in toolshim_postprocess error path (interpreter failure would previously return raw unsanitized message). - Add has_tool_markers() detector for programmatic checking. - Add frontend stripToolCallMarkers() in message.ts as defense-in-depth. - Add direct tokenized and inline JSON parsers bypassing the interpreter for faster, more reliable tool call extraction. - Add local llama.cpp toolshim backend (GOOSE_TOOLSHIM_BACKEND=local). - Add truncated JSON repair in safely_parse_json() for malformed args. - Increase main thread stack size to 8MB to prevent stack overflows. - Add 4 regression tests for marker leakage (15 toolshim tests total). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Eugenio La Cava <eugeniolcv@gmail.com>
5d2c38a to
1878efd
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1878efd8c6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| serde_json::from_str::<Value>(input).ok().or_else(|| { | ||
| let escaped = escape_invalid_backslashes_in_json_strings(input); | ||
| serde_json::from_str::<Value>(&escaped).ok() |
There was a problem hiding this comment.
Normalize Windows-style backslashes before first JSON parse
parse_json_value_tolerant returns immediately on the initial serde_json::from_str success path, so JSON like {"path":"C:\temp\foo"} is accepted with escape decoding (\t, \f, etc.) and silently converted to control characters instead of literal backslashes. Because the backslash-repair branch never runs in that case, toolshim direct parsing can corrupt common Windows path/command arguments before execution, leading to wrong filesystem targets or failed tool calls whenever arguments include sequences such as \t, \n, or \r.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This is a known trade-off. The first serde_json::from_str path succeeds because {"path":"C:\temp\foo"} is technically valid JSON — \t and \f are legal escape sequences. Running the backslash repair unconditionally would risk corrupting valid JSON that intentionally uses \t/\n/\r.
In practice, models that know these markers (Mistral, etc.) emit properly escaped \ for Windows paths — verified by the existing parses_tokenized_tool_call_with_windows_path_arguments test. This edge case would require the model to emit raw unescaped backslashes, which is a pre-existing limitation of any JSON-based tool call format, not a regression from this PR.
Summary
<|tool_calls_section_begin|>,<|tool_call_begin|>, etc.) are detected and stripped before reaching the UI.sanitize_residual_markers()on every exit path ofaugment_message_with_tool_calls, so malformed markers are stripped even when parsing yields zero tool calls.GOOSE_TOOLSHIM_BACKEND=localoption alongside the existing Ollama backend.stripToolCallMarkers()inmessage.tsas a last-resort UI-level sanitizer.safely_parse_json()now recovers from truncated/malformed JSON tool arguments.Test plan
🤖 Generated with Claude Code