Skip to content

fix: prevent tool-use marker leakage in toolshim output#8310

Open
eugenio wants to merge 1 commit intoblock:mainfrom
eugenio:fix/toolshim-marker-sanitization
Open

fix: prevent tool-use marker leakage in toolshim output#8310
eugenio wants to merge 1 commit intoblock:mainfrom
eugenio:fix/toolshim-marker-sanitization

Conversation

@eugenio
Copy link
Copy Markdown

@eugenio eugenio commented Apr 4, 2026

Summary

  • Fix streaming marker leakage: Accumulate full toolshim response before post-processing so markers spanning multiple stream chunks (<|tool_calls_section_begin|>, <|tool_call_begin|>, etc.) are detected and stripped before reaching the UI.
  • Fix parse-failure marker leakage: Add catch-all sanitize_residual_markers() on every exit path of augment_message_with_tool_calls, so malformed markers are stripped even when parsing yields zero tool calls.
  • Add direct tokenized + inline JSON parsers: Bypass the interpreter for faster, more reliable tool call extraction from model output.
  • Add local llama.cpp toolshim backend: New GOOSE_TOOLSHIM_BACKEND=local option alongside the existing Ollama backend.
  • Add frontend defense-in-depth: stripToolCallMarkers() in message.ts as a last-resort UI-level sanitizer.
  • Add truncated JSON repair: safely_parse_json() now recovers from truncated/malformed JSON tool arguments.
  • Increase main thread stack size: 8MB stack to prevent overflows on Windows.

Test plan

  • 15 toolshim unit tests pass (4 new regression tests for marker leakage)
  • Smoke test: plain reply flow — clean output, no markers
  • Smoke test: tool-use flow — tool call rendered cleanly, no raw markers in output
  • CI build and test suite

🤖 Generated with Claude Code

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8acacdcaad

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +364 to +367
if let Some(value) = parse_json_value_tolerant(json_obj) {
let maybe_name = value.get("name").and_then(Value::as_str);
let maybe_args = value.get("arguments").and_then(Value::as_object);
if let (Some(raw_name), Some(arguments)) = (maybe_name, maybe_args) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Require explicit tool syntax before promoting inline JSON

This parser promotes any JSON object with "name" and object "arguments" into an executable tool call, even when it is just example/output text. In toolshim mode, a normal assistant response like an instructional snippet ({"name":"shell","arguments":{...}}) will now be treated as a real tool invocation, which can trigger unintended tool execution (or approval prompts) without explicit tool markers.

Useful? React with 👍 / 👎.

Comment on lines +489 to +492
if lower.contains("using tool:")
|| (text.text.contains("\"name\"") && text.text.contains("\"arguments\""))
{
text.text.clear();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Narrow residual-marker detection to avoid deleting valid text

The residual sanitizer clears text whenever it finds both "name" and "arguments", which matches many legitimate JSON explanations. In toolshim turns where parsing returns no tool calls, normal assistant content discussing those fields is erased entirely, producing blank or truncated responses unrelated to tool markers.

Useful? React with 👍 / 👎.

@eugenio eugenio force-pushed the fix/toolshim-marker-sanitization branch from 8acacdc to 3265e19 Compare April 4, 2026 07:31
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3265e1967b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +360 to +362
if let Some(msg) = accumulated_message {
let processed = toolshim_postprocess(msg, &toolshim_tools).await?;
yield (Some(processed), final_usage);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve usage-only chunks in toolshim buffering

This branch only emits a stream item when accumulated_message is present, which drops turns that contain usage but no text message. Multiple provider formatters in this repo emit final usage as (None, Some(usage)) (for example OpenAI/Google paths), so in toolshim mode those usage-only completions are now lost and Agent::update_session_metrics never runs for that turn. Please emit final_usage even when no message was accumulated.

Useful? React with 👍 / 👎.

Comment on lines +130 to +134
warn!(
"Toolshim augmentation failed, skipping tool augmentation: {}",
e
);
Ok(response)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Surface toolshim interpreter errors to callers

Catching all augmentation errors and returning the raw response silently disables toolshim behavior when backend selection/parsing fails (e.g., invalid GOOSE_TOOLSHIM_BACKEND or missing local model config introduced in this commit). In those cases no structured tool requests are produced, and the agent proceeds as if no tool call was needed, which is hard to diagnose and can stall tool-driven workflows. This should propagate an explicit error (or a structured failure) instead of silently continuing.

Useful? React with 👍 / 👎.

@eugenio eugenio force-pushed the fix/toolshim-marker-sanitization branch from 3265e19 to ebf8535 Compare April 4, 2026 07:43
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ebf85359cd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +326 to +330
while let Some(result) = stream.next().await {
let (msg_opt, usage_opt) = result?;

if let Some(usage) = usage_opt.as_ref() {
crate::providers::base::set_current_model(&usage.model);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve cancellation points while buffering toolshim stream

In toolshim mode this loop now consumes the entire provider stream before emitting any chunk, which removes the per-chunk cancellation checkpoints that the agent loop relies on. Agent::run only checks the cancel token between yielded stream items, so a user cancel during a long toolshim response will not be honored until generation fully completes. This is a regression from the previous chunked behavior and makes stop/cancel effectively unresponsive for toolshim sessions.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8f09be1ac4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +407 to +411
] {
stripped = stripped.replace(marker, " ");
}

stripped.trim().to_string()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove bare tool_call blocks, not just section-wrapped blocks

parse_tokenized_tool_calls accepts <|tool_call_begin|>...<|tool_call_end|> even without a surrounding <|tool_calls_section_begin|> block, but strip_tokenized_tool_markup only removes full spans when section markers exist and otherwise just deletes marker tokens. In that case the raw payload (functions.* and JSON arguments) is still shown to the user even though a tool request was extracted, which defeats the marker-leakage fix for malformed/partial outputs.

Useful? React with 👍 / 👎.

@eugenio eugenio force-pushed the fix/toolshim-marker-sanitization branch from 8f09be1 to ea46170 Compare April 4, 2026 09:22
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ea461702ae

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +590 to +592
for part in message.content {
if let MessageContent::Text(text) = part {
content.push_str(&text.text);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle ToolRequest chunks in local toolshim interpreter

In GOOSE_TOOLSHIM_BACKEND=local mode, this collector only appends MessageContent::Text and ignores MessageContent::ToolRequest. I checked the local inference native path (crates/goose/src/providers/local_inference/inference_native_tools.rs) and it emits tool calls as ToolRequest messages when the model outputs {"tool_calls": ...}; in that case content stays empty here, process_interpreter_response receives an empty string, and no tool calls are returned. The result is that local toolshim silently drops valid tool intents and the agent proceeds without executing the requested tools.

Useful? React with 👍 / 👎.

@eugenio eugenio force-pushed the fix/toolshim-marker-sanitization branch from ea46170 to ef3de12 Compare April 4, 2026 09:41
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef3de12f41

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +416 to +421
for marker in [
TOOL_CALL_BEGIN,
TOOL_CALL_ARGUMENT_BEGIN,
TOOL_CALL_END,
TOOL_CALLS_SECTION_BEGIN,
TOOL_CALLS_SECTION_END,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Strip argument-end tool marker in backend sanitization

The tokenized cleanup path removes TOOL_CALL_ARGUMENT_BEGIN but never handles the corresponding <|tool_call_argument_end|> token. When models emit that end marker, parse_tokenized_tool_calls can fail to parse raw_args (the marker remains attached to the JSON), and the residual sanitizer can still leak marker text to users because this token is never removed. Add the argument-end marker to the backend marker set used for detection/stripping.

Useful? React with 👍 / 👎.

@eugenio eugenio force-pushed the fix/toolshim-marker-sanitization branch from ef3de12 to 5d2c38a Compare April 4, 2026 09:53
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d2c38ae86

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

}

// Strip out XML tool call markers that shouldn't be displayed to the user
textContent = stripToolCallMarkers(textContent);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict marker stripping to assistant messages

getTextAndImageContent now always runs stripToolCallMarkers on every message, but this helper is also used by user-message rendering/editing (UserMessage calls it). That means any user-authored text containing literal tool-marker tokens (for example when discussing/debugging these markers) is silently removed, and the .trim() in the scrubber also rewrites original whitespace. This is user-visible data loss; marker sanitization should be scoped to assistant output (or gated by explicit leakage metadata) rather than applied universally.

Useful? React with 👍 / 👎.

Toolshim models emit raw markers (<|tool_calls_section_begin|>, "Using
tool:", etc.) as plain text. Previously these leaked into final assistant
output in two scenarios:

1. Streaming: markers arrived token-by-token across multiple chunks, so
   per-chunk sanitization never saw the full marker string.
2. Parse failure: when marker parsing returned zero tool calls (malformed
   markers), no sanitization ran at all.

Changes:
- Accumulate full response before toolshim post-processing instead of
  processing per-chunk, so markers spanning multiple stream chunks are
  detected and stripped.
- Add catch-all sanitize_residual_markers() applied on every exit path
  of augment_message_with_tool_calls, covering parse-failure cases.
- Sanitize markers in toolshim_postprocess error path (interpreter
  failure would previously return raw unsanitized message).
- Add has_tool_markers() detector for programmatic checking.
- Add frontend stripToolCallMarkers() in message.ts as defense-in-depth.
- Add direct tokenized and inline JSON parsers bypassing the interpreter
  for faster, more reliable tool call extraction.
- Add local llama.cpp toolshim backend (GOOSE_TOOLSHIM_BACKEND=local).
- Add truncated JSON repair in safely_parse_json() for malformed args.
- Increase main thread stack size to 8MB to prevent stack overflows.
- Add 4 regression tests for marker leakage (15 toolshim tests total).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Eugenio La Cava <eugeniolcv@gmail.com>
@eugenio eugenio force-pushed the fix/toolshim-marker-sanitization branch from 5d2c38a to 1878efd Compare April 4, 2026 10:03
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1878efd8c6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +275 to +277
serde_json::from_str::<Value>(input).ok().or_else(|| {
let escaped = escape_invalid_backslashes_in_json_strings(input);
serde_json::from_str::<Value>(&escaped).ok()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Normalize Windows-style backslashes before first JSON parse

parse_json_value_tolerant returns immediately on the initial serde_json::from_str success path, so JSON like {"path":"C:\temp\foo"} is accepted with escape decoding (\t, \f, etc.) and silently converted to control characters instead of literal backslashes. Because the backslash-repair branch never runs in that case, toolshim direct parsing can corrupt common Windows path/command arguments before execution, leading to wrong filesystem targets or failed tool calls whenever arguments include sequences such as \t, \n, or \r.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a known trade-off. The first serde_json::from_str path succeeds because {"path":"C:\temp\foo"} is technically valid JSON — \t and \f are legal escape sequences. Running the backslash repair unconditionally would risk corrupting valid JSON that intentionally uses \t/\n/\r.

In practice, models that know these markers (Mistral, etc.) emit properly escaped \ for Windows paths — verified by the existing parses_tokenized_tool_call_with_windows_path_arguments test. This edge case would require the model to emit raw unescaped backslashes, which is a pre-existing limitation of any JSON-based tool call format, not a regression from this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant