Skip to content

[BUG] _fix_broken_tool_use does not repair orphaned toolUse in the last message, causing session corruption on process termination #2025

@nmiyat

Description

@nmiyat

Checks

  • I have updated to the lastest minor and patch version of Strands
  • I have checked the documentation and this is not expected behavior
  • I have searched ./issues and there are no duplicates of my issue

Strands Version

1.34.0

Python Version

3.13

Operating System

macOS 26.3.1

Installation Method

git clone

Steps to Reproduce

  1. Set up two agents: a supervisor (B) and a worker (C), running on separate runtimes
  2. Configure B with a RepositorySessionManager (e.g., S3-backed)
  3. Have C execute a tool that takes longer than B's runtime timeout (e.g., 6 minutes with a 5-minute timeout)
  4. B's process is terminated while waiting for C's response
  5. Send a new message to B using the same session ID

Expected Behavior

The session is restored with the orphaned toolUse repaired (a toolResult with status: "error" is inserted), and the agent
processes the new user message normally.

Actual Behavior

The model provider returns a ValidationException:

botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the
ConverseStream operation: The model returned the following errors: messages.N: tool_use ids were
found without tool_result blocks immediately after: tooluse_XXXX. Each tool_use
block must have a corresponding tool_result block in the next message.

Additional Context

When an agent process is terminated while waiting for a tool response (e.g., runtime timeout), the session stored in S3 ends with
an assistant message containing toolUse but no corresponding toolResult. On the next invocation, the session is restored and a new
user message is appended, resulting in a ValidationException from the model provider.

RepositorySessionManager._fix_broken_tool_use already handles orphaned toolUse messages in the middle of the conversation, but
explicitly skips the last message:

python

# Check all but the latest message in the messages array
# The latest message being orphaned is handled in the agent class
if index + 1 < len(messages):

The "agent class" handling refers to _has_tool_use_in_latest_message in event_loop_cycle, which skips model invocation and re-
executes the tool when the last message is a toolUse. However, this only works when the same agent instance continues execution
within the same process. It does not work when a new process restores the session and a new user message is appended before
entering the event loop.

Possible Solution

Remove the if index + 1 < len(messages) guard in _fix_broken_tool_use so that the last message is also repaired. When the last
message is an orphaned toolUse, append a toolResult with status: "error" using the existing generate_missing_tool_result_content
utility.

This is safe because _fix_broken_tool_use is only called during session restoration (initialize), not during the event loop. At
restoration time, the original tool execution context is already lost, so re-executing the tool is not viable — reporting the
error to the model and letting it decide how to proceed is the correct behavior.

Related Issues

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions