-
Notifications
You must be signed in to change notification settings - Fork 756
[BUG] _fix_broken_tool_use does not repair orphaned toolUse in the last message, causing session corruption on process termination #2025
Description
Checks
- I have updated to the lastest minor and patch version of Strands
- I have checked the documentation and this is not expected behavior
- I have searched ./issues and there are no duplicates of my issue
Strands Version
1.34.0
Python Version
3.13
Operating System
macOS 26.3.1
Installation Method
git clone
Steps to Reproduce
- Set up two agents: a supervisor (B) and a worker (C), running on separate runtimes
- Configure B with a RepositorySessionManager (e.g., S3-backed)
- Have C execute a tool that takes longer than B's runtime timeout (e.g., 6 minutes with a 5-minute timeout)
- B's process is terminated while waiting for C's response
- Send a new message to B using the same session ID
Expected Behavior
The session is restored with the orphaned toolUse repaired (a toolResult with status: "error" is inserted), and the agent
processes the new user message normally.
Actual Behavior
The model provider returns a ValidationException:
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the
ConverseStream operation: The model returned the following errors: messages.N: tool_use ids were
found without tool_result blocks immediately after: tooluse_XXXX. Each tool_use
block must have a corresponding tool_result block in the next message.
Additional Context
When an agent process is terminated while waiting for a tool response (e.g., runtime timeout), the session stored in S3 ends with
an assistant message containing toolUse but no corresponding toolResult. On the next invocation, the session is restored and a new
user message is appended, resulting in a ValidationException from the model provider.
RepositorySessionManager._fix_broken_tool_use already handles orphaned toolUse messages in the middle of the conversation, but
explicitly skips the last message:
python
# Check all but the latest message in the messages array
# The latest message being orphaned is handled in the agent class
if index + 1 < len(messages):
The "agent class" handling refers to _has_tool_use_in_latest_message in event_loop_cycle, which skips model invocation and re-
executes the tool when the last message is a toolUse. However, this only works when the same agent instance continues execution
within the same process. It does not work when a new process restores the session and a new user message is appended before
entering the event loop.
Possible Solution
Remove the if index + 1 < len(messages) guard in _fix_broken_tool_use so that the last message is also repaired. When the last
message is an orphaned toolUse, append a toolResult with status: "error" using the existing generate_missing_tool_result_content
utility.
This is safe because _fix_broken_tool_use is only called during session restoration (initialize), not during the event loop. At
restoration time, the original tool execution context is already lost, so re-executing the tool is not viable — reporting the
error to the model and letting it decide how to proceed is the correct behavior.
Related Issues
No response