What Works
Right now if using foreground agents, the tool usages (but not thinking blocks - if there are any?) are streamed back to the parent session's stream-json responses. This allows you to group / track tool usages, responses, etc. via the stream and can expose to the system that uses the SDK.
The Problem
If, instead of foreground agents, you use background agents, not only do any tool usages / thinking blocks not flow through the SDK's main session stream, but neither do SendMessage responses or final results from the agent. This means a massive lack of visibility into those background agents:
- No knowledge of if it's stuck or done
- No knowledge of what tools it's using (or if it's off track)
- No feedback when it completes or what it has completed
- No visibility into messages send to/from those agents (or even to the main session), losing the inter-agent comms benefit
These visibility losses make background agents "fire and hope" shots. This is exacerbated if the primary session compacts while the other agents are doing things, as it has a tendency to stop receiving messages (or stops polling its inbox?), requiring the user to re-prompt it to check on the agents and how they're doing.
The Recommendation:
- When spawning background agents, attach all tool usages (and possibly thinking blocks if enabled) to stream-json messages that flow back through the main session's streaming with
parent_tool_use_id similar to foreground agents. While it'll add a level of complexity for some integrations to handle async tool streaming as part of their flows, the visibility gained would be highly worth it.
- When any agent uses SendMessage, both the send AND the receive should be treated as tool uses and streamed to provide visibility into where in the message stream the main session / team lead received the message. Similar for status checks.
Caveats
With the current mechanism of the actual token-chunk streaming, there's no identifier tied to which content each chunk is associated with, so interim chunks from different agents could likely get complicated to handle. I would recommend keeping only the main session use chunk-based streaming if enabled, and only do the full-message JSONs for background agent activity. This would would be a fine trade-off from my PoV.
What Works
Right now if using foreground agents, the tool usages (but not thinking blocks - if there are any?) are streamed back to the parent session's stream-json responses. This allows you to group / track tool usages, responses, etc. via the stream and can expose to the system that uses the SDK.
The Problem
If, instead of foreground agents, you use background agents, not only do any tool usages / thinking blocks not flow through the SDK's main session stream, but neither do SendMessage responses or final results from the agent. This means a massive lack of visibility into those background agents:
These visibility losses make background agents "fire and hope" shots. This is exacerbated if the primary session compacts while the other agents are doing things, as it has a tendency to stop receiving messages (or stops polling its inbox?), requiring the user to re-prompt it to check on the agents and how they're doing.
The Recommendation:
parent_tool_use_idsimilar to foreground agents. While it'll add a level of complexity for some integrations to handle async tool streaming as part of their flows, the visibility gained would be highly worth it.Caveats
With the current mechanism of the actual token-chunk streaming, there's no identifier tied to which content each chunk is associated with, so interim chunks from different agents could likely get complicated to handle. I would recommend keeping only the main session use chunk-based streaming if enabled, and only do the full-message JSONs for background agent activity. This would would be a fine trade-off from my PoV.