Summary
The loop-streaming orchestrator emits the same response text through two redundant pipelines, causing downstream consumers that read both channels to accumulate duplicated content.
Reproduction
When extended thinking with interleaved content is enabled (interleaved-thinking-2025-05-14), the orchestrator:
- Phase 1 (events): Iterates content blocks and emits
content_block:start + content_block:end with full block text
- Phase 2 (yield): Re-yields the same
response.text token-by-token as content_block:delta events
Frontend consumers that handle both content_block:end (writing full text to message.content) and delta events (appending tokens to message.content) end up with message.content longer than the actual response. In the observed case: 2,009 chars of actual content inflated to 2,121 chars in message.content.
Impact
The Kepler desktop frontend has a liveTextTail guard in ChatMessage.tsx that detects the mismatch between contentParts text and message.content length, then creates a phantom tail fragment. The user sees only the last ~112 characters of a full response (starting mid-word), with the rest hidden behind tool call timeline entries.
Observed: User asked for dashboard KPI recommendations. Full response was 2,009 chars with tables and analysis. User saw only: "d I'll start there. Also happy to hear if you have other specific KPIs or filters in mind that I haven't listed."
Expected Behavior
Each content block should be delivered through one channel, not both:
- Streaming mode: Yield tokens (with
content_block:delta for observers), then emit content_block:end as a finalization signal (with metadata, not redundant content delivery)
- Non-streaming fallback: Emit
content_block:start + content_block:end with full content
Architectural Note
Per the amplifier-core design philosophy ("Explicit > implicit", "No hidden state"), consumers should not need to deduplicate content from the orchestrator. The orchestrator owns delivery policy and should own delivery correctness.
Workaround
Kepler desktop is applying a temporary frontend guard: only create liveTextTail during active streaming (message.isStreaming === true), not for finalized messages.
Environment
- amplifier-module-loop-streaming (latest via git)
- amplifier-module-provider-anthropic with interleaved thinking enabled
- Kepler desktop sidecar (amplifier-distro-kepler)
Summary
The
loop-streamingorchestrator emits the same response text through two redundant pipelines, causing downstream consumers that read both channels to accumulate duplicated content.Reproduction
When extended thinking with interleaved content is enabled (
interleaved-thinking-2025-05-14), the orchestrator:content_block:start+content_block:endwith full block textresponse.texttoken-by-token ascontent_block:deltaeventsFrontend consumers that handle both
content_block:end(writing full text tomessage.content) and delta events (appending tokens tomessage.content) end up withmessage.contentlonger than the actual response. In the observed case: 2,009 chars of actual content inflated to 2,121 chars inmessage.content.Impact
The Kepler desktop frontend has a
liveTextTailguard inChatMessage.tsxthat detects the mismatch betweencontentPartstext andmessage.contentlength, then creates a phantom tail fragment. The user sees only the last ~112 characters of a full response (starting mid-word), with the rest hidden behind tool call timeline entries.Observed: User asked for dashboard KPI recommendations. Full response was 2,009 chars with tables and analysis. User saw only:
"d I'll start there. Also happy to hear if you have other specific KPIs or filters in mind that I haven't listed."Expected Behavior
Each content block should be delivered through one channel, not both:
content_block:deltafor observers), then emitcontent_block:endas a finalization signal (with metadata, not redundant content delivery)content_block:start+content_block:endwith full contentArchitectural Note
Per the amplifier-core design philosophy ("Explicit > implicit", "No hidden state"), consumers should not need to deduplicate content from the orchestrator. The orchestrator owns delivery policy and should own delivery correctness.
Workaround
Kepler desktop is applying a temporary frontend guard: only create
liveTextTailduring active streaming (message.isStreaming === true), not for finalized messages.Environment