feat(diagnostics): detect empty-turn upstream responses + repro harness#228
Open
steventohme wants to merge 1 commit into
Open
feat(diagnostics): detect empty-turn upstream responses + repro harness#228steventohme wants to merge 1 commit into
steventohme wants to merge 1 commit into
Conversation
The empty-Qwen-turn bug (router faithfully streams message_start → message_delta(end_turn) → message_stop with zero content blocks, CC sees a closed turn with nothing to render and goes silent) needs visibility before we can root-cause it. Per-model context-window data confirmed the model has 1M context, so it's not a window violation — likely a provider-side or model-quality issue. Three pieces: 1. internal/translate/stream.go — AnthropicSSETranslator now counts content blocks emitted (excluding the routing marker) and exposes EmptyTurnEmitted() so callers can flag the pathological turn. New WithRawUpstreamCapture(*bytes.Buffer) tees every upstream byte into the caller's buffer; RawUpstreamBytes() returns a copy for logging. 2. internal/proxy/service.go — after every Anthropic-format translation path (OpenAI-compat and Gemini chains), logEmptyTurnIfDetected checks the translator and emits a WARN with request_id, decision model+provider+reason, message_count, estimated_input_tokens, and the request body size. When WEAVE_ROUTER_DEBUG_EMPTY_RESPONSE=true (off by default — bodies are large and may contain user data), the raw upstream bytes (capped at 8KB) are included so the failing response is visible from logs alone. 3. scripts/repro-empty-response.sh — replays the most recent large inbound from docker logs against the running router N times, counting empty responses and printing the model + latency per iteration. Pairs with the WARN log to root-cause without tcpdump. Tests: two new translator unit tests cover (a) zero-content + stop_reason flags as empty, (b) normal text turn does not flag.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Observed in prod (session bb251d00):
qwen/qwen3.5-flash-02-23returned a fully-formed but empty assistant message —message_start→message_delta(end_turn)→message_stop, zero content blocks between. The router faithfully relayed it, CC saw a properly-closed turn with nothing to render, conversation went silent.Per-model context-window data (PR #227) confirms the model has 1M context, so it's not a window violation. Likely a provider-side issue (OpenRouter routing to a flaky upstream for that model) or a model-quality issue (Qwen flash returning EOS on certain prompt shapes). Either way, we currently have zero visibility when this happens — the structured logs say "ProxyMessages complete" with
status=200and we have to look at CC's missing UI to know anything went wrong.This PR adds the visibility we need to root-cause it.
What's in here
1. Translator-level anomaly detection
AnthropicSSETranslatornow trackscontentBlocksEmitted(excluding the routing marker block — that's not real model output) and exposes:EmptyTurnEmitted() bool— true when a streaming turn closed with zero content blocksWithRawUpstreamCapture(*bytes.Buffer)— opt-in tee of every upstream byteRawUpstreamBytes() []byte— returns a copy of captured bytes2. Proxy-side anomaly logging
After every Anthropic-format translation path (OpenAI-compat + Gemini chains),
logEmptyTurnIfDetectedchecks the translator and emits a WARN with:request_id,decision_model,decision_provider,decision_reasonmessage_count,estimated_input_tokens,request_body_bytesWEAVE_ROUTER_DEBUG_EMPTY_RESPONSE=trueis set:upstream_raw_bytes,upstream_raw_preview(capped at 8KB),upstream_raw_truncatedOff by default — raw upstream bodies can be large and contain user content. The structured metadata log is always on; flip the env flag during active investigation.
3. Repro harness
scripts/repro-empty-response.shextracts the most recent large inbound fromdocker logs router-server-1, replays it N times against the running router, and reportsempty / total. Pairs with the WARN log to root-cause without tcpdump.Tested end-to-end: replays a 465KB inbound, reports per-iteration
n_blocks,stop_reason,model, and latency.Test plan
TestAnthropicSSETranslator_EmptyTurnDetection— zero-content +stop_reason=stop→EmptyTurnEmitted()returns true; raw capture round-trips bytesTestAnthropicSSETranslator_EmptyTurnNotFlaggedForNormalResponse— normal text turn → returns falsego test -tags=no_onnx ./...greenHow to use
Once merged + redeployed: