feat(diagnostics): detect empty-turn upstream responses + repro harness by steventohme · Pull Request #228 · workweave/router

steventohme · 2026-05-22T01:50:32Z

Why

Observed in prod (session bb251d00): qwen/qwen3.5-flash-02-23 returned a fully-formed but empty assistant message — message_start → message_delta(end_turn) → message_stop, zero content blocks between. The router faithfully relayed it, CC saw a properly-closed turn with nothing to render, conversation went silent.

Per-model context-window data (PR #227) confirms the model has 1M context, so it's not a window violation. Likely a provider-side issue (OpenRouter routing to a flaky upstream for that model) or a model-quality issue (Qwen flash returning EOS on certain prompt shapes). Either way, we currently have zero visibility when this happens — the structured logs say "ProxyMessages complete" with status=200 and we have to look at CC's missing UI to know anything went wrong.

This PR adds the visibility we need to root-cause it.

What's in here

1. Translator-level anomaly detection

AnthropicSSETranslator now tracks contentBlocksEmitted (excluding the routing marker block — that's not real model output) and exposes:

EmptyTurnEmitted() bool — true when a streaming turn closed with zero content blocks
WithRawUpstreamCapture(*bytes.Buffer) — opt-in tee of every upstream byte
RawUpstreamBytes() []byte — returns a copy of captured bytes

2. Proxy-side anomaly logging

After every Anthropic-format translation path (OpenAI-compat + Gemini chains), logEmptyTurnIfDetected checks the translator and emits a WARN with:

request_id, decision_model, decision_provider, decision_reason
message_count, estimated_input_tokens, request_body_bytes
When WEAVE_ROUTER_DEBUG_EMPTY_RESPONSE=true is set: upstream_raw_bytes, upstream_raw_preview (capped at 8KB), upstream_raw_truncated

Off by default — raw upstream bodies can be large and contain user content. The structured metadata log is always on; flip the env flag during active investigation.

3. Repro harness

scripts/repro-empty-response.sh extracts the most recent large inbound from docker logs router-server-1, replays it N times against the running router, and reports empty / total. Pairs with the WARN log to root-cause without tcpdump.

Tested end-to-end: replays a 465KB inbound, reports per-iteration n_blocks, stop_reason, model, and latency.

Test plan

TestAnthropicSSETranslator_EmptyTurnDetection — zero-content + stop_reason=stop → EmptyTurnEmitted() returns true; raw capture round-trips bytes
TestAnthropicSSETranslator_EmptyTurnNotFlaggedForNormalResponse — normal text turn → returns false
Full test suite: go test -tags=no_onnx ./... green
Repro script smoke-tested end-to-end

How to use

Once merged + redeployed:

# Optional: enable raw upstream capture during active investigation
export WEAVE_ROUTER_DEBUG_EMPTY_RESPONSE=true
wv mr

# In another terminal — replay last large inbound 10 times
./scripts/repro-empty-response.sh 10

# Inspect the anomaly logs
docker logs router-server-1 --since 5m 2>&1 \
  | sed -E 's/\x1b\[[0-9;]*m//g' \
  | grep "Empty assistant turn"

The empty-Qwen-turn bug (router faithfully streams message_start → message_delta(end_turn) → message_stop with zero content blocks, CC sees a closed turn with nothing to render and goes silent) needs visibility before we can root-cause it. Per-model context-window data confirmed the model has 1M context, so it's not a window violation — likely a provider-side or model-quality issue. Three pieces: 1. internal/translate/stream.go — AnthropicSSETranslator now counts content blocks emitted (excluding the routing marker) and exposes EmptyTurnEmitted() so callers can flag the pathological turn. New WithRawUpstreamCapture(*bytes.Buffer) tees every upstream byte into the caller's buffer; RawUpstreamBytes() returns a copy for logging. 2. internal/proxy/service.go — after every Anthropic-format translation path (OpenAI-compat and Gemini chains), logEmptyTurnIfDetected checks the translator and emits a WARN with request_id, decision model+provider+reason, message_count, estimated_input_tokens, and the request body size. When WEAVE_ROUTER_DEBUG_EMPTY_RESPONSE=true (off by default — bodies are large and may contain user data), the raw upstream bytes (capped at 8KB) are included so the failing response is visible from logs alone. 3. scripts/repro-empty-response.sh — replays the most recent large inbound from docker logs against the running router N times, counting empty responses and printing the model + latency per iteration. Pairs with the WARN log to root-cause without tcpdump. Tests: two new translator unit tests cover (a) zero-content + stop_reason flags as empty, (b) normal text turn does not flag.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(diagnostics): detect empty-turn upstream responses + repro harness#228

feat(diagnostics): detect empty-turn upstream responses + repro harness#228
steventohme wants to merge 1 commit into
mainfrom
steven/empty-response-diagnostics

steventohme commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

steventohme commented May 22, 2026

Why

What's in here

1. Translator-level anomaly detection

2. Proxy-side anomaly logging

3. Repro harness

Test plan

How to use

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant