Skip to content

feat(diagnostics): detect empty-turn upstream responses + repro harness#228

Open
steventohme wants to merge 1 commit into
mainfrom
steven/empty-response-diagnostics
Open

feat(diagnostics): detect empty-turn upstream responses + repro harness#228
steventohme wants to merge 1 commit into
mainfrom
steven/empty-response-diagnostics

Conversation

@steventohme
Copy link
Copy Markdown
Collaborator

Why

Observed in prod (session bb251d00): qwen/qwen3.5-flash-02-23 returned a fully-formed but empty assistant message — message_startmessage_delta(end_turn)message_stop, zero content blocks between. The router faithfully relayed it, CC saw a properly-closed turn with nothing to render, conversation went silent.

Per-model context-window data (PR #227) confirms the model has 1M context, so it's not a window violation. Likely a provider-side issue (OpenRouter routing to a flaky upstream for that model) or a model-quality issue (Qwen flash returning EOS on certain prompt shapes). Either way, we currently have zero visibility when this happens — the structured logs say "ProxyMessages complete" with status=200 and we have to look at CC's missing UI to know anything went wrong.

This PR adds the visibility we need to root-cause it.

What's in here

1. Translator-level anomaly detection

AnthropicSSETranslator now tracks contentBlocksEmitted (excluding the routing marker block — that's not real model output) and exposes:

  • EmptyTurnEmitted() bool — true when a streaming turn closed with zero content blocks
  • WithRawUpstreamCapture(*bytes.Buffer) — opt-in tee of every upstream byte
  • RawUpstreamBytes() []byte — returns a copy of captured bytes

2. Proxy-side anomaly logging

After every Anthropic-format translation path (OpenAI-compat + Gemini chains), logEmptyTurnIfDetected checks the translator and emits a WARN with:

  • request_id, decision_model, decision_provider, decision_reason
  • message_count, estimated_input_tokens, request_body_bytes
  • When WEAVE_ROUTER_DEBUG_EMPTY_RESPONSE=true is set: upstream_raw_bytes, upstream_raw_preview (capped at 8KB), upstream_raw_truncated

Off by default — raw upstream bodies can be large and contain user content. The structured metadata log is always on; flip the env flag during active investigation.

3. Repro harness

scripts/repro-empty-response.sh extracts the most recent large inbound from docker logs router-server-1, replays it N times against the running router, and reports empty / total. Pairs with the WARN log to root-cause without tcpdump.

Tested end-to-end: replays a 465KB inbound, reports per-iteration n_blocks, stop_reason, model, and latency.

Test plan

  • TestAnthropicSSETranslator_EmptyTurnDetection — zero-content + stop_reason=stopEmptyTurnEmitted() returns true; raw capture round-trips bytes
  • TestAnthropicSSETranslator_EmptyTurnNotFlaggedForNormalResponse — normal text turn → returns false
  • Full test suite: go test -tags=no_onnx ./... green
  • Repro script smoke-tested end-to-end

How to use

Once merged + redeployed:

# Optional: enable raw upstream capture during active investigation
export WEAVE_ROUTER_DEBUG_EMPTY_RESPONSE=true
wv mr

# In another terminal — replay last large inbound 10 times
./scripts/repro-empty-response.sh 10

# Inspect the anomaly logs
docker logs router-server-1 --since 5m 2>&1 \
  | sed -E 's/\x1b\[[0-9;]*m//g' \
  | grep "Empty assistant turn"

The empty-Qwen-turn bug (router faithfully streams message_start →
message_delta(end_turn) → message_stop with zero content blocks, CC
sees a closed turn with nothing to render and goes silent) needs
visibility before we can root-cause it. Per-model context-window
data confirmed the model has 1M context, so it's not a window
violation — likely a provider-side or model-quality issue.

Three pieces:

1. internal/translate/stream.go — AnthropicSSETranslator now counts
   content blocks emitted (excluding the routing marker) and exposes
   EmptyTurnEmitted() so callers can flag the pathological turn. New
   WithRawUpstreamCapture(*bytes.Buffer) tees every upstream byte into
   the caller's buffer; RawUpstreamBytes() returns a copy for logging.

2. internal/proxy/service.go — after every Anthropic-format translation
   path (OpenAI-compat and Gemini chains), logEmptyTurnIfDetected
   checks the translator and emits a WARN with request_id, decision
   model+provider+reason, message_count, estimated_input_tokens, and
   the request body size. When WEAVE_ROUTER_DEBUG_EMPTY_RESPONSE=true
   (off by default — bodies are large and may contain user data), the
   raw upstream bytes (capped at 8KB) are included so the failing
   response is visible from logs alone.

3. scripts/repro-empty-response.sh — replays the most recent large
   inbound from docker logs against the running router N times,
   counting empty responses and printing the model + latency per
   iteration. Pairs with the WARN log to root-cause without
   tcpdump.

Tests: two new translator unit tests cover (a) zero-content +
stop_reason flags as empty, (b) normal text turn does not flag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant