feat(proxy): per-request provider failover for multi-binding models by steventohme · Pull Request #197 · workweave/router

steventohme · 2026-05-18T18:54:07Z

Summary

When a model has an ordered Providers list in catalog (e.g. deepseek/deepseek-v4-pro on Fireworks primary / OpenRouter fallback), the proxy now retries on the next binding when the first attempt fails before any bytes reach the client.

The catalog data shape has supported this since #189 (unified model catalog); this PR adds the runtime dispatch.

What changes

providers — new UpstreamErrorResponse type for buffered error bodies, plus IsRetryableStatus (408/429/5xx) and IsRetryable(err) classifier.
providers/openaicompat — on >=400, buffer the response (capped at MaxBufferedErrorBytes = 64KB) and return *UpstreamErrorResponse without touching the client writer. Success path unchanged.
router/catalog — new AvailableBindings(id, available) []ProviderBinding returning the ordered eligible list.
proxy/fallback.go (new) — dispatchWithFallback walks the binding list, retries on IsRetryable errors when no bytes have flushed, and on exhaustion writes the final buffered upstream error verbatim to the client. firstByteGuard tracks whether anything has reached the client (the retry-safety invariant).
proxy/service.go — ProxyMessages and ProxyOpenAIChatCompletion dispatch through the helper. Per-attempt prep + translator construction lives in a closure so retries get fresh translator state. ProxyGemini left untouched (Gemini bindings are all single-provider).

Safety carve-outs

BYOK / inbound client credentials — failover skipped. Those keys bind the request to a specific upstream and would 401 elsewhere.
Single-binding models (Anthropic / OpenAI / Google) — len(bindings) == 1, loop runs once, identical to today.
Legacy "all registered" mode (deploymentKeyedProviders == nil) — also returns single-attempt to avoid retrying on providers whose keys aren't actually wired.

Retry envelope

Per scoping:

Transport errors from http.Client.Do
Buffered upstream status: 408, 429, 5xx

4xx ≠ 408/429 are the client's fault — never retried; the buffered body flushes to the client immediately so they see the real upstream error.

Observability

router.upstream OTel span gains:

dispatch.primary_provider
dispatch.final_provider
dispatch.fallback_attempts (winning binding index)
dispatch.failover_used (bool, easy alert filter)

Same fields land in the ProxyMessages / ProxyOpenAIChatCompletion completion log lines.

Response headers on successful failover: x-router-fallback-from: <primary> and x-router-fallback-attempt: <n>.

What's load-bearing

openaicompat no longer flushes error bodies through to the writer. This is what makes retry possible — once WriteHeader is called on the client writer, the response is committed and any retry would result in two responses on the wire. The post-loop flushBufferedIfPresent writes the final upstream's envelope when retries are exhausted, so the customer-visible error envelope on exhaustion is unchanged in shape (just sourced from the last attempt instead of the first).
firstByteGuard.written is the hard gate on retry. Even a retryable error type means nothing if bytes already shipped — partial SSE is on the wire and committing to that attempt is the only correct move.
Translator state is fresh per attempt. The dispatch closure constructs the translator inside its body so a retry doesn't reuse a stateful object that may have seen partial bytes.

Test plan

go build -tags=no_onnx ./... clean
go test -tags=no_onnx ./... — all 21 packages green
New tests in internal/proxy/fallback_test.go (12 cases via scripted fakeClient): transport error → retry, retryable buffered error → retry, non-retryable status → no retry, bytes-flushed → no retry, exhaustion → final buffered body flushed, single-binding passthrough, BYOK skip, catalog resolution, IsRetryable classifier table
New tests in internal/providers/openaicompat/client_test.go (3 cases): error body buffered + writer pristine, 64KB cap on body, 4xx still buffered (classifier decides retry, not adapter)
Manual smoke against a staging deploy that has both FIREWORKS_API_KEY and OPENROUTER_API_KEY wired — issue a deepseek/deepseek-v4-pro request, observe dispatch.failover_used=false baseline; then kill the Fireworks edge (or point at a 503-serving stub) and observe x-router-fallback-from: fireworks on the response

Follow-ups not in this PR

Tests for ProxyMessages / ProxyOpenAIChatCompletion end-to-end failover behavior (current tests target the helper + adapter; the service-level wiring is exercised but not asserted directly)
Provider-specific max-attempts cap if/when models grow beyond 2 bindings
firstByteGuard doesn't yet support http.ResponseController flush hints — fine today since the openaicompat success path uses httputil.StreamBody and the failure path doesn't write at all

When a model has an ordered Providers list in catalog (e.g. deepseek/deepseek-v4-pro on Fireworks primary / OpenRouter fallback), the proxy now retries on the next binding when the first attempt fails before any bytes reach the client. - providers: new UpstreamErrorResponse type for buffered error bodies, plus IsRetryableStatus (408/429/5xx) and IsRetryable classifier. - openaicompat: on >=400, buffer the response (capped at 64KB) and return *UpstreamErrorResponse without touching the client writer. Success path unchanged. - catalog: new AvailableBindings(id, available) — ordered list of the model's bindings filtered to wired providers. - proxy/fallback.go: dispatchWithFallback walks the binding list, retries on IsRetryable errors when no bytes have flushed, and on exhaustion writes the final buffered upstream error verbatim to the client. - proxy/service.go: ProxyMessages and ProxyOpenAIChatCompletion dispatch through the helper. Per-attempt prep + translator construction lives in a closure so retries get fresh translator state. Failover is skipped when BYOK or inbound client credentials are present — those keys bind the request to a specific upstream and would 401 elsewhere. OTel attrs on router.upstream span: dispatch.primary_provider, dispatch.final_provider, dispatch.fallback_attempts, dispatch.failover_used. Same fields land in the ProxyMessages/ProxyOpenAIChatCompletion completion log lines. Response headers on successful failover: x-router-fallback-from and x-router-fallback-attempt expose the route the request actually took. Tests: 12 new dispatchWithFallback / classifier cases via a scripted fakeClient (transport error, retryable buffered error, non-retryable status, bytes-flushed lockout, exhaustion flush, single-binding passthrough, BYOK skip, catalog resolution). 3 new openaicompat cases for error buffering, 64KB cap, and 4xx buffering. Full router suite green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 1c64562. Configure here.}

cursor · 2026-05-18T18:59:36Z

+				WithRoutingMarker(routingMarkerFor(routeRes)).
+				WithEstimatedInputTokens(feats.Tokens)
+			err := p.Proxy(actx, d, prep, translator, r)
+			return finalizeAfterProxy(err, translator.Finalize)


Prep body baked for primary provider, reused on fallback

High Severity

PrepareOpenAI is called once with opts.TargetProvider set to the primary provider, and the resulting prep body is captured by the attempt closure and reused on every fallback attempt. Provider-specific fields (provider hint, reasoning object, system reminder, tool-turn temperature override) are gated on targetIsOpenRouter(opts) at prep time. On fallback to a different provider, the body carries the wrong provider-specific fields. For current catalog (Fireworks primary → OpenRouter fallback on deepseek/), the fallback request to OpenRouter is missing its required provider/reasoning hints. If ordering ever reverses, the OpenRouter-specific fields would cause a 400 on Fireworks.

Additional Locations (1)

internal/proxy/service.go#L1572-L1585

^{Triggered by learned rule: Emit-layer provider-specific fields must gate on TargetProvider, not model slug}

^{Reviewed by Cursor Bugbot for commit 1c64562. Configure here.}

cursor Bot reviewed May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(proxy): per-request provider failover for multi-binding models#197

feat(proxy): per-request provider failover for multi-binding models#197
steventohme wants to merge 1 commit into
mainfrom
feat/provider-failover-v2

steventohme commented May 18, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

steventohme commented May 18, 2026

Summary

What changes

Safety carve-outs

Retry envelope

Observability

What's load-bearing

Test plan

Follow-ups not in this PR

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 18, 2026

Choose a reason for hiding this comment

Prep body baked for primary provider, reused on fallback

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant