feat(api): AIN-245 routing v0 — brain wiring + §16 outcome capture#63
Conversation
The ainfera-auto path now flows through the importable ainfera_routing
brain (extracted into the sibling routing/ repo) and writes one
routing_outcomes row per decision. Replaces the legacy weighted-λ
auto_route() against aa_intelligence_index; the old function stays in
ainfera_api/routing/auto.py for now (unreachable from the live path,
cleanup tracked separately).
What changed
- routing_brain.dispatch_with_brain orchestrates: load catalog →
resolve_policy from request body + agent.spend_policy (F6 ruling, no
read of tenant_routing_policies) → brain.decide() → write decision
row → dispatch_inference loop on 5xx → complete the row.
- routing_outcomes service writes the §16 store (insert at decision,
complete at termination — succeeded / failed_provider_error /
failed_other / rejected_*).
- inference.py: replaces _resolve_auto_route with the brain path.
Surfaces 422 no_candidate_clears_floor (NT1) and 503
all_routed_models_failed; CapViolation / InsufficientFunds /
AgentNotActive paths preserved verbatim.
- orm.py: ModelORM.q_prior numeric(3,2) + new RoutingOutcomeORM.
- alembic 0026: adds models.q_prior + seeds the 5 §C anchors
(opus-4-7=0.95, gpt-5-5=0.93, gemini-3-1-pro=0.90, grok-4=0.86,
mistral-large-3=0.80) + creates routing_outcomes (RLS on).
- pyproject.toml: ainfera-routing dep pinned to routing@552ac1b.
Locked rulings (Notion v0 Build Spec §H, 2026-05-22)
- F4: routing_outcomes is the authoritative §16 store; AIN-218 cols on
inferences kept as denorm convenience for /v1/inferences/{id}
- F5: q_prior is the canonical quality signal; the 5 §C anchors are
seeded, the rest stay NULL until AIN-248 (no fabricated priors)
- F6: per-request policy reads from routing_hint + agent.spend_policy
only; tenant_routing_policies stays weighted-λ and unused
- F7: emergent gating — a model is a candidate only if it clears
price + q_prior + M_allowed; never a hardcoded active flag
- F8: extract-and-refactor; services/routing.dispatch_inference
orchestration + audit-chain ordering + tests preserved
Migration status on prod
Alembic 0026 was already applied to Supabase prod
(dftfpwzqxoebwzepygzl) via apply_migration before this PR opened.
alembic_version is at 20260522_0026; the 5 q_prior anchors are seeded;
routing_outcomes exists with RLS=ON; security advisors show only the
INFO-level rls_enabled_no_policy line for routing_outcomes, identical
to every other public.* table in the project. Re-running
`alembic upgrade head` on a fresh checkout is a no-op for 0026 (all
DDL has IF NOT EXISTS / IS DISTINCT FROM guards).
Tests
- 22 AIN-245 tests pass (6 integration + 16 unit) end-to-end against
a local Postgres mirroring prod state
- 454 baseline api unit tests still green
- mypy --strict clean (routing now ships py.typed)
- Brain unit tests live in the routing/ repo: 19 tests, all green
Caveats baked in (not solved here)
- C1: no latency data in catalog → observed latency captured to
routing_outcomes.observed_latency_ms; tiebreak degrades to q_prior
only; policy.latency_cap_ms is dormant
- C2: public routing/schema/routing-policy.schema.json is still
weighted-λ; republish is a separate founder tap (AIN-243)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AIN-245 [Routing v0] Static router — q_prior + M_allowed veto + §16 capture + replay (no learning)
The buildable cut. No learning. Turns the repo demo into a real static router. Spec: v0 Build Spec 🔒 Locked 2026-05-22 — build without asking
Build checklist
Pre-build calibration (parallel; blocks ship, not code)
Done (curl-200 AND deterministic replay)
|
| temperature=temperature, | ||
| idempotency_key=idempotency_key, | ||
| caller_task_type=caller_task_type, | ||
| ) |
There was a problem hiding this comment.
Idempotency duplicates routing outcomes
High Severity
The dispatch_with_brain function inserts a new routing_outcomes record for idempotent requests even when dispatch_inference correctly returns an existing inference. This creates duplicate routing_outcomes entries for the same successful inference, which can lead to inaccurate analytics and §16 reporting.
Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.
| hint = routing_hint or {} | ||
| disallow_raw = hint.get("disallow_brands") or [] | ||
| if not isinstance(disallow_raw, list) or not all(isinstance(b, str) for b in disallow_raw): | ||
| raise ValueError("routing_hint.disallow_brands must be a list of strings") |
There was a problem hiding this comment.
Invalid disallow_brands returns 500
Medium Severity
A malformed routing_hint.disallow_brands raises a ValueError in routing_brain.py. This error isn't caught by the ainfera-auto handler, causing clients to receive a generic 500 internal server error instead of a structured 4xx validation response.
Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.
| temperature=temperature, | ||
| idempotency_key=idempotency_key, | ||
| caller_task_type=caller_task_type, | ||
| ) |
There was a problem hiding this comment.
Conflicting policy on auto success
Medium Severity
The routing_brain uses the new policy resolution to stamp Section 16 fields in routing_outcomes. However, dispatch_inference (which writes to inferences) still derives these fields from legacy tenant_routing_policies. This causes a mismatch in Section 16 data between routing_outcomes and inferences for the same request.
Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.
| "policy_version": decision.ruleset_hash, | ||
| "candidate_count": len(decision.candidates), | ||
| "drop_summary": _drop_summary(decision.candidates), | ||
| }, |
There was a problem hiding this comment.
422 policy_version wrong field
Medium Severity
The 422 error response for ainfera-auto routing, when no candidates are found, includes a policy_version in its detail body. This field currently shows an 8-character brain digest, but it should reflect the full policy_version string persisted in the routing_outcomes record. This mismatch can lead to confusion when correlating error details with audit logs.
Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.
| chosen_model_id=chosen_model_id, | ||
| cost_projected_usd=projected_cost, | ||
| seed=seed_token, | ||
| ) |
There was a problem hiding this comment.
Outcome policy_version legacy hash
Medium Severity
The routing_outcomes.policy_version field is generated using a legacy hashing method, which can cause it to disagree with the ruleset_hash (from the routing brain's decision) on the same outcome record. This creates an inconsistency in the recorded policy.
Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.
| idempotency_key=idempotency_key, | ||
| caller_task_type=body.task_type, | ||
| request_id=str(agent_id), | ||
| ) |
There was a problem hiding this comment.
Auto path ignores capabilities
High Severity
ainfera-auto no longer applies routing_hint.require_capabilities or message-based capability detection. dispatch_with_brain builds candidates from active catalog rows only, so models without required capabilities (e.g. vision, tools) can be chosen while the API still documents those hints.
Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.
| temperature=temperature, | ||
| idempotency_key=idempotency_key, | ||
| caller_task_type=caller_task_type, | ||
| ) |
There was a problem hiding this comment.
Outcome row committed too early
High Severity
After insert_decision() only flushes the outcome row, dispatch_inference() performs db.commit() before the provider call. That commit persists the routing_outcomes insert with outcome_status still NULL, so incomplete decisions are visible and the intended single-transaction “decision + inference” coupling is broken.
Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.
The CI integration job runs `alembic upgrade head` (which seeds q_prior in migration 0026) followed by `seed_dev`. seed_dev inserts catalog rows AFTER 0026 has already run, so 0026's seed UPDATE matches zero rows on a fresh CI database — every model ends up with q_prior=NULL, which makes every test land on the no_candidate_enrolled reject path. Prod is unaffected: q_prior was applied to dftfpwzqxoebwzepygzl via `apply_migration` after seed_dev's equivalent had already populated the catalog, so the same UPDATE matched the existing anchor rows there. This is purely a test-environment ordering quirk. Fix: autouse pytest_asyncio fixture in test_routing_v0.py that pins q_prior on the 5 §C anchors before every test (idempotent, matches the migration's seed list). Six AIN-245 integration tests now green end-to-end even when the catalog starts with NULL q_prior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ture Round 2 of the CI repair. After landing the q_prior seed (commit b840469) the CI integration job got further but five anchors now hit the m_allowed_veto path instead of being chosen — same root cause: seed_dev inserts the catalog rows after migration 0025's slug-pattern brand_id backfill has already run, so anchors land with brand_id=NULL. build_candidates treats brand_slug=None as m_allowed=None (gated). Fix: extend the autouse fixture to also rewire brand_id by slug-pattern on the 5 §C anchors (claude-* → anthropic, gpt-* → openai, gemini-* → google, grok-* → xai, mistral-* → mistral). Verified locally by wiping both q_prior AND brand_id on those rows (`UPDATE models SET q_prior=NULL, brand_id=NULL WHERE slug IN (...)`) then running the integration suite — 6/6 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n CI Round 3 of the CI repair. With q_prior + brand_id wired (b3c6634), the brain now correctly chose gemini for NT2 (mistral vetoed) and mistral for NT3 (NULL-q_prior anchors skipped) — but the adapter registry crashed with `RuntimeError: no API key configured for provider 'X'` because the integration conftest only set placeholder keys for anthropic + openai + together. Add placeholders for gemini, mistral, and xai so all five active brands resolve through the adapter registry. The HTTP calls themselves remain respx-mocked; the placeholders only satisfy the at-init key presence check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 9 total unresolved issues (including 7 from previous reviews).
Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.
Reviewed by Cursor Bugbot for commit 5f52f2b. Configure here.
| db, | ||
| outcome_id=outcome_id, | ||
| outcome_status="failed_other", | ||
| observed_latency_ms=elapsed_ms, |
There was a problem hiding this comment.
Cap refusal wrong outcome status
Medium Severity
When dispatch_inference raises CapViolationError on the brain path, complete_decision sets outcome_status to failed_other even though migration 0026 defines rejected_budget for budget-style rejections and the brain already recorded a routing decision.
Reviewed by Cursor Bugbot for commit 5f52f2b. Configure here.
| temperature=body.temperature, | ||
| idempotency_key=idempotency_key, | ||
| caller_task_type=body.task_type, | ||
| request_id=str(agent_id), |
There was a problem hiding this comment.
Seed token not request-unique
Low Severity
The auto router passes request_id=str(agent_id), so without an Idempotency-Key the §E seed becomes req:<agent_id>:<agent_id> for every call from that agent, not a per-request replay token.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 5f52f2b. Configure here.


Summary
The
ainfera-autopath now flows through the importableainfera_routingbrain (extracted into the sibling repo, pinned to552ac1b) and writes onerouting_outcomesrow per decision. Implements quality-floor-then-min-cost with emergent 3-gate enrolment (price+q_prior+M_allowed).routing/repo (Apache 2.0, public).apiimportsdecide()via git source. Spark will reuse the same decision core offline for replay + training (v1+).routing_outcomestable — append-only, one row per routing decision (reject paths included).auto_route()inainfera_api/routing/auto.pystays but is unreachable from/v1/inference. Cleanup tracked separately.Locked rulings (Notion v0 Build Spec §H, 2026-05-22)
routing_outcomes= authoritative §16 store; AIN-218 cols oninferenceskept as denorm convenienceq_prior= canonical quality signal; 5 §C anchors seeded, rest stay NULL until AIN-248routing_hint+agent.spend_policyonly;tenant_routing_policiesweighted-λ stays unusedactiveflagdispatch_inferenceorchestration + audit-chain ordering preservedMigration status
Alembic 0026 has already been applied to Supabase prod (
dftfpwzqxoebwzepygzl) viaapply_migrationbefore this PR opened:alembic_version=20260522_0026routing_outcomesexists with RLS=ON, 5 indexesrls_enabled_no_policyline)Re-running
alembic upgrade headon a fresh checkout is a no-op for 0026 — all DDL is guarded withIF NOT EXISTS+IS DISTINCT FROM.Test plan
/v1/inference→routing_outcomesrow writtenmin_quality=0.96) → 422no_candidate_clears_floordisallow_brands=['mistral']→ gemini wins; veto reason capturedq_priornever chosenpy.typed)routing/repo (separate suite)Caveats baked in
routing_outcomes.observed_latency_ms; tiebreak degrades toq_prioronly;policy.latency_cap_msis dormant.routing/schema/routing-policy.schema.jsonis still weighted-λ; republish is a separate founder tap (AIN-243).What's next
q_empiricaloverridesq_prioras data accrues).🤖 Generated with Claude Code
Note
High Risk
High risk because it changes the production
ainfera-autorouting/dispatch path and introduces new persistent auditing writes (routing_outcomes) that run inside the inference transaction, affecting model selection and error handling.Overview
ainfera-autorequests are rerouted from the legacyauto_route()logic to a newdispatch_with_brain()flow that calls the externalainfera-routingdecide()core, applies policy fromrouting_hint+agent.spend_policy, and retries fallback candidates on upstream 5xx.Adds persistent §16 routing decision auditing via a new append-only
routing_outcomestable and write helpers (insert_decision+complete_decision) that record the full candidate set, chosen model, projected/actual cost, observed latency, and terminal outcome status (including reject paths), plus a new nullablemodels.q_priorquality signal seeded for anchor models via alembic0026. Integration/unit tests are added/updated to validate deterministic decisions, reject behavior (422), veto gating, and outcome row population, andainfera-routingis added as a pinned git dependency.Reviewed by Cursor Bugbot for commit 5f52f2b. Bugbot is set up for automated code reviews on this repo. Configure here.