feat(api): AIN-245 routing v0 — brain wiring + §16 outcome capture by hizrianraz · Pull Request #63 · ainfera-ai/api

hizrianraz · 2026-05-22T13:43:13Z

Summary

The ainfera-auto path now flows through the importable ainfera_routing brain (extracted into the sibling repo, pinned to 552ac1b) and writes one routing_outcomes row per decision. Implements quality-floor-then-min-cost with emergent 3-gate enrolment (price + q_prior + M_allowed).

Brain extracted to routing/ repo (Apache 2.0, public). api imports decide() via git source. Spark will reuse the same decision core offline for replay + training (v1+).
§16 outcome capture in the new routing_outcomes table — append-only, one row per routing decision (reject paths included).
Legacy auto_route() in ainfera_api/routing/auto.py stays but is unreachable from /v1/inference. Cleanup tracked separately.

Locked rulings (Notion v0 Build Spec §H, 2026-05-22)

Ruling	What it locked
F4	`routing_outcomes` = authoritative §16 store; AIN-218 cols on `inferences` kept as denorm convenience
F5	`q_prior` = canonical quality signal; 5 §C anchors seeded, rest stay NULL until AIN-248
F6	Per-request policy reads from `routing_hint` + `agent.spend_policy` only; `tenant_routing_policies` weighted-λ stays unused
F7	Emergent gating — never a hardcoded `active` flag
F8	Extract-and-refactor; `dispatch_inference` orchestration + audit-chain ordering preserved

Migration status

Alembic 0026 has already been applied to Supabase prod (dftfpwzqxoebwzepygzl) via apply_migration before this PR opened:

alembic_version = 20260522_0026
5 q_prior anchors seeded (opus-4-7=0.95, gpt-5-5=0.93, gemini-3-1-pro=0.90, grok-4=0.86, mistral-large-3=0.80)
routing_outcomes exists with RLS=ON, 5 indexes
Security advisors clean (only the family-standard INFO-level rls_enabled_no_policy line)

Re-running alembic upgrade head on a fresh checkout is a no-op for 0026 — all DDL is guarded with IF NOT EXISTS + IS DISTINCT FROM.

Test plan

Caveats baked in

C1 No latency data in catalog → observed latency captured to routing_outcomes.observed_latency_ms; tiebreak degrades to q_prior only; policy.latency_cap_ms is dormant.
C2 Public routing/schema/routing-policy.schema.json is still weighted-λ; republish is a separate founder tap (AIN-243).

What's next

Railway deploy — once this merges, the brain path goes live in prod.
AIN-248 — catalog enrolment feeder (q_prior backfill from AA Intelligence Index v4.0 + the 6 gated brands' M_allowed verdicts).
AIN-246 — v1 LinUCB learning (q_empirical overrides q_prior as data accrues).

🤖 Generated with Claude Code

Note

High Risk
High risk because it changes the production ainfera-auto routing/dispatch path and introduces new persistent auditing writes (routing_outcomes) that run inside the inference transaction, affecting model selection and error handling.

Overview
ainfera-auto requests are rerouted from the legacy auto_route() logic to a new dispatch_with_brain() flow that calls the external ainfera-routing decide() core, applies policy from routing_hint + agent.spend_policy, and retries fallback candidates on upstream 5xx.

Adds persistent §16 routing decision auditing via a new append-only routing_outcomes table and write helpers (insert_decision + complete_decision) that record the full candidate set, chosen model, projected/actual cost, observed latency, and terminal outcome status (including reject paths), plus a new nullable models.q_prior quality signal seeded for anchor models via alembic 0026. Integration/unit tests are added/updated to validate deterministic decisions, reject behavior (422), veto gating, and outcome row population, and ainfera-routing is added as a pinned git dependency.

^{Reviewed by Cursor Bugbot for commit 5f52f2b. Bugbot is set up for automated code reviews on this repo. Configure here.}

The ainfera-auto path now flows through the importable ainfera_routing brain (extracted into the sibling routing/ repo) and writes one routing_outcomes row per decision. Replaces the legacy weighted-λ auto_route() against aa_intelligence_index; the old function stays in ainfera_api/routing/auto.py for now (unreachable from the live path, cleanup tracked separately). What changed - routing_brain.dispatch_with_brain orchestrates: load catalog → resolve_policy from request body + agent.spend_policy (F6 ruling, no read of tenant_routing_policies) → brain.decide() → write decision row → dispatch_inference loop on 5xx → complete the row. - routing_outcomes service writes the §16 store (insert at decision, complete at termination — succeeded / failed_provider_error / failed_other / rejected_*). - inference.py: replaces _resolve_auto_route with the brain path. Surfaces 422 no_candidate_clears_floor (NT1) and 503 all_routed_models_failed; CapViolation / InsufficientFunds / AgentNotActive paths preserved verbatim. - orm.py: ModelORM.q_prior numeric(3,2) + new RoutingOutcomeORM. - alembic 0026: adds models.q_prior + seeds the 5 §C anchors (opus-4-7=0.95, gpt-5-5=0.93, gemini-3-1-pro=0.90, grok-4=0.86, mistral-large-3=0.80) + creates routing_outcomes (RLS on). - pyproject.toml: ainfera-routing dep pinned to routing@552ac1b. Locked rulings (Notion v0 Build Spec §H, 2026-05-22) - F4: routing_outcomes is the authoritative §16 store; AIN-218 cols on inferences kept as denorm convenience for /v1/inferences/{id} - F5: q_prior is the canonical quality signal; the 5 §C anchors are seeded, the rest stay NULL until AIN-248 (no fabricated priors) - F6: per-request policy reads from routing_hint + agent.spend_policy only; tenant_routing_policies stays weighted-λ and unused - F7: emergent gating — a model is a candidate only if it clears price + q_prior + M_allowed; never a hardcoded active flag - F8: extract-and-refactor; services/routing.dispatch_inference orchestration + audit-chain ordering + tests preserved Migration status on prod Alembic 0026 was already applied to Supabase prod (dftfpwzqxoebwzepygzl) via apply_migration before this PR opened. alembic_version is at 20260522_0026; the 5 q_prior anchors are seeded; routing_outcomes exists with RLS=ON; security advisors show only the INFO-level rls_enabled_no_policy line for routing_outcomes, identical to every other public.* table in the project. Re-running `alembic upgrade head` on a fresh checkout is a no-op for 0026 (all DDL has IF NOT EXISTS / IS DISTINCT FROM guards). Tests - 22 AIN-245 tests pass (6 integration + 16 unit) end-to-end against a local Postgres mirroring prod state - 454 baseline api unit tests still green - mypy --strict clean (routing now ships py.typed) - Brain unit tests live in the routing/ repo: 19 tests, all green Caveats baked in (not solved here) - C1: no latency data in catalog → observed latency captured to routing_outcomes.observed_latency_ms; tiebreak degrades to q_prior only; policy.latency_cap_ms is dormant - C2: public routing/schema/routing-policy.schema.json is still weighted-λ; republish is a separate founder tap (AIN-243) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

linear-code · 2026-05-22T13:43:16Z

cursor · 2026-05-22T13:49:51Z

+                temperature=temperature,
+                idempotency_key=idempotency_key,
+                caller_task_type=caller_task_type,
+            )


Idempotency duplicates routing outcomes

High Severity

The dispatch_with_brain function inserts a new routing_outcomes record for idempotent requests even when dispatch_inference correctly returns an existing inference. This creates duplicate routing_outcomes entries for the same successful inference, which can lead to inaccurate analytics and §16 reporting.

^{Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.}

cursor · 2026-05-22T13:49:51Z

+    hint = routing_hint or {}
+    disallow_raw = hint.get("disallow_brands") or []
+    if not isinstance(disallow_raw, list) or not all(isinstance(b, str) for b in disallow_raw):
+        raise ValueError("routing_hint.disallow_brands must be a list of strings")


Invalid disallow_brands returns 500

Medium Severity

A malformed routing_hint.disallow_brands raises a ValueError in routing_brain.py. This error isn't caught by the ainfera-auto handler, causing clients to receive a generic 500 internal server error instead of a structured 4xx validation response.

^{Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.}

cursor · 2026-05-22T13:49:51Z

+                temperature=temperature,
+                idempotency_key=idempotency_key,
+                caller_task_type=caller_task_type,
+            )


Conflicting policy on auto success

Medium Severity

The routing_brain uses the new policy resolution to stamp Section 16 fields in routing_outcomes. However, dispatch_inference (which writes to inferences) still derives these fields from legacy tenant_routing_policies. This causes a mismatch in Section 16 data between routing_outcomes and inferences for the same request.

^{Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.}

cursor · 2026-05-22T13:49:51Z

+                    "policy_version": decision.ruleset_hash,
+                    "candidate_count": len(decision.candidates),
+                    "drop_summary": _drop_summary(decision.candidates),
+                },


422 policy_version wrong field

Medium Severity

The 422 error response for ainfera-auto routing, when no candidates are found, includes a policy_version in its detail body. This field currently shows an 8-character brain digest, but it should reflect the full policy_version string persisted in the routing_outcomes record. This mismatch can lead to confusion when correlating error details with audit logs.

^{Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.}

cursor · 2026-05-22T13:49:51Z

+        chosen_model_id=chosen_model_id,
+        cost_projected_usd=projected_cost,
+        seed=seed_token,
+    )


Outcome policy_version legacy hash

Medium Severity

The routing_outcomes.policy_version field is generated using a legacy hashing method, which can cause it to disagree with the ruleset_hash (from the routing brain's decision) on the same outcome record. This creates an inconsistency in the recorded policy.

^{Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.}

cursor · 2026-05-22T13:49:51Z

+                idempotency_key=idempotency_key,
+                caller_task_type=body.task_type,
+                request_id=str(agent_id),
+            )


Auto path ignores capabilities

High Severity

ainfera-auto no longer applies routing_hint.require_capabilities or message-based capability detection. dispatch_with_brain builds candidates from active catalog rows only, so models without required capabilities (e.g. vision, tools) can be chosen while the API still documents those hints.

^{Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.}

cursor · 2026-05-22T13:49:51Z

+                temperature=temperature,
+                idempotency_key=idempotency_key,
+                caller_task_type=caller_task_type,
+            )


Outcome row committed too early

High Severity

After insert_decision() only flushes the outcome row, dispatch_inference() performs db.commit() before the provider call. That commit persists the routing_outcomes insert with outcome_status still NULL, so incomplete decisions are visible and the intended single-transaction “decision + inference” coupling is broken.

^{Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.}

The CI integration job runs `alembic upgrade head` (which seeds q_prior in migration 0026) followed by `seed_dev`. seed_dev inserts catalog rows AFTER 0026 has already run, so 0026's seed UPDATE matches zero rows on a fresh CI database — every model ends up with q_prior=NULL, which makes every test land on the no_candidate_enrolled reject path. Prod is unaffected: q_prior was applied to dftfpwzqxoebwzepygzl via `apply_migration` after seed_dev's equivalent had already populated the catalog, so the same UPDATE matched the existing anchor rows there. This is purely a test-environment ordering quirk. Fix: autouse pytest_asyncio fixture in test_routing_v0.py that pins q_prior on the 5 §C anchors before every test (idempotent, matches the migration's seed list). Six AIN-245 integration tests now green end-to-end even when the catalog starts with NULL q_prior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ture Round 2 of the CI repair. After landing the q_prior seed (commit b840469) the CI integration job got further but five anchors now hit the m_allowed_veto path instead of being chosen — same root cause: seed_dev inserts the catalog rows after migration 0025's slug-pattern brand_id backfill has already run, so anchors land with brand_id=NULL. build_candidates treats brand_slug=None as m_allowed=None (gated). Fix: extend the autouse fixture to also rewire brand_id by slug-pattern on the 5 §C anchors (claude-* → anthropic, gpt-* → openai, gemini-* → google, grok-* → xai, mistral-* → mistral). Verified locally by wiping both q_prior AND brand_id on those rows (`UPDATE models SET q_prior=NULL, brand_id=NULL WHERE slug IN (...)`) then running the integration suite — 6/6 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n CI Round 3 of the CI repair. With q_prior + brand_id wired (b3c6634), the brain now correctly chose gemini for NT2 (mistral vetoed) and mistral for NT3 (NULL-q_prior anchors skipped) — but the adapter registry crashed with `RuntimeError: no API key configured for provider 'X'` because the integration conftest only set placeholder keys for anthropic + openai + together. Add placeholders for gemini, mistral, and xai so all five active brands resolve through the adapter registry. The HTTP calls themselves remain respx-mocked; the placeholders only satisfy the at-init key presence check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 9 total unresolved issues (including 7 from previous reviews).

^{Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.}

^{Reviewed by Cursor Bugbot for commit 5f52f2b. Configure here.}

cursor · 2026-05-22T14:04:00Z

+                db,
+                outcome_id=outcome_id,
+                outcome_status="failed_other",
+                observed_latency_ms=elapsed_ms,


Cap refusal wrong outcome status

Medium Severity

When dispatch_inference raises CapViolationError on the brain path, complete_decision sets outcome_status to failed_other even though migration 0026 defines rejected_budget for budget-style rejections and the brain already recorded a routing decision.

^{Reviewed by Cursor Bugbot for commit 5f52f2b. Configure here.}

cursor · 2026-05-22T14:04:00Z

+                temperature=body.temperature,
+                idempotency_key=idempotency_key,
+                caller_task_type=body.task_type,
+                request_id=str(agent_id),


Seed token not request-unique

Low Severity

The auto router passes request_id=str(agent_id), so without an Idempotency-Key the §E seed becomes req:<agent_id>:<agent_id> for every call from that agent, not a per-request replay token.

Additional Locations (1)

ainfera_api/services/routing_brain.py#L254-L264

^{Reviewed by Cursor Bugbot for commit 5f52f2b. Configure here.}

cursor Bot reviewed May 22, 2026

View reviewed changes

hizrianraz and others added 3 commits May 22, 2026 20:51

hizrianraz merged commit 808c419 into main May 22, 2026
4 checks passed

hizrianraz deleted the feat/ain-245-routing-v0-brain branch May 22, 2026 13:59

cursor Bot reviewed May 22, 2026

View reviewed changes

hizrianraz mentioned this pull request May 22, 2026

fix(api): install git in Dockerfile so uv can fetch ainfera-routing #64

Merged

3 tasks

Conversation

hizrianraz commented May 22, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Locked rulings (Notion v0 Build Spec §H, 2026-05-22)

Migration status

Test plan

Caveats baked in

What's next

Uh oh!

linear-code Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Locked 2026-05-22 — build without asking

Build checklist

Pre-build calibration (parallel; blocks ship, not code)

Done (curl-200 AND deterministic replay)

Uh oh!

cursor Bot May 22, 2026

Choose a reason for hiding this comment

Idempotency duplicates routing outcomes

Uh oh!

cursor Bot May 22, 2026

Choose a reason for hiding this comment

Invalid disallow_brands returns 500

Uh oh!

cursor Bot May 22, 2026

Choose a reason for hiding this comment

Conflicting policy on auto success

Uh oh!

cursor Bot May 22, 2026

Choose a reason for hiding this comment

422 policy_version wrong field

Uh oh!

cursor Bot May 22, 2026

Choose a reason for hiding this comment

Outcome policy_version legacy hash

Uh oh!

cursor Bot May 22, 2026

Choose a reason for hiding this comment

Auto path ignores capabilities

Uh oh!

cursor Bot May 22, 2026

Choose a reason for hiding this comment

Outcome row committed too early

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 22, 2026

Choose a reason for hiding this comment

Cap refusal wrong outcome status

Uh oh!

cursor Bot May 22, 2026

Choose a reason for hiding this comment

Seed token not request-unique

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hizrianraz commented May 22, 2026 •

edited by cursor Bot

Loading

linear-code Bot commented May 22, 2026 •

edited

Loading