Skip to content

feat(api): AIN-245 routing v0 — brain wiring + §16 outcome capture#63

Merged
hizrianraz merged 4 commits into
mainfrom
feat/ain-245-routing-v0-brain
May 22, 2026
Merged

feat(api): AIN-245 routing v0 — brain wiring + §16 outcome capture#63
hizrianraz merged 4 commits into
mainfrom
feat/ain-245-routing-v0-brain

Conversation

@hizrianraz
Copy link
Copy Markdown
Contributor

@hizrianraz hizrianraz commented May 22, 2026

Summary

The ainfera-auto path now flows through the importable ainfera_routing brain (extracted into the sibling repo, pinned to 552ac1b) and writes one routing_outcomes row per decision. Implements quality-floor-then-min-cost with emergent 3-gate enrolment (price + q_prior + M_allowed).

  • Brain extracted to routing/ repo (Apache 2.0, public). api imports decide() via git source. Spark will reuse the same decision core offline for replay + training (v1+).
  • §16 outcome capture in the new routing_outcomes table — append-only, one row per routing decision (reject paths included).
  • Legacy auto_route() in ainfera_api/routing/auto.py stays but is unreachable from /v1/inference. Cleanup tracked separately.

Locked rulings (Notion v0 Build Spec §H, 2026-05-22)

Ruling What it locked
F4 routing_outcomes = authoritative §16 store; AIN-218 cols on inferences kept as denorm convenience
F5 q_prior = canonical quality signal; 5 §C anchors seeded, rest stay NULL until AIN-248
F6 Per-request policy reads from routing_hint + agent.spend_policy only; tenant_routing_policies weighted-λ stays unused
F7 Emergent gating — never a hardcoded active flag
F8 Extract-and-refactor; dispatch_inference orchestration + audit-chain ordering preserved

Migration status

Alembic 0026 has already been applied to Supabase prod (dftfpwzqxoebwzepygzl) via apply_migration before this PR opened:

  • alembic_version = 20260522_0026
  • 5 q_prior anchors seeded (opus-4-7=0.95, gpt-5-5=0.93, gemini-3-1-pro=0.90, grok-4=0.86, mistral-large-3=0.80)
  • routing_outcomes exists with RLS=ON, 5 indexes
  • Security advisors clean (only the family-standard INFO-level rls_enabled_no_policy line)

Re-running alembic upgrade head on a fresh checkout is a no-op for 0026 — all DDL is guarded with IF NOT EXISTS + IS DISTINCT FROM.

Test plan

  • 22 AIN-245 tests pass end-to-end against local Postgres mirroring prod state:
  • 454 baseline api unit tests still green
  • mypy --strict clean (routing now ships py.typed)
  • ruff check clean across both packages
  • 19 brain unit tests in routing/ repo (separate suite)

Caveats baked in

  • C1 No latency data in catalog → observed latency captured to routing_outcomes.observed_latency_ms; tiebreak degrades to q_prior only; policy.latency_cap_ms is dormant.
  • C2 Public routing/schema/routing-policy.schema.json is still weighted-λ; republish is a separate founder tap (AIN-243).

What's next

  • Railway deploy — once this merges, the brain path goes live in prod.
  • AIN-248 — catalog enrolment feeder (q_prior backfill from AA Intelligence Index v4.0 + the 6 gated brands' M_allowed verdicts).
  • AIN-246 — v1 LinUCB learning (q_empirical overrides q_prior as data accrues).

🤖 Generated with Claude Code


Note

High Risk
High risk because it changes the production ainfera-auto routing/dispatch path and introduces new persistent auditing writes (routing_outcomes) that run inside the inference transaction, affecting model selection and error handling.

Overview
ainfera-auto requests are rerouted from the legacy auto_route() logic to a new dispatch_with_brain() flow that calls the external ainfera-routing decide() core, applies policy from routing_hint + agent.spend_policy, and retries fallback candidates on upstream 5xx.

Adds persistent §16 routing decision auditing via a new append-only routing_outcomes table and write helpers (insert_decision + complete_decision) that record the full candidate set, chosen model, projected/actual cost, observed latency, and terminal outcome status (including reject paths), plus a new nullable models.q_prior quality signal seeded for anchor models via alembic 0026. Integration/unit tests are added/updated to validate deterministic decisions, reject behavior (422), veto gating, and outcome row population, and ainfera-routing is added as a pinned git dependency.

Reviewed by Cursor Bugbot for commit 5f52f2b. Bugbot is set up for automated code reviews on this repo. Configure here.

The ainfera-auto path now flows through the importable ainfera_routing
brain (extracted into the sibling routing/ repo) and writes one
routing_outcomes row per decision. Replaces the legacy weighted-λ
auto_route() against aa_intelligence_index; the old function stays in
ainfera_api/routing/auto.py for now (unreachable from the live path,
cleanup tracked separately).

What changed
- routing_brain.dispatch_with_brain orchestrates: load catalog →
  resolve_policy from request body + agent.spend_policy (F6 ruling, no
  read of tenant_routing_policies) → brain.decide() → write decision
  row → dispatch_inference loop on 5xx → complete the row.
- routing_outcomes service writes the §16 store (insert at decision,
  complete at termination — succeeded / failed_provider_error /
  failed_other / rejected_*).
- inference.py: replaces _resolve_auto_route with the brain path.
  Surfaces 422 no_candidate_clears_floor (NT1) and 503
  all_routed_models_failed; CapViolation / InsufficientFunds /
  AgentNotActive paths preserved verbatim.
- orm.py: ModelORM.q_prior numeric(3,2) + new RoutingOutcomeORM.
- alembic 0026: adds models.q_prior + seeds the 5 §C anchors
  (opus-4-7=0.95, gpt-5-5=0.93, gemini-3-1-pro=0.90, grok-4=0.86,
  mistral-large-3=0.80) + creates routing_outcomes (RLS on).
- pyproject.toml: ainfera-routing dep pinned to routing@552ac1b.

Locked rulings (Notion v0 Build Spec §H, 2026-05-22)
- F4: routing_outcomes is the authoritative §16 store; AIN-218 cols on
  inferences kept as denorm convenience for /v1/inferences/{id}
- F5: q_prior is the canonical quality signal; the 5 §C anchors are
  seeded, the rest stay NULL until AIN-248 (no fabricated priors)
- F6: per-request policy reads from routing_hint + agent.spend_policy
  only; tenant_routing_policies stays weighted-λ and unused
- F7: emergent gating — a model is a candidate only if it clears
  price + q_prior + M_allowed; never a hardcoded active flag
- F8: extract-and-refactor; services/routing.dispatch_inference
  orchestration + audit-chain ordering + tests preserved

Migration status on prod
Alembic 0026 was already applied to Supabase prod
(dftfpwzqxoebwzepygzl) via apply_migration before this PR opened.
alembic_version is at 20260522_0026; the 5 q_prior anchors are seeded;
routing_outcomes exists with RLS=ON; security advisors show only the
INFO-level rls_enabled_no_policy line for routing_outcomes, identical
to every other public.* table in the project. Re-running
`alembic upgrade head` on a fresh checkout is a no-op for 0026 (all
DDL has IF NOT EXISTS / IS DISTINCT FROM guards).

Tests
- 22 AIN-245 tests pass (6 integration + 16 unit) end-to-end against
  a local Postgres mirroring prod state
- 454 baseline api unit tests still green
- mypy --strict clean (routing now ships py.typed)
- Brain unit tests live in the routing/ repo: 19 tests, all green

Caveats baked in (not solved here)
- C1: no latency data in catalog → observed latency captured to
  routing_outcomes.observed_latency_ms; tiebreak degrades to q_prior
  only; policy.latency_cap_ms is dormant
- C2: public routing/schema/routing-policy.schema.json is still
  weighted-λ; republish is a separate founder tap (AIN-243)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 22, 2026

AIN-245 [Routing v0] Static router — q_prior + M_allowed veto + §16 capture + replay (no learning)

The buildable cut. No learning. Turns the repo demo into a real static router. Spec: v0 Build Spec

🔒 Locked 2026-05-22 — build without asking

  • Objective = quality-floor-then-min-cost. Drop candidates below the request/template min_quality bar (on q_prior), then pick the cheapest clearing it. Tie → higher q_prior, then lower latency. No weighted-λ.
  • Architecture: brain in routing, runtime in api. ainfera-ai/routing ships the decision core (q_prior + optimizer) as an importable policy engine + spec/templates/routing-policy.schema.json; ainfera-ai/api /v1/inference imports it, dispatches, writes §16. Spark Brain-Factory trains/replays offline from the same store.
  • §16 store = Supabase routing_outcomes in prod (dftfpwzqxoebwzepygzl), insert-only, exact §16 columns; DDL = first v0 migration.
  • Ontology v1.2 verified — covers every v0 term.

Build checklist

  • routing: q_prior scoring from seed table — 5 frontier models (claude-opus-4-7, gpt-5-5, gemini-3-1-pro, grok-4, mistral-large-3); engine takes N
  • routing: optimizer = M_allowed veto → budget gate → quality-floor → min-cost
  • api: import the routing engine into /v1/inference; dispatch
  • api: §16 capture → routing_outcomes (insert-only); full schema (request_id, candidates, chosen_model, q_prior_used, M_allowed_set, cost_projected, cost_actual, latency, outcome_status, policy_version, traffic_origin, fleet_agent)
  • Exact-match request cache
  • Provider fallback (timeout / 5xx / refusal → next survivor; never bill a failed route)
  • Drain-proof budgets (per-call + per-agent ceilings enforced before dispatch)
  • Deterministic replay (same input + policy_version → identical route)

Pre-build calibration (parallel; blocks ship, not code)

  • Exact $/Mtok for the 5 seed rows
  • Anchor q_prior to current public benchmarks (founder sign-off)

Done (curl-200 AND deterministic replay)

  • curl → 200 routed completion + signed audit entry
  • Determinism test green (same input + policy_version → same route)
  • Every decision in routing_outcomes w/ full §16 schema
  • Negative: over-budget route blocked
  • Negative: M_allowed=false model excluded

Review in Linear

temperature=temperature,
idempotency_key=idempotency_key,
caller_task_type=caller_task_type,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idempotency duplicates routing outcomes

High Severity

The dispatch_with_brain function inserts a new routing_outcomes record for idempotent requests even when dispatch_inference correctly returns an existing inference. This creates duplicate routing_outcomes entries for the same successful inference, which can lead to inaccurate analytics and §16 reporting.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.

hint = routing_hint or {}
disallow_raw = hint.get("disallow_brands") or []
if not isinstance(disallow_raw, list) or not all(isinstance(b, str) for b in disallow_raw):
raise ValueError("routing_hint.disallow_brands must be a list of strings")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid disallow_brands returns 500

Medium Severity

A malformed routing_hint.disallow_brands raises a ValueError in routing_brain.py. This error isn't caught by the ainfera-auto handler, causing clients to receive a generic 500 internal server error instead of a structured 4xx validation response.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.

temperature=temperature,
idempotency_key=idempotency_key,
caller_task_type=caller_task_type,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflicting policy on auto success

Medium Severity

The routing_brain uses the new policy resolution to stamp Section 16 fields in routing_outcomes. However, dispatch_inference (which writes to inferences) still derives these fields from legacy tenant_routing_policies. This causes a mismatch in Section 16 data between routing_outcomes and inferences for the same request.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.

"policy_version": decision.ruleset_hash,
"candidate_count": len(decision.candidates),
"drop_summary": _drop_summary(decision.candidates),
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

422 policy_version wrong field

Medium Severity

The 422 error response for ainfera-auto routing, when no candidates are found, includes a policy_version in its detail body. This field currently shows an 8-character brain digest, but it should reflect the full policy_version string persisted in the routing_outcomes record. This mismatch can lead to confusion when correlating error details with audit logs.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.

chosen_model_id=chosen_model_id,
cost_projected_usd=projected_cost,
seed=seed_token,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outcome policy_version legacy hash

Medium Severity

The routing_outcomes.policy_version field is generated using a legacy hashing method, which can cause it to disagree with the ruleset_hash (from the routing brain's decision) on the same outcome record. This creates an inconsistency in the recorded policy.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.

idempotency_key=idempotency_key,
caller_task_type=body.task_type,
request_id=str(agent_id),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto path ignores capabilities

High Severity

ainfera-auto no longer applies routing_hint.require_capabilities or message-based capability detection. dispatch_with_brain builds candidates from active catalog rows only, so models without required capabilities (e.g. vision, tools) can be chosen while the API still documents those hints.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.

temperature=temperature,
idempotency_key=idempotency_key,
caller_task_type=caller_task_type,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outcome row committed too early

High Severity

After insert_decision() only flushes the outcome row, dispatch_inference() performs db.commit() before the provider call. That commit persists the routing_outcomes insert with outcome_status still NULL, so incomplete decisions are visible and the intended single-transaction “decision + inference” coupling is broken.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 447bbbd. Configure here.

hizrianraz and others added 3 commits May 22, 2026 20:51
The CI integration job runs `alembic upgrade head` (which seeds q_prior
in migration 0026) followed by `seed_dev`. seed_dev inserts catalog
rows AFTER 0026 has already run, so 0026's seed UPDATE matches zero
rows on a fresh CI database — every model ends up with q_prior=NULL,
which makes every test land on the no_candidate_enrolled reject path.

Prod is unaffected: q_prior was applied to dftfpwzqxoebwzepygzl via
`apply_migration` after seed_dev's equivalent had already populated
the catalog, so the same UPDATE matched the existing anchor rows
there. This is purely a test-environment ordering quirk.

Fix: autouse pytest_asyncio fixture in test_routing_v0.py that pins
q_prior on the 5 §C anchors before every test (idempotent, matches
the migration's seed list). Six AIN-245 integration tests now green
end-to-end even when the catalog starts with NULL q_prior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ture

Round 2 of the CI repair. After landing the q_prior seed (commit
b840469) the CI integration job got further but five anchors now hit
the m_allowed_veto path instead of being chosen — same root cause:
seed_dev inserts the catalog rows after migration 0025's slug-pattern
brand_id backfill has already run, so anchors land with brand_id=NULL.
build_candidates treats brand_slug=None as m_allowed=None (gated).

Fix: extend the autouse fixture to also rewire brand_id by slug-pattern
on the 5 §C anchors (claude-* → anthropic, gpt-* → openai,
gemini-* → google, grok-* → xai, mistral-* → mistral).

Verified locally by wiping both q_prior AND brand_id on those rows
(`UPDATE models SET q_prior=NULL, brand_id=NULL WHERE slug IN (...)`)
then running the integration suite — 6/6 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n CI

Round 3 of the CI repair. With q_prior + brand_id wired (b3c6634), the
brain now correctly chose gemini for NT2 (mistral vetoed) and mistral
for NT3 (NULL-q_prior anchors skipped) — but the adapter registry
crashed with `RuntimeError: no API key configured for provider 'X'`
because the integration conftest only set placeholder keys for
anthropic + openai + together.

Add placeholders for gemini, mistral, and xai so all five active
brands resolve through the adapter registry. The HTTP calls themselves
remain respx-mocked; the placeholders only satisfy the at-init key
presence check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hizrianraz hizrianraz merged commit 808c419 into main May 22, 2026
4 checks passed
@hizrianraz hizrianraz deleted the feat/ain-245-routing-v0-brain branch May 22, 2026 13:59
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 9 total unresolved issues (including 7 from previous reviews).

Fix All in Cursor

Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.

Reviewed by Cursor Bugbot for commit 5f52f2b. Configure here.

db,
outcome_id=outcome_id,
outcome_status="failed_other",
observed_latency_ms=elapsed_ms,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cap refusal wrong outcome status

Medium Severity

When dispatch_inference raises CapViolationError on the brain path, complete_decision sets outcome_status to failed_other even though migration 0026 defines rejected_budget for budget-style rejections and the brain already recorded a routing decision.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5f52f2b. Configure here.

temperature=body.temperature,
idempotency_key=idempotency_key,
caller_task_type=body.task_type,
request_id=str(agent_id),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seed token not request-unique

Low Severity

The auto router passes request_id=str(agent_id), so without an Idempotency-Key the §E seed becomes req:&lt;agent_id&gt;:&lt;agent_id&gt; for every call from that agent, not a per-request replay token.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5f52f2b. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant