Skip to content

feat(api): AIN-116 Piece #1 · provider_model_name column + routing fallback#44

Merged
hizrianraz merged 1 commit into
mainfrom
feat/ain-116-piece-1-provider-model-name-column
May 18, 2026
Merged

feat(api): AIN-116 Piece #1 · provider_model_name column + routing fallback#44
hizrianraz merged 1 commit into
mainfrom
feat/ain-116-piece-1-provider-model-name-column

Conversation

@hizrianraz
Copy link
Copy Markdown
Contributor

Summary

Schema prerequisite for AIN-116 v1.0.1 catalog mapping. Adds a nullable provider_model_name TEXT column on models and updates the routing layer to prefer it over slug when set.

Why

Ainfera slug is the canonical-for-us identifier (e.g., claude-opus-4-7). The provider's API expects its own canonical name (e.g., Anthropic's claude-opus-4-20251110). Currently routing.dispatch_inference passes slug directly to upstream, which works for AAMC 5 but blocks 43 deactivated non-AAMC slugs whose names diverge.

Audit chain integrity

Per the "no stealth substitution" rule, audit events still record model_slug (Ainfera canonical). This change only affects what we send upstream.

Deferred (separate sub-tickets needed)

Both pieces are Discipline #6 corollary (prod DB writes). File as sub-tickets for explicit founder auth.

Test plan

🤖 Generated with Claude Code

…llback

Ainfera `slug` is the canonical-for-us identifier (e.g., `claude-opus-4-7`).
The provider's API expects its own canonical name (e.g., Anthropic's
`claude-opus-4-20251110`). Currently routing.dispatch_inference passes
`slug` directly to upstream, which works for AAMC 5 (slug == upstream name)
but blocks the 43 deactivated non-AAMC slugs whose names diverge.

## Schema (migration 0016)

- `models.provider_model_name TEXT NULL`
- Forward-compatible add-column (no backfill)

## Routing fallback

`upstream_model = model.provider_model_name or model_slug`

- NULL = adapter receives Ainfera slug as-is (current AAMC 5 behavior preserved)
- Populated = adapter receives the provider's canonical name

## Audit chain integrity

Per "no stealth substitution" rule, audit events still record `model_slug`
(Ainfera canonical). This change only affects what we send upstream.

## Deferred to follow-up

- Piece #2: fix 4 provider base_urls (groq/fireworks/deepinfra/novita) —
  requires prod DB UPDATE against `providers` table
- Piece #3: backfill provider_model_name for the 43 deactivated rows —
  requires per-provider /v1/models research + sacrificial test key access
  + T9 re-fanout

Both pieces are Discipline #6 corollary territory (prod DB writes); file
as separate sub-tickets for explicit founder auth.

Co-Authored-By: Claude <noreply@anthropic.com>
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 18, 2026

AIN-116 ainfera-auto routing v1.1 + v1.0.1 catalog mapping — provider_model_name, latency, geo, prefs

Update 2026-05-16 19:25 WIB · v1.0.1 prerequisite added

T9 fanout hit the cap-rule (11/54 pass) and 43 new model rows were
deactivated. Two upstream blockers must land in v1.0.1 BEFORE we can
ship the v1.1 routing improvements below:

v1.0.1 (blocker · ~3h)

  1. provider_model_name TEXT NULL column on models · routing
    layer uses it when set, falls back to slug when NULL. Ainfera
    slug is the canonical-for-us identifier; provider_model_name
    is what we pass to the upstream provider's API. Migration:
    forward-compatible add-column.
  2. Fix 4 provider base_urls · current rows have URLs that produce
    /v1/v1/chat/completions when concatenated with the OpenAI-compat
    adapter path. Fix:
    • groq base_url → https://api.groq.com/openai (drop trailing /v1)
    • fireworks base_url → https://api.fireworks.ai/inference (drop trailing /v1)
    • deepinfra + novita need custom chat_completions_path (add column
      to providers OR subclass OpenAICompatAdapter)
  3. Slug-to-provider-name mapping · for each of the 43 deactivated
    rows, research each provider's /v1/models endpoint and populate
    provider_model_name. Re-run fanout. Re-activate passing rows.
    Target ≥ 35 pass before re-enabling.

See .launch-snapshots/T9-FANOUT-INCIDENT.md for the full incident
write-up + the 11 currently-active slugs.

v1.1 (after v1.0.1 lands)

  1. latency_budget_ms axis — when caller passes
    routing_hint.latency_budget_ms, prefer Groq tier (sub-50ms p50)
    and other fast-capable models. Hard cap on AA Index loss vs the
    unconstrained pick.
  2. Geographic stack hintsrouting_hint.stack accepts
    china, eu, sea, us, avoid-china. Maps the same way
    /v1/models?stack= does today.
  3. Smarter cost estimation — current default assumes
    expected_output_ratio = 0.5. Learn the per-model ratio from the
    last N receipts and use that to estimate. Fall back to 0.5 when
    fewer than N samples.
  4. Per-tenant model preferences — new fields on
    tenants.spend_policy:
    • banned_providers: list[str]
    • preferred_providers: list[str]
    • banned_slugs: list[str]
  5. Per-call routing telemetry — emit inference.routed audit
    events with decision_log (top-3 candidates that survived each
    filter step).

Out of scope (v1.2+)

  • Streaming-aware routing (drop non-streaming-capable models when
    caller requests stream)
  • Multi-region routing (preference for low-RTT provider POPs)
  • Cost-vs-quality Pareto frontier exposed at /v1/models/explain

Acceptance after v1.0.1 + v1.1

  • /v1/models | jq 'length' returns ≥ 40 (post-mapping re-activation)
  • routing_hint.latency_budget_ms=500 routes through Tier 4 Groq if any model qualifies
  • routing_hint.stack=china routes Novita-only
  • Banned providers in tenant policy are never selected (test with synthetic tenant)
  • Decision-log audit event present on every ainfera-auto call
  • Backward compatible: existing model=ainfera-auto calls with no hint still work the same as v1.0

Review in Linear

@cursor
Copy link
Copy Markdown

cursor Bot commented May 18, 2026

You have used all Bugbot PR reviews included in your free trial for your GitHub account on this workspace.

To continue using Bugbot reviews, enable Bugbot for your team in the Cursor dashboard.

@hizrianraz hizrianraz merged commit 14999e2 into main May 18, 2026
3 checks passed
@hizrianraz hizrianraz deleted the feat/ain-116-piece-1-provider-model-name-column branch May 18, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant