Skip to content

chore(api): SP-4 PR-C · dark-host activation scaffold (AIN-248, founder-gated)#81

Open
hizrianraz wants to merge 1 commit into
mainfrom
chore/dark-host-prep
Open

chore(api): SP-4 PR-C · dark-host activation scaffold (AIN-248, founder-gated)#81
hizrianraz wants to merge 1 commit into
mainfrom
chore/dark-host-prep

Conversation

@hizrianraz
Copy link
Copy Markdown
Contributor

@hizrianraz hizrianraz commented May 24, 2026

SP-Ω-RUN resurrection of #75 (closed by base-deletion). Rebased onto main post-#80 squash.

Original PR: #75
Source: chore/dark-host-prep


Note

Low Risk
Adds new documentation and a standalone founder-run smoke-test script; no production routing, schema, or data-path changes unless the script/runbook is manually executed.

Overview
Adds a founder-gated dark-host activation runbook describing a phased process (smoke test, ontology decision, parametrized alembic migration template, and post-deploy verification) without applying any migrations.

Adds a Model×Host ontology proposal doc outlining alternative schema approaches (multi-row slugs vs model_hosts junction) to support multi-venue hosting.

Introduces scripts/dark_host_smoke.py, a read-only async CLI that uses existing provider adapters to make two test chat() calls against a specified venue (keys via env) and emits a JSON latency/shape report.

Reviewed by Cursor Bugbot for commit b93ec86. Bugbot is set up for automated code reviews on this repo. Configure here.

…er-gated)

Adds three pieces of scaffolding for the dark-host activation pass.
**Activates nothing. Zero schema change. Zero catalog change.** Per
the SP-4 §1 moat guardrails, this PR ships ONLY founder-gated artifacts.

Stacks on SP-2 api#72 (\`feat/ain271-streaming-tooluse\`); independent
of PR-A (#73) and PR-B (#74).

## What's new

### 1. \`scripts/dark_host_smoke.py\` — adapter smoke harness

A CLI that exercises the existing ProviderAdapter against a (provider,
upstream_model, base_url) target and prints a JSON latency/cost/shape
report. Two consecutive \`.chat()\` calls give a coarse cold-vs-warm
variance read.

  - Reads keys from env (Doppler-injectable) — never argv.
  - Covers the 5 open-weight venues (Groq, DeepInfra, Together,
    Fireworks, Novita) + Anthropic for parity check.
  - Returns JSON-serializable error dicts on every failure mode (no
    bare exceptions to stderr) so the founder can pipe the output
    straight into the activation runbook as evidence.
  - **Aulë does NOT run this** — the harness needs live provider
    credits (~\$45 total: DeepInfra \$15 + Together \$15 + Fireworks
    \$10 + Groq \$0 + Novita \$5) + Doppler keys. Founder runs it
    after topping up.

### 2. \`docs/dark-host-activation-runbook.md\` — the 4-phase tap

The exact, ordered steps to light one (logical-model, venue) row:

  Phase 1 — smoke (founder, no DB): run the harness per venue, save
           the JSON reports for §16 audit.
  Phase 2 — Model x Host ontology decision (Disc#12): see proposal
           below; founder picks Path A / B / C.
  Phase 3 — activation migration TEMPLATE (not yet a real alembic
           file — lives as a snippet in the doc to keep the
           \`alembic/versions/\` directory clean until authorized).
           Parametrized on slug, upstream_model, costs, q_prior, brand.
  Phase 4 — verify (post-deploy): catalog row active, brain enrols
           it, audit chain intact. Rollback = \`alembic downgrade -1\`.

The runbook is explicit that activation is **founder-gated** on three
signals: credits + Doppler keys + ontology authorization.

### 3. \`docs/dark-host-ontology-proposal.md\` — Disc#12 schema decision

Lays out 3 schema paths for representing the same logical model on
multiple hosts (verified live: 0 cross-host slugs today; the schema
is operationally one-model-one-host):

  Path A — flat \`models\` table, venue-suffixed slugs
           (\`llama-3.3-70b-groq\`). Lightest migration; zero engine
           change.
  Path B — \`model_hosts\` M:N junction. Cleanest semantics; biggest
           migration; touches \`routing_outcomes\` (§16 schema —
           violates SP-4 §1 immutability unless additive).
  Path C — Path A + nullable \`models.logical_slug\` for cross-venue
           aggregates.

Aulë's recommendation: **Path A** for the SP-4 activation pass.
Migrate to Path B in a follow-up sprint when the multi-host catalog
density justifies the §16-additive migration.

Four Disc#12 questions for the founder are listed at the bottom of
the proposal. Activation runbook stays parked until they're answered.

## §0/P5 finding (documented for the audit chain)

Live read against Supabase \`dftfpwzqxoebwzepygzl\`:
  - 47 inactive models distributed across 10 providers (novita 9 +
    deepinfra 6 + together 6 + gemini 5 + groq 5 + openai 4 +
    anthropic 3 + fireworks 3 + mistral 3 + xai 3).
  - **0 model slugs appear across multiple providers** — confirms
    one-model-one-host today. The Model x Host ontology change IS a
    real schema migration; PR-C ships ONLY the proposal doc.

## Pre-commit

ruff + ruff-format + mypy --strict + pytest unit+smoke = 505 green.
Zero new tests in this PR — the smoke harness is exercised against
live providers (founder-run); the runbook + ontology are docs.

## Out of scope (per SP-4 §1 moat guardrails)

- \`routing_outcomes\` schema — immutable, untouched.
- The routing engine in \`routing/ainfera_routing/decide.py\` — untouched.
- \`models\` schema — untouched.
- Catalog activation — no model becomes \`active=true\` from this PR.
- Online learning (AIN-246) — Backlog/deferred.
- M_allowed / q_prior / q_empirical semantics.

## Founder action to unblock

  1. \$45 credits across the 5 open-weight venues (DeepInfra \$15 +
     Together \$15 + Fireworks \$10 + Groq \$0 + Novita \$5).
  2. Doppler keys mirroring those into the api Doppler env.
  3. Disc#12 authorization of the Model x Host ontology path (the 4
     questions at the bottom of the proposal doc).

Once all three are in place, run the smoke harness per venue, then
materialize the activation migration template into an actual alembic
file and apply.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 24, 2026

AIN-248 [Catalog] Full-catalog routing enrolment — q_prior backfill + 6 gated brands (price + q_prior + M_allowed) — feeds AIN-245

Feeds AIN-245 Routing v0. The engine is N-agnostic and routes the full catalog (11+ brands, growing) via emergent gating — a model enrols as a routing candidate iff it clears 3 gates: price + q_prior + M_allowed verdict. AIN-245 ships the engine + the 5 §C frontier anchors enrolled; this ticket grows enrolment to the full catalog.

Rulings — LOCKED 2026-05-22 (Discipline #12, founder "Go")

  • q_prior = new numeric(3,2) column. Do NOT inherit aa_intelligence_index — it is sourced from the retired AAMC engine; carrying it forward re-imports a dead methodology.
  • q_prior seeding rule: the 5 §C frontier anchors = their locked §C values (opus-4-7 0.95 · gpt-5-5 0.93 · gemini-3-1-pro 0.90 · grok-4 0.86 · mistral-large-3 0.80); every other model = Artificial Analysis Intelligence Index v4.0 ÷ 100 (traceable; same source the public leaderboard already cites). No AA entry and not a §C anchor → not enrolled (no fabricated priors — §D3).
  • Emergent gating: no hardcoded active flag decides routing — clearing the 3 gates does. The 6 Chinese-origin brands cannot enter the candidate set until an M_allowed verdict exists. The architecture enforces the legal gate pre-launch.

Scope

Work Detail Owner
q_prior backfill — active non-anchor claude-haiku-4-5, claude-sonnet-4-6, gpt-5, gpt-5-mini ← AA v4.0 ÷ 100 Aule (data)
6 gated brands — price Alibaba (Qwen), DeepSeek, Meta, MiniMax, Moonshot AI, Z.ai (GLM) Aule (data)
6 gated brands — q_prior AA v4.0 ÷ 100 where published; else hold out of v0 set Aule (data)
6 gated brands — M_allowed verdict data-residency / legal per brand Ulmo + founder

Gate / dependency

  • M_allowed verdicts for the 6 gated brands = founder/Ulmo compliance call. Until issued, those brands stay out of the candidate set by design.
  • All writes via Alembic; real values only (§D3 — no fabricated priors).

NOT in this ticket

  • The q_prior column DDL + the 5 §C anchor seeds — those land in AIN-245's first migration (so the engine routes the frontier set immediately).
  • Latency data (none exists in catalog today; tracked separately — see AIN-245 caveat).

Review in Linear

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issues.

Reviewed by Cursor Bugbot for commit b93ec86. Configure here.

"GROQ_API_KEY",
"https://api.groq.com/openai",
OpenAICompatAdapter,
),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smoke base URLs mismatch production

High Severity

Default base_url values for deepinfra, groq, and fireworks differ from providers.base_url seeded in 20260516_0007_t9_catalog_providers.py. OpenAICompatAdapter appends /v1/chat/completions, so smoke and adapter_for_provider() hit different hosts. Phase 1 runbook commands omit --base-url, so a passing smoke run may not reflect production dispatch.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b93ec86. Configure here.

active = TRUE
WHERE slug = '{_MODEL_SLUG}'
AND provider_id = (SELECT id FROM providers WHERE slug = '{_VENUE}');
""")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Activation template only updates rows

Medium Severity

The Phase 3 migration template only runs UPDATE models for a new Path A slug such as llama-3.3-70b-deepinfra. If no matching row exists yet, upgrade() succeeds with zero rows changed and the model stays inactive, which is easy to miss.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b93ec86. Configure here.


Before any DB change, the founder authorizes the schema shape from [dark-host-ontology-proposal.md](./dark-host-ontology-proposal.md). Two paths the proposal lays out:

- **Path A (minimal):** keep the existing `models` table; add multiple rows for the same logical model (e.g. three rows for `llama-3.3-70b` differentiated by `provider_id`). Slug becomes non-unique → schema change.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path A slug uniqueness contradiction

Medium Severity

Phase 2 says Path A makes slug non-unique across providers, but ModelORM enforces global UniqueConstraint("slug", name="uq_models_slug"). Path A needs distinct suffixed slugs per venue, not duplicate slugs on different provider_id values.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b93ec86. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant