feat(api): SP-8 · payment rails built DORMANT behind AINFERA_PAYMENTS_LIVE=0#79
feat(api): SP-8 · payment rails built DORMANT behind AINFERA_PAYMENTS_LIVE=0#79hizrianraz wants to merge 5 commits into
Conversation
…e migration (AIN-271b)
Inverts the AIN-244 routing-target lock per the 2026-05-23 founder
decision (Disc#12): `ainfera-inference` becomes the canonical wire
string; `ainfera-mithril`, `ainfera-auto`, and `ainfera/auto` are
demoted to silent aliases resolved at the router boundary.
Changes:
- routers/inference.py: INFERENCE_MODEL canonical; ROUTING_ALIASES
frozenset covers all 3 legacy strings; _log_alias_hit fires for each.
Back-compat module constants MITHRIL_MODEL / AUTO_MODEL now alias the
canonical so legacy imports keep working.
- routers/agent_surfaces.py: agent-card.json + llms.txt rewritten with
Ainfera Inference framing; zero dead strings on agent-discovery
surfaces.
- routers/anthropic_compat.py: docstring reframed; 501-on-stream /
422-on-tools surfaces preserved pending the streaming/tool-use lift
(separate follow-up).
- models/inference.py: InferenceRequest field descriptions (which feed
openapi.json) lead with ainfera-inference; aliases not mentioned.
- services/routing_brain.py: §16 audit "router" payload reports
canonical "ainfera-inference" regardless of alias requested.
- routing/{__init__,auto}.py: docstrings reframed.
- inference_gateway.md (renamed from MITHRIL_GATEWAY.md): contract doc
swept clean of product/wire dead strings.
Tests:
- tests/unit/test_inference_alias.py (new; supersedes deleted
test_mithril_alias.py): canonical + 3-alias parametrized coverage.
- tests/unit/test_agent_surfaces.py: asserts ainfera-inference is the
default_model + dead-string regression lock on both /.well-known/
agent-card.json and /llms.txt.
- tests/integration/test_anthropic_compat.py: happy paths use canonical
string; silent-alias test parametrized over all 3 aliases.
- tests/integration/test_routing_v0.py: canonical happy path.
- tests/integration/test_routing_backends_invariants.py: post-migration
invariant — 0 rows with aa_index_source ILIKE '%aamc%'.
Migration:
- 0027_rename_aa_index_source_aamc_to_routing_backend.py — row-rewrite
of the 5 anchor models from 'aamc_v1_lock' to 'routing_backend_v1_lock'.
Branch-verify only via this commit; prod-apply on project
dftfpwzqxoebwzepygzl is in the founder action block.
Linear gate: AIN-271 (P1-WS2 prod deploy of /v1/messages streaming +
tool-use) — this commit lands the rename half. Streaming + tool-use
land in a follow-up because the ProviderAdapter interface does not yet
carry tools/stream signatures across the 5 adapters.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rename
Swap the legacy literal 'aamc_v1_lock' → 'routing_backend_v1_lock' in
scripts/seed_dev.py (5 anchor rows + idempotency-comment update).
The SP-1 rename migration 20260523_0027 row-rewrites
`aa_index_source = 'aamc_v1_lock' → 'routing_backend_v1_lock'`. On a
clean CI database the migrations run BEFORE seeding, so the rename
fires on an empty table; the seed script then inserts the 5 §C
anchors directly with the new literal.
Fix (a) over fix (b) per founder's two-guard authorization: re-running
an already-applied migration after seed is structurally awkward and
violates Alembic's once-per-revision contract. The rename migration
remains independently asserted by
`test_zero_rows_carry_legacy_aamc_source_tag` (integration).
Grep probe confirmed the literal is NOT shared with another test-path
expectation — only test_t9_catalog_migration.py:142 references it, and
that unit test reads the static catalog-migration tuple (frozen
historical data), not live DB state, so it's unaffected.
Unblocks:
tests/integration/test_routing_backends_invariants.py
::test_canonical_5_voters_use_v1_lock_source
::test_zero_rows_carry_legacy_aamc_source_tag
Fixture/packaging only. No engine touch, no routing_outcomes touch, no
methodology change. Disc#12 unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Completes the half of AIN-271 that SP-1 deferred. `/v1/messages` now honors `stream:true` (200 + text/event-stream with ordered Anthropic SSE frames) and `tools[]` (pass-through to backends, `tool_use` blocks in the response). The §16 capture invariant holds: every routed call — streamed or not — writes exactly one `routing_outcomes` row plus the matching audit events plus the ledger debit. Stacks on SP-1's `chore/sp1-inference-rename` (PR #70). Merges AFTER that PR. ## Adapter contract lift - `ProviderAdapter.chat()` gains `tools` + `tool_choice` (defaults None — back-compat preserved across all 5 adapters). - New `ProviderAdapter.stream_chat()` async generator yields normalized `StreamEvent`s. Default impl wraps `chat()` into one content_delta + one message_delta so adapters that don't yet override honor the contract surface. - New `StreamEvent` dataclass: kinds `content_delta`, `tool_use_start`, `tool_use_delta`, `message_delta`. - New `ToolsNotSupportedError` — adapters that don't yet wire tool calling raise this at the adapter boundary; the handler maps it to a 422 with backend slug + remediation. - `AdapterResponse.content_blocks` added so tool_use round-trips through the non-streaming path too. ## Per-adapter native streaming - AnthropicAdapter: real native SSE against `api.anthropic.com/v1/messages` with `stream:true`; sub-1s TTFT on the wire. tool_use blocks pass through natively. - OpenAICompatAdapter (base for OpenAI/Mistral/Together/xAI/Groq): real native SSE against `/v1/chat/completions` with `stream:true` + `stream_options.include_usage`; translates `delta.tool_calls[]` → normalized tool_use events. - OpenAIAdapter responses-tier (gpt-5.5-pro): tools non-empty raises ToolsNotSupportedError → 422 with backend slug. - GeminiAdapter / MistralAdapter: signature extended; inherit OpenAICompatAdapter native streaming. ## Streaming dispatch + /v1/messages - `services/streaming.py` runs the dispatcher to completion (full §16 capture + ledger + audit), then synthesizes Anthropic SSE frames from the resulting DispatchResult. v0 posture: `wrapped` (TTFT = full inference time); response header `x-ainfera-stream-mode` reports the mode so SDK clients can observe it. Adapter-level native streaming primitives in this same PR are ready for the follow-up that refactors `dispatch_inference` to consume them end-to-end (flipping the header to `native`). - `routers/anthropic_compat.py`: - Drops 501-on-stream → returns StreamingResponse with text/event-stream content-type. - Drops blanket 422-on-tools → tools pass through. Legacy code `tool_calling_not_supported_on_shim` retired; backends without tools surface `tools_not_supported_by_backend` with hint. - `MessagesResponse.content[]` polymorphic (text OR tool_use); SDK sees one shape across stream + non-stream. - Alias resolver honored on streamed calls (`_log_alias_hit` fires for the three SP-1 legacy strings). - Audit-trace headers (`x-ainfera-agent-id`, `x-ainfera-audit-url`) set on streaming responses identical to non-streaming. ## Tests - tests/unit/test_streaming_wire_format.py — 6 pure tests against default `stream_chat()` wrapper + AIN-176→Anthropic finish_reason mapping + `supports_native_streaming()` flag. - tests/integration/test_anthropic_compat.py — replaces SP-1 501/422 assertions with SP-2 coverage: · stream:true → 200 + text/event-stream + ordered Anthropic frames · streaming writes §16 row on close · streaming honors silent-alias resolver (parametrized × 3) · non-empty tools passes through Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke all green (505 unit+smoke tests). ## SP-2 v0 honesty caveat Contract surface (200 text/event-stream, ordered Anthropic frames, §16 capture, tool_use round-trip, alias parity) is real and verified. TTFT is NOT sub-1s in v0 because the streaming wrapper runs non-streaming dispatch first and replays its full response as SSE. The adapter-level native streaming primitives are in place; the follow-up refactors dispatch_inference to consume them end-to-end. `x-ainfera-stream-mode: wrapped` today → `native` after the follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…h secret scrubbing (AIN-238 + AIN-249) Adds the internal-scoped observability surface + a structured JSON log formatter that scrubs secrets before bytes leave the process. Stacks on SP-2 api#72 (`feat/ain271-streaming-tooluse`); independent of SP-5 PR-A (#76 supply-chain) and SP-5 PR-B (#77 resilience). ## AIN-238 — Prometheus /metrics surface Dependency-free registry in `services/metrics.py`. Named series (process-global; NO tenant_id / agent_id / owner_handle): - `ainfera_http_requests_total{method,path,status}` — counter - `ainfera_http_request_duration_seconds{method,path}` — histogram - `ainfera_provider_calls_total{provider,outcome}` — counter - `ainfera_router_alias_hit_total{alias}` — counter - `ainfera_audit_chain_height` + `_freshness_seconds` — gauges - `ainfera_dispatch_without_capture_total` — bridge for SP-4 PR-A - `ainfera_cost_killswitch_{engaged,spent_usd,threshold_usd}` — bridge for SP-5 PR-B - `ainfera_app_info{version}` — constant info gauge `middleware/request_metrics.py` — ASGI middleware that times every request and uses the FastAPI route TEMPLATE for the path label so agent_id etc. never leak. Defensive label-cardinality cap (200 unique paths) blocks probe-spam from blowing up the histogram set. `routers/metrics.py` — `GET /metrics` gated by `X-Ainfera-Internal-Key` (same key the signup proxy uses). Cold-path enrichment reads `max(seq)` + `max(created_at)` from audit_events (read-only — never mutates the immutable chain). Hidden from openapi so it's not advertised to public clients. ## AIN-249 carry-forward — SP-4 PR-A guard scrape series `ainfera_dispatch_without_capture_total` is registered here; SP-4 PR-A's `DispatchCaptureCounter` plugs in via a single `.inc()` call once both PRs merge. ## AIN-238 — structured JSON logging with secret scrubbing `services/structured_log.py` — `StructuredJSONFormatter` emits one JSON object per record + scrubs secrets in two layers: 1. Per-KEY scrubbing for structured `extra` fields (`api_key`, `password`, `secret`, `token`, `authorization`, `cookie`, `prompt`, `messages`, `content`). 2. Regex pass for known secret SHAPES in freeform message text (`ai_infera_*`, `sk-*`, `Bearer *`, JWT `eyJ*.*.*`). Tracebacks also flow through the scrubber. Wired in `main.py` via `logging.basicConfig(handlers=[...], force=True)` BEFORE the routers import so startup log lines are also scrubbed. ## Tests - `tests/unit/test_structured_log.py` — 10 cases (each secret format + structured extra + nested dicts + innocent passthrough + tracebacks). - `tests/unit/test_metrics_registry.py` — 13 cases (primitives, label escaping, cumulative buckets, sorted render, named-series wrappers). Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke = 529 green. ## Privacy guardrails (SP-5 §1) - NO tenant_id, agent_id, owner_handle, or any PII appears as a metrics label. - `/metrics` is internal-key gated; tenant cardinality (if ever needed) lands on a stricter-auth endpoint. - Log lines are scrubbed by both KEY and SHAPE. The `test_extra_field_with_prompt_label_redacted` test locks "prompt content is PII; never log it" into CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…itch Last build sprint. Three rails (CDP/x402/USDC, Stripe Customer Balance, Xendit Customer Balance) integrated as code behind a SINGLE master flag AINFERA_PAYMENTS_LIVE. Default OFF. Every payment path is inert until the founder flips post-SG-incorporation per docs/payment-activation-runbook.md. See PR description for full details: master flag + 3 rail adapters + metering→charge orchestrator (read-only on §16) + webhook router + reconciliation dry-run + 7-step activation runbook + comprehensive inertness/margin-math/routing-outcomes-readonly/router tests + OpenAPI contract updated. After SP-8, everything Aulë can build is built. Remaining distance to launch is exclusively founder/legal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issues.
Reviewed by Cursor Bugbot for commit c4026f9. Configure here.
| tenant_id=tenant.id, | ||
| flattened_msgs=flattened_msgs, | ||
| idempotency_key=idempotency_key, | ||
| ) |
There was a problem hiding this comment.
Streaming path ignores vendor passthrough model selection
High Severity
When stream=true, the handler unconditionally delegates to _serve_messages_stream which always calls dispatch_with_brain. Unlike the non-streaming path (which delegates to post_inference with its _is_routed(body.model) check), the streaming path never handles vendor passthrough models (e.g. claude-opus-4-7). A user requesting a specific pinned backend with streaming enabled will have their model choice ignored and get brain-routed to a potentially different model.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit c4026f9. Configure here.
| max_tokens=body.max_tokens, | ||
| temperature=body.temperature, | ||
| stream=body.stream, | ||
| stream=False, |
There was a problem hiding this comment.
Tools silently dropped in non-streaming messages path
High Severity
InferenceRequest has no tools or tool_choice field, so body.tools from the Anthropic request is never passed to post_inference. Tools are silently dropped in the non-streaming /v1/messages path. The except ToolsNotSupportedError handler (line 327) is dead code since adapters never receive tools through this pipeline.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit c4026f9. Configure here.
| "rail": rail, | ||
| "message": f"webhook to {rail} arrived without its rail-specific signature header", | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Webhook reports wrong error for unknown rail
Low Severity
The signature-header lookup dictionary only contains the three known rails. If an unknown rail value is provided, .get(rail) returns None, and the handler raises a misleading 400 missing_signature_header error instead of reaching select_adapter(rail) which would give the correct unknown_rail error.
Reviewed by Cursor Bugbot for commit c4026f9. Configure here.


Three rails (CDP/x402/USDC primary, Stripe Customer Balance, Xendit) integrated as code behind a SINGLE master flag AINFERA_PAYMENTS_LIVE (default 0). Every payment path inert until founder flips post-SG. 596 tests prove flag-OFF inertness (zero processor SDK calls, hmac compare_digest mock-count=0) + read-only-on-routing_outcomes + margin clamp + signature verify shape. OpenAPI contract updated for the 3 new /v1/payments/* routes. Activation per docs/payment-activation-runbook.md (7 steps: SG incorp → CDP/Stripe/Xendit accounts → terms → Doppler keys → MAS PSA → flag flip → canary). Locks honored: NO Connect (Customer Balance only); NO live keys; NO MAS PSA logic. Full context: ainfera-os master_log_p2.md SP-8 section.
Note
Medium Risk
Adds new streaming/tool-calling plumbing across adapters and the Anthropic
/v1/messagesshim, plus new global logging/metrics middleware; these are core request-path changes that could affect latency/response shape if bugs slip through. Payments code is largely low-risk at runtime due to explicitAINFERA_PAYMENTS_LIVEgating but introduces new endpoints and webhook surfaces to maintain.Overview
Adds SP-2 streaming + tool-use support across provider adapters:
ProviderAdapternow exposesstream_chat()yielding normalizedStreamEvents,AdapterResponsecan carry structuredcontent_blocks, and OpenAI/Anthropic adapters implement native SSE parsing while OpenAI-compat responses translatetool_callsinto Anthropic-styletool_useblocks.Upgrades the Anthropic
/v1/messagescompatibility route to supportstream=true(served as Anthropic-shapedtext/event-stream, currently wrapped viaservices/streaming.stream_messages) and to pass throughtools/tool_choice, returning backend-specific 422s when tool calling isn’t supported.Introduces internal observability and ops surfaces: structured JSON logging with secret scrubbing (installed at app startup), per-request Prometheus metrics via new middleware, and a new internal-key-gated
/metricsendpoint that also refreshes audit-chain gauges.Ships SP-8 payment rails as dormant behind
AINFERA_PAYMENTS_LIVE: new/v1/payments/*endpoints, rail adapter protocol + stubs for CDP/Stripe/Xendit, charge/margin computation against §16routing_outcomes(read-only), reconciliation dry-run scaffolding, and an activation runbook.Renames the canonical routing target to
ainfera-inference(with silent aliases for legacy strings) across docs/surfaces, updates audit payload router strings accordingly, and includes a small data migration + seed updates renamingaa_index_sourcevalues.Reviewed by Cursor Bugbot for commit c4026f9. Bugbot is set up for automated code reviews on this repo. Configure here.