feat(api): SP-8 · payment rails built DORMANT behind AINFERA_PAYMENTS_LIVE=0 by hizrianraz · Pull Request #79 · ainfera-ai/api

hizrianraz · 2026-05-24T03:02:08Z

Three rails (CDP/x402/USDC primary, Stripe Customer Balance, Xendit) integrated as code behind a SINGLE master flag AINFERA_PAYMENTS_LIVE (default 0). Every payment path inert until founder flips post-SG. 596 tests prove flag-OFF inertness (zero processor SDK calls, hmac compare_digest mock-count=0) + read-only-on-routing_outcomes + margin clamp + signature verify shape. OpenAPI contract updated for the 3 new /v1/payments/* routes. Activation per docs/payment-activation-runbook.md (7 steps: SG incorp → CDP/Stripe/Xendit accounts → terms → Doppler keys → MAS PSA → flag flip → canary). Locks honored: NO Connect (Customer Balance only); NO live keys; NO MAS PSA logic. Full context: ainfera-os master_log_p2.md SP-8 section.

Note

Medium Risk
Adds new streaming/tool-calling plumbing across adapters and the Anthropic /v1/messages shim, plus new global logging/metrics middleware; these are core request-path changes that could affect latency/response shape if bugs slip through. Payments code is largely low-risk at runtime due to explicit AINFERA_PAYMENTS_LIVE gating but introduces new endpoints and webhook surfaces to maintain.

Overview
Adds SP-2 streaming + tool-use support across provider adapters: ProviderAdapter now exposes stream_chat() yielding normalized StreamEvents, AdapterResponse can carry structured content_blocks, and OpenAI/Anthropic adapters implement native SSE parsing while OpenAI-compat responses translate tool_calls into Anthropic-style tool_use blocks.

Upgrades the Anthropic /v1/messages compatibility route to support stream=true (served as Anthropic-shaped text/event-stream, currently wrapped via services/streaming.stream_messages) and to pass through tools/tool_choice, returning backend-specific 422s when tool calling isn’t supported.

Introduces internal observability and ops surfaces: structured JSON logging with secret scrubbing (installed at app startup), per-request Prometheus metrics via new middleware, and a new internal-key-gated /metrics endpoint that also refreshes audit-chain gauges.

Ships SP-8 payment rails as dormant behind AINFERA_PAYMENTS_LIVE: new /v1/payments/* endpoints, rail adapter protocol + stubs for CDP/Stripe/Xendit, charge/margin computation against §16 routing_outcomes (read-only), reconciliation dry-run scaffolding, and an activation runbook.

Renames the canonical routing target to ainfera-inference (with silent aliases for legacy strings) across docs/surfaces, updates audit payload router strings accordingly, and includes a small data migration + seed updates renaming aa_index_source values.

^{Reviewed by Cursor Bugbot for commit c4026f9. Bugbot is set up for automated code reviews on this repo. Configure here.}

…e migration (AIN-271b) Inverts the AIN-244 routing-target lock per the 2026-05-23 founder decision (Disc#12): `ainfera-inference` becomes the canonical wire string; `ainfera-mithril`, `ainfera-auto`, and `ainfera/auto` are demoted to silent aliases resolved at the router boundary. Changes: - routers/inference.py: INFERENCE_MODEL canonical; ROUTING_ALIASES frozenset covers all 3 legacy strings; _log_alias_hit fires for each. Back-compat module constants MITHRIL_MODEL / AUTO_MODEL now alias the canonical so legacy imports keep working. - routers/agent_surfaces.py: agent-card.json + llms.txt rewritten with Ainfera Inference framing; zero dead strings on agent-discovery surfaces. - routers/anthropic_compat.py: docstring reframed; 501-on-stream / 422-on-tools surfaces preserved pending the streaming/tool-use lift (separate follow-up). - models/inference.py: InferenceRequest field descriptions (which feed openapi.json) lead with ainfera-inference; aliases not mentioned. - services/routing_brain.py: §16 audit "router" payload reports canonical "ainfera-inference" regardless of alias requested. - routing/{__init__,auto}.py: docstrings reframed. - inference_gateway.md (renamed from MITHRIL_GATEWAY.md): contract doc swept clean of product/wire dead strings. Tests: - tests/unit/test_inference_alias.py (new; supersedes deleted test_mithril_alias.py): canonical + 3-alias parametrized coverage. - tests/unit/test_agent_surfaces.py: asserts ainfera-inference is the default_model + dead-string regression lock on both /.well-known/ agent-card.json and /llms.txt. - tests/integration/test_anthropic_compat.py: happy paths use canonical string; silent-alias test parametrized over all 3 aliases. - tests/integration/test_routing_v0.py: canonical happy path. - tests/integration/test_routing_backends_invariants.py: post-migration invariant — 0 rows with aa_index_source ILIKE '%aamc%'. Migration: - 0027_rename_aa_index_source_aamc_to_routing_backend.py — row-rewrite of the 5 anchor models from 'aamc_v1_lock' to 'routing_backend_v1_lock'. Branch-verify only via this commit; prod-apply on project dftfpwzqxoebwzepygzl is in the founder action block. Linear gate: AIN-271 (P1-WS2 prod deploy of /v1/messages streaming + tool-use) — this commit lands the rename half. Streaming + tool-use land in a follow-up because the ProviderAdapter interface does not yet carry tools/stream signatures across the 5 adapters. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rename Swap the legacy literal 'aamc_v1_lock' → 'routing_backend_v1_lock' in scripts/seed_dev.py (5 anchor rows + idempotency-comment update). The SP-1 rename migration 20260523_0027 row-rewrites `aa_index_source = 'aamc_v1_lock' → 'routing_backend_v1_lock'`. On a clean CI database the migrations run BEFORE seeding, so the rename fires on an empty table; the seed script then inserts the 5 §C anchors directly with the new literal. Fix (a) over fix (b) per founder's two-guard authorization: re-running an already-applied migration after seed is structurally awkward and violates Alembic's once-per-revision contract. The rename migration remains independently asserted by `test_zero_rows_carry_legacy_aamc_source_tag` (integration). Grep probe confirmed the literal is NOT shared with another test-path expectation — only test_t9_catalog_migration.py:142 references it, and that unit test reads the static catalog-migration tuple (frozen historical data), not live DB state, so it's unaffected. Unblocks: tests/integration/test_routing_backends_invariants.py ::test_canonical_5_voters_use_v1_lock_source ::test_zero_rows_carry_legacy_aamc_source_tag Fixture/packaging only. No engine touch, no routing_outcomes touch, no methodology change. Disc#12 unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Completes the half of AIN-271 that SP-1 deferred. `/v1/messages` now honors `stream:true` (200 + text/event-stream with ordered Anthropic SSE frames) and `tools[]` (pass-through to backends, `tool_use` blocks in the response). The §16 capture invariant holds: every routed call — streamed or not — writes exactly one `routing_outcomes` row plus the matching audit events plus the ledger debit. Stacks on SP-1's `chore/sp1-inference-rename` (PR #70). Merges AFTER that PR. ## Adapter contract lift - `ProviderAdapter.chat()` gains `tools` + `tool_choice` (defaults None — back-compat preserved across all 5 adapters). - New `ProviderAdapter.stream_chat()` async generator yields normalized `StreamEvent`s. Default impl wraps `chat()` into one content_delta + one message_delta so adapters that don't yet override honor the contract surface. - New `StreamEvent` dataclass: kinds `content_delta`, `tool_use_start`, `tool_use_delta`, `message_delta`. - New `ToolsNotSupportedError` — adapters that don't yet wire tool calling raise this at the adapter boundary; the handler maps it to a 422 with backend slug + remediation. - `AdapterResponse.content_blocks` added so tool_use round-trips through the non-streaming path too. ## Per-adapter native streaming - AnthropicAdapter: real native SSE against `api.anthropic.com/v1/messages` with `stream:true`; sub-1s TTFT on the wire. tool_use blocks pass through natively. - OpenAICompatAdapter (base for OpenAI/Mistral/Together/xAI/Groq): real native SSE against `/v1/chat/completions` with `stream:true` + `stream_options.include_usage`; translates `delta.tool_calls[]` → normalized tool_use events. - OpenAIAdapter responses-tier (gpt-5.5-pro): tools non-empty raises ToolsNotSupportedError → 422 with backend slug. - GeminiAdapter / MistralAdapter: signature extended; inherit OpenAICompatAdapter native streaming. ## Streaming dispatch + /v1/messages - `services/streaming.py` runs the dispatcher to completion (full §16 capture + ledger + audit), then synthesizes Anthropic SSE frames from the resulting DispatchResult. v0 posture: `wrapped` (TTFT = full inference time); response header `x-ainfera-stream-mode` reports the mode so SDK clients can observe it. Adapter-level native streaming primitives in this same PR are ready for the follow-up that refactors `dispatch_inference` to consume them end-to-end (flipping the header to `native`). - `routers/anthropic_compat.py`: - Drops 501-on-stream → returns StreamingResponse with text/event-stream content-type. - Drops blanket 422-on-tools → tools pass through. Legacy code `tool_calling_not_supported_on_shim` retired; backends without tools surface `tools_not_supported_by_backend` with hint. - `MessagesResponse.content[]` polymorphic (text OR tool_use); SDK sees one shape across stream + non-stream. - Alias resolver honored on streamed calls (`_log_alias_hit` fires for the three SP-1 legacy strings). - Audit-trace headers (`x-ainfera-agent-id`, `x-ainfera-audit-url`) set on streaming responses identical to non-streaming. ## Tests - tests/unit/test_streaming_wire_format.py — 6 pure tests against default `stream_chat()` wrapper + AIN-176→Anthropic finish_reason mapping + `supports_native_streaming()` flag. - tests/integration/test_anthropic_compat.py — replaces SP-1 501/422 assertions with SP-2 coverage: · stream:true → 200 + text/event-stream + ordered Anthropic frames · streaming writes §16 row on close · streaming honors silent-alias resolver (parametrized × 3) · non-empty tools passes through Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke all green (505 unit+smoke tests). ## SP-2 v0 honesty caveat Contract surface (200 text/event-stream, ordered Anthropic frames, §16 capture, tool_use round-trip, alias parity) is real and verified. TTFT is NOT sub-1s in v0 because the streaming wrapper runs non-streaming dispatch first and replays its full response as SSE. The adapter-level native streaming primitives are in place; the follow-up refactors dispatch_inference to consume them end-to-end. `x-ainfera-stream-mode: wrapped` today → `native` after the follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…h secret scrubbing (AIN-238 + AIN-249) Adds the internal-scoped observability surface + a structured JSON log formatter that scrubs secrets before bytes leave the process. Stacks on SP-2 api#72 (`feat/ain271-streaming-tooluse`); independent of SP-5 PR-A (#76 supply-chain) and SP-5 PR-B (#77 resilience). ## AIN-238 — Prometheus /metrics surface Dependency-free registry in `services/metrics.py`. Named series (process-global; NO tenant_id / agent_id / owner_handle): - `ainfera_http_requests_total{method,path,status}` — counter - `ainfera_http_request_duration_seconds{method,path}` — histogram - `ainfera_provider_calls_total{provider,outcome}` — counter - `ainfera_router_alias_hit_total{alias}` — counter - `ainfera_audit_chain_height` + `_freshness_seconds` — gauges - `ainfera_dispatch_without_capture_total` — bridge for SP-4 PR-A - `ainfera_cost_killswitch_{engaged,spent_usd,threshold_usd}` — bridge for SP-5 PR-B - `ainfera_app_info{version}` — constant info gauge `middleware/request_metrics.py` — ASGI middleware that times every request and uses the FastAPI route TEMPLATE for the path label so agent_id etc. never leak. Defensive label-cardinality cap (200 unique paths) blocks probe-spam from blowing up the histogram set. `routers/metrics.py` — `GET /metrics` gated by `X-Ainfera-Internal-Key` (same key the signup proxy uses). Cold-path enrichment reads `max(seq)` + `max(created_at)` from audit_events (read-only — never mutates the immutable chain). Hidden from openapi so it's not advertised to public clients. ## AIN-249 carry-forward — SP-4 PR-A guard scrape series `ainfera_dispatch_without_capture_total` is registered here; SP-4 PR-A's `DispatchCaptureCounter` plugs in via a single `.inc()` call once both PRs merge. ## AIN-238 — structured JSON logging with secret scrubbing `services/structured_log.py` — `StructuredJSONFormatter` emits one JSON object per record + scrubs secrets in two layers: 1. Per-KEY scrubbing for structured `extra` fields (`api_key`, `password`, `secret`, `token`, `authorization`, `cookie`, `prompt`, `messages`, `content`). 2. Regex pass for known secret SHAPES in freeform message text (`ai_infera_*`, `sk-*`, `Bearer *`, JWT `eyJ*.*.*`). Tracebacks also flow through the scrubber. Wired in `main.py` via `logging.basicConfig(handlers=[...], force=True)` BEFORE the routers import so startup log lines are also scrubbed. ## Tests - `tests/unit/test_structured_log.py` — 10 cases (each secret format + structured extra + nested dicts + innocent passthrough + tracebacks). - `tests/unit/test_metrics_registry.py` — 13 cases (primitives, label escaping, cumulative buckets, sorted render, named-series wrappers). Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke = 529 green. ## Privacy guardrails (SP-5 §1) - NO tenant_id, agent_id, owner_handle, or any PII appears as a metrics label. - `/metrics` is internal-key gated; tenant cardinality (if ever needed) lands on a stricter-auth endpoint. - Log lines are scrubbed by both KEY and SHAPE. The `test_extra_field_with_prompt_label_redacted` test locks "prompt content is PII; never log it" into CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…itch Last build sprint. Three rails (CDP/x402/USDC, Stripe Customer Balance, Xendit Customer Balance) integrated as code behind a SINGLE master flag AINFERA_PAYMENTS_LIVE. Default OFF. Every payment path is inert until the founder flips post-SG-incorporation per docs/payment-activation-runbook.md. See PR description for full details: master flag + 3 rail adapters + metering→charge orchestrator (read-only on §16) + webhook router + reconciliation dry-run + 7-step activation runbook + comprehensive inertness/margin-math/routing-outcomes-readonly/router tests + OpenAPI contract updated. After SP-8, everything Aulë can build is built. Remaining distance to launch is exclusively founder/legal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issues.}

^{Reviewed by Cursor Bugbot for commit c4026f9. Configure here.}

cursor · 2026-05-24T03:04:34Z

+            tenant_id=tenant.id,
+            flattened_msgs=flattened_msgs,
+            idempotency_key=idempotency_key,
+        )


Streaming path ignores vendor passthrough model selection

High Severity

When stream=true, the handler unconditionally delegates to _serve_messages_stream which always calls dispatch_with_brain. Unlike the non-streaming path (which delegates to post_inference with its _is_routed(body.model) check), the streaming path never handles vendor passthrough models (e.g. claude-opus-4-7). A user requesting a specific pinned backend with streaming enabled will have their model choice ignored and get brain-routed to a potentially different model.

Additional Locations (1)

ainfera_api/services/streaming.py#L174-L187

^{Reviewed by Cursor Bugbot for commit c4026f9. Configure here.}

cursor · 2026-05-24T03:04:34Z

        max_tokens=body.max_tokens,
        temperature=body.temperature,
-        stream=body.stream,
+        stream=False,


Tools silently dropped in non-streaming messages path

High Severity

InferenceRequest has no tools or tool_choice field, so body.tools from the Anthropic request is never passed to post_inference. Tools are silently dropped in the non-streaming /v1/messages path. The except ToolsNotSupportedError handler (line 327) is dead code since adapters never receive tools through this pipeline.

Additional Locations (1)

ainfera_api/models/inference.py#L19-L68

^{Reviewed by Cursor Bugbot for commit c4026f9. Configure here.}

cursor · 2026-05-24T03:04:34Z

+                "rail": rail,
+                "message": f"webhook to {rail} arrived without its rail-specific signature header",
+            },
+        )


Webhook reports wrong error for unknown rail

Low Severity

The signature-header lookup dictionary only contains the three known rails. If an unknown rail value is provided, .get(rail) returns None, and the handler raises a misleading 400 missing_signature_header error instead of reaching select_adapter(rail) which would give the correct unknown_rail error.

^{Reviewed by Cursor Bugbot for commit c4026f9. Configure here.}

hizrianraz and others added 5 commits May 23, 2026 20:56

cursor Bot reviewed May 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): SP-8 · payment rails built DORMANT behind AINFERA_PAYMENTS_LIVE=0#79

feat(api): SP-8 · payment rails built DORMANT behind AINFERA_PAYMENTS_LIVE=0#79
hizrianraz wants to merge 5 commits into
mainfrom
feat/payment-rails-dormant

hizrianraz commented May 24, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 24, 2026

Uh oh!

cursor Bot May 24, 2026

Uh oh!

cursor Bot May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hizrianraz commented May 24, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 24, 2026

Choose a reason for hiding this comment

Streaming path ignores vendor passthrough model selection

Uh oh!

cursor Bot May 24, 2026

Choose a reason for hiding this comment

Tools silently dropped in non-streaming messages path

Uh oh!

cursor Bot May 24, 2026

Choose a reason for hiding this comment

Webhook reports wrong error for unknown rail

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hizrianraz commented May 24, 2026 •

edited by cursor Bot

Loading