Skip to content

feat(api): SP-8 · payment rails built DORMANT behind AINFERA_PAYMENTS_LIVE=0#79

Open
hizrianraz wants to merge 5 commits into
mainfrom
feat/payment-rails-dormant
Open

feat(api): SP-8 · payment rails built DORMANT behind AINFERA_PAYMENTS_LIVE=0#79
hizrianraz wants to merge 5 commits into
mainfrom
feat/payment-rails-dormant

Conversation

@hizrianraz
Copy link
Copy Markdown
Contributor

@hizrianraz hizrianraz commented May 24, 2026

Three rails (CDP/x402/USDC primary, Stripe Customer Balance, Xendit) integrated as code behind a SINGLE master flag AINFERA_PAYMENTS_LIVE (default 0). Every payment path inert until founder flips post-SG. 596 tests prove flag-OFF inertness (zero processor SDK calls, hmac compare_digest mock-count=0) + read-only-on-routing_outcomes + margin clamp + signature verify shape. OpenAPI contract updated for the 3 new /v1/payments/* routes. Activation per docs/payment-activation-runbook.md (7 steps: SG incorp → CDP/Stripe/Xendit accounts → terms → Doppler keys → MAS PSA → flag flip → canary). Locks honored: NO Connect (Customer Balance only); NO live keys; NO MAS PSA logic. Full context: ainfera-os master_log_p2.md SP-8 section.


Note

Medium Risk
Adds new streaming/tool-calling plumbing across adapters and the Anthropic /v1/messages shim, plus new global logging/metrics middleware; these are core request-path changes that could affect latency/response shape if bugs slip through. Payments code is largely low-risk at runtime due to explicit AINFERA_PAYMENTS_LIVE gating but introduces new endpoints and webhook surfaces to maintain.

Overview
Adds SP-2 streaming + tool-use support across provider adapters: ProviderAdapter now exposes stream_chat() yielding normalized StreamEvents, AdapterResponse can carry structured content_blocks, and OpenAI/Anthropic adapters implement native SSE parsing while OpenAI-compat responses translate tool_calls into Anthropic-style tool_use blocks.

Upgrades the Anthropic /v1/messages compatibility route to support stream=true (served as Anthropic-shaped text/event-stream, currently wrapped via services/streaming.stream_messages) and to pass through tools/tool_choice, returning backend-specific 422s when tool calling isn’t supported.

Introduces internal observability and ops surfaces: structured JSON logging with secret scrubbing (installed at app startup), per-request Prometheus metrics via new middleware, and a new internal-key-gated /metrics endpoint that also refreshes audit-chain gauges.

Ships SP-8 payment rails as dormant behind AINFERA_PAYMENTS_LIVE: new /v1/payments/* endpoints, rail adapter protocol + stubs for CDP/Stripe/Xendit, charge/margin computation against §16 routing_outcomes (read-only), reconciliation dry-run scaffolding, and an activation runbook.

Renames the canonical routing target to ainfera-inference (with silent aliases for legacy strings) across docs/surfaces, updates audit payload router strings accordingly, and includes a small data migration + seed updates renaming aa_index_source values.

Reviewed by Cursor Bugbot for commit c4026f9. Bugbot is set up for automated code reviews on this repo. Configure here.

hizrianraz and others added 5 commits May 23, 2026 20:56
…e migration (AIN-271b)

Inverts the AIN-244 routing-target lock per the 2026-05-23 founder
decision (Disc#12): `ainfera-inference` becomes the canonical wire
string; `ainfera-mithril`, `ainfera-auto`, and `ainfera/auto` are
demoted to silent aliases resolved at the router boundary.

Changes:
- routers/inference.py: INFERENCE_MODEL canonical; ROUTING_ALIASES
  frozenset covers all 3 legacy strings; _log_alias_hit fires for each.
  Back-compat module constants MITHRIL_MODEL / AUTO_MODEL now alias the
  canonical so legacy imports keep working.
- routers/agent_surfaces.py: agent-card.json + llms.txt rewritten with
  Ainfera Inference framing; zero dead strings on agent-discovery
  surfaces.
- routers/anthropic_compat.py: docstring reframed; 501-on-stream /
  422-on-tools surfaces preserved pending the streaming/tool-use lift
  (separate follow-up).
- models/inference.py: InferenceRequest field descriptions (which feed
  openapi.json) lead with ainfera-inference; aliases not mentioned.
- services/routing_brain.py: §16 audit "router" payload reports
  canonical "ainfera-inference" regardless of alias requested.
- routing/{__init__,auto}.py: docstrings reframed.
- inference_gateway.md (renamed from MITHRIL_GATEWAY.md): contract doc
  swept clean of product/wire dead strings.

Tests:
- tests/unit/test_inference_alias.py (new; supersedes deleted
  test_mithril_alias.py): canonical + 3-alias parametrized coverage.
- tests/unit/test_agent_surfaces.py: asserts ainfera-inference is the
  default_model + dead-string regression lock on both /.well-known/
  agent-card.json and /llms.txt.
- tests/integration/test_anthropic_compat.py: happy paths use canonical
  string; silent-alias test parametrized over all 3 aliases.
- tests/integration/test_routing_v0.py: canonical happy path.
- tests/integration/test_routing_backends_invariants.py: post-migration
  invariant — 0 rows with aa_index_source ILIKE '%aamc%'.

Migration:
- 0027_rename_aa_index_source_aamc_to_routing_backend.py — row-rewrite
  of the 5 anchor models from 'aamc_v1_lock' to 'routing_backend_v1_lock'.
  Branch-verify only via this commit; prod-apply on project
  dftfpwzqxoebwzepygzl is in the founder action block.

Linear gate: AIN-271 (P1-WS2 prod deploy of /v1/messages streaming +
tool-use) — this commit lands the rename half. Streaming + tool-use
land in a follow-up because the ProviderAdapter interface does not yet
carry tools/stream signatures across the 5 adapters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rename

Swap the legacy literal 'aamc_v1_lock' → 'routing_backend_v1_lock' in
scripts/seed_dev.py (5 anchor rows + idempotency-comment update).

The SP-1 rename migration 20260523_0027 row-rewrites
`aa_index_source = 'aamc_v1_lock' → 'routing_backend_v1_lock'`. On a
clean CI database the migrations run BEFORE seeding, so the rename
fires on an empty table; the seed script then inserts the 5 §C
anchors directly with the new literal.

Fix (a) over fix (b) per founder's two-guard authorization: re-running
an already-applied migration after seed is structurally awkward and
violates Alembic's once-per-revision contract. The rename migration
remains independently asserted by
`test_zero_rows_carry_legacy_aamc_source_tag` (integration).

Grep probe confirmed the literal is NOT shared with another test-path
expectation — only test_t9_catalog_migration.py:142 references it, and
that unit test reads the static catalog-migration tuple (frozen
historical data), not live DB state, so it's unaffected.

Unblocks:
  tests/integration/test_routing_backends_invariants.py
    ::test_canonical_5_voters_use_v1_lock_source
    ::test_zero_rows_carry_legacy_aamc_source_tag

Fixture/packaging only. No engine touch, no routing_outcomes touch, no
methodology change. Disc#12 unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Completes the half of AIN-271 that SP-1 deferred. `/v1/messages` now
honors `stream:true` (200 + text/event-stream with ordered Anthropic
SSE frames) and `tools[]` (pass-through to backends, `tool_use` blocks
in the response). The §16 capture invariant holds: every routed call —
streamed or not — writes exactly one `routing_outcomes` row plus the
matching audit events plus the ledger debit.

Stacks on SP-1's `chore/sp1-inference-rename` (PR #70). Merges AFTER
that PR.

## Adapter contract lift

- `ProviderAdapter.chat()` gains `tools` + `tool_choice` (defaults
  None — back-compat preserved across all 5 adapters).
- New `ProviderAdapter.stream_chat()` async generator yields normalized
  `StreamEvent`s. Default impl wraps `chat()` into one content_delta +
  one message_delta so adapters that don't yet override honor the
  contract surface.
- New `StreamEvent` dataclass: kinds `content_delta`, `tool_use_start`,
  `tool_use_delta`, `message_delta`.
- New `ToolsNotSupportedError` — adapters that don't yet wire tool
  calling raise this at the adapter boundary; the handler maps it to
  a 422 with backend slug + remediation.
- `AdapterResponse.content_blocks` added so tool_use round-trips
  through the non-streaming path too.

## Per-adapter native streaming

- AnthropicAdapter: real native SSE against `api.anthropic.com/v1/messages`
  with `stream:true`; sub-1s TTFT on the wire. tool_use blocks pass
  through natively.
- OpenAICompatAdapter (base for OpenAI/Mistral/Together/xAI/Groq): real
  native SSE against `/v1/chat/completions` with `stream:true` +
  `stream_options.include_usage`; translates `delta.tool_calls[]` →
  normalized tool_use events.
- OpenAIAdapter responses-tier (gpt-5.5-pro): tools non-empty raises
  ToolsNotSupportedError → 422 with backend slug.
- GeminiAdapter / MistralAdapter: signature extended; inherit
  OpenAICompatAdapter native streaming.

## Streaming dispatch + /v1/messages

- `services/streaming.py` runs the dispatcher to completion (full §16
  capture + ledger + audit), then synthesizes Anthropic SSE frames
  from the resulting DispatchResult. v0 posture: `wrapped` (TTFT =
  full inference time); response header `x-ainfera-stream-mode`
  reports the mode so SDK clients can observe it. Adapter-level
  native streaming primitives in this same PR are ready for the
  follow-up that refactors `dispatch_inference` to consume them
  end-to-end (flipping the header to `native`).
- `routers/anthropic_compat.py`:
  - Drops 501-on-stream → returns StreamingResponse with
    text/event-stream content-type.
  - Drops blanket 422-on-tools → tools pass through. Legacy code
    `tool_calling_not_supported_on_shim` retired; backends without
    tools surface `tools_not_supported_by_backend` with hint.
  - `MessagesResponse.content[]` polymorphic (text OR tool_use);
    SDK sees one shape across stream + non-stream.
  - Alias resolver honored on streamed calls (`_log_alias_hit` fires
    for the three SP-1 legacy strings).
- Audit-trace headers (`x-ainfera-agent-id`, `x-ainfera-audit-url`)
  set on streaming responses identical to non-streaming.

## Tests

- tests/unit/test_streaming_wire_format.py — 6 pure tests against
  default `stream_chat()` wrapper + AIN-176→Anthropic finish_reason
  mapping + `supports_native_streaming()` flag.
- tests/integration/test_anthropic_compat.py — replaces SP-1 501/422
  assertions with SP-2 coverage:
    · stream:true → 200 + text/event-stream + ordered Anthropic frames
    · streaming writes §16 row on close
    · streaming honors silent-alias resolver (parametrized × 3)
    · non-empty tools passes through

Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke
all green (505 unit+smoke tests).

## SP-2 v0 honesty caveat

Contract surface (200 text/event-stream, ordered Anthropic frames,
§16 capture, tool_use round-trip, alias parity) is real and verified.
TTFT is NOT sub-1s in v0 because the streaming wrapper runs
non-streaming dispatch first and replays its full response as SSE.
The adapter-level native streaming primitives are in place; the
follow-up refactors dispatch_inference to consume them end-to-end.
`x-ainfera-stream-mode: wrapped` today → `native` after the follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…h secret scrubbing (AIN-238 + AIN-249)

Adds the internal-scoped observability surface + a structured JSON
log formatter that scrubs secrets before bytes leave the process.

Stacks on SP-2 api#72 (`feat/ain271-streaming-tooluse`); independent
of SP-5 PR-A (#76 supply-chain) and SP-5 PR-B (#77 resilience).

## AIN-238 — Prometheus /metrics surface

Dependency-free registry in `services/metrics.py`. Named series
(process-global; NO tenant_id / agent_id / owner_handle):
  - `ainfera_http_requests_total{method,path,status}` — counter
  - `ainfera_http_request_duration_seconds{method,path}` — histogram
  - `ainfera_provider_calls_total{provider,outcome}` — counter
  - `ainfera_router_alias_hit_total{alias}` — counter
  - `ainfera_audit_chain_height` + `_freshness_seconds` — gauges
  - `ainfera_dispatch_without_capture_total` — bridge for SP-4 PR-A
  - `ainfera_cost_killswitch_{engaged,spent_usd,threshold_usd}` —
    bridge for SP-5 PR-B
  - `ainfera_app_info{version}` — constant info gauge

`middleware/request_metrics.py` — ASGI middleware that times every
request and uses the FastAPI route TEMPLATE for the path label so
agent_id etc. never leak. Defensive label-cardinality cap (200
unique paths) blocks probe-spam from blowing up the histogram set.

`routers/metrics.py` — `GET /metrics` gated by
`X-Ainfera-Internal-Key` (same key the signup proxy uses). Cold-path
enrichment reads `max(seq)` + `max(created_at)` from audit_events
(read-only — never mutates the immutable chain). Hidden from openapi
so it's not advertised to public clients.

## AIN-249 carry-forward — SP-4 PR-A guard scrape series

`ainfera_dispatch_without_capture_total` is registered here; SP-4
PR-A's `DispatchCaptureCounter` plugs in via a single `.inc()` call
once both PRs merge.

## AIN-238 — structured JSON logging with secret scrubbing

`services/structured_log.py` — `StructuredJSONFormatter` emits one
JSON object per record + scrubs secrets in two layers:
  1. Per-KEY scrubbing for structured `extra` fields (`api_key`,
     `password`, `secret`, `token`, `authorization`, `cookie`,
     `prompt`, `messages`, `content`).
  2. Regex pass for known secret SHAPES in freeform message text
     (`ai_infera_*`, `sk-*`, `Bearer *`, JWT `eyJ*.*.*`).

Tracebacks also flow through the scrubber. Wired in `main.py` via
`logging.basicConfig(handlers=[...], force=True)` BEFORE the routers
import so startup log lines are also scrubbed.

## Tests
- `tests/unit/test_structured_log.py` — 10 cases (each secret format
  + structured extra + nested dicts + innocent passthrough + tracebacks).
- `tests/unit/test_metrics_registry.py` — 13 cases (primitives, label
  escaping, cumulative buckets, sorted render, named-series wrappers).

Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke
= 529 green.

## Privacy guardrails (SP-5 §1)

- NO tenant_id, agent_id, owner_handle, or any PII appears as a
  metrics label.
- `/metrics` is internal-key gated; tenant cardinality (if ever
  needed) lands on a stricter-auth endpoint.
- Log lines are scrubbed by both KEY and SHAPE. The
  `test_extra_field_with_prompt_label_redacted` test locks "prompt
  content is PII; never log it" into CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…itch

Last build sprint. Three rails (CDP/x402/USDC, Stripe Customer Balance,
Xendit Customer Balance) integrated as code behind a SINGLE master flag
AINFERA_PAYMENTS_LIVE. Default OFF. Every payment path is inert until
the founder flips post-SG-incorporation per docs/payment-activation-runbook.md.

See PR description for full details: master flag + 3 rail adapters +
metering→charge orchestrator (read-only on §16) + webhook router +
reconciliation dry-run + 7-step activation runbook + comprehensive
inertness/margin-math/routing-outcomes-readonly/router tests +
OpenAPI contract updated.

After SP-8, everything Aulë can build is built. Remaining distance
to launch is exclusively founder/legal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issues.

Reviewed by Cursor Bugbot for commit c4026f9. Configure here.

tenant_id=tenant.id,
flattened_msgs=flattened_msgs,
idempotency_key=idempotency_key,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streaming path ignores vendor passthrough model selection

High Severity

When stream=true, the handler unconditionally delegates to _serve_messages_stream which always calls dispatch_with_brain. Unlike the non-streaming path (which delegates to post_inference with its _is_routed(body.model) check), the streaming path never handles vendor passthrough models (e.g. claude-opus-4-7). A user requesting a specific pinned backend with streaming enabled will have their model choice ignored and get brain-routed to a potentially different model.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c4026f9. Configure here.

max_tokens=body.max_tokens,
temperature=body.temperature,
stream=body.stream,
stream=False,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tools silently dropped in non-streaming messages path

High Severity

InferenceRequest has no tools or tool_choice field, so body.tools from the Anthropic request is never passed to post_inference. Tools are silently dropped in the non-streaming /v1/messages path. The except ToolsNotSupportedError handler (line 327) is dead code since adapters never receive tools through this pipeline.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c4026f9. Configure here.

"rail": rail,
"message": f"webhook to {rail} arrived without its rail-specific signature header",
},
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Webhook reports wrong error for unknown rail

Low Severity

The signature-header lookup dictionary only contains the three known rails. If an unknown rail value is provided, .get(rail) returns None, and the handler raises a misleading 400 missing_signature_header error instead of reaching select_adapter(rail) which would give the correct unknown_rail error.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c4026f9. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant