Skip to content

feat(billing): revert reserved message slot on failed/aborted chats#51

Merged
ABB65 merged 1 commit into
mainfrom
fix/billing-commit-or-revert-usage
May 15, 2026
Merged

feat(billing): revert reserved message slot on failed/aborted chats#51
ABB65 merged 1 commit into
mainfrom
fix/billing-commit-or-revert-usage

Conversation

@ABB65
Copy link
Copy Markdown
Member

@ABB65 ABB65 commented May 15, 2026

Summary

Both chat handlers reserve a message slot up-front for race-free cap enforcement, but every reservation was previously terminal. A pre-AI failure (DB error, brain-cache failure, Anthropic auth failure, abort before first token) consumed a slot even though no LLM cost was incurred — workspaces lost monthly quota to work that never happened.

Billing semantic (review-agreed)

A message becomes billable the moment Anthropic streams its first real provider event (text or tool_use).

  • tool_result is internal (engine-emitted post tool exec) → not a commit trigger
  • done is synthetic → not a commit trigger
  • error may fire before any tokens → not a commit trigger

Pre-first-event failure → refund slot. Post-first-event failure (mid-stream error, abort, save fail) → keep slot, we paid Anthropic.

Changes

Migration 007_agent_usage_revert_rpcs.sql

Adds decrement_agent_usage and decrement_api_usage SECURITY DEFINER RPCs with GREATEST(message_count - 1, 0) clamp — double-revert race (caller's outer catch + inner finally both invoking best-effort wrappers) cannot drive the counter negative.

Provider surface

  • decrementAgentUsage({ workspaceId, userId, month, source })
  • decrementAPIUsage({ workspaceId, apiKeyId, month })

Handlers

Per review feedback (finally inside processChat would miss pre-AI failures because conversation/history/brain/branches load happens at handler scope), the structure now scopes the revert correctly:

  • Studio chat (SSE): reserved / committed declared above request body; outer try wraps reserve + all pre-AI work + processChat launch + eventStream.send(); outer catch invokes tryRevert('pre-AI') then rethrows so h3 converts to HTTP error. ProcessChat's own finally invokes tryRevert('post-reserve') for failures once the SSE pipeline is live.
  • Conversation API (JSON): single try/finally covers the entire reserve-onward scope — not SSE, no fire-and-forget.
  • recordAIUsage / recordAPIUsage moved into the for-await loop, gated on the same first-event commit so failed/aborted requests never emit a meter event to Polar.
  • tryRevert(reason) logs RPC failures (console.error) rather than silently swallowing them — silent revert failure makes quota drift hard to debug.

Out of scope

Partial token accounting on aborted streams. The engine still only captures inputTokens/outputTokens on message_end, so a stream that errors after first token writes 0 tokens to the usage row. Anthropic billed us for those tokens but the dashboard underreports. Separate observability follow-up — different concern (cost capture vs. quota refund).

Test plan

  • pnpm test — 590 passed (587 + 3 new integration tests)
  • pnpm typecheck clean
  • pnpm lint — 0 errors on changed files

Three new integration tests in chat-route.integration.test.ts:

Test Scenario Assertion
refunds the reserved slot when the provider errors before yielding any event Stream throws on first next() decrementAgentUsage called once, recordAIUsage NOT called
keeps the reservation when the provider yields at least one event before erroring Stream yields text then throws decrementAgentUsage NOT called, recordAIUsage called once
refunds the reserved slot when a pre-AI step fails after reservation loadConversationMessages rejects decrementAgentUsage called once, recordAIUsage NOT called, HTTP 500

Conversation API path uses the symmetric helper (decrementAPIUsage); its integration test coverage stays in the same E2E harness pattern when we add the Conversation API integration suite (separate PR for the harness).

Both the Studio chat handler and the Conversation API handler reserve
a message slot up-front via `incrementAgentUsageIfAllowed` /
`incrementAPIUsageIfAllowed` so concurrent requests can't double-spend
the last cap slot. Previously every reservation was terminal — a
pre-AI DB error, brain-cache failure, Anthropic auth failure, or
client abort before any tokens streamed back still consumed a slot
even though no LLM cost was incurred.

The runtime rule established in review: a message becomes billable
the moment Anthropic streams its first real provider event (`text` or
`tool_use`). `tool_result` is internal (engine-emitted post tool
exec), `done` is synthetic, and `error` can fire before any tokens —
none of those qualify. Any failure before the first billable event
refunds the slot. After the first billable event we keep the
reservation regardless of downstream outcome (mid-stream error,
client abort, DB save failure) — we paid Anthropic for those tokens.

Migration 007 adds `decrement_agent_usage` and `decrement_api_usage`
SECURITY DEFINER RPCs with a `GREATEST(message_count - 1, 0)` clamp
so a double-revert race (caller's outer catch + inner finally both
calling through best-effort wrappers) can't drive the counter
negative.

Handler restructure mirrors the rule:
- `reserved` / `committed` flags scoped above the request body.
- `tryRevert(reason)` helper logs RPC failures rather than swallowing
  them (silent revert failure makes quota drift hard to debug, per
  review feedback).
- Studio chat: outer try wraps reserve + all pre-AI work + processChat
  launch; outer catch hits `tryRevert('pre-AI')` then rethrows.
  ProcessChat's own finally hits `tryRevert('post-reserve')` for any
  failure once the SSE pipeline is live.
- Conversation API: single try/finally — non-SSE request/response, so
  one scope covers both pre-AI and AI paths.
- `recordAIUsage` / `recordAPIUsage` outbox writes moved into the
  for-await loop, gated on the same first-event commit so failed
  requests never emit a meter event to Polar.

Three new integration tests cover the matrix:
- Provider errors before any event → revert called, no meter event.
- Provider yields a text event then errors → no revert, meter event
  written (committed = true).
- Pre-AI step (loadConversationMessages) throws after reservation →
  outer catch fires revert, no meter event.

Token accounting on aborted/partial streams stays out of scope — the
engine still only captures totals on `message_end`, so a stream that
errors after first token writes 0 tokens to the usage row. That's a
separate observability follow-up.
@ABB65 ABB65 merged commit 66884c7 into main May 15, 2026
1 check passed
@ABB65 ABB65 deleted the fix/billing-commit-or-revert-usage branch May 15, 2026 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant