Skip to content

refactor(chat): shared history builder with model/plan/source-aware budgets#52

Merged
ABB65 merged 1 commit into
mainfrom
refactor/chat-history-builder
May 15, 2026
Merged

refactor(chat): shared history builder with model/plan/source-aware budgets#52
ABB65 merged 1 commit into
mainfrom
refactor/chat-history-builder

Conversation

@ABB65
Copy link
Copy Markdown
Member

@ABB65 ABB65 commented May 15, 2026

Summary

Studio chat and the Conversation API carried two near-identical copies of the sliding-window history builder: hard-coded 8K token ceiling, magic loadConversationMessages(..., 50) row cap, and divergent tool_calls / toolCalls casing handling. The 8K ceiling was left over from earlier model defaults and is undersized for Claude 4 (200K-window models).

This PR collapses both into server/utils/conversation-history.ts and makes the budget model/plan/source-aware.

Budget table (review-agreed)

// Model — capability + pricing tier
Haiku 4.5            12K
Sonnet 4 / 4.5 / 4.6 32K / 40K / 48K
Opus 4 / 4.1 / 4.7   32K / 32K / 48K
fallback             16K

// Plan — Contentrain's per-message margin posture
free          0      // defensive backstop, gated upstream
starter       0.75x
pro           1x
enterprise    1.25x
community     1x

// Source — who pays for the input tokens
studio / api  1x
byoa          1.5x   // user pays Anthropic directly

maxTokens = base × plan × source. Sonnet/Opus values are conservative on purpose — once prompt caching lands (cache reads cost ~10% of base input) the model table can safely grow. Sources: models overview, pricing.

Changes

  • New server/utils/conversation-history.tsselectHistoryBudget() + buildPromptMessages(). Pure functions; no DB, no provider.
  • chat.post.ts — model selection moved before history (model drives budget); IIFE + push loop replaced with two helper calls.
  • ee/enterprise/conversation-api.ts — same pattern; duplicate loadConversationMessages wrapper renamed loadConversationHistoryForResponse and now serves only the /history.get route's enriched JSON response shape (the runtime chat path doesn't need usage/createdAt fields).
  • Integration test mocks — added vi.mock('~~/server/utils/conversation-history') to chat-route and overage-soft-cap integration files. One ownership-flow assertion relaxed from hard-coded 50 to expect.any(Number) (rowLimit is derived; budget arithmetic covered by the new unit tests).

Test plan

  • pnpm test — 605 passed (590 + 15 new unit tests in conversation-history.test.ts)
  • pnpm typecheck clean
  • pnpm lint — 0 errors on changed files

13 unit tests cover the matrix: per-model budget, fallback, plan multipliers (starter/enterprise/community/unknown), free→0 defensive backstop, BYOA 1.5x, API no-multiplier, rowLimit scaling, empty history, fits-in-budget, exceeds-budget, zero-budget, snake_case tool_calls, camelCase toolCalls, chronological ordering after cutoff.

Out of scope

  • Tokenization accuracy. Still using length/4 heuristic — accurate Anthropic countTokens integration is a separate PR.
  • Prompt cache breakpoints. Next PR (perf(ai): system-block array + cache_control); once cached, the Sonnet/Opus budget values in the table here can grow with margin headroom.

Net diff

+379 / −64 — most of the size is unit test coverage. Net runtime behavior change: conversations now use 1.5x–6x more history before cutoff depending on model/plan/source, but cost stays bounded by the multiplier table.

…udgets

`chat.post.ts` and `ee/enterprise/conversation-api.ts` carried two
copies of the same sliding-window history builder: a hard-coded 8K
token ceiling, a magic `loadConversationMessages(..., 50)` row cap,
and divergent `tool_calls`/`toolCalls` casing handling. Both also
ignored the fact that Claude 4 model windows are now 200K — 8K was
left over from earlier model defaults and was undersized for any
non-toy conversation.

This PR collapses both into `server/utils/conversation-history.ts`:

- `selectHistoryBudget({ plan, model, source })` returns the per-call
  token ceiling and a derived `rowLimit` for DB pagination. Budget is
  decomposed along three axes:
    Model — capability and pricing tier:
      Haiku 4.5            12K
      Sonnet 4 / 4.5 / 4.6 32K / 40K / 48K
      Opus 4 / 4.1 / 4.7   32K / 32K / 48K
      fallback             16K
    Plan — Contentrain's per-message margin posture:
      free          0  (defensive backstop, should never reach chat)
      starter       0.75x
      pro           1x
      enterprise    1.25x
      community     1x
    Source — who pays for the input tokens:
      studio / api  1x
      byoa          1.5x  (user pays Anthropic directly)
  Computed budget = base × plan × source. Sonnet/Opus values are
  intentionally conservative; once prompt caching lands (cache reads
  cost ~10% of base input) the model table can grow safely.

- `buildPromptMessages({ history, newUserMessage, budget })` walks
  rows newest→oldest under the token cap, then takes the kept slice
  in chronological order and appends the current user message. Handles
  both `tool_calls` (snake_case from DB) and `toolCalls` (legacy EE
  wrapper) for the same content.

Studio chat (`chat.post.ts`) now picks model before history (model
drives budget), then calls the two helpers in place of the IIFE +
push loop. Conversation API mirrors the change and drops its
duplicate `loadConversationMessages` wrapper — that helper is renamed
`loadConversationHistoryForResponse` and kept only for the
`/history.get` route's JSON response shape, which still needs the
enriched `{ usage, createdAt }` projection.

Integration test mocks were missing the `~~/server/utils/conversation-history`
entry; added to both `chat-route` and `overage-soft-cap` integration
files. One assertion in `chat-route` previously hard-coded the magic
`50` row limit — relaxed to `expect.any(Number)` since rowLimit is
now derived (covered by the new unit tests).

Net: −18 lines, +228 (mostly tests). No DB or schema changes; no
runtime behavior change beyond "uzun konuşmalarda daha fazla history
korunuyor" along the budget table.
@ABB65 ABB65 merged commit e61711e into main May 15, 2026
1 check passed
@ABB65 ABB65 deleted the refactor/chat-history-builder branch May 15, 2026 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant