Skip to content

fix(deepseek): extract prompt_cache_hit_tokens and reasoning_tokens from usage#1021

Open
Lintume wants to merge 1 commit into
prism-php:mainfrom
Lintume:fix/deepseek-usage-extraction
Open

fix(deepseek): extract prompt_cache_hit_tokens and reasoning_tokens from usage#1021
Lintume wants to merge 1 commit into
prism-php:mainfrom
Lintume:fix/deepseek-usage-extraction

Conversation

@Lintume
Copy link
Copy Markdown

@Lintume Lintume commented May 22, 2026

Description

DeepSeek\Handlers\Text::addStep() and Stream::extractUsage() hardcode Usage to only prompt_tokens and completion_tokens from the API response, silently dropping two DeepSeek-specific fields:

  • usage.prompt_cache_hit_tokens — cached input portion of the prompt. DeepSeek offers a 98% discount on cache hits (their headline feature) and exposes this as a separate counter.
  • usage.completion_tokens_details.reasoning_tokens — internal thinking tokens emitted by reasoning models (`deepseek-reasoner`, `deepseek-v4-flash` thinking mode).

Impact

  • Apps that compute cost from Prism's `Usage` charge the full `prompt_tokens` at fresh rate — overstating real spend ~3-5x once the prompt cache warms up.
  • No signal to derive a `cacheHitRatio` for monitoring prompt-prefix stability.
  • Reasoning-mode token consumption is invisible to observability tooling (Langfuse, custom dashboards, etc.).

Changes

  • `src/Providers/DeepSeek/Handlers/Text.php`::addStep() — read both fields from `usage`, subtract cache hit from `prompt_tokens` to derive the fresh-prompt count, populate `Usage` with `cacheReadInputTokens` and `thoughtTokens`. Mirrors what the Gemini and OpenAI handlers already do for their analogous fields.
  • `src/Providers/DeepSeek/Handlers/Stream.php`::extractUsage() — same fix in the streaming path.
  • `tests/Providers/DeepSeek/TextTest.php` — multi-step tools test updated to assert the new semantics (fresh-only `promptTokens` aggregated across steps, plus the previously-invisible `cacheReadInputTokens`).

All 28 DeepSeek tests pass. Pint clean.

Testing

Verified against `deepseek-v4-flash` direct API in production:

  • Cold cache request → `cacheReadInputTokens=0`, `promptTokens=`
  • Warm cache (same prefix within ~hour TTL) → `cacheReadInputTokens=`, `promptTokens=`
  • Reasoning model → `thoughtTokens` populated and matches the DeepSeek platform dashboard

References

Non-overlapping with #1020 (which fixes `reasoning_content` round-trip) — both fixes are complementary; this one is purely about exposing usage detail that's already in the response.

…rom usage

The DeepSeek Text and Stream handlers hardcode `Usage` to only `prompt_tokens`
and `completion_tokens`, silently dropping two DeepSeek-specific usage fields:

- `usage.prompt_cache_hit_tokens` — cached input portion of the prompt.
  DeepSeek offers a 98% discount on cache hits (their headline feature)
  and reports the hit/miss split as separate counters.
- `usage.completion_tokens_details.reasoning_tokens` — internal thinking
  tokens emitted by reasoning models (deepseek-reasoner, deepseek-v4-flash
  thinking mode).

Without these, cost trackers that subscribe to `cacheReadInputTokens` see
zero and charge the full `prompt_tokens` at fresh rate — overstating real
spend ~3-5x once the prompt cache warms up. Reasoning-mode token usage
is invisible to observability tooling.

Both handlers now subtract `prompt_cache_hit_tokens` from `prompt_tokens`
to derive the fresh-prompt count, and populate `Usage` with
`cacheReadInputTokens` and `thoughtTokens`. Mirrors what the Gemini and
OpenAI handlers already do for their analogous fields.

The multi-step tools test asserts the new semantics: aggregated
promptTokens reflects fresh-only counts and the previously-invisible
cacheReadInputTokens is now exposed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant