fix(deepseek): extract prompt_cache_hit_tokens and reasoning_tokens from usage#1021
Open
Lintume wants to merge 1 commit into
Open
fix(deepseek): extract prompt_cache_hit_tokens and reasoning_tokens from usage#1021Lintume wants to merge 1 commit into
Lintume wants to merge 1 commit into
Conversation
…rom usage The DeepSeek Text and Stream handlers hardcode `Usage` to only `prompt_tokens` and `completion_tokens`, silently dropping two DeepSeek-specific usage fields: - `usage.prompt_cache_hit_tokens` — cached input portion of the prompt. DeepSeek offers a 98% discount on cache hits (their headline feature) and reports the hit/miss split as separate counters. - `usage.completion_tokens_details.reasoning_tokens` — internal thinking tokens emitted by reasoning models (deepseek-reasoner, deepseek-v4-flash thinking mode). Without these, cost trackers that subscribe to `cacheReadInputTokens` see zero and charge the full `prompt_tokens` at fresh rate — overstating real spend ~3-5x once the prompt cache warms up. Reasoning-mode token usage is invisible to observability tooling. Both handlers now subtract `prompt_cache_hit_tokens` from `prompt_tokens` to derive the fresh-prompt count, and populate `Usage` with `cacheReadInputTokens` and `thoughtTokens`. Mirrors what the Gemini and OpenAI handlers already do for their analogous fields. The multi-step tools test asserts the new semantics: aggregated promptTokens reflects fresh-only counts and the previously-invisible cacheReadInputTokens is now exposed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
DeepSeek\Handlers\Text::addStep()andStream::extractUsage()hardcodeUsageto onlyprompt_tokensandcompletion_tokensfrom the API response, silently dropping two DeepSeek-specific fields:usage.prompt_cache_hit_tokens— cached input portion of the prompt. DeepSeek offers a 98% discount on cache hits (their headline feature) and exposes this as a separate counter.usage.completion_tokens_details.reasoning_tokens— internal thinking tokens emitted by reasoning models (`deepseek-reasoner`, `deepseek-v4-flash` thinking mode).Impact
Changes
All 28 DeepSeek tests pass. Pint clean.
Testing
Verified against `deepseek-v4-flash` direct API in production:
References
Non-overlapping with #1020 (which fixes `reasoning_content` round-trip) — both fixes are complementary; this one is purely about exposing usage detail that's already in the response.