Skip to content

fix: correct cache_hit_rate calculation and fix Vercel stream tool call handling#10994

Open
sestinj wants to merge 2 commits intomainfrom
nate/fix-cache-hit-rate-telemetry
Open

fix: correct cache_hit_rate calculation and fix Vercel stream tool call handling#10994
sestinj wants to merge 2 commits intomainfrom
nate/fix-cache-hit-rate-telemetry

Conversation

@sestinj
Copy link
Contributor

@sestinj sestinj commented Mar 3, 2026

Summary

  • Fix cache_hit_rate telemetry: The prompt_cache_metrics event was emitted twice per completion, and the cache hit rate denominator used only prompt_tokens (which maps to Anthropic's input_tokens — non-cached only). This caused ratios >> 1 when caching worked well (max observed: 89,892). Fixed by removing the duplicate emission and using the correct total: prompt_tokens + cache_read_tokens + cache_write_tokens.

  • Fix Vercel AI SDK tool call streaming: The Vercel AI SDK streams tool calls as tool-input-starttool-input-deltatool-input-endtool-call. Previously tool-input-start was ignored and tool-call emitted the full call at the end, so streaming consumers never saw the tool call id on intermediate chunks. Now tool-input-start emits the initial chunk with id and function name (matching OpenAI's streaming format), and tool-call is a no-op to avoid duplicating args.

Test plan

  • Unit tests updated and passing for vercelStreamConverter.test.ts (15 tests)
  • Vercel SDK integration tests should now pass in CI (locally blocked by missing @ai-sdk/xai dep, env-only issue)

Two bugs in prompt_cache_metrics telemetry:

1. Duplicate emission: prompt_cache_metrics was emitted twice per API
   request — once using `actualInputTokens` and again using
   `fullUsage.prompt_tokens`. This doubled all event counts in PostHog
   and produced conflicting values.

2. Wrong denominator: cache_hit_rate was calculated as
   `cacheReadTokens / prompt_tokens`, but the Anthropic adapter maps
   `prompt_tokens` to only non-cached input tokens (`input_tokens`),
   excluding cache reads and writes. When caching works well, this
   produces ratios >> 1 (observed max: 89,892). The correct total is
   `prompt_tokens + cache_read_tokens + cache_write_tokens`.

Fix: remove the first duplicate emission and compute total_prompt_tokens
as the sum of all three token types. cache_hit_rate is now a proper 0-1
ratio.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sestinj sestinj requested a review from a team as a code owner March 3, 2026 05:01
@sestinj sestinj requested review from RomneyDa and removed request for a team March 3, 2026 05:01
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Mar 3, 2026
@continue
Copy link
Contributor

continue bot commented Mar 3, 2026

Docs Review: No documentation updates needed.

This PR contains internal telemetry fixes (correcting cache_hit_rate calculation and removing duplicate event emission) that don't affect user-facing features, configuration options, or developer workflows. The changes are purely internal to Continue's analytics infrastructure.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

…nverter

The Vercel AI SDK streams tool calls as tool-input-start → tool-input-delta
→ tool-input-end → tool-call. Previously, tool-input-start was ignored (returned
null) and tool-call emitted the full tool call at the end, which meant streaming
consumers never saw the tool call id on intermediate chunks.

Now tool-input-start emits the initial chunk with id and function name (matching
OpenAI's streaming format), and tool-call returns null to avoid duplicating args
already streamed via tool-input-delta.

Generated with [Continue](https://continue.dev)

Co-Authored-By: Continue <noreply@continue.dev>
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Mar 3, 2026
@sestinj sestinj changed the title fix: correct cache_hit_rate calculation and remove duplicate emission fix: correct cache_hit_rate calculation and fix Vercel stream tool call handling Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

1 participant