|
1 | | -# Anthropic Prompt Caching Examples (AI SDK v5) |
| 1 | +# Prompt Caching Examples (AI SDK v5) |
2 | 2 |
|
3 | | -This directory contains examples demonstrating Anthropic's prompt caching feature via OpenRouter using Vercel AI SDK v5. |
| 3 | +Examples demonstrating prompt caching with Vercel AI SDK v5. |
4 | 4 |
|
5 | | -## What is Prompt Caching? |
| 5 | +## Documentation |
6 | 6 |
|
7 | | -Anthropic's prompt caching allows you to cache large portions of your prompts to: |
8 | | -- **Reduce costs** - Cached tokens cost significantly less |
9 | | -- **Improve latency** - Cached content is processed faster |
10 | | -- **Enable larger contexts** - Use more context without proportional cost increases |
| 7 | +For full prompt caching documentation including all providers, pricing, and configuration details, see: |
| 8 | +- **[Prompt Caching Guide](../../../../docs/prompt-caching.md)** |
11 | 9 |
|
12 | | -Cache TTL: 5 minutes for ephemeral caches |
| 10 | +## Examples in This Directory |
13 | 11 |
|
14 | | -## Examples |
| 12 | +- `user-message-cache.ts` - Cache large context in user messages |
| 13 | +- `multi-message-cache.ts` - Cache system prompt across multi-turn conversations |
| 14 | +- `no-cache-control.ts` - Control scenario (validates methodology) |
| 15 | + |
| 16 | +## Quick Start |
15 | 17 |
|
16 | | -### User Message Cache (`user-message-cache.ts`) |
17 | | -Cache large context in user messages using AI SDK: |
18 | 18 | ```bash |
| 19 | +# Run an example |
19 | 20 | bun run typescript/ai-sdk-v5/src/prompt-caching/user-message-cache.ts |
20 | 21 | ``` |
21 | 22 |
|
22 | | -**Pattern**: User message with `providerOptions.openrouter.cacheControl` |
23 | | - |
24 | | -## How to Use with AI SDK |
| 23 | +## AI SDK v5 Usage |
25 | 24 |
|
26 | 25 | ```typescript |
27 | 26 | import { createOpenRouter } from '@openrouter/ai-sdk-provider'; |
28 | | -import { generateText } from 'ai'; |
29 | 27 |
|
30 | | -// CRITICAL: Must include stream_options for usage details |
31 | 28 | const openrouter = createOpenRouter({ |
32 | | - apiKey: process.env.OPENROUTER_API_KEY, |
33 | 29 | extraBody: { |
34 | | - stream_options: { include_usage: true }, // Required! |
| 30 | + stream_options: { include_usage: true }, // Required for cache metrics |
35 | 31 | }, |
36 | 32 | }); |
37 | 33 |
|
| 34 | +// Use providerOptions.openrouter.cacheControl on content items |
38 | 35 | const result = await generateText({ |
39 | 36 | model: openrouter('anthropic/claude-3.5-sonnet'), |
40 | | - messages: [ |
41 | | - { |
42 | | - role: 'user', |
43 | | - content: [ |
44 | | - { |
45 | | - type: 'text', |
46 | | - text: 'Large context here...', |
47 | | - providerOptions: { |
48 | | - openrouter: { |
49 | | - cacheControl: { type: 'ephemeral' }, // Cache this block |
50 | | - }, |
51 | | - }, |
52 | | - }, |
53 | | - { |
54 | | - type: 'text', |
55 | | - text: 'Your question here', |
56 | | - }, |
57 | | - ], |
58 | | - }, |
59 | | - ], |
| 37 | + messages: [{ |
| 38 | + role: 'user', |
| 39 | + content: [{ |
| 40 | + type: 'text', |
| 41 | + text: 'Large context...', |
| 42 | + providerOptions: { |
| 43 | + openrouter: { cacheControl: { type: 'ephemeral' } } |
| 44 | + } |
| 45 | + }] |
| 46 | + }] |
60 | 47 | }); |
61 | 48 |
|
62 | 49 | // Check cache metrics |
63 | | -const cachedTokens = result.providerMetadata?.openrouter?.usage?.promptTokensDetails?.cachedTokens ?? 0; |
64 | | -``` |
65 | | - |
66 | | -## Important Notes |
67 | | - |
68 | | -### Critical Configuration |
69 | | -**MUST include `extraBody: { stream_options: { include_usage: true } }`** |
70 | | -- Without this, usage details (including cached_tokens) are not populated |
71 | | -- This is a provider-level configuration, not per-request |
72 | | - |
73 | | -### Cache Metrics Location |
74 | | -Cache metrics are in `providerMetadata.openrouter.usage`: |
75 | | -```typescript |
76 | | -{ |
77 | | - promptTokens: number, |
78 | | - completionTokens: number, |
79 | | - promptTokensDetails: { |
80 | | - cachedTokens: number // Number of tokens read from cache |
81 | | - } |
82 | | -} |
| 50 | +const cached = result.providerMetadata?.openrouter?.usage?.promptTokensDetails?.cachedTokens ?? 0; |
83 | 51 | ``` |
84 | | - |
85 | | -### Requirements |
86 | | -1. **stream_options.include_usage = true** - CRITICAL for usage details |
87 | | -2. **Minimum 2048+ tokens** - Smaller content may not be cached |
88 | | -3. **providerOptions.openrouter.cacheControl** - On content items, not messages |
89 | | -4. **Exact match** - Cache only hits on identical content |
90 | | - |
91 | | -### Expected Behavior |
92 | | -- **First call**: `cachedTokens = 0` (cache miss, creates cache) |
93 | | -- **Second call**: `cachedTokens > 0` (cache hit, reads from cache) |
94 | | - |
95 | | -## Scientific Method |
96 | | -All examples follow evidence-based verification: |
97 | | -- **Hypothesis**: providerOptions.openrouter.cacheControl triggers caching |
98 | | -- **Experiment**: Make identical calls twice |
99 | | -- **Evidence**: Measure via providerMetadata.openrouter.usage |
100 | | -- **Analysis**: Compare cache miss vs cache hit |
0 commit comments