Track your local inference savings vs API costs in real-time. See how much you're saving by running models locally instead of paying API rates.
97% cheaper at home! Local inference costs ~$0.01 vs $0.30/M tokens on the API.
- Automatic tracking - Plugin tracks every request silently
- Cache-aware - Tracks prefix caching (fresh vs cached tokens)
- Simple commands -
/savingsto see your savings - Easy install - Copy files, restart, done
mkdir -p ~/.config/opencode/plugins
cp src/index.ts ~/.config/opencode/plugins/savings-tracker.tsmkdir -p ~/.config/opencode/commands
cp .opencode/commands/*.md ~/.config/opencode/commands/# Exit and reopen opencode, or restart the daemonopencode run "hello" # Generate some tokens
opencode run "/savings" # See your savings!| Command | Description |
|---|---|
/savings |
Show savings summary |
/savings-reset confirm: true |
Reset all tracking data |
| Token Type | Rate | Notes |
|---|---|---|
| Fresh input | $0.30/M | New tokens processed |
| Cache read | $0.06/M | From prefix caching |
| Output | $1.20/M | Generated tokens |
Your local cost: Just electricity (~$0.0001/M tokens at $0.12/kWh)
Savings Tracker Summary
======================
Period: 0.5 days (since 2024-04-24)
Usage:
Total requests: 25
Total input tokens: 3,456,789
- Fresh: 2,123,456
- Cache read: 1,333,333
Total output tokens: 45,678
Costs:
minimax/nvfp4 API: $12.34
- Cache read: $0.08
- Fresh input: $0.64
- Output: $0.55
Local inference: $0.01
-------------------------
Net savings: $12.33 (99% cheaper at home)
Create ~/.local/share/opencode-savings/config.json to customize:
{
"providers": ["llama-*", "minimax-*"],
"baseline": {
"provider": "minimax",
"model": "nvfp4",
"inputCostPer1M": 0.30,
"outputCostPer1M": 1.20,
"cacheReadCostPer1M": 0.06
},
"gpus": [
{
"wattage": 275,
"promptTokensPerSecond": 1000,
"outputTokensPerSecond": 43,
"costPerKwh": 0.12
}
]
}| Option | Default | Description |
|---|---|---|
providers |
["llama-*", "minimax-*"] |
Provider patterns to track |
inputCostPer1M |
$0.30 |
Input token rate |
outputCostPer1M |
$1.20 |
Output token rate |
cacheReadCostPer1M |
$0.06 |
Cache read rate |
wattage |
275 |
GPU wattage |
promptTokensPerSecond |
1000 |
Input speed |
outputTokensPerSecond |
43 |
Generation speed |
costPerKwh |
$0.12 |
Electricity rate |
- OpenCode with
@opencode-ai/pluginand@opencode-ai/sdk - vLLM with prefix caching enabled:
--enable-prefix-caching --enable-prompt-tokens-details - Or llama.cpp server (KV cache enabled by default)
No cache tokens showing?
- Make sure vLLM has
--enable-prompt-tokens-detailsflag - Restart the vLLM server after adding flags
Plugin not loading?
- Check for syntax errors:
opencode run "hello" 2>&1 | head -20 - Verify plugin path exists:
ls ~/.config/opencode/plugins/
Reset tracking data:
opencode run "/savings-reset confirm: true"See GitHub Issues for:
- Track savings over time with charts
- Show cache hit rate percentage