🪖 Trooper

Your LLM didn't crash — it fell back and kept going.

Quota errors should be invisible.

As LLM APIs get rate-limited and expensive, local fallback isn't optional anymore.

→ Claude fails       → continues on Ollama
→ Simple prompts     → never hit the cloud
→ Every response     → shows tokens saved

Trooper is a circuit breaker + router + context engine for LLMs.

What you see

Every response tells you exactly what happened — no dashboards, no setup:

# Simple question → Ollama handled it, cloud never contacted
X-Trooper-Provider: ollama
X-Trooper-Decision: ollama (simple turn) | cloud skipped
X-Trooper-Session-Saved: 42 tokens

# Complex question → Claude handled it
X-Trooper-Provider: claude
X-Trooper-Summary: claude (direct) ✓

# Claude quota hit → fell back to Ollama, context preserved
X-Trooper-Provider: ollama
X-Trooper-Decision: ollama (fallback: credit_balance)
X-Trooper-Session-Saved: 42 tokens
X-Trooper-Summary: claude → ollama (credit_balance) | context ✓

X-Trooper-Session-Saved accumulates across the session — every turn routed locally instead of to a paid API adds to the count.

What Trooper is

Trooper is a drop-in proxy for LLM apps. When cloud models fail — quota, rate limits, outages — it automatically falls back to your local Ollama instance while preserving full conversation context.

No retries. No crashes. No lost sessions. ⏱ Runs in under 60 seconds.

Who uses Trooper

App developers — your users never see quota errors. Trooper fails over to local Ollama transparently while your app keeps running.

Agent builders — agent loops survive quota limits mid-task. Context is preserved so the agent continues exactly where it left off.

Claude Code / Cursor users — coding sessions survive quota hits. No lost context, no starting over.

Privacy-conscious developers — use x_force_local to keep sensitive requests off the cloud without interrupting the session.

Why not LiteLLM or Bifrost

LiteLLM and Bifrost route between cloud providers.

Trooper is built for a different failure mode: when the cloud stops working.

	LiteLLM / Bifrost	Trooper
Fallback target	Another cloud provider	Your local machine
Setup	`pip install`, venv, YAML	One Go binary, env vars
Dependencies	Heavy Python stack	Zero — pure stdlib
Works offline	❌	✅
Data on fallback	Goes to another cloud	Stays on your machine

When LiteLLM falls back, your data goes to another cloud. When Trooper falls back, your data goes to your machine.

Smart routing

Trooper decides when the cloud is overkill.

The classifier is rule-based and deterministic — no LLM call, no latency, no cost to classify. Most routing tools call an LLM to decide routing. Trooper doesn't.

Simple, stateless requests route directly to your local Ollama — no API call, no cost:

"how many days in a week"  →  Ollama directly 🪖  (cloud never contacted)
"explain why goroutines…"  →  Claude ✅           (needs reasoning)

Routes to Ollama: factual lookups, definitions, formatting, conversation meta, short stateless summaries

Always goes to Claude: reasoning, judgment, multi-step tasks, context-aware summaries, code, messages over 20 words

How Trooper handles context

The hard part of fallback isn't switching models — it's keeping context.

Trooper solves that with a 3-layer compaction system:

ANCHOR  (~10%)  — First 2 turns verbatim, never dropped
SITREP  (~20%)  — Rule-based summary of middle turns
TAIL    (~70%)  — Last N turns verbatim
                  Total <= 6144 tokens (configurable)

The SITREP is extracted automatically — no LLM call needed. From a real session:

[TROOPER_SITREP]{
  "intent": "building a go proxy called trooper that falls back to local",
  "stage": "in_progress",
  "constraints": ["local-first", "proxy-layer"],
  "active_entities": ["Trooper", "Ollama", "Claude"],
  "open_loops": ["streaming pending"],
  "recent_actions": ["deploy monday", "check streaming"],
  "resolved_loops": ["resolve the health check"],
  "confidence": 1.00
}[/TROOPER_SITREP]

Compaction triggers automatically when the session exceeds the token budget:

📦  Context compaction triggered — 1532 tokens exceeds 6144 budget
    Anchor turns   : 2 (~180 tokens)
    Middle turns   : 2 → SITREP (~148 tokens)
    Recent turns   : 1 (~36 tokens)
    Tokens used    : 364 / 6144

Honest note: Compaction is lossy by design. The SITREP preserves intent and state — not verbatim history. For precision-critical workflows, keep sessions short or increase CONTEXT_WINDOW.

Quickstart

⏱ Runs in under 60 seconds.

Prerequisites

ollama pull qwen2.5:3b

💡 Eliminate cold-start latency — set OLLAMA_KEEP_ALIVE=24h in your Ollama systemd service. Without this, the first fallback after idle takes 3–5s for 7B models, up to 20s for 72B. Add to your systemd service:
Environment="OLLAMA_KEEP_ALIVE=24h"

Option 1 — Docker (no Go required)

git clone https://github.com/shouvik12/trooper
cd trooper
cp .env.example .env
# edit .env — set CLAUDE_API_KEY
docker compose up

Option 2 — Run from source (Go 1.22+)

git clone https://github.com/shouvik12/trooper
cd trooper
export CLAUDE_API_KEY=sk-ant-...
go run main.go providers.go classifier.go

Trooper starts on http://127.0.0.1:3000. Binds to localhost by default — your API keys are not exposed on the network.

Usage

Point your existing client at Trooper — nothing else changes:

Python + Anthropic SDK:

import anthropic
client = anthropic.Anthropic(
    api_key="your-key",
    base_url="http://localhost:3000",  # only change
)

Python + OpenAI SDK:

from openai import OpenAI
client = OpenAI(
    api_key="your-key",
    base_url="http://localhost:3000",  # only change
)

curl:

curl http://localhost:3000/ \
  -H "Content-Type: application/json" \
  -H "X-Session-ID: my-session" \
  -d '{"model": "claude-3-5-haiku-20241022", "messages": [{"role": "user", "content": "Hello!"}]}'

Pass X-Session-ID to track named sessions. Without it, Trooper assigns a unique auto session per request.

Provider chain

Trooper builds the chain from environment variables. Ollama is always last.

CLAUDE_API_KEY=sk-ant-...                          # Chain: Claude → Ollama
CLAUDE_API_KEY=sk-ant-...  GEMINI_API_KEY=AIza...  # Chain: Claude → Gemini → Ollama
CLAUDE_API_KEY=sk-ant-...  OPENAI_API_KEY=sk-...   # Chain: Claude → OpenAI → Ollama

Fallback behaviour

Status	Trooper action
`200 OK`	Pass through
`429 Rate Limited`	Retry with 2s backoff, then try next
`402 Payment Required`	Fall back immediately
`400 Credit Balance`	Detect credit error, fall back immediately
`401 Unauthorized`	Surface error — bad keys are never masked
`529 Overloaded`	Fall back immediately
Network error	Fall back immediately — 30s timeout per provider

Response headers

curl http://localhost:3000/ ... -v 2>&1 | grep X-Trooper

# Simple turn — cloud never contacted
X-Trooper-Provider: ollama
X-Trooper-Decision: ollama (simple turn) | cloud skipped
X-Trooper-Session-Saved: 14 tokens

# Cloud served normally
X-Trooper-Provider: claude
X-Trooper-Fallback-Count: 0
X-Trooper-Summary: claude (direct) ✓

# Quota hit — fell back, context preserved
X-Trooper-Provider: ollama
X-Trooper-Fallback-Count: 1
X-Trooper-Decision: ollama (fallback: credit_balance)
X-Trooper-Session-Saved: 14 tokens
X-Trooper-Summary: claude → ollama (credit_balance) | context ✓

Circuit breaker

If a provider fails 3 times within 60 seconds, Trooper skips it automatically — no wasted round trips. Resets after 60 seconds.

⚡ Skipping claude — circuit open (3 fails in last 60s)
🔄 Trying provider: ollama

Auto recovery

AUTO_RECOVERY=true go run main.go providers.go classifier.go

Health checks use a free GET /models endpoint — no inference requests, no cost. Trooper silently routes back to the primary provider when it recovers.

Per-request local routing

Add x_force_local: true to any request body to route that specific request to Ollama, regardless of complexity or provider availability.

Use for:

Privacy — keep sensitive requests off the cloud
Cost control — force local for expensive operations
Offline mode — bypass cloud entirely mid-session

The session context is preserved. Cloud routing resumes on the next request without the flag.

Example:

# Turn 1 & 2 — Claude handles it (cloud)
curl http://localhost:3000/v1/chat/completions \
  -H "X-Session-ID: dev-session" \
  -d '{"model": "claude-sonnet-4-5", "max_tokens": 1024,
       "messages": [{"role": "user", "content": "Help me design our auth layer"}]}'

# Turn 3 — sensitive detail, developer keeps it local
curl http://localhost:3000/v1/chat/completions \
  -H "X-Session-ID: dev-session" \
  -d '{"model": "claude-sonnet-4-5", "max_tokens": 1024,
       "x_force_local": true,
       "messages": [{"role": "user", "content": "Our payment vault uses..."}]}'

Trooper log on Turn 3:

🔒 Developer requested local-only (x_force_local) — skipping cloud
🔒 Local: ollama (force_local) | privacy mode | session saved: 28 tokens

Running tests

go test ./... -v

Covers: turn classifier, code detection, context compaction, token estimation. All tests must pass before any contribution is merged.

Configuration

Variable	Default	Description
`CLAUDE_API_KEY`	—	Anthropic API key
`CLAUDE_MODEL`	—	Default Claude model
`GEMINI_API_KEY`	—	Google Gemini API key
`GEMINI_MODEL`	`gemini-2.0-flash`	Default Gemini model
`OPENAI_API_KEY`	—	OpenAI API key
`OPENAI_MODEL`	`gpt-4o-mini`	Default OpenAI model
`OLLAMA_MODEL`	`qwen2.5:3b`	Local fallback model
`FALLBACK_URL`	`http://localhost:11434/api/chat`	Ollama endpoint
`CONTEXT_WINDOW`	`6144`	Token budget for context compaction
`QUOTA_STATUS_CODES`	`429,402,529,400`	HTTP codes that trigger fallback
`TROOPER_PORT`	`3000`	Port Trooper listens on
`TROOPER_BIND`	`127.0.0.1`	Bind address
`AUTO_RECOVERY`	`false`	Enable automatic recovery to primary provider
`OLLAMA_KEEP_ALIVE`	`5m`	Set `24h` in systemd to eliminate cold-start latency

Recommended local models

Model	Size	Notes
`qwen2.5:3b`	1.9GB	Default — fast, lightweight
`qwen2.5:7b`	4.7GB	Better quality, still fast
`llama3.1:8b`	4.9GB	Strong all-rounder
`mistral:7b`	4.1GB	Good reasoning

Roadmap

V3.1 — Released

✅ Smart routing — simple turns route to Ollama directly, cloud never contacted
✅ X-Trooper-Session-Saved header — cumulative tokens saved per session
✅ X-Trooper-Decision header — routing decision on every response
✅ Deterministic classifier — no LLM call to route, zero added latency

V3.0 — Released

✅ Circuit breaker — skip providers that fail 3x in 60s
✅ Zero-interruption log lines
✅ X-Trooper-Summary header

V2 / V2.2 — Released

✅ Cloud → Ollama fallback with session continuity
✅ Context compaction — Anchor + SITREP + Tail
✅ Streaming, health check, auto recovery, zero dependencies

Recognition

Featured in Agent Brief by agentcommunity.org — curated alongside Anthropic, Shopify MCP, and LangGraph updates (April 2026)
Featured on @github_unpacked — Instagram reel with 76 saves
Featured on PatentLLM — covered alongside Qwen3.6-27B RTX 3090 local inference story (May 2026)
Featured on dev.to — local AI tooling roundup (May 2026)
Cited by kylebrodeur as inspiration for "robust, transparent HTTP rate-limit fallback triggers"

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
README.md		README.md
classifier.go		classifier.go
docker-compose.yml		docker-compose.yml
go.mod		go.mod
main.go		main.go
providers.go		providers.go
sanity.sh		sanity.sh
trooper_test.go		trooper_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🪖 Trooper

What you see

What Trooper is

Who uses Trooper

Why not LiteLLM or Bifrost

Smart routing

How Trooper handles context

Quickstart

Prerequisites

Option 1 — Docker (no Go required)

Option 2 — Run from source (Go 1.22+)

Usage

Provider chain

Fallback behaviour

Response headers

Circuit breaker

Auto recovery

Per-request local routing

Running tests

Configuration

Recommended local models

Roadmap

Recognition

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🪖 Trooper

What you see

What Trooper is

Who uses Trooper

Why not LiteLLM or Bifrost

Smart routing

How Trooper handles context

Quickstart

Prerequisites

Option 1 — Docker (no Go required)

Option 2 — Run from source (Go 1.22+)

Usage

Provider chain

Fallback behaviour

Response headers

Circuit breaker

Auto recovery

Per-request local routing

Running tests

Configuration

Recommended local models

Roadmap

Recognition

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages