Skip to content

darshil3011/codedrift

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

 ██████╗ ██████╗ ██████╗ ███████╗██████╗ ██████╗ ██╗███████╗████████╗
██╔════╝██╔═══██╗██╔══██╗██╔════╝██╔══██╗██╔══██╗██║██╔════╝╚══██╔══╝
██║     ██║   ██║██║  ██║█████╗  ██║  ██║██████╔╝██║█████╗     ██║   
██║     ██║   ██║██║  ██║██╔══╝  ██║  ██║██╔══██╗██║██╔══╝     ██║   
╚██████╗╚██████╔╝██████╔╝███████╗██████╔╝██║  ██║██║██║        ██║   
 ╚═════╝ ╚═════╝ ╚═════╝ ╚══════╝╚═════╝ ╚═╝  ╚═╝╚═╝╚═╝        ╚═╝  

Your coding agent spends 90% of its tokens finding code, not writing it.
Reduce token usage by 50x with CodeDrift.

GitHub Stars  LinkedIn


Quick Setup  AST Token Reduction  Session-aware Reads  Cross-session Memory  PII Redaction  Analytics Dashboard


Token usage: without vs with CodeDrift

Numbers are typical for a mid-size Python codebase session. Run benchmark.py against your own sessions to measure exactly.

Every prompt triggers the same loop — grep, glob, read a file, realize it's wrong, read another, try again. A single question burns 60K tokens and 23 tool calls before the real work starts.

CodeDrift replaces that loop. It parses your codebase with tree-sitter, extracts every function, class, import, and call site, and stores them in a local index with full-text search. When your agent needs code, it queries the index — and gets back the exact definition, every caller, related tests, and git history. Within a session, it tracks what the agent has already seen — re-reads return only the lines that changed, not the entire file again.

This isn't compression. It's elimination. The agent never reads files it doesn't need, never greps through irrelevant matches, never re-reads what it already saw, and never pays full price for a file it edited — re-reads return only the unified diff against what the agent already has in context.

No LLM involved in indexing — tree-sitter is a deterministic AST parser, so the index is fast, free to build, and requires zero maintenance.


Quick setup

# 1. Install
pip install "git+https://github.com/darshil3011/codedrift[mcp]"

# 2. Index your project
cd /path/to/your/project
codedrift init

# 3. Register MCP server with Claude Code
claude mcp add --scope local codedrift -- codedrift mcp

# 4. Write tool-priority rules to CLAUDE.md
codedrift install-skill

# 5. Start a new Claude Code session — done

Add .codecodedrift/ to your .gitignore.


Keep the index fresh

Auto-update on every git commit (recommended):

codedrift install-hook

Or manually after changes:

codedrift update

MCP tools

Tool Replaces Description
codedrift_memory Starting from scratch Recall the files and symbols that were useful for a similar past task
codedrift_search Grep, Glob FTS5 search across symbol names, signatures, file paths, call sites
codedrift_resolve Read (full file) Source code + callers + importers + tests + git history for one symbol
codedrift_overview Reading multiple files Module map, entry points, test summary (~300 tokens)
codedrift_read Read Full file on first access; unified diff on re-reads

AST-based token reduction

CodeDrift uses tree-sitter to parse your codebase into a structured symbol index — functions, classes, imports, and call sites — rather than storing raw file text. When the agent queries for a symbol, it gets back only the relevant definition and its context, not an entire file.

This means the agent never pays for boilerplate it doesn't need. A 2,000-line module with one relevant function costs the same as a 10-line file — only the symbol travels over the wire.


Session-aware reads — zero re-read waste

codedrift_read tracks every file the agent reads during a session. The first access returns the full file; every subsequent access returns either a one-line "unchanged" notice or a unified diff of only the lines that changed. The design treats the LLM's context window as the cache — since the full file is already there from the first read, re-reads only need to transmit the delta.


Cross-session memory

CodeDrift can remember which files and symbols were useful for a given task and surface them again when a similar task comes up in a future session.

After finishing a session, record it:

codedrift memory record          # parses the latest Claude Code session log
codedrift memory record --outcome error  # mark it as a failed attempt

Before starting work on something similar, check for a past match:

codedrift memory recall "add authentication middleware"

If a past session scores above the similarity threshold (default 0.40), it returns the task description, the files that were read, and the symbols that were resolved — giving the agent a warm start instead of re-discovering context from scratch.

Use --verbose (or -v) to see all stored sessions ranked by similarity score, regardless of threshold — useful for tuning:

codedrift memory recall "add authentication middleware" --verbose
codedrift memory list            # show all stored sessions
codedrift memory clear           # wipe memory

Memory uses vector embeddings (all-MiniLM-L6-v2) stored locally in the project's SQLite index. It requires the optional memory extra:

pip install "codedrift[memory]"

Install all features at once: pip install "codedrift[all]"


PII redaction

CodeDrift can strip sensitive values from file content before it reaches the LLM. It uses openai/privacy-filter — a 1.5B parameter bidirectional token classifier that runs locally via ONNX. No data leaves your machine.

When enabled, codedrift_read passes each string literal's source line through the model. If PII is detected, only the string value is replaced in-place — the rest of the file is untouched.

Before / after example:

# Before
def send_data():
    token   = "ghp_aBcDeFgHiJkLmNoPqRsTuVwXyZ123456"
    email   = "john.smith@company.com"
    db_pass = "super$ecret99"
    requests.post("https://api.example.com", headers={"Authorization": token})

# After
def send_data():
    token   = "[REDACTED:SECRET]"
    email   = "[REDACTED:EMAIL]"
    db_pass = "[REDACTED:SECRET]"
    requests.post("https://api.example.com", headers={"Authorization": token})

.env files are handled separately — all values are redacted line-by-line without running the model, except keys you explicitly allow through.

Setup

pip install "codedrift[redact]"
codedrift redact enable

The model (~917 MB, ONNX q4) is downloaded from Hugging Face on first use and cached locally.

What gets redacted

Entity type Examples Redacted as
secret API keys, passwords, tokens, private keys [REDACTED:SECRET]
private_email john@company.com [REDACTED:EMAIL]
account_number Bank account numbers, card numbers [REDACTED:ACCOUNT_NUMBER]
private_person Full names [REDACTED:PERSON]
private_phone Phone numbers [REDACTED:PHONE]
private_url Personal or authenticated URLs [REDACTED:URL]
private_address Street addresses [REDACTED:ADDRESS]
private_date Dates of birth, personal dates [REDACTED:DATE]

By default only secret, private_email, and account_number are active. Enable others with codedrift redact watch <entity_type>.

Interpolated strings (f"...", JS template literals) are skipped — they contain variable references, not static values.

Configuration

Config is stored in .codecodedrift/redact.json:

{
  "enabled": true,
  "entity_types": ["secret", "private_email", "account_number"],
  "allow_patterns": ["test@example.com", "localhost"],
  "env_passthrough_keys": ["NODE_ENV", "PORT", "HOST", "DEBUG", "APP_ENV", "LOG_LEVEL"]
}
Field Type Description
enabled bool Master switch. false by default — no overhead until you opt in.
entity_types list of strings Which entity types to redact. Any subset of the 8 types above.
allow_patterns list of regex strings Values matching any pattern are never redacted, even if the model flags them. Useful for test fixtures and known-safe placeholders.
env_passthrough_keys list of strings .env keys whose values are passed through unchanged. Defaults cover common non-secret keys.

CLI

codedrift redact enable                   # turn on redaction for this project
codedrift redact disable                  # turn off
codedrift redact status                   # show current config
codedrift redact allow "test@example.com" # never redact this value
codedrift redact ignore private_person    # stop redacting names
codedrift redact watch private_person     # re-enable name redaction

Analytics dashboard

CodeDrift ships a built-in analytics dashboard that shows how your AI agent is using the tools, how many tokens are being saved, and whether your index is healthy.

Start the dashboard:

# Install dashboard dependencies (FastAPI + uvicorn, pre-built UI included)
pip install "codedrift[dashboard]"

# One command — auto-detects project root, opens browser automatically
codedrift dashboard
# → http://localhost:8421

What's tracked:

Section Data shown
Index Health File count, symbol count, top languages, last indexed time, freshness warning if index is > 24h old
Tool Usage Call count and tokens saved per tool, as a bar chart
Activity Timeline Daily tool calls over the last 30 days
Token Savings Cumulative savings area chart + per-tool breakdown
Symbol Heatmap Top 20 most-resolved symbols, coloured by kind (function / class / method)
Memory Hit Rate Donut chart of memory recall hits vs misses
Response Size Average output tokens per tool + 30-day trend sparklines
Init / Update History Table of every full and incremental index run

Every MCP tool call and every codedrift init/update run is persisted to the SQLite index automatically — no extra setup required beyond the install above.

For API-only access (CI / headless servers):

codedrift api             # no browser, no UI, just the JSON endpoints

Measure token savings

python benchmark.py                          # analyse most recent session
python benchmark.py --list                   # list all sessions
python benchmark.py --project /path/to/repo  # sessions for a specific project

Reads Claude Code session logs directly — no API key required.


CLI reference

codedrift init            # full index scan
codedrift update          # incremental re-index (changed files only)
codedrift search <query>  # FTS5 search from terminal
codedrift resolve <sym>   # full symbol context from terminal
codedrift overview        # project structural map
codedrift status          # index stats (files, symbols, languages)
codedrift dashboard       # start full analytics dashboard, opens browser
codedrift api             # start API-only server (no UI, for scripting/CI)
codedrift install-hook    # git post-commit hook for auto-update
codedrift install-skill   # append tool-priority rules to CLAUDE.md
codedrift mcp             # start MCP server (used by claude mcp add)
codedrift memory record   # store last session's context in memory
codedrift memory recall   # find closest past session for a query
codedrift memory list     # show all stored sessions
codedrift memory clear    # wipe session memory

Requirements

  • Python 3.10+
  • git on PATH
  • Claude Code CLI

Supported languages

Python, JavaScript, TypeScript, Go, Rust

About

Your coding agent spends 90% of its tokens finding code, not writing it. Reduce your token usage by 50x with CodeDrift !

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors