See the detailed implementation plan: plans/2026-02-25-v1-implementation-plan.md
| Phase | Description | Sprints | Status |
|---|---|---|---|
| 0 | Architecture Foundation (ToolContext) | 0.1 | Done |
| 1 | Wire Existing Backends | 1.1–1.4 | Done |
| 2 | Complete File Tools | 2.1 | Done |
| 3 | LSP Client | 3.1–3.5 | Done |
| 4 | Tree-sitter AST Engine | 4.1–4.2 | Done |
| 5 | Polish & v1.0 | 5.1–5.3 | In progress |
See FEATURES.md for the full feature reference. Summary:
- 29 tools across 7 categories (file, workflow, symbol, semantic, memory, config/nav, GitHub)
- LSP client — transport, lifecycle, document symbols, references, definition, hover, rename + text sweep
- Tree-sitter AST — symbol extraction + docstrings for Rust, Python, TypeScript, Go, Java, Kotlin
- Semantic search — embedding pipeline with sqlite-vec
vec0KNN (auto-migrates from plain BLOB), incremental rebuilds, drift detection (concepts, backends) - Library search — navigate third-party deps via LSP-inferred discovery, scoped symbol nav + semantic search
- OutputBuffer —
@cmd_*/@file_*handles; large output stored, queried with Unix tools - run_command — cwd, acknowledge_risk, dangerous-cmd speed bump, smart summaries per command type
- read_file — smart buffering with per-type summarizers; source files require symbol tools or start/end lines
- Dual-audience output — 8 tools emit structured JSON for agents + readable preview for humans
- Progressive discoverability — overflow responses include
by_filebreakdown + narrowing hints;kindfilter - edit_file / remove_symbol — find-and-replace and symbol deletion with security gating
- Worktree write guard — advisory
worktree_hintfield prevents silent cross-worktree corruption - Symbol signatures — LSP
detailfield captured;signaturesynthesized for display - Project customization —
.codescout/system-prompt.mdinjects project-specific agent guidance - Onboarding — language-specific nav hints, system-prompt draft generation
- RecoverableError — non-fatal tool failures don't abort sibling parallel calls
- Dashboard —
codescout dashboardweb UI with tool stats and project health (concept page) - Companion Claude Code plugin —
code-explorer-routingfor tool routing guidance (live at mareurs/claude-plugins) - Usage monitor — per-tool call stats in
usage.db, surfaced via the dashboard - Semantic memories —
remember/recall/forgetactions with sqlite-vec vector search, auto-classification into buckets (code/system/preferences/unstructured), cross-embedding of markdown memories, preferences auto-injection during onboarding - Git blame via git2; persistent memory store (markdown topics + semantic memories)
- MCP over stdio and HTTP/SSE (rmcp); 1142 tests passing
- Debug logging —
--debugflag enables structured file logging with rotation (tracing-appender) - Multi-project workspaces —
workspace.tomlregistration, per-project memory/LSP/indexing, cross-project search guidance, workspace-aware onboarding - Library version tracking — per-library embedding DBs (
.codescout/embeddings/lib/), lockfile version comparison, staleness hints insemantic_search - LSP idle TTL eviction — per-language configurable timeouts (Kotlin 2h, others 30min), transparent shutdown and restart
- Additional tree-sitter grammars (currently: Rust, Python, TypeScript, Go, Java, Kotlin)
- Additional LSP server configurations
- Configurable LSP idle TTL via
project.toml - GitHub tools:
github_issue,github_prmethod parity withgithub_repo
Implemented features have been moved to FEATURES.md.
Leverage the MCP elicitation spec (Claude Code 2.1.76, March 2026) for interactive user input: stdin prompts and PostCompact hook integration.
Reference:
docs/TODO-mcp-elicitation.md
Implemented:
-
E-0: Elicitation plumbing ✅ — Added
elicitation/requestInputsupport toToolContext. Helper:ctx.elicit(message, schema) -> ElicitResult. Integrated intoServerHandlerso any tool can call it. -
E-3: Interactive sessions via elicitation ✅ —
run_command(interactive: true)drives a process with piped stdin/stdout/stderr. Each round: display accumulated output, elicit user input, feed to stdin. Settle detection: 150 ms silence window. Max 50 rounds guard. Note: Practical for slow-interaction CLIs (setup wizards, REPLs); unsuitable for high-frequency TUIs (ncurses, vim) due to MCP round-trip latency (~1–3 s). -
E-5:
PostCompacthook integration ✅ — Register for Claude Code'sPostCompacthook. On fire: invalidate stale LSP position caches (symbol positions shift when files change during compaction). Optionally re-inject fresh project status into next request's server instructions. -
E-6c: Auto-register Cargo dependencies ✅ — During
activate_project, scanCargo.lockand auto-register top N dependencies as libraries. Eliminates manualregister_librarycalls;find_symbol(scope="lib:...")immediately available (fixes BUG-022).
Removed (by design):
-
E-1: Tool disambiguation ❌ — Removed. Elicitation is server→human, not server→AI. Disambiguation (e.g., "which symbol match?") should be handled autonomously by the AI agent based on context and heuristics. The LLM can reason about the most likely match given conversation state.
-
E-2: Dangerous command confirmation ❌ — Removed. The two-round-trip
pending_ack/acknowledge_riskpattern works well for autonomous AI agents. Elicitation disrupts the agent's autonomy and should be reserved for interactive human input, not confirmation loops. -
E-4: Mutation confirmation ❌ — Removed. Same reasoning as E-1 and E-2: disambiguation and confirmation should be handled autonomously by the AI agent, not via server-to-human elicitation.
-
E-6a/E-6b: PreToolUse hook proposals ❌ — Research showed PreToolUse hooks cannot trigger elicitation (they only return allow/block). Proposed
suggest_alternativefield deferred pending deeper design of agent guidance patterns.
Make codescout usable by any MCP-capable agent — Copilot, Cursor, Cline, custom agents — with routing knowledge included so agents know when to reach for each tool.
Motivation: The server already speaks MCP over stdio. The gap is that agents other than Claude Code lack the curated routing guidance (the server_instructions.md prompt) that tells Claude how to choose between semantic_search, find_symbol, list_symbols, etc. Without this, agents default to over-using a single tool (usually semantic search).
Work streams:
-
HTTP/SSE transport (already planned) — lets non-CLI agents connect without spawning a subprocess.
-
Agent-neutral routing prompt — refactor
server_instructions.mdinto a well-structured decision tree that any agent can consume as a system prompt or tool description prefix. Avoid Claude-specific framing. -
code-explorer-routingplugin / extension — a thin adapter per agent platform:- Claude Code: existing plugin approach
- VS Code Copilot: Language Model Tools API (
vscode.lm.registerTool) - Cursor:
.cursorrules+ MCP config - Generic: OpenAPI spec + routing hints as tool descriptions
-
Tool description quality — every tool's
description()should embed just enough routing guidance to work even without a system prompt (one-sentence "prefer this over X when Y" hint). -
Benchmark routing quality — extend the live benchmark to test tool selection accuracy across agent backends, not just result quality.
Background filesystem watcher for near-realtime index updates. Depends on Incremental Index Rebuilding (Layer 2 of that design).
Motivation: Layers 0+1 of incremental indexing cover commit-oriented workflows well, but some users want the index to stay current as they edit — especially in long coding sessions where commits are infrequent.
Implementation sketch:
- Use the
notifycrate (cross-platform: inotify on Linux, FSEvents on macOS, ReadDirectoryChangesW on Windows) - Spawn a background
tokio::spawntask in the MCP server on startup - Debounce events with a 2s window to batch rapid saves
- Filter events through
.gitignore+ignored_pathsconfig - Call
diff_and_reindexwith a per-file candidate list (no git diff needed — watcher knows exactly which files changed) - Opt-in via
project.toml:[index] watch = true
Platform considerations:
- Linux:
inotifyhas a per-user watch limit (fs.inotify.max_user_watches), may need guidance for large repos - macOS:
FSEventsis directory-level, efficient for large trees - Windows:
ReadDirectoryChangesWworks but has buffer overflow edge cases on burst writes - The
notifycrate abstracts all of this, but platform-specific tuning docs may be needed
Maintain project glossaries and documentation that stay in sync with the codebase via content-hash change detection.
Motivation: LLM-generated documentation (onboarding summaries, architecture glossaries, API docs) goes stale the moment the underlying code changes. Manual upkeep is unsustainable. By tracking file content hashes, codescout can detect which documented files changed, compute targeted diffs, and trigger glossary/documentation updates — keeping project knowledge accurate without full re-indexing.
Core mechanism:
-
Hash tracking — Store a content hash (e.g. SHA-256) for every file that contributes to a glossary or documentation entry. Persist in
.codescout/doc-hashes.db(SQLite, same pattern asembeddings.db). -
Change detection — On a
check_docsorsync_docstool call (or automatically duringonboarding), compare stored hashes against current file content. Files with mismatched hashes are flagged as stale. -
Targeted diff — For each stale file, compute a diff (reusing
git/diffinfra or direct content comparison). Surface only the meaningful changes (skip whitespace-only, comment-only changes via configurable filters). -
Update trigger — Present the diffs to the LLM with the current glossary entry, prompting a targeted update rather than a full rewrite. Alternatively, for structured glossaries, apply rule-based updates (renamed symbol → rename in glossary).
Glossary features:
- Term extraction — Build a glossary from codebase symbols, domain concepts, and abbreviations (combining AST/LSP data with semantic search)
- Cross-reference — Link glossary terms to source locations (file:line), kept accurate via hash tracking
- Scope — Per-project glossary in
.codescout/glossary.mdor structured.codescout/glossary.json
Documentation management features:
- Doc registration —
register_doc(path, sources: [file globs])links a documentation file to the source files it describes - Staleness report —
check_docs()tool returns which docs are stale, what changed, and suggested update scope - Auto-update —
sync_docs(path)re-generates or patches a specific doc using the diffs as context
Storage schema (doc-hashes.db):
CREATE TABLE doc_sources (
doc_path TEXT NOT NULL, -- the documentation/glossary file
source_path TEXT NOT NULL, -- a source file it depends on
hash TEXT NOT NULL, -- SHA-256 of source content at last sync
synced_at TEXT NOT NULL, -- ISO 8601 timestamp
PRIMARY KEY (doc_path, source_path)
);Implementation sketch:
- New
src/tools/docs.rsmodule withregister_doc,check_docs,sync_docs,build_glossarytools src/docs/module for hash computation, staleness detection, diff generation- Integration with existing memory store — glossary terms can cross-reference memory topics
- Progressive disclosure:
check_docsin exploring mode shows only stale counts; focused mode shows full diffs
Example workflow:
- Onboarding creates
glossary.mdwith key terms andarchitecture.mdsummary register_doc("glossary.md", sources: ["src/**/*.rs"])tracks all Rust source hashes- Developer adds a new tool module — hash changes detected on next
check_docs() - LLM receives: "3 files changed since last sync" + targeted diffs → updates glossary with new tool's terms
Allow the agent to interact with long-running processes — REPLs, debuggers, and confirmation prompts — instead of waiting for them to exit.
Motivation: run_command currently blocks until the process exits. Commands like python3 -i, pdb, or npm install (with y/n prompts) hang until timeout. There is no way for the agent to send input to a running process.
Design: Three tools built on a SessionStore (analogous to OutputBuffer):
| Tool | Purpose |
|---|---|
run_command(interactive: true) |
Spawns with piped I/O, waits for initial output to settle, returns a @ses_<hex> session handle |
session_send(session_id, input) |
Writes a line to stdin, waits for settle window of silence, returns the output delta |
session_cancel(session_id) |
Kills the process and frees all resources |
Settle detection: After each write, poll the output buffer every 10ms. When 150ms passes with no new bytes, the response is considered complete. Configurable via settle_ms. No prompt-pattern knowledge needed.
Scope: REPLs, debuggers, confirmation flows. Full-screen TUI apps (vim, less) are explicitly out of scope — no PTY allocation.
Design doc: plans/2026-03-01-interactive-sessions-design.md
Implementation plan: plans/2026-03-01-interactive-sessions-plan.md
Automatically capture and surface contextual knowledge — code gotchas, deployment pitfalls, debugging insights — with a decay mechanism that lets transitory memories fade while persistent truths remain.
Motivation: Agents frequently rediscover the same gotchas ("this test is flaky
on CI", "don't forget to restart Ollama after config changes", "the LSP crashes if
you open >50 files"). Currently these are lost between sessions. The remember
action requires explicit invocation — most insights slip through. Auto-memories
capture them passively, but some gotchas are temporary (a bug gets fixed, a
workaround becomes unnecessary), so blind accumulation would pollute the context
with stale advice.
Auto-capture triggers:
- Agent hits an error and recovers → capture the recovery pattern
- Agent deviates from a preference with confirmation → capture the exception
- Agent discovers a non-obvious build/deploy step → capture as system gotcha
- User says "watch out for..." or "this is tricky" → capture as code gotcha
Decay mechanism — confidence scoring:
Each auto-memory gets a confidence score (0.0–1.0) and a last_verified timestamp:
ALTER TABLE memories ADD COLUMN confidence REAL NOT NULL DEFAULT 1.0;
ALTER TABLE memories ADD COLUMN last_verified TEXT;
ALTER TABLE memories ADD COLUMN auto_captured BOOLEAN DEFAULT 0;Decay rules:
- Time-based decay: Auto-captured memories lose confidence over time
(e.g., -0.1 per month since
last_verified). Manually created memories (remember) don't decay. - Verification prompts: During onboarding, if low-confidence memories exist (< 0.5), the system prompt includes: "These memories may be outdated — verify if they still apply: [list]". Agent confirmation resets confidence to 1.0.
- Contradiction detection: If an auto-memory says "X doesn't work" but the agent successfully does X, flag for review.
- Garbage collection: Memories below 0.1 confidence are auto-archived
(moved to a
memories_archivetable, not deleted — recoverable if needed).
Bucket extensions:
code_gotcha— tricky code behaviors, non-obvious API contracts, flaky testsdeploy_gotcha— deployment pitfalls, environment-specific issues- Both are sub-types of the existing buckets, tagged via a
sub_bucketcolumn
Integration with preferences:
- Preferences don't decay (they're intentional)
- Gotchas decay (they may be transitory)
- Both are auto-injected during onboarding, but gotchas show their confidence score so agents can judge reliability
Design doc: TBD
Priority: Low | Effort: Small
kotlin-lsp instances use per-process --system-path=/tmp/codescout-<PID>-kotlin-lsp
dirs for workspace isolation. These are tiny (~4KB) and PID-scoped, but accumulate
across sessions until OS reboot clears /tmp.
Options:
- Clean up in
LspManager::evict_idleorLspClient::shutdown— requires plumbing the system-path fromLspServerConfigthrough to the cleanup site - Periodic sweep:
glob /tmp/codescout-*-kotlin-lsp, skip dirs whose PID is alive - Do nothing —
/tmpis self-cleaning and dirs are negligible
Context: See docs/issues/2026-03-24-kotlin-lsp-concurrent-instances.md
Priority: Medium | Effort: Small/Medium
run_command already detects dangerous commands (sudo, rm -rf, force-push, etc.)
and routes them through the two-round-trip acknowledge_risk gate. What it does
not do today is record those calls anywhere durable — once the response is
returned, the command and its output are only visible in the live @cmd_*
buffer (LRU, 20 entries, session-scoped) and the optional --diagnostic log.
For projects where multiple agents share the same shell-enabled instance, this
is too thin: there is no after-the-fact way to answer "what sudo calls did the
agent run last week, on which project, with what output?".
Goal: persist every dangerous-command invocation (and the user's acknowledge_risk decision) to a per-project audit log that survives restarts and is straightforward to inspect.
Sketch:
- New table in
.codescout/usage.db(already SQLite; seesrc/usage/db.rs):dangerous_commands(rowid, ts_unix_ms, project, cwd, command, matched_pattern, acknowledged, exit_code, stdout_excerpt, stderr_excerpt, agent_id). - Write site: the existing dangerous-detection branch in
run_command— log on both rejection (first round-trip) and execution (second round-trip withacknowledge_risk: true), so denials are auditable too. - Read site: a new tool, e.g.
audit_log(action="list", since="…", project="…")or extendproject_statuswith a recent-dangerous-commands tail. - Excerpts capped (e.g. first/last 32 lines) so the table stays small.
- Optional: opt-out via
[security] audit_disabled = trueinproject.toml, but default ON — auditability beats convenience for this category.
Open questions:
- Where do shell calls inside sub-agents get attributed? Tag with the spawning session id if available.
- Should we hash the command instead of storing it verbatim when secrets
are likely (e.g.
--password=)? Maybe redact known secret-flag patterns before insert. - Surface a UI in the dashboard (
src/dashboard/) — list view + per-row drill-down to the buffered output if the cmd_id is still alive.
Context: see is_dangerous_command and the acknowledge_risk flow in
src/tools/run_command.rs; companion plugin's pre-tool-guard.sh already
funnels every Bash call through run_command, so the audit hook covers all
agent shell activity.
Three Claude Code skills living in .claude/skills/ within this repo. Contributors who open codescout in Claude Code get them automatically — no build step required. See plans/2026-02-26-contributor-skills-design.md for the full design.
| Skill | Purpose | Status |
|---|---|---|
project-management |
Navigate sprint status, roadmap, open PRs and issues | Planned |
debugging |
Systematic debugging workflow for the Rust codebase | Planned |
log-stat-analyzer |
Analyze usage.db for call pattern drift and latency regressions |
Ready |
Surface current sprint status from the roadmap, map recent commits to sprint items, and guide contributors through opening correctly-structured PRs. Uses run_command with git log, run_command with git diff, and the GitHub MCP tools alongside docs/ROADMAP.md and docs/plans/.
Systematic workflow from symptom to fix to verification — covering build failures, test failures, LSP timeouts, tree-sitter parse errors, and embedding pipeline issues. Guides contributors through hypothesis formation (semantic_search, find_symbol), targeted investigation (run_command("git log/blame"), search_pattern), and the cargo build / cargo test / cargo clippy verification loop.
Structured workflow for interpreting Tool Usage Monitor data: per-tool call counts, error rates, p50/p99 latency, overflow rates, and time-bucketed drift detection. Produces actionable summaries (e.g. "semantic_search error rate up 3× in last 24h"). Uses the dashboard (codescout dashboard).