Local-first MCP toolkit for fast code search, dependency-aware module discovery, visual code atlas pages, and DeepWiki-style repository documentation.
English | 简体中文
Overview • MCP Tools • Code Module Atlas • DeepWiki • Benchmarks • Setup • Skills
codebase-mcp turns a local repository into a persistent MCP code intelligence service. It keeps tree-sitter indexed source data, symbols, references, dependencies, graph metadata, lexical indexes, and vector search data under the target repo's .codedb-mcp directory.
Warm MCP calls are designed to be millisecond-level inside a persistent server process. See Benchmark Snapshot and MCP Tool Benchmark Matrix for measured latency, peak memory, and rg comparisons.
| Area | What It Provides |
|---|---|
| Fast MCP tools | Indexed exact/regex search, BM25/symbol search, lazy vector search, outlines, definitions, callers, dependencies, fuzzy file lookup, query pipelines, and 100-call bundles. |
| Module discovery | Dependency-connected file components plus dependency-weighted label propagation, with terms and paths used as explainable labels and evidence. |
| Code Module Atlas | A packaged meet-blog-style 3D viewer with one star per source file, module/file lists, dependency edges, and file focus/details. |
| DeepWiki | Local repository documentation generated from MCP evidence and the active agent's reasoning, with business-module-first pages and cited source files. |
| Local deployment | Explicit .codedb-mcp/codedb-mcp.toml, project-local storage, bundled skills, and no hidden environment-variable behavior. |
The server keeps a tree-sitter indexed, project-local code database under .codedb-mcp and exposes tools for:
- fast exact/regex search, BM25/symbol search, and lazy vector search;
- symbol outlines and definition lookup;
- LSP-like callers anchored to a definition path and line;
- direct and reverse file dependencies, including transitive walks;
- fuzzy file lookup, path globbing, compact query pipelines, and 100-call bundles;
- graph summaries, lazy Louvain communities, module planning, atlas export, and DeepWiki evidence gathering.
The atlas page is generated by the skills/code-module-atlas skill. It calls the local MCP module-atlas export, converts the result into the bundled meet-blog-style 3D viewer dataset, and shows one star node per source file.
Module boundaries are computed from the dependency-connected file graph first. Inside each connected component, the Rust module planner uses dependency-weighted label propagation; paths and distinctive terms are used for names, evidence, and oversized-component splitting, not as the primary grouping rule. The page then provides a module list, a file list for the selected module, file-to-file dependency edges, and file focus/details.
node skills\code-module-atlas\scripts\build-module-atlas.mjs u3dclient
cd skills\code-module-atlas\assets\viewer
npm run dev -- --port 5174 --strictPortThe skills/deepwiki skill builds local DeepWiki-style documentation from MCP evidence and the active agent's reasoning. It starts from dependency-aware module candidates, then writes business-module-first pages with cited files, entry points, flows, dependencies, and risk notes. It does not require a separate model API.
The intended distribution model is setup-guide first: give an agent setup-for-agent.md, let it create .codedb-mcp, use the default HuggingFace cache when it already exists, fall back to a second-drive cache when it does not, and then ask the human whether this specific agent should register the MCP server. The codedb-mcp skill is for using the tools after setup, not for installing them.
Benchmark target: u3dclient.
Benchmarks were rerun on 2026-05-29 on Windows. warm timings run inside one loaded MCP process after warmup. one-shot timings launch a CLI child process and include startup/cache load. Peak memory is sampled as MB Working Set / Private Bytes.
Current index status with the Unity C# benchmark config:
- Indexed files: 19,035.
- Chunks: 31,949.
- Symbols: 277,213.
- Graph: 19,941 nodes and 166,132 edges.
- Vector search: Model2Vec
minishlab/potion-code-16Mfile embeddings are built lazily on first natural-language search and queried with flat cosine scan. - Storage:
u3dclient\.codedb-mcp. - Cache v20 sidecars: compact
index.bin, spilledbm25.postings, lazyword_index.bin/word_hits.bin, lazycallers.bin, lazydeps.bin, optional legacyembeddings.bin, and binary source fingerprints. - Peak memory below is sampled Working Set / Private Bytes for child processes. The cold rebuild row was measured with memory sampling enabled, so its wall time is not directly comparable to the faster no-sampling rebuild pass.
Index and cache baseline:
| Scenario | Time | Peak memory | Notes |
|---|---|---|---|
| Cache v20 cold rebuild | 30.258s wall | 255.8 / 249.6 MB | tree-sitter declaration parse, source-on-demand dependencies, spill-to-disk BM25, lazy embeddings, compact cache save |
| Cache-hit index open | 0.873s internal / 1.132s wall | 134.9 / 136.0 MB | process startup, source fingerprint validation, and cache load |
codedb_index cache-hit tool call |
1.556s wall | 141.5 / 140.4 MB | explicit tool call after cache is already valid |
The table is intentionally three columns so it fits GitHub README pages without horizontal scrolling. Memory values are MB Working Set / Private Bytes.
| Tool / Purpose | MCP benchmark | rg comparison |
|---|---|---|
codedb_indexBuild/rebuild local index |
cold 30.258s, 255.8 / 249.6 MB cache-hit tool 1.556s, 141.5 / 140.4 MB |
none |
codedb_statusHealth, counts, scan state |
one-shot 0.561s, 14.2 / 7.9 MB | none |
codedb_treeIndexed tree with language, lines, symbols |
warm 11.891ms one-shot 1.018s, 142.0 / 141.0 MB |
partial file list only |
codedb_outlineOne-file symbol outline |
warm 0.074ms one-shot 1.279s, 140.2 / 140.3 MB |
none |
codedb_symbolSymbol definition lookup |
warm 2.106ms one-shot 1.034s, 140.7 / 140.0 MB |
regex approximates text only |
codedb_searchHybrid search, regex, batch queries |
warm scoped regex 7.120ms one-shot 1.097s, 142.3 / 140.5 MB |
scoped rg 0.047s, MCP 6.6x faster warmbroad grep is 1.5-1.8x slower |
codedb_wordExact identifier inverted index |
warm first lazy load 94.403ms one-shot 1.033s, 167.3 / 172.6 MB |
partial word grep only |
codedb_callersDefinition-anchored references |
warm 3.422ms one-shot 1.309s, 168.5 / 173.0 MB |
no semantic anchor |
codedb_hotRecently modified indexed files |
warm 7.069ms one-shot 1.454s, 141.4 / 140.5 MB |
none |
codedb_depsDirect/reverse/transitive file deps |
warm 0.098ms one-shot 0.528s, 29.5 / 23.0 MB |
none |
codedb_readIndexed file or line-range read |
warm 0.757ms one-shot 1.307s, 141.7 / 140.1 MB |
partial file print only |
codedb_editRead-only compatibility stub |
one-shot 0.128s, 4.8 / 1.2 MB | none |
codedb_changesFiles changed since sequence |
warm 10.818ms one-shot 0.871s, 144.7 / 145.8 MB |
none |
codedb_snapshotJSON snapshot of files/symbols/deps |
one-shot 2.421s, 634.0 / 715.8 MB | none |
codedb_bundleUp to 100 tools in one request |
warm 100 fast ops 57.725ms one-shot 20 searches 1.107s, 143.3 / 141.5 MB |
no MCP batching |
codedb_remoteRemote compatibility stub |
one-shot 0.136s, 5.4 / 1.3 MB | none |
codedb_projectsProjects loaded in server process |
one-shot 0.114s, 3.8 / 1.0 MB | none |
codedb_findFuzzy file/path lookup |
warm 18.019-20.230ms one-shot 0.406s, 14.1 / 7.8 MB |
no fuzzy ranking |
codedb_queryfind/search/filter/limit/outline pipeline |
warm 6.786-25.139ms one-shot 1.149s, 141.6 / 140.6 MB |
no equivalent single tool |
codedb_globGlob over indexed paths |
warm 4.231ms one-shot 0.956s, 140.7 / 140.1 MB |
rg --files -g 0.045sMCP 10.6x faster warm |
codedb_lsImmediate indexed directory children |
warm 4.027ms one-shot 0.940s, 139.3 / 138.8 MB |
partial file list only |
codedb_graphGraph summary/export |
one-shot 1.988s, 389.4 / 396.8 MB | none |
codedb_explainExplain graph node and edges |
warm first graph explain 845.369ms one-shot 1.854s, 392.8 / 397.6 MB |
none |
codedb_pathShortest graph path |
warm after graph load 13.073ms one-shot 1.790s, 392.6 / 397.2 MB |
none |
codedb_communitiesLazy Louvain communities |
warm 265.593ms one-shot 1.905s, 390.8 / 400.1 MB |
none |
codedb_module_mapDeepWiki module planning |
warm 1.679s one-shot 2.236s, 214.4 / 215.3 MB |
none |
codedb_module_atlasModule/file atlas JSON export |
Rust export 8.548s, 319.8 / 323.5 MB full skill 10.870s wall, 371.8 / 369.9 MB sampled |
none |
codedb_analyzeGraph stats and suggested questions |
warm graph analysis 830.637ms one-shot 2.936s, 392.2 / 397.5 MB |
none |
codedb_exportGraph JSON/GraphML/Cypher export |
warm after graph load 10.313ms one-shot 1.963s, 390.0 / 397.0 MB |
none |
Java smoke benchmark on gameserver:
| Scenario | Files | Chunks | Symbols | Time | Peak memory |
|---|---|---|---|---|---|
| Cold build after config/model-path change | 6,940 | 55,057 | 245,238 | 10.477s | 656.0 / 664.4 MB |
| Reopen with unchanged files/config | 6,940 | 55,057 | 245,238 | 1.027s | 129.4 / 176.4 MB |
Multi-language smoke coverage includes C#, Java, Rust, Python, Lua, TypeScript, C, and C++ parser paths: 8 files, 8 chunks, 14 symbols, 0.219s.
Rust smoke check on this repository: 29 indexed files, 1,752 chunks, 1,901 symbols; codedb_outline, codedb_search, and codedb_deps all returned Rust results.
- Give the target agent
setup-for-agent.md. - The agent creates
<repo-root>\.codedb-mcpand<repo-root>\.codedb-mcp\models. - On Windows, the agent checks the default HuggingFace hub cache first. If
minishlab/potion-code-16Malready has a valid snapshot there, config points to that snapshot. If the hub cache exists but the model is missing, the agent downloads toC:\Users\<user>\.cache\huggingface\hub\codedb-mcp\models\potion-code-16M. If the default hub cache does not exist, it uses the second available drive, such asD:\codedb-mcp-cache\models\potion-code-16M. - The agent writes
<repo-root>\.codedb-mcp\codedb-mcp.tomlfrom the demo config, writes the model as an absolute path, and shows the human which languages are configured. - The human can edit
extensions,root_paths,include_paths,exclude_paths,skip_dirs, and the model path before first indexing. - The agent runs an index check.
- The agent asks whether this specific agent should register MCP. If yes, it uses its own MCP mechanism.
- Restart or reload the agent MCP session and check
/mcp.
The MCP command shape is:
<package-root>\skills\codedb-mcp\assets\codebase-mcp.exe --config <repo-root>\.codedb-mcp\codedb-mcp.toml mcp <repo-root>
This project intentionally keeps installation explicit: setup prepares local project files, while the agent/user chooses when and where to register MCP.
- Exposes local MCP tools for code search, outlines, symbols, typed callers, dependencies, file discovery, graph analysis, DeepWiki module planning, module atlas export, batching, and exports.
- Indexes configured source languages through one explicit config file:
<repo-root>/.codedb-mcp/codedb-mcp.toml. - Stores generated data inside the target repo under
.codedb-mcp. Delete that directory to remove local cache and generated wiki/index data. - Uses a unified tree-sitter parser layer, not Roslyn/JDT. C#, Java, Rust, Python, Lua, JavaScript, TypeScript/TSX, C, and C++ all emit the same
FileEntry/Symbolmodel. C#/Java typed callers and dependencies remain the strongest path because their namespace/package import rules are implemented on top of that shared AST output. - Uses Minish ecosystem pieces:
model2vec-rswith explicit-pathminishlab/potion-code-16M, file-level semantic units, BM25 lexical ranking, exact identifier indexes, and on-demand flat-cosine vectors for natural-language search. - Builds a graphify-style code graph, computes Louvain communities lazily for
codedb_communities, and exposes Rust-nativecodedb_module_map/codedb_module_atlasoutputs from a dependency-connected file graph with label propagation, dependency cohesion, cross-folder evidence, semantic-neighbor probes, key symbols, and c-TF-IDF-like labels. - Watches configured source extensions in MCP mode and rebuilds after a debounce.
- Explicit project-local config: all behavior comes from
.codedb-mcp/codedb-mcp.toml. There are no environment-variable switches for indexing behavior. - Project-local storage: cache payloads, manifests, Louvain caches, and DeepWiki output live under
.codedb-mcp. Deleting that directory removes all generated data for the repo. - Scanner: walks the repo with explicit extensions, max file size, project
.gitignorebehavior, scan roots, include paths, exclude globs, and skip dirs. Nested Git worktrees/submodules under the target root are scanned as normal source directories. Unity runtime scans can be limited toAssets,Packages, andLibrary/PackageCachewhile excluding**/Editor/**. - Unified language layer: extension dispatch selects a tree-sitter grammar for C#, Java, Rust, Python, Lua, JavaScript, TypeScript/TSX, C, or C++. The parser emits the same
FileEntry/Symbolmodel for every language and visits declarations without descending into large method bodies. - Code-aware references: C#/Java namespace/package imports, qualified names, aliases, static using, annotations, and attribute suffixes feed typed callers and dependency edges. Rust and the other non C#/Java languages currently provide indexed search, outlines, imports/includes/use declarations, Lua
require()imports, and graph nodes, but not Roslyn/JDT-level semantic binding. - Search indexes: builds chunk metadata, symbol-definition chunk hits, dependency references, and spill-to-disk BM25 lexical search during cold indexing. Exact identifier hits and Model2Vec file embeddings are generated lazily when callers or natural-language search actually need them.
- Memory-shaped cache: cache v20 follows the bounded-content-cache lesson from
justrach/codedb: full file bodies, chunk preview text, repeated chunk file paths, repeated language/kind strings, BM25 postings, word-index hits, caller results, embeddings, forward/reverse dependencies, graph objects, and Louvain results are no longer all resident by default. Tools read exact lines, postings, word hits, caller sidecars, embeddings, dependencies, or graph data on demand. - Graph layer: builds a graphify-style code graph lazily. Small repos keep file, namespace/package, symbol, dependency, and reference edges; large repos keep graph construction behind graph/community/module tools while symbol data stays in outline/search/callers indexes. Louvain communities and subcommunities are computed lazily on first request and cached under
.codedb-mcp. - Module atlas layer:
codedb_module_mapandcodedb_module_atlasrun in Rust. They first split files by dependency-connected components, then do dependency-weighted label propagation inside each component. Path and token terms are used for naming, evidence, and oversized-component splitting, not as the primary clustering basis.codedb_module_atlasexports Embedding Atlas-ready JSON. - MCP runtime: implemented with the Rust
rmcpSDK over stdio. Tools operate against a warm in-process index, and batch-capable tools pluscodedb_bundlereduce MCP round trips. - Setup guide and skills package:
setup-for-agent.mdowns installation guidance.skills/codedb-mcpis standalone for tool usage and includes the executable, config template, MCP reference, and tool guidance.skills/deepwikibuilds local DeepWiki-style docs from MCP evidence plus the active agent's reasoning.skills/code-module-atlascallscodedb_module_atlasand packages the local meet-blog-style module/file graph webpage.
Default config path:
<repo-root>/.codedb-mcp/codedb-mcp.toml
The repo includes a working example at .codedb-mcp/codedb-mcp.toml and a distributable template at skills/codedb-mcp/assets/codedb-mcp.toml.template.
Important defaults:
[scan]
extensions = ["cs", "java", "rs", "py", "pyw", "lua", "js", "jsx", "mjs", "cjs", "ts", "tsx", "c", "h", "cc", "cpp", "cxx", "hpp", "hh", "hxx"]
max_file_bytes = 50000000
respect_gitignore = true
root_paths = []
include_paths = ["Library/PackageCache"]
exclude_paths = []
[embedding]
model = "C:/Users/<user>/.cache/huggingface/hub/codedb-mcp/models/potion-code-16M"
[storage]
enabled = true
dir = ".codedb-mcp"There are no environment-variable toggles. Edit the config file explicitly. root_paths can limit scanning to source roots such as Assets, Packages, and Library/PackageCache; include_paths adds extra roots even when a parent is skipped; exclude_paths accepts globs such as **/Editor/** for Unity runtime-only scans. respect_gitignore=true reads project .gitignore files, but nested Git worktrees/submodules inside the target root are still indexed unless excluded by skip_dirs, exclude_paths, or file extension rules. The model path is explicit and absolute; on Windows the setup guide uses the default HuggingFace cache when present, otherwise it falls back to the second available drive.
Build:
cargo build --releaseRun MCP directly:
target\release\codebase-mcp.exe --config u3dclient\.codedb-mcp\codedb-mcp.toml mcp u3dclientQuick CLI checks:
target\release\codebase-mcp.exe --config u3dclient\.codedb-mcp\codedb-mcp.toml index u3dclient
target\release\codebase-mcp.exe --config u3dclient\.codedb-mcp\codedb-mcp.toml search "network listener manager" u3dclient -k 5
target\release\codebase-mcp.exe --config u3dclient\.codedb-mcp\codedb-mcp.toml --root u3dclient tool codedb_status "{}"MCP mode answers the protocol handshake before the initial index finishes, then builds the default project index in the background. Early tool calls may wait for that first build. It also watches indexed extensions by default; when a configured source file changes, the server debounces events, rebuilds the project index in the background, and swaps in the new index after it is ready. Use --no-watch for static benchmark runs.
codedb_search accepts queries:
{
"max_results": 3,
"queries": [
"PoolManager",
{
"query": "Joystick",
"path_glob": "Assets/Plugins/3rdPlugins/Joystick Pack/**"
},
{
"query": "NetworkListenerManager",
"regex": true,
"compact": true
}
]
}codedb_callers accepts targets:
{
"max_results": 10,
"targets": [
{
"name": "PoolManager",
"definition_path": "Assets/Scripts/HotFix/3rdExtend/Runtime/PoolManager/PoolManager.cs",
"definition_line": 26
},
{
"name": "Joystick",
"definition_path": "Assets/Plugins/3rdPlugins/Joystick Pack/Scripts/Runtime/Base/Joystick.cs",
"definition_line": 8
}
]
}codedb_communities uses lazy Louvain clustering:
target\release\codebase-mcp.exe --config u3dclient\.codedb-mcp\codedb-mcp.toml --root u3dclient tool codedb_communities "{`"community_limit`":10}"
target\release\codebase-mcp.exe --config u3dclient\.codedb-mcp\codedb-mcp.toml --root u3dclient tool codedb_communities "{`"community_id`":0,`"children`":true,`"community_limit`":20}"Overview calls return community IDs, labels, member counts, and cohesion. Add children=true or subcommunities=true with a community_id to split only that community's subgraph; child clusters are cached in .codedb-mcp/louvain-subcommunities.bin.
codedb_module_map is the preferred DeepWiki planning call. It uses the Rust dependency-connected module graph, then adds dependency cohesion, cross-folder roots, semantic-neighbor probes, entry points, key symbols, and c-TF-IDF-like labels:
target\release\codebase-mcp.exe --config u3dclient\.codedb-mcp\codedb-mcp.toml --root u3dclient tool codedb_module_map "{`"path_prefix`":`"Assets/Scripts`",`"limit`":40,`"min_files`":2,`"semantic_neighbors`":5}"The skills/ directory is intended to be copied as a standalone package.
setup-for-agent.md: installation guide for agents. It reuses the default HuggingFace cache when present, falls back to the second Windows drive when absent, and writes project-local config with an absolute model path.skills/codedb-mcp: includesassets/codebase-mcp.exe, a config template, MCP registration reference, and tool guidance. It does not own setup.skills/deepwiki: creates DeepWiki-style local documentation using localcodedb_*tools plus the active agent's reasoning. It emphasizes business module boundaries over folder-only or community-only grouping.skills/code-module-atlas: creates a local 3D module/file atlas webpage by callingcodedb_module_atlas, then adapting the bundled meet-blog-style viewer. Generated repo-specific JSON stays ignored.
- meet-blog.buyixiao.xyz inspired the Code Module Atlas visual style and viewer experience.
- justrach/codedb inspired the original MCP tool interface direction.
