You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CLAUDE.md
+8-7Lines changed: 8 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
1
# CodeCortex
2
2
3
-
Persistent, AI-powered codebase knowledge layer. Pre-digests codebases into structured knowledge and serves to AI agents via MCP.
3
+
Codebase navigation and risk layer for AI agents. Pre-builds a map of architecture, dependencies, coupling, and risk areas so agents go straight to the right files.
4
4
5
5
## Stack
6
6
- TypeScript, ESM (`"type": "module"`)
7
7
- tree-sitter (native N-API) + 27 language grammar packages
8
8
-@modelcontextprotocol/sdk - MCP server (stdio transport)
All read tools include `_freshness` metadata (status, lastAnalyzed, filesChangedSince, changedFiles, message).
57
+
All read tools return context-safe responses (<10K chars) via truncation utilities in `src/utils/truncate.ts`.
57
58
58
59
## Pre-Publish Checklist
59
60
Run ALL of these before `npm publish`. Do not skip any step.
@@ -71,7 +72,7 @@ Run ALL of these before `npm publish`. Do not skip any step.
71
72
-**Grammar smoke test** (`parser.test.ts`): Loads every language in `LANGUAGE_LOADERS` via `parseSource()`. Catches missing packages, broken native builds, wrong require paths. This is what would have caught the tree-sitter-liquid issue.
Copy file name to clipboardExpand all lines: README.md
+80-50Lines changed: 80 additions & 50 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,12 @@
1
1
# CodeCortex
2
2
3
-
Persistent codebase knowledge layer for AI agents. Your AI shouldn't re-learn your codebase every session.
3
+
Codebase navigation and risk layer for AI agents. Pre-builds a map of architecture, dependencies, coupling, and risk areas so agents go straight to the right files.
4
4
5
-
> **⚠️ If you're on v0.4.3 or earlier, update now:**`npm install -g codecortex-ai@latest`
6
-
> v0.4.4 adds freshness flags on all MCP responses and `get_edit_briefing` — a pre-edit risk briefing tool.
@@ -13,18 +16,40 @@ Persistent codebase knowledge layer for AI agents. Your AI shouldn't re-learn yo
13
16
14
17
## The Problem
15
18
16
-
Every AI coding session starts from scratch. When context compacts or a new session begins, the AI re-scans the entire codebase. Same files, same tokens, same wasted time. It's like hiring a new developer every session who has to re-learn everything before writing a single line.
19
+
Every AI coding session starts with exploration — grepping, reading wrong files, re-discovering architecture. On a 6,000-file codebase, an agent makes 37 tool calls and burns 79K tokens just to understand what's where. And it still can't tell you which files are dangerous to edit or which files secretly depend on each other.
17
20
18
21
**The data backs this up:**
19
22
- AI agents increase defect risk by 30% on unfamiliar code ([CodeScene + Lund University, 2025](https://codescene.com/hubfs/whitepapers/AI-Ready-Code-How-Code-Health-Determines-AI-Performance.pdf))
20
23
- Code churn grew 2.5x in the AI era ([GitClear, 211M lines analyzed](https://www.gitclear.com/coding_on_copilot_data_shows_ais_downward_pressure_on_code_quality))
21
-
- Nobody combines structural + semantic + temporal + decision knowledge in one portable tool
22
24
23
25
## The Solution
24
26
25
-
CodeCortex pre-digests codebases into layered knowledge files and serves them to any AI agent via MCP. Instead of re-understanding your codebase every session, the AI starts with knowledge.
27
+
CodeCortex gives agents a pre-built map: architecture, dependencies, risk areas, hidden coupling. The agent goes straight to the right files and starts working.
26
28
27
-
**Hybrid extraction:** tree-sitter native N-API for structure (symbols, imports, calls across 27 languages) + host LLM for semantics (what modules do, why they're built that way). Zero extra API keys.
29
+
**CodeCortex finds WHERE to look. Your agent still reads the code.**
30
+
31
+
Tested on a real 6,400-file codebase (143K symbols, 96 modules):
32
+
33
+
|| Without CodeCortex | With CodeCortex |
34
+
|--|:--:|:--:|
35
+
| Tool calls | 37 |**15** (2.5x fewer) |
36
+
| Total tokens | 79K |**43K** (~50% fewer) |
37
+
| Answer quality | 23/25 |**23/25** (same) |
38
+
| Hidden dependencies found | No |**Yes**|
39
+
40
+
### What makes it unique
41
+
42
+
Three capabilities no other tool provides:
43
+
44
+
1.**Temporal coupling** — Files that always change together but have zero imports between them. You can read every line and never discover this. Only git co-change analysis reveals it.
45
+
46
+
2.**Risk scores** — File X has been bug-fixed 7 times, has 6 hidden dependencies, and co-changes with 3 other files. Risk score: 35. You can't learn this from reading code.
47
+
48
+
3.**Cross-session memory** — Decisions, patterns, observations persist. The agent doesn't start from zero each session.
49
+
50
+
**Example from a real codebase:**
51
+
-`schema.help.ts` and `schema.labels.ts` co-changed in 12/14 commits (86%) with **zero imports between them**
52
+
- Without this knowledge, an AI editing one file would produce a bug 86% of the time
| 4. Decisions | Why things are built this way |`decisions/*.md`|
90
131
| 5. Patterns | How code is written here |`patterns.md`|
91
132
| 6. Sessions | What changed between sessions |`sessions/*.md`|
92
133
93
-
### The Temporal Layer
94
-
95
-
This is the killer differentiator. The temporal layer tells agents *"if you touch file X, you MUST also touch file Y"* even when there's no import between them. This comes from git co-change analysis, not static code analysis.
134
+
## MCP Tools (13)
96
135
97
-
Example from a real codebase:
98
-
-`routes.ts` and `worker.ts` co-changed in 9/12 commits (75%) with **zero imports between them**
99
-
- Without this knowledge, an AI editing one file would produce a bug 75% of the time
136
+
### Navigation — "Where should I look?" (4 tools)
100
137
101
-
## MCP Tools (15)
138
+
| Tool | Description |
139
+
|------|-------------|
140
+
|`get_project_overview`| Architecture, modules, risk map. Call this first. |
141
+
|`search_knowledge`| Find where a function/class/type is DEFINED by name. Ranked results. |
142
+
|`lookup_symbol`| Precise symbol lookup with kind and file path filters. |
143
+
|`get_module_context`| Module files, deps, temporal signals. Zoom into a module. |
|`codecortex status`| Show knowledge freshness, stale modules, symbol counts |
138
174
|`codecortex symbols [query]`| Browse and filter the symbol index |
139
-
|`codecortex search <query>`| Search across all CodeCortex knowledge files|
175
+
|`codecortex search <query>`| Search across symbols, file paths, and docs|
140
176
|`codecortex modules [name]`| List modules or deep-dive into a specific module |
141
177
|`codecortex hotspots`| Show files ranked by risk: churn + coupling + bug history |
142
178
|`codecortex hook install\|uninstall\|status`| Manage git hooks for auto-updating knowledge |
143
179
|`codecortex upgrade`| Check for and install the latest version |
144
180
145
-
## Token Efficiency
181
+
## How It Works
146
182
147
-
CodeCortex uses a three-tier memory model to minimize token usage:
148
-
149
-
```
150
-
Session start (HOT only): ~4,300 tokens
151
-
Working on a module (+WARM): ~5,000 tokens
152
-
Need coding patterns (+COLD): ~5,900 tokens
183
+
**Hybrid extraction:** tree-sitter native N-API for structure (symbols, imports, calls across 27 languages) + host LLM for semantics (what modules do, why they're built that way). Zero extra API keys.
153
184
154
-
vs. raw scan of entire codebase: ~37,800 tokens
155
-
```
185
+
**Git hooks** keep knowledge fresh — `codecortex update` runs automatically on every commit, re-extracting changed files and updating temporal analysis.
156
186
157
-
85-90% token reduction. 7-10x efficiency gain.
187
+
**Size-adaptive responses** — CodeCortex classifies your project (micro → extra-large) and adjusts response caps accordingly. A 23-file project gets full detail. A 6,400-file project gets intelligent summaries. Every MCP tool response stays under 10K chars.
0 commit comments