Skip to content

Commit 226b837

Browse files
Merge pull request #6 from rushikeshmore/v0.5.0-positioning
feat: v0.5.0 — navigation + risk positioning, 15→13 tools
2 parents 796c706 + 244bee0 commit 226b837

30 files changed

Lines changed: 1716 additions & 328 deletions

.github/dependabot.yml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
version: 2
2+
updates:
3+
- package-ecosystem: "npm"
4+
directory: "/"
5+
schedule:
6+
interval: "weekly"
7+
day: "monday"
8+
open-pull-requests-limit: 10
9+
labels:
10+
- "dependencies"
11+
commit-message:
12+
prefix: "deps"
13+
14+
- package-ecosystem: "github-actions"
15+
directory: "/"
16+
schedule:
17+
interval: "weekly"
18+
day: "monday"
19+
labels:
20+
- "ci"
21+
commit-message:
22+
prefix: "ci"

.github/workflows/ci.yml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
branches: [main]
8+
9+
jobs:
10+
test:
11+
runs-on: ubuntu-latest
12+
strategy:
13+
matrix:
14+
node-version: [20, 22]
15+
16+
steps:
17+
- uses: actions/checkout@v4
18+
19+
- name: Use Node.js ${{ matrix.node-version }}
20+
uses: actions/setup-node@v4
21+
with:
22+
node-version: ${{ matrix.node-version }}
23+
cache: 'npm'
24+
25+
- name: Install dependencies
26+
run: npm ci --legacy-peer-deps
27+
28+
- name: Type check
29+
run: npx tsc --noEmit
30+
31+
- name: Run tests
32+
run: npm test
33+
34+
- name: Security audit
35+
run: npm audit --audit-level=high --omit=dev
36+
continue-on-error: true

.github/workflows/codeql.yml

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
name: CodeQL
2+
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
branches: [main]
8+
schedule:
9+
- cron: '0 6 * * 1' # Weekly on Monday at 6am UTC
10+
11+
jobs:
12+
analyze:
13+
runs-on: ubuntu-latest
14+
permissions:
15+
security-events: write
16+
actions: read
17+
contents: read
18+
19+
steps:
20+
- uses: actions/checkout@v4
21+
22+
- name: Initialize CodeQL
23+
uses: github/codeql-action/init@v3
24+
with:
25+
languages: javascript-typescript
26+
27+
- name: Autobuild
28+
uses: github/codeql-action/autobuild@v3
29+
30+
- name: Perform CodeQL Analysis
31+
uses: github/codeql-action/analyze@v3

.github/workflows/scorecard.yml

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
name: OpenSSF Scorecard
2+
3+
on:
4+
push:
5+
branches: [main]
6+
schedule:
7+
- cron: '0 6 * * 1' # Weekly on Monday at 6am UTC
8+
9+
permissions: read-all
10+
11+
jobs:
12+
analysis:
13+
runs-on: ubuntu-latest
14+
permissions:
15+
security-events: write
16+
id-token: write
17+
contents: read
18+
actions: read
19+
20+
steps:
21+
- uses: actions/checkout@v4
22+
with:
23+
persist-credentials: false
24+
25+
- name: Run Scorecard
26+
uses: ossf/scorecard-action@v2.4.0
27+
with:
28+
results_file: results.sarif
29+
results_format: sarif
30+
publish_results: true
31+
32+
- name: Upload SARIF
33+
uses: github/codeql-action/upload-sarif@v3
34+
with:
35+
sarif_file: results.sarif

CLAUDE.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# CodeCortex
22

3-
Persistent, AI-powered codebase knowledge layer. Pre-digests codebases into structured knowledge and serves to AI agents via MCP.
3+
Codebase navigation and risk layer for AI agents. Pre-builds a map of architecture, dependencies, coupling, and risk areas so agents go straight to the right files.
44

55
## Stack
66
- TypeScript, ESM (`"type": "module"`)
77
- tree-sitter (native N-API) + 27 language grammar packages
88
- @modelcontextprotocol/sdk - MCP server (stdio transport)
9-
- commander - CLI (init, serve, update, status)
9+
- commander - CLI (init, serve, update, status, symbols, search, modules, hotspots, hook, upgrade)
1010
- simple-git - git integration + temporal analysis
1111
- zod - schema validation for LLM analysis results
1212
- yaml - cortex.yaml manifest
@@ -49,11 +49,12 @@ Hybrid extraction:
4949
- `codecortex hook install|uninstall|status` - manage git hooks for auto-update
5050
- `codecortex upgrade` - check for and install latest version
5151

52-
## MCP Tools (15)
52+
## MCP Tools (13)
5353
Read (10): get_project_overview, get_module_context, get_session_briefing, search_knowledge, get_decision_history, get_dependency_graph, lookup_symbol, get_change_coupling, get_hotspots, get_edit_briefing
54-
Write (5): analyze_module, save_module_analysis, record_decision, update_patterns, report_feedback
54+
Write (3): record_decision, update_patterns, record_observation
5555

5656
All read tools include `_freshness` metadata (status, lastAnalyzed, filesChangedSince, changedFiles, message).
57+
All read tools return context-safe responses (<10K chars) via truncation utilities in `src/utils/truncate.ts`.
5758

5859
## Pre-Publish Checklist
5960
Run ALL of these before `npm publish`. Do not skip any step.
@@ -71,7 +72,7 @@ Run ALL of these before `npm publish`. Do not skip any step.
7172
- **Grammar smoke test** (`parser.test.ts`): Loads every language in `LANGUAGE_LOADERS` via `parseSource()`. Catches missing packages, broken native builds, wrong require paths. This is what would have caught the tree-sitter-liquid issue.
7273
- **Version-check tests**: Update notification, cache lifecycle, PM detection, upgrade commands.
7374
- **Hook tests**: Git hook install/uninstall/status integration tests.
74-
- **MCP tests**: All 15 tools (read + write), simulation tests.
75+
- **MCP tests**: All 13 tools (read + write), simulation tests.
7576

7677
### Known limitations
7778
- tree-sitter native bindings don't compile on Node 24 yet (upstream issue)
@@ -90,11 +91,11 @@ Run ALL of these before `npm publish`. Do not skip any step.
9091
src/
9192
cli/ - commander CLI (init, serve, update, status)
9293
mcp/ - MCP server + tools
93-
core/ - knowledge store (graph, modules, decisions, sessions, patterns, constitution, search)
94+
core/ - knowledge store (graph, modules, decisions, sessions, patterns, constitution, search, agent-instructions, freshness)
9495
extraction/ - tree-sitter native N-API (parser, symbols, imports, calls)
9596
git/ - git diff, history, temporal analysis
9697
types/ - TypeScript types + Zod schemas
97-
utils/ - file I/O, YAML, markdown helpers
98+
utils/ - file I/O, YAML, markdown helpers, truncation
9899
```
99100

100101
## Temporal Analysis

README.md

Lines changed: 80 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
11
# CodeCortex
22

3-
Persistent codebase knowledge layer for AI agents. Your AI shouldn't re-learn your codebase every session.
3+
Codebase navigation and risk layer for AI agents. Pre-builds a map of architecture, dependencies, coupling, and risk areas so agents go straight to the right files.
44

5-
> **⚠️ If you're on v0.4.3 or earlier, update now:** `npm install -g codecortex-ai@latest`
6-
> v0.4.4 adds freshness flags on all MCP responses and `get_edit_briefing` — a pre-edit risk briefing tool.
5+
[![CI](https://github.com/rushikeshmore/CodeCortex/actions/workflows/ci.yml/badge.svg)](https://github.com/rushikeshmore/CodeCortex/actions/workflows/ci.yml)
6+
[![npm version](https://img.shields.io/npm/v/codecortex-ai)](https://www.npmjs.com/package/codecortex-ai)
7+
[![npm downloads](https://img.shields.io/npm/dw/codecortex-ai)](https://www.npmjs.com/package/codecortex-ai)
8+
[![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/rushikeshmore/CodeCortex/badge)](https://scorecard.dev/viewer/?uri=github.com/rushikeshmore/CodeCortex)
9+
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/rushikeshmore/CodeCortex/blob/main/LICENSE)
710

811
[Website](https://codecortex-ai.vercel.app) · [npm](https://www.npmjs.com/package/codecortex-ai) · [GitHub](https://github.com/rushikeshmore/CodeCortex)
912

@@ -13,18 +16,40 @@ Persistent codebase knowledge layer for AI agents. Your AI shouldn't re-learn yo
1316

1417
## The Problem
1518

16-
Every AI coding session starts from scratch. When context compacts or a new session begins, the AI re-scans the entire codebase. Same files, same tokens, same wasted time. It's like hiring a new developer every session who has to re-learn everything before writing a single line.
19+
Every AI coding session starts with exploration — grepping, reading wrong files, re-discovering architecture. On a 6,000-file codebase, an agent makes 37 tool calls and burns 79K tokens just to understand what's where. And it still can't tell you which files are dangerous to edit or which files secretly depend on each other.
1720

1821
**The data backs this up:**
1922
- AI agents increase defect risk by 30% on unfamiliar code ([CodeScene + Lund University, 2025](https://codescene.com/hubfs/whitepapers/AI-Ready-Code-How-Code-Health-Determines-AI-Performance.pdf))
2023
- Code churn grew 2.5x in the AI era ([GitClear, 211M lines analyzed](https://www.gitclear.com/coding_on_copilot_data_shows_ais_downward_pressure_on_code_quality))
21-
- Nobody combines structural + semantic + temporal + decision knowledge in one portable tool
2224

2325
## The Solution
2426

25-
CodeCortex pre-digests codebases into layered knowledge files and serves them to any AI agent via MCP. Instead of re-understanding your codebase every session, the AI starts with knowledge.
27+
CodeCortex gives agents a pre-built map: architecture, dependencies, risk areas, hidden coupling. The agent goes straight to the right files and starts working.
2628

27-
**Hybrid extraction:** tree-sitter native N-API for structure (symbols, imports, calls across 27 languages) + host LLM for semantics (what modules do, why they're built that way). Zero extra API keys.
29+
**CodeCortex finds WHERE to look. Your agent still reads the code.**
30+
31+
Tested on a real 6,400-file codebase (143K symbols, 96 modules):
32+
33+
| | Without CodeCortex | With CodeCortex |
34+
|--|:--:|:--:|
35+
| Tool calls | 37 | **15** (2.5x fewer) |
36+
| Total tokens | 79K | **43K** (~50% fewer) |
37+
| Answer quality | 23/25 | **23/25** (same) |
38+
| Hidden dependencies found | No | **Yes** |
39+
40+
### What makes it unique
41+
42+
Three capabilities no other tool provides:
43+
44+
1. **Temporal coupling** — Files that always change together but have zero imports between them. You can read every line and never discover this. Only git co-change analysis reveals it.
45+
46+
2. **Risk scores** — File X has been bug-fixed 7 times, has 6 hidden dependencies, and co-changes with 3 other files. Risk score: 35. You can't learn this from reading code.
47+
48+
3. **Cross-session memory** — Decisions, patterns, observations persist. The agent doesn't start from zero each session.
49+
50+
**Example from a real codebase:**
51+
- `schema.help.ts` and `schema.labels.ts` co-changed in 12/14 commits (86%) with **zero imports between them**
52+
- Without this knowledge, an AI editing one file would produce a bug 86% of the time
2853

2954
## Quick Start
3055

@@ -38,17 +63,32 @@ npm install -g codecortex-ai --legacy-peer-deps
3863
cd /path/to/your-project
3964
codecortex init
4065

41-
# Start MCP server (for AI agent access)
42-
codecortex serve
43-
4466
# Check knowledge freshness
4567
codecortex status
4668
```
4769

4870
### Connect to Claude Code
4971

50-
Add to your MCP config:
72+
**CLI (recommended):**
73+
```bash
74+
claude mcp add codecortex -- codecortex serve
75+
```
5176

77+
**Or add to MCP config manually:**
78+
```json
79+
{
80+
"mcpServers": {
81+
"codecortex": {
82+
"command": "codecortex",
83+
"args": ["serve"],
84+
"cwd": "/path/to/your-project"
85+
}
86+
}
87+
}
88+
```
89+
90+
### Connect to Cursor
91+
Add to `.cursor/mcp.json`:
5292
```json
5393
{
5494
"mcpServers": {
@@ -73,7 +113,8 @@ All knowledge lives in `.codecortex/` as flat files in your repo:
73113
graph.json # dependency graph (imports, calls, modules)
74114
symbols.json # full symbol index (functions, classes, types...)
75115
temporal.json # git coupling, hotspots, bug history
76-
modules/*.md # per-module deep analysis
116+
AGENT.md # tool usage guide for AI agents
117+
modules/*.md # per-module structural analysis
77118
decisions/*.md # architectural decision records
78119
sessions/*.md # session change logs
79120
patterns.md # coding patterns and conventions
@@ -85,47 +126,42 @@ All knowledge lives in `.codecortex/` as flat files in your repo:
85126
|-------|------|------|
86127
| 1. Structural | Modules, deps, symbols, entry points | `graph.json` + `symbols.json` |
87128
| 2. Semantic | What each module does, data flow, gotchas | `modules/*.md` |
88-
| 3. Temporal | Git behavioral fingerprint - coupling, hotspots, bug history | `temporal.json` |
129+
| 3. Temporal | Git behavioral fingerprint coupling, hotspots, bug history | `temporal.json` |
89130
| 4. Decisions | Why things are built this way | `decisions/*.md` |
90131
| 5. Patterns | How code is written here | `patterns.md` |
91132
| 6. Sessions | What changed between sessions | `sessions/*.md` |
92133

93-
### The Temporal Layer
94-
95-
This is the killer differentiator. The temporal layer tells agents *"if you touch file X, you MUST also touch file Y"* even when there's no import between them. This comes from git co-change analysis, not static code analysis.
134+
## MCP Tools (13)
96135

97-
Example from a real codebase:
98-
- `routes.ts` and `worker.ts` co-changed in 9/12 commits (75%) with **zero imports between them**
99-
- Without this knowledge, an AI editing one file would produce a bug 75% of the time
136+
### Navigation — "Where should I look?" (4 tools)
100137

101-
## MCP Tools (15)
138+
| Tool | Description |
139+
|------|-------------|
140+
| `get_project_overview` | Architecture, modules, risk map. Call this first. |
141+
| `search_knowledge` | Find where a function/class/type is DEFINED by name. Ranked results. |
142+
| `lookup_symbol` | Precise symbol lookup with kind and file path filters. |
143+
| `get_module_context` | Module files, deps, temporal signals. Zoom into a module. |
102144

103-
### Read Tools (10)
145+
### Risk — "What could go wrong?" (4 tools)
104146

105147
| Tool | Description |
106148
|------|-------------|
107-
| `get_project_overview` | Constitution + overview + graph summary |
108-
| `get_module_context` | Module doc by name, includes temporal signals |
109-
| `get_session_briefing` | Changes since last session |
110-
| `search_knowledge` | Keyword search across all knowledge |
111-
| `get_decision_history` | Decision records filtered by topic |
112-
| `get_dependency_graph` | Import/export graph, filterable |
113-
| `lookup_symbol` | Symbol by name/file/kind |
114-
| `get_change_coupling` | What files must I also edit if I touch X? |
115-
| `get_hotspots` | Files ranked by risk (churn x coupling) |
116-
| `get_edit_briefing` | **NEW** — Pre-edit risk briefing: co-change warnings, hidden deps, bug history, importers |
149+
| `get_edit_briefing` | Pre-edit risk: co-change warnings, hidden deps, bug history. **Always call before editing.** |
150+
| `get_hotspots` | Files ranked by risk (churn x coupling x bugs). |
151+
| `get_change_coupling` | Files that must change together. Hidden dependencies flagged. |
152+
| `get_dependency_graph` | Import/export graph filtered by module or file. |
117153

118-
All read tools include `_freshness` metadata indicating how up-to-date the knowledge is.
119-
120-
### Write Tools (5)
154+
### Memory — "Remember this" (5 tools)
121155

122156
| Tool | Description |
123157
|------|-------------|
124-
| `analyze_module` | Returns source files + structured prompt for LLM analysis |
125-
| `save_module_analysis` | Persists LLM analysis to `modules/*.md` |
126-
| `record_decision` | Saves architectural decision to `decisions/*.md` |
127-
| `update_patterns` | Merges coding pattern into `patterns.md` |
128-
| `report_feedback` | Agent reports incorrect knowledge for next analysis |
158+
| `get_session_briefing` | What changed since the last session. |
159+
| `get_decision_history` | Why things were built this way. |
160+
| `record_decision` | Save an architectural decision. |
161+
| `update_patterns` | Document coding conventions. |
162+
| `record_observation` | Record anything you learned about the codebase. |
163+
164+
All read tools include `_freshness` metadata and return context-safe responses (<10K chars) via size-adaptive caps.
129165

130166
## CLI Commands
131167

@@ -136,25 +172,19 @@ All read tools include `_freshness` metadata indicating how up-to-date the knowl
136172
| `codecortex update` | Re-extract changed files, update affected modules |
137173
| `codecortex status` | Show knowledge freshness, stale modules, symbol counts |
138174
| `codecortex symbols [query]` | Browse and filter the symbol index |
139-
| `codecortex search <query>` | Search across all CodeCortex knowledge files |
175+
| `codecortex search <query>` | Search across symbols, file paths, and docs |
140176
| `codecortex modules [name]` | List modules or deep-dive into a specific module |
141177
| `codecortex hotspots` | Show files ranked by risk: churn + coupling + bug history |
142178
| `codecortex hook install\|uninstall\|status` | Manage git hooks for auto-updating knowledge |
143179
| `codecortex upgrade` | Check for and install the latest version |
144180

145-
## Token Efficiency
181+
## How It Works
146182

147-
CodeCortex uses a three-tier memory model to minimize token usage:
148-
149-
```
150-
Session start (HOT only): ~4,300 tokens
151-
Working on a module (+WARM): ~5,000 tokens
152-
Need coding patterns (+COLD): ~5,900 tokens
183+
**Hybrid extraction:** tree-sitter native N-API for structure (symbols, imports, calls across 27 languages) + host LLM for semantics (what modules do, why they're built that way). Zero extra API keys.
153184

154-
vs. raw scan of entire codebase: ~37,800 tokens
155-
```
185+
**Git hooks** keep knowledge fresh — `codecortex update` runs automatically on every commit, re-extracting changed files and updating temporal analysis.
156186

157-
85-90% token reduction. 7-10x efficiency gain.
187+
**Size-adaptive responses** — CodeCortex classifies your project (micro → extra-large) and adjusts response caps accordingly. A 23-file project gets full detail. A 6,400-file project gets intelligent summaries. Every MCP tool response stays under 10K chars.
158188

159189
## Supported Languages (27)
160190

0 commit comments

Comments
 (0)