Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ These are non-negotiable. Every PR, feature, and design decision must respect th
- **Never stage/commit `.planning/**`\*\* (or any other local workflow artifacts) unless the user explicitly asks in that message.
- **Never use `gsd-tools ... commit` wrappers** in this repo. Use plain `git add <exact files>` and `git commit -m "..."`.
- **Before every commit:** run `git status --short` and confirm staged files match intent; abort if any `.planning/**` is staged.

- **Avoid using `any` Type AT ALL COSTS.
## Evaluation Integrity (NON-NEGOTIABLE)

These rules prevent metric gaming, overfitting, and false quality claims. Violation of these rules means the feature CANNOT ship.
Expand Down
40 changes: 13 additions & 27 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,35 +2,16 @@

## [Unreleased]

## [Unshipped - Phase 09] - High-Signal Search + Decision Card

Cleaned up the edit decision card and sharpened search ranking for exact-name queries.

### Added

- **Definition-first ranking (SEARCH-01)**: For exact-name queries (PascalCase/camelCase), the file that *defines* a symbol now ranks above files that merely use it. Symbol-level dedup ensures multiple methods from the same class don't clog the top slots.
- **Smart snippets with scope headers (SEARCH-02)**: When `includeSnippets: true`, code chunks from symbol-aware analysis include a scope comment header (`// ClassName.methodName`) before the snippet, giving structural context without extra disk reads.
- **Clean decision card (PREF-01-04)**: The preflight response for `intent="edit"|"refactor"|"migrate"` is now a decision card: `ready`, `nextAction` (if not ready), `warnings`, `patterns` (do/avoid capped at 3), `bestExample` (top golden file), `impact` (caller coverage + top files), and `whatWouldHelp`. Internal fields like `evidenceLock`, `riskLevel`, `confidence` are no longer exposed.
- **Impact coverage gating (PREF-02)**: When result files have known callers (from import graph), the card shows caller coverage: "X/Y callers in results". Low coverage (< 40% with > 3 total callers) triggers an epistemic stress alert.
- **whatWouldHelp recommendations (PREF-03)**: When `ready=false`, concrete next steps appear: search more specifically, call `get_team_patterns`, search for uncovered callers, or check memories. Each is actionable in 1-2 sentences.

### Changed

- **Preflight shape**: `{ ready, reason?, ... }` → `{ ready, nextAction?, warnings?, patterns?, bestExample?, impact?, whatWouldHelp? }`. `reason` renamed to `nextAction` for clarity. No breaking changes to `ready` (stays top-level).

### Fixed

- Agents no longer parse unstable internal fields. Preflight output is stable by design.
- Snippets now include scope context, reducing ambiguity for symbol-heavy edits.

## [Unreleased]

### Added

- **Index versioning (Phase 06)**: Index artifacts are versioned via `index-meta.json`. Mixed-version indexes are never served; version mismatches or corruption trigger automatic rebuild.
- **Crash-safe rebuilds (Phase 06)**: Full rebuilds write to `.staging/` and swap atomically only on success. Failed rebuilds don't corrupt the active index.
- **Relationship sidecar (Phase 07)**: New `relationships.json` artifact containing file import graph, reverse imports, and symbol export index. Updated incrementally alongside the main index.
- **References confidence + hints (Phase 08)**: `get_symbol_references` now includes `confidence: "syntactic"` and `isComplete: boolean` to help agents assess result completeness. `search_codebase` results now include a structured `hints` object (capped callers/consumers/tests ranked by frequency) drawn from the relationships sidecar. `get_component_usage` removed from MCP surface (11→10 tools).
- **Definition-first ranking**: Exact-name searches now show the file that *defines* a symbol before files that use it. For example, searching `parseConfig` shows the function definition first, then callers.
- **Scope headers in code snippets**: When requesting snippets (`includeSnippets: true`), each code block now starts with a comment like `// UserService.login()` so agents know where the code lives without extra file reads.
- **Edit decision card**: When searching with `intent="edit"`, `intent="refactor"`, or `intent="migrate"`, results now include a decision card telling you whether there's enough evidence to proceed safely. The card shows: whether you're ready (`ready: true/false`), what to do next if not (`nextAction`), relevant team patterns to follow, a top example file, how many callers appear in results (`impact.coverage`), and what searches would help close gaps (`whatWouldHelp`).
- **Caller coverage tracking**: The decision card shows how many of a symbol's callers are in your search results. Low coverage (less than 40% when there are lots of callers) triggers an alert so you know to search more before editing.
- **Index versioning**: Index artifacts are versioned via `index-meta.json`. Mixed-version indexes are never served; version mismatches or corruption trigger automatic rebuild.
- **Crash-safe rebuilds**: Full rebuilds write to `.staging/` and swap atomically only on success. Failed rebuilds don't corrupt the active index.
- **Relationship sidecar**: New `relationships.json` artifact containing file import graph, reverse imports, and symbol export index. Updated incrementally alongside the main index.
- **References confidence + hints**: `get_symbol_references` now includes `confidence: "syntactic"` and `isComplete: boolean` to help agents assess result completeness. `search_codebase` results now include a structured `hints` object (capped callers/consumers/tests ranked by frequency) drawn from the relationships sidecar. **`get_component_usage` removed from MCP surface (11→10 tools).** If you previously used `get_component_usage`, use `get_symbol_references` for symbol usage evidence (usageCount, top snippets, callers/consumers).
- Tree-sitter-backed symbol extraction is now used by the Generic analyzer when available (with safe fallbacks).
- Expanded language/extension detection to improve indexing coverage (e.g. `.pyi`, `.php`, `.kt`/`.kts`, `.cc`/`.cxx`, `.cs`, `.swift`, `.scala`, `.toml`, `.xml`).
- New tool: `get_symbol_references` for concrete symbol usage evidence (usageCount + top snippets).
Expand All @@ -39,8 +20,13 @@ Cleaned up the edit decision card and sharpened search ranking for exact-name qu
- Second frozen eval fixture plus an in-repo controlled TypeScript codebase for fully-offline eval runs.
- Regression tests covering Tree-sitter Unicode slicing, parser cleanup/reset behavior, and large/generated file skipping.

### Changed

- **Preflight response shape**: Renamed `reason` to `nextAction` for clarity. Removed internal fields (`evidenceLock`, `riskLevel`, `confidence`) so the output is stable and doesn't change shape unexpectedly.

### Fixed

- Null-pointer crash in GenericAnalyzer when chunk content is undefined.
- Tree-sitter symbol extraction now treats node offsets as UTF-8 byte ranges and evicts cached parsers on failures/timeouts.

## [1.6.2] - 2026-02-17
Expand Down
27 changes: 20 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Here's what codebase-context does:

**Remembers across sessions** - Decisions, failures, workarounds that look wrong but exist for a reason - the battle scars that aren't in the comments. Recorded once, surfaced automatically so the agent doesn't "clean up" something you spent a week getting right. Conventional git commits (`refactor:`, `migrate:`, `fix:`) auto-extract into memory with zero effort. Stale memories decay and get flagged instead of blindly trusted.

**Checks before editing** - A preflight card with risk level, patterns to use and avoid, failure warnings, and a `readyToEdit` evidence check. Catches the "confidently wrong" problem: when code, team memories, and patterns contradict each other, it tells the agent to ask instead of guess. If evidence is thin or contradictory, it says so.
**Checks before editing** - Before editing something, you get a decision card showing whether there's enough evidence to proceed. If a symbol has four callers and only two appear in your search results, the card shows that coverage gap. If coverage is low, `whatWouldHelp` lists the specific searches to run before you touch anything. When code, team memories, and patterns contradict each other, it tells you to look deeper instead of guessing.

One tool call returns all of it. Local-first - your code never leaves your machine.

Expand Down Expand Up @@ -119,12 +119,21 @@ This is where it all comes together. One call returns:
- **Code results** with `file` (path + line range), `summary`, `score`
- **Type** per result: compact `componentType:layer` (e.g., `service:data`) — helps agents orient
- **Pattern signals** per result: `trend` (Rising/Declining — Stable is omitted) and `patternWarning` when using legacy code
- **Relationships** per result: `importedByCount` and `hasTests` (condensed) + **hints** (capped ranked callers, consumers, tests)
- **Relationships** per result: `importedByCount` and `hasTests` (condensed) + **hints** (capped ranked callers, consumers, tests) — so you see suggested next reads and know what you haven't looked at yet
- **Related memories**: up to 3 team decisions, gotchas, and failures matched to the query
- **Search quality**: `ok` or `low_confidence` with confidence score and `hint` when low
- **Preflight**: `ready` (boolean) with decision card when `intent="edit"|"refactor"|"migrate"`. Shows `nextAction` (if not ready), `warnings`, `patterns` (do/avoid), `bestExample`, `impact` (caller coverage), and `whatWouldHelp` (next steps). If search quality is low, `ready` is always `false`.

Snippets are opt-in (`includeSnippets: true`). Default output is lean — if the agent wants code, it calls `read_file`.
Snippets are optional (`includeSnippets: true`). When enabled, snippets that have symbol metadata (e.g. from the Generic analyzer's AST chunking or Angular component chunks) start with a scope header so you know where the code lives (e.g. `// AuthService.getToken()` or `// SpotifyApiService`). Example:

```ts
// AuthService.getToken()
getToken(): string {
return this.token;
}
```

Default output is lean — if the agent wants code, it calls `read_file`.

```json
{
Expand Down Expand Up @@ -189,7 +198,7 @@ Record a decision once. It surfaces automatically in search results and prefligh
| ------------------------------ | ------------------------------------------------------------------------------------------- |
| `search_codebase` | Hybrid search + decision card. Pass `intent="edit"` to get `ready`, `nextAction`, patterns, caller coverage, and `whatWouldHelp`. |
| `get_team_patterns` | Pattern frequencies, golden files, conflict detection |
| `get_symbol_references` | Find concrete references to a symbol (usageCount + top snippets + confidence + completeness) |
| `get_symbol_references` | Find concrete references to a symbol (usageCount + top snippets). `confidence: "syntactic"` = static/source-based only; no runtime or dynamic dispatch. |
| `remember` | Record a convention, decision, gotcha, or failure |
| `get_memory` | Query team memory with confidence decay scoring |
| `get_codebase_metadata` | Project structure, frameworks, dependencies |
Expand All @@ -200,7 +209,7 @@ Record a decision once. It surfaces automatically in search results and prefligh

## Evaluation Harness (`npm run eval`)

Reproducible evaluation with frozen fixtures so ranking/chunking changes are measured honestly and regressions get caught.
Reproducible evaluation with frozen fixtures so ranking/chunking changes are measured honestly and regressions get caught. **For contributors and CI:** run before releases or after changing search/ranking/chunking to guard against regressions.

- Two codebases: `npm run eval -- <codebaseA> <codebaseB>`
- Defaults: fixture A = `tests/fixtures/eval-angular-spotify.json`, fixture B = `tests/fixtures/eval-controlled.json`
Expand All @@ -214,11 +223,13 @@ npm run eval -- tests/fixtures/codebases/eval-controlled tests/fixtures/codebase
```

- Flags: `--help`, `--fixture-a`, `--fixture-b`, `--skip-reindex`, `--no-rerank`, `--no-redact`
- To save a report for later comparison, redirect stdout (e.g. `pnpm run eval -- <path-to-angular-spotify> --skip-reindex > internal-docs/tests/eval-runs/angular-spotify-YYYY-MM-DD.txt`).

## How the Search Works

The retrieval pipeline is designed around one goal: give the agent the right context, not just any file that matches.

- **Definition-first ranking** - for exact-name lookups (e.g. a symbol name), the file that *defines* the symbol ranks above files that only use it.
- **Intent classification** - knows whether "AuthService" is a name lookup or "how does auth work" is conceptual. Adjusts keyword/semantic weights accordingly.
- **Hybrid fusion (RRF)** - combines keyword and semantic search using Reciprocal Rank Fusion instead of brittle score averaging.
- **Query expansion** - conceptual queries automatically expand with domain-relevant terms (auth → login, token, session, guard).
Expand All @@ -229,13 +240,15 @@ The retrieval pipeline is designed around one goal: give the agent the right con
- **Version gating** - index artifacts are versioned; mismatches trigger automatic rebuild so mixed-version data is never served.
- **Auto-heal** - if the index corrupts, search triggers a full re-index automatically.

**Index reliability:** Rebuilds write to a staging directory and swap atomically only on success, so a failed rebuild never corrupts the active index. Version mismatches or corruption trigger an automatic full re-index (no user action required).

## Language Support

Over **30+ languages** are supported for indexing + retrieval: TypeScript/JavaScript, Python (incl `.pyi`), PHP, Ruby, Java, Kotlin (`.kt`/`.kts`), Go, Rust, C/C++ (incl `.cc`/`.cxx`), C#, Swift, Scala, Shell, plus common config/markup formats (JSON/YAML/TOML/XML, etc.).
**10 languages** have full symbol extraction (Tree-sitter): TypeScript, JavaScript, Python, Java, Kotlin, C, C++, C#, Go, Rust. **30+ languages** have indexing and retrieval coverage (keyword + semantic), including PHP, Ruby, Swift, Scala, Shell, and config/markup (JSON/YAML/TOML/XML, etc.).

Enrichment is framework-specific: right now only **Angular** has a dedicated analyzer for rich conventions/context (signals, standalone components, control flow, DI patterns).

For non-Angular projects, the **Generic** analyzer still provides broad coverage, and will use Tree-sitter symbol extraction when a grammar is available (otherwise it falls back to safe parsing).
For non-Angular projects, the **Generic** analyzer uses **AST-aligned chunking** when a Tree-sitter grammar is available: symbol-bounded chunks with **scope-aware prefixes** (e.g. `// ClassName.methodName`) so snippets show where code lives. Without a grammar it falls back to safe line-based chunking.

Structured filters available: `framework`, `language`, `componentType`, `layer` (presentation, business, data, state, core, shared).

Expand Down
8 changes: 4 additions & 4 deletions docs/capabilities.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ Technical reference for what `codebase-context` ships today. For the user-facing

## Tool Surface

10 MCP tools + 1 optional resource (`codebase://context`).
10 MCP tools + 1 optional resource (`codebase://context`). **Migration:** `get_component_usage` was removed; use `get_symbol_references` for symbol usage evidence.

### Core Tools

| Tool | Input | Output |
| ----------------------- | ----------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `search_codebase` | `query`, optional `intent`, `limit`, `filters`, `includeSnippets` | Ranked results (`file`, `summary`, `score`, `type`, `trend`, `patternWarning`, `relationships`, `hints`) + `searchQuality` + decision card (`ready`, `nextAction`, `patterns`, `bestExample`, `impact`, `whatWouldHelp`) when `intent="edit"`. Hints capped at 3 per category. |
| `get_team_patterns` | optional `category` | Pattern frequencies, trends, golden files, conflicts |
| `get_symbol_references` | `symbol`, optional `limit` | Concrete symbol usage evidence: `usageCount` + top usage snippets + `confidence` ("syntactic") + `isComplete` boolean |
| `get_symbol_references` | `symbol`, optional `limit` | Concrete symbol usage evidence: `usageCount` + top usage snippets + `confidence` + `isComplete`. `confidence: "syntactic"` means static/source-based only (no runtime or dynamic dispatch). Replaces the removed `get_component_usage`. |
| `remember` | `type`, `category`, `memory`, `reason` | Persists to `.codebase-context/memory.json` |
| `get_memory` | optional `category`, `type`, `query`, `limit` | Memories with confidence decay scoring |

Expand Down Expand Up @@ -121,12 +121,12 @@ Returned as `preflight` when search `intent` is `edit`, `refactor`, or `migrate`
## Analyzers

- **Angular**: signals, standalone components, control flow syntax, lifecycle hooks, DI patterns, component metadata
- **Generic**: 30+ languages TypeScript, JavaScript, Python, Java, Kotlin, C/C++, C#, Go, Rust, PHP, Ruby, Swift, Scala, Shell, config/markup formats
- **Generic**: 30+ have indexing/retrieval coverage including PHP, Ruby, Swift, Scala, Shell, config/markup., 10 languages have full symbol extraction (Tree-sitter: TypeScript, JavaScript, Python, Java, Kotlin, C, C++, C#, Go, Rust).

Notes:

- Language detection covers common extensions including `.pyi`, `.kt`/`.kts`, `.cc`/`.cxx`, and config formats like `.toml`/`.xml`.
- When Tree-sitter grammars are present, the Generic analyzer can derive symbol components from Tree-sitter extraction (with fallbacks).
- When Tree-sitter grammars are present, the Generic analyzer uses AST-aligned chunking and scope-aware prefixes for symbol-aware snippets (with fallbacks).

## Evaluation Harness

Expand Down
8 changes: 7 additions & 1 deletion src/analyzers/generic/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -498,6 +498,10 @@ export class GenericAnalyzer implements FrameworkAnalyzer {
const fileName = path.basename(chunk.filePath);
const { language, componentType, content } = chunk;

if (!content) {
return `${language} ${componentType || 'code'} in ${fileName}`;
}

// Try to extract meaningful information
const firstComment = this.extractFirstComment(content);
if (firstComment) {
Expand Down Expand Up @@ -526,7 +530,9 @@ export class GenericAnalyzer implements FrameworkAnalyzer {
return `${language} code in ${fileName}: ${firstLine ? firstLine.trim().slice(0, 60) + '...' : 'code definition'}`;
}

private extractFirstComment(content: string): string {
private extractFirstComment(content: string | null | undefined): string {
if (!content) return '';

// Try JSDoc style
const jsdocMatch = content.match(/\/\*\*\s*\n?\s*\*\s*(.+?)(?:\n|\*\/)/);
if (jsdocMatch) return jsdocMatch[1].trim();
Expand Down
Loading
Loading