Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@

### Added

- **Index versioning (Phase 06)**: Index artifacts are versioned via `index-meta.json`. Mixed-version indexes are never served; version mismatches or corruption trigger automatic rebuild.
- **Crash-safe rebuilds (Phase 06)**: Full rebuilds write to `.staging/` and swap atomically only on success. Failed rebuilds don't corrupt the active index.
- **Relationship sidecar (Phase 07)**: New `relationships.json` artifact containing file import graph, reverse imports, and symbol export index. Updated incrementally alongside the main index.
- Tree-sitter-backed symbol extraction is now used by the Generic analyzer when available (with safe fallbacks).
- Expanded language/extension detection to improve indexing coverage (e.g. `.pyi`, `.php`, `.kt`/`.kts`, `.cc`/`.cxx`, `.cs`, `.swift`, `.scala`, `.toml`, `.xml`).
- New tool: `get_symbol_references` for concrete symbol usage evidence (usageCount + top snippets).
Expand Down
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,7 @@ The retrieval pipeline is designed around one goal: give the agent the right con
- **Import centrality** - files that are imported more often rank higher.
- **Cross-encoder reranking** - a stage-2 reranker triggers only when top scores are ambiguous. CPU-only, bounded to top-K.
- **Incremental indexing** - only re-indexes files that changed since last run (SHA-256 manifest diffing).
- **Version gating** - index artifacts are versioned; mismatches trigger automatic rebuild so mixed-version data is never served.
- **Auto-heal** - if the index corrupts, search triggers a full re-index automatically.

## Language Support
Expand Down Expand Up @@ -239,7 +240,9 @@ Structured filters available: `framework`, `language`, `componentType`, `layer`
```
.codebase-context/
memory.json # Team knowledge (should be persisted in git)
index-meta.json # Index metadata and version (generated)
intelligence.json # Pattern analysis (generated)
relationships.json # File/symbol relationships (generated)
index.json # Keyword index (generated)
index/ # Vector database (generated)
```
Expand Down
3 changes: 3 additions & 0 deletions docs/capabilities.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,10 @@ Output: `{ ready: boolean, reason?: string }`

- Initial: full scan → chunking (50 lines, 0 overlap) → embedding → vector DB (LanceDB) + keyword index (Fuse.js)
- Incremental: SHA-256 manifest diffing, selective embed/delete, full intelligence regeneration
- Version gating: `index-meta.json` tracks format version; mismatches trigger automatic rebuild
- Crash-safe rebuilds: full rebuilds write to `.staging/` and swap atomically only on success
- Auto-heal: corrupted index triggers automatic full re-index on next search
- Relationships sidecar: `relationships.json` contains file import graph and symbol export index
- Storage: `.codebase-context/` directory (memory.json + generated files)

## Analyzers
Expand Down
1 change: 1 addition & 0 deletions src/constants/codebase-context.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,4 @@ export const KEYWORD_INDEX_FILENAME = 'index.json' as const;
export const INDEXING_STATS_FILENAME = 'indexing-stats.json' as const;
export const VECTOR_DB_DIRNAME = 'index' as const;
export const MANIFEST_FILENAME = 'manifest.json' as const;
export const RELATIONSHIPS_FILENAME = 'relationships.json' as const;
37 changes: 37 additions & 0 deletions src/core/index-meta.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import {
INDEX_META_VERSION,
INTELLIGENCE_FILENAME,
KEYWORD_INDEX_FILENAME,
RELATIONSHIPS_FILENAME,
VECTOR_DB_DIRNAME
} from '../constants/codebase-context.js';
import { IndexCorruptedError } from '../errors/index.js';
Expand All @@ -34,6 +35,12 @@ const IntelligenceFileSchema = z
})
.passthrough();

const RelationshipsFileSchema = z
.object({
header: ArtifactHeaderSchema
})
.passthrough();

export const IndexMetaSchema = z.object({
metaVersion: z.number().int().positive(),
formatVersion: z.number().int().nonnegative(),
Expand Down Expand Up @@ -221,4 +228,34 @@ export async function validateIndexArtifacts(rootDir: string, meta: IndexMeta):
throw asIndexCorrupted('Intelligence corrupted (rebuild required)', error);
}
}

// Optional relationships sidecar: validate if present, but do not require.
const relationshipsPath = path.join(contextDir, RELATIONSHIPS_FILENAME);
if (await pathExists(relationshipsPath)) {
try {
const raw = await fs.readFile(relationshipsPath, 'utf-8');
const json = JSON.parse(raw);
const parsed = RelationshipsFileSchema.safeParse(json);
if (!parsed.success) {
throw new IndexCorruptedError(
`Relationships schema mismatch (rebuild required): ${parsed.error.message}`
);
}

const { buildId, formatVersion } = parsed.data.header;
if (formatVersion !== meta.formatVersion) {
throw new IndexCorruptedError(
`Relationships formatVersion mismatch (rebuild required): meta=${meta.formatVersion}, relationships.json=${formatVersion}`
);
}
if (buildId !== meta.buildId) {
throw new IndexCorruptedError(
`Relationships buildId mismatch (rebuild required): meta=${meta.buildId}, relationships.json=${buildId}`
);
}
} catch (error) {
if (error instanceof IndexCorruptedError) throw error;
throw asIndexCorrupted('Relationships sidecar corrupted (rebuild required)', error);
}
}
}
54 changes: 53 additions & 1 deletion src/core/indexer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ import {
INTELLIGENCE_FILENAME,
KEYWORD_INDEX_FILENAME,
MANIFEST_FILENAME,
RELATIONSHIPS_FILENAME,
VECTOR_DB_DIRNAME
} from '../constants/codebase-context.js';

Expand Down Expand Up @@ -91,13 +92,15 @@ async function atomicSwapStagingToActive(
const activeVectorDir = path.join(contextDir, VECTOR_DB_DIRNAME);
const activeManifestPath = path.join(contextDir, MANIFEST_FILENAME);
const activeStatsPath = path.join(contextDir, INDEXING_STATS_FILENAME);
const activeRelationshipsPath = path.join(contextDir, RELATIONSHIPS_FILENAME);

const stagingMetaPath = path.join(stagingDir, INDEX_META_FILENAME);
const stagingIndexPath = path.join(stagingDir, KEYWORD_INDEX_FILENAME);
const stagingIntelligencePath = path.join(stagingDir, INTELLIGENCE_FILENAME);
const stagingVectorDir = path.join(stagingDir, VECTOR_DB_DIRNAME);
const stagingManifestPath = path.join(stagingDir, MANIFEST_FILENAME);
const stagingStatsPath = path.join(stagingDir, INDEXING_STATS_FILENAME);
const stagingRelationshipsPath = path.join(stagingDir, RELATIONSHIPS_FILENAME);

// Step 1: Create .previous directory and move current active there
await fs.mkdir(previousDir, { recursive: true });
Expand Down Expand Up @@ -134,6 +137,7 @@ async function atomicSwapStagingToActive(
await moveIfExists(activeIntelligencePath, path.join(previousDir, INTELLIGENCE_FILENAME));
await moveIfExists(activeManifestPath, path.join(previousDir, MANIFEST_FILENAME));
await moveIfExists(activeStatsPath, path.join(previousDir, INDEXING_STATS_FILENAME));
await moveIfExists(activeRelationshipsPath, path.join(previousDir, RELATIONSHIPS_FILENAME));
await moveDirIfExists(activeVectorDir, path.join(previousDir, VECTOR_DB_DIRNAME));

// Step 2: Move staging artifacts to active location
Expand All @@ -143,6 +147,7 @@ async function atomicSwapStagingToActive(
await moveIfExists(stagingIntelligencePath, activeIntelligencePath);
await moveIfExists(stagingManifestPath, activeManifestPath);
await moveIfExists(stagingStatsPath, activeStatsPath);
await moveIfExists(stagingRelationshipsPath, activeRelationshipsPath);
await moveDirIfExists(stagingVectorDir, activeVectorDir);

// Step 3: Clean up .previous and staging directories
Expand Down Expand Up @@ -171,6 +176,7 @@ async function atomicSwapStagingToActive(
await moveIfExists(path.join(previousDir, INTELLIGENCE_FILENAME), activeIntelligencePath);
await moveIfExists(path.join(previousDir, MANIFEST_FILENAME), activeManifestPath);
await moveIfExists(path.join(previousDir, INDEXING_STATS_FILENAME), activeStatsPath);
await moveIfExists(path.join(previousDir, RELATIONSHIPS_FILENAME), activeRelationshipsPath);
await moveDirIfExists(path.join(previousDir, VECTOR_DB_DIRNAME), activeVectorDir);
console.error('Rollback successful');
} catch (rollbackError) {
Expand Down Expand Up @@ -796,6 +802,51 @@ export class CodebaseIndexer {
};
await fs.writeFile(intelligencePath, JSON.stringify(intelligence, null, 2));

// Write relationships sidecar (versioned, for fast lookup)
const relationshipsPath = path.join(activeContextDir, RELATIONSHIPS_FILENAME);
const graphData = internalFileGraph.toJSON();

// Build reverse import map (importedBy)
const importedBy: Record<string, string[]> = {};
if (graphData.imports) {
for (const [file, deps] of Object.entries(graphData.imports)) {
for (const dep of deps as string[]) {
if (!importedBy[dep]) importedBy[dep] = [];
importedBy[dep].push(file);
}
}
}

// Build symbol export map (exportedBy)
const exportedBy: Record<string, string[]> = {};
if (graphData.exports) {
for (const [file, exps] of Object.entries(graphData.exports)) {
for (const exp of exps as Array<{ name: string; type: string }>) {
if (exp.name && exp.name !== 'default') {
if (!exportedBy[exp.name]) exportedBy[exp.name] = [];
if (!exportedBy[exp.name].includes(file)) {
exportedBy[exp.name].push(file);
}
}
}
}
}

const relationships = {
header: { buildId, formatVersion: INDEX_FORMAT_VERSION },
generatedAt,
graph: {
imports: graphData.imports || {},
importedBy,
exports: graphData.exports || {}
},
symbols: {
exportedBy
},
stats: graphData.stats || internalFileGraph.getStats()
};
await fs.writeFile(relationshipsPath, JSON.stringify(relationships, null, 2));

// Write manifest (both full and incremental)
// For full rebuild, write to staging; for incremental, write to active
const activeManifestPath = path.join(activeContextDir, MANIFEST_FILENAME);
Expand Down Expand Up @@ -831,7 +882,8 @@ export class CodebaseIndexer {
vectorDb: { path: VECTOR_DB_DIRNAME, provider: 'lancedb' },
intelligence: { path: INTELLIGENCE_FILENAME },
manifest: { path: MANIFEST_FILENAME },
indexingStats: { path: INDEXING_STATS_FILENAME }
indexingStats: { path: INDEXING_STATS_FILENAME },
relationships: { path: RELATIONSHIPS_FILENAME }
}
},
null,
Expand Down
43 changes: 35 additions & 8 deletions src/tools/detect-circular-dependencies.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
import type { Tool } from '@modelcontextprotocol/sdk/types.js';
import { promises as fs } from 'fs';
import path from 'path';
import type { ToolContext, ToolResponse } from './types.js';
import { InternalFileGraph } from '../utils/usage-tracker.js';
import { RELATIONSHIPS_FILENAME } from '../constants/codebase-context.js';

export const definition: Tool = {
name: 'detect_circular_dependencies',
Expand All @@ -27,11 +29,36 @@ export async function handle(
const { scope } = args as { scope?: string };

try {
const intelligencePath = ctx.paths.intelligence;
const content = await fs.readFile(intelligencePath, 'utf-8');
const intelligence = JSON.parse(content);
// Try relationships sidecar first (preferred), then intelligence
let graphDataSource: any = null;
let graphStats: any = null;

if (!intelligence.internalFileGraph) {
const relationshipsPath = path.join(
path.dirname(ctx.paths.intelligence),
RELATIONSHIPS_FILENAME
);
try {
const relationshipsContent = await fs.readFile(relationshipsPath, 'utf-8');
const relationships = JSON.parse(relationshipsContent);
if (relationships?.graph) {
graphDataSource = relationships.graph;
graphStats = relationships.stats;
}
} catch {
// Relationships sidecar not available, try intelligence
}

if (!graphDataSource) {
const intelligencePath = ctx.paths.intelligence;
const content = await fs.readFile(intelligencePath, 'utf-8');
const intelligence = JSON.parse(content);
if (intelligence.internalFileGraph) {
graphDataSource = intelligence.internalFileGraph;
graphStats = intelligence.internalFileGraph.stats;
}
}

if (!graphDataSource) {
return {
content: [
{
Expand All @@ -51,9 +78,9 @@ export async function handle(
}

// Reconstruct the graph from stored data
const graph = InternalFileGraph.fromJSON(intelligence.internalFileGraph, ctx.rootPath);
const graph = InternalFileGraph.fromJSON(graphDataSource, ctx.rootPath);
const cycles = graph.findCycles(scope);
const graphStats = intelligence.internalFileGraph.stats || graph.getStats();
const stats = graphStats || graph.getStats();

if (cycles.length === 0) {
return {
Expand All @@ -67,7 +94,7 @@ export async function handle(
? `No circular dependencies detected in scope: ${scope}`
: 'No circular dependencies detected in the codebase.',
scope,
graphStats
graphStats: stats
},
null,
2
Expand All @@ -92,7 +119,7 @@ export async function handle(
severity: c.length === 2 ? 'high' : c.length <= 3 ? 'medium' : 'low'
})),
count: cycles.length,
graphStats,
graphStats: stats,
advice:
'Shorter cycles (length 2-3) are typically more problematic. Consider breaking the cycle by extracting shared dependencies.'
},
Expand Down
49 changes: 38 additions & 11 deletions src/tools/search-codebase.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import { assessSearchQuality } from '../core/search-quality.js';
import { IndexCorruptedError } from '../errors/index.js';
import { readMemoriesFile, withConfidence } from '../memory/store.js';
import { InternalFileGraph } from '../utils/usage-tracker.js';
import { RELATIONSHIPS_FILENAME } from '../constants/codebase-context.js';

export const definition: Tool = {
name: 'search_codebase',
Expand Down Expand Up @@ -229,6 +230,30 @@ export async function handle(
/* graceful degradation — intelligence file may not exist yet */
}

// Load relationships sidecar (preferred over intelligence.internalFileGraph)
let relationships: any = null;
try {
const relationshipsPath = path.join(
path.dirname(ctx.paths.intelligence),
RELATIONSHIPS_FILENAME
);
const relationshipsContent = await fs.readFile(relationshipsPath, 'utf-8');
relationships = JSON.parse(relationshipsContent);
} catch {
/* graceful degradation — relationships sidecar may not exist yet */
}

// Helper to get imports graph from relationships sidecar (preferred) or intelligence
function getImportsGraph(): Record<string, string[]> | null {
if (relationships?.graph?.imports) {
return relationships.graph.imports as Record<string, string[]>;
}
if (intelligence?.internalFileGraph?.imports) {
return intelligence.internalFileGraph.imports as Record<string, string[]>;
}
return null;
}

function computeIndexConfidence(): 'fresh' | 'aging' | 'stale' {
let confidence: 'fresh' | 'aging' | 'stale' = 'stale';
if (intelligence?.generatedAt) {
Expand All @@ -246,8 +271,8 @@ export async function handle(
// Cheap impact breadth estimate from the import graph (used for risk assessment).
function computeImpactCandidates(resultPaths: string[]): string[] {
const impactCandidates: string[] = [];
if (!intelligence?.internalFileGraph?.imports) return impactCandidates;
const allImports = intelligence.internalFileGraph.imports as Record<string, string[]>;
const allImports = getImportsGraph();
if (!allImports) return impactCandidates;
for (const [file, deps] of Object.entries(allImports)) {
if (
deps.some((dep: string) => resultPaths.some((rp) => dep.endsWith(rp) || rp.endsWith(dep)))
Expand All @@ -260,10 +285,11 @@ export async function handle(
return impactCandidates;
}

// Build reverse import map from intelligence graph
// Build reverse import map from relationships sidecar (preferred) or intelligence graph
const reverseImports = new Map<string, string[]>();
if (intelligence?.internalFileGraph?.imports) {
for (const [file, deps] of Object.entries<string[]>(intelligence.internalFileGraph.imports)) {
const importsGraph = getImportsGraph();
if (importsGraph) {
for (const [file, deps] of Object.entries<string[]>(importsGraph)) {
for (const dep of deps) {
if (!reverseImports.has(dep)) reverseImports.set(dep, []);
reverseImports.get(dep)!.push(file);
Expand All @@ -285,8 +311,8 @@ export async function handle(

// imports: files this result depends on (forward lookup)
const imports: string[] = [];
if (intelligence?.internalFileGraph?.imports) {
for (const [file, deps] of Object.entries<string[]>(intelligence.internalFileGraph.imports)) {
if (importsGraph) {
for (const [file, deps] of Object.entries<string[]>(importsGraph)) {
if (file.endsWith(rPath) || rPath.endsWith(file)) {
imports.push(...deps);
}
Expand All @@ -296,8 +322,8 @@ export async function handle(
// testedIn: heuristic — same basename with .spec/.test extension
const testedIn: string[] = [];
const baseName = path.basename(rPath).replace(/\.[^.]+$/, '');
if (intelligence?.internalFileGraph?.imports) {
for (const file of Object.keys(intelligence.internalFileGraph.imports)) {
if (importsGraph) {
for (const file of Object.keys(importsGraph)) {
const fileBase = path.basename(file);
if (
(fileBase.includes('.spec.') || fileBase.includes('.test.')) &&
Expand Down Expand Up @@ -416,9 +442,10 @@ export async function handle(
// --- Risk level (based on circular deps + impact breadth) ---
let riskLevel: 'low' | 'medium' | 'high' = 'low';
let cycleCount = 0;
if (intelligence.internalFileGraph) {
const graphDataSource = relationships?.graph || intelligence?.internalFileGraph;
if (graphDataSource) {
try {
const graph = InternalFileGraph.fromJSON(intelligence.internalFileGraph, ctx.rootPath);
const graph = InternalFileGraph.fromJSON(graphDataSource, ctx.rootPath);
// Use directory prefixes as scope (not full file paths)
// findCycles(scope) filters files by startsWith, so a full path would only match itself
const scopes = new Set(
Expand Down
Loading
Loading