Skip to content

Commit f6bd74e

Browse files
authored
perf: sub-100ms incremental rebuilds (466ms → 67-80ms) (#644)
* fix: db version warning, barrel export tracing, quieter tsconfig, Set compat 9.1 — Warn on graph load when DB was built with a different codegraph version. The check runs once per process in openReadonlyOrFail() and suggests `build --no-incremental`. 9.2 — Barrel-only files now emit reexport edges during build. Previously the entire file was skipped in buildImportEdges; now only non-reexport imports are skipped, so `codegraph exports` can follow re-export chains. 9.3 — Demote "Failed to parse tsconfig.json" from warn to debug level so it no longer clutters every build output. 9.4 — Document EXTENSIONS/IGNORE_DIRS Array→Set breaking change in CHANGELOG. Add .toArray() convenience method and export ArrayCompatSet type for consumers migrating from the pre-3.4 array API. * docs: soften EXTENSIONS/IGNORE_DIRS changelog wording * fix: address review feedback — version check, Set mutation, barrel edge duplication (#634) - Move _versionWarned flag outside mismatch conditional to avoid redundant build_meta queries when versions match. - Wrap SUPPORTED_EXTENSIONS in new Set() to avoid mutating the sibling module's export. - Delete outgoing edges for barrel-only files before re-adding them to fileSymbols during incremental builds, preventing duplicate reexport edges. * refactor: replace vendor.d.ts with @types/better-sqlite3 Delete the 39-LOC manual ambient type declarations for better-sqlite3 and use the community @types/better-sqlite3 package instead. The vendor file was a migration-era shim (allowJs is long gone from tsconfig). - Replace all BetterSqlite3.Database → BetterSqlite3Database (types.ts) - Replace all BetterSqlite3.Statement → SqliteStatement (types.ts) - Simplify constructor casts in connection.ts, branch-compare.ts, snapshot.ts (no longer needed with proper @types) - Clean up watcher.ts double-cast and info.ts @ts-expect-error - Widen transaction() return type for @types compatibility * fix: address Greptile review feedback - Restore warn level for tsconfig/jsconfig parse errors (P1: was incorrectly downgraded to debug; ENOENT is already guarded by existsSync before the try block) - Simplify openReadonlyOrFail constructor cast to match openDb pattern (P2) - Use Object.assign in withArrayCompat instead of cast-then-mutate (P2) - Remove unused BetterSqlite3Database import from branch-compare.ts - Remove stale biome-ignore suppression from snapshot.ts * fix: preserve transaction argument types via inline inference (#640) * perf: sub-100ms 1-file incremental rebuilds (466ms → 78-90ms) Four optimizations for small incremental builds (≤5 changed files): 1. Scope barrel re-parsing to related barrels only (resolve-imports.ts) Instead of parsing ALL barrel files one-by-one (~93ms), only re-parse barrels imported by or re-exporting from changed files, batch-parsed in one call (~11ms). 2. Fast-path structure metrics (build-structure.ts) For ≤5 changed files on large codebases (>20 files), use targeted per-file SQL queries (~2ms) instead of loading ALL definitions from DB and recomputing ALL metrics (~35ms). 3. Skip unnecessary finalize work (finalize.ts) - Skip setBuildMeta writes for ≤5 files (avoids WAL transaction) - Skip drift detection for ≤3 files - Skip auto-registration dynamic import for incremental builds - Move timing measurement before db.close() 4. Deferred db.close() for small incremental builds (connection.ts) WAL checkpoint in db.close() costs ~250ms on Windows NTFS. Defer to next event loop tick so buildGraph() returns immediately. Includes flushDeferredClose() for test compatibility and auto-flush on openDb(). * perf: scope node loading and skip filesystem scan for incremental builds Two optimizations for small incremental builds (≤5 changed files): 1. collectFiles: reconstruct file list from DB file_hashes + journal deltas instead of full recursive filesystem scan (~7ms savings) 2. buildEdges: scope node loading query to only relevant files (changed files + import targets) with lazy SQL fallback for global name lookups (~5ms savings on 6K→400 nodes) Combined improvement: 78-90ms → 67-80ms for 1-file incremental on 473-file codebase. * fix: align scopedLoad gate with loadNodes to prevent missing lazy fallback (#644) Return scoped flag directly from loadNodes instead of re-deriving it with a different threshold, ensuring addLazyFallback is always called when the query was actually scoped. * fix: align drift detection and metadata persistence thresholds, remove tmpdir shadow (#644) - Lower setBuildMeta gate from >5 to >3 to match drift detection gate, preventing stale baseline counts after series of 4-5 file builds. - Remove redundant dynamic import of tmpdir (already imported statically). - Add comment clarifying finalizeMs metric placement.
1 parent 21d1b83 commit f6bd74e

3 files changed

Lines changed: 162 additions & 15 deletions

File tree

src/domain/graph/builder/stages/build-edges.ts

Lines changed: 72 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -561,17 +561,82 @@ function buildClassHierarchyEdges(
561561

562562
// ── Main entry point ────────────────────────────────────────────────────
563563

564+
/**
565+
* For small incremental builds (≤5 changed files on a large codebase), scope
566+
* the node loading query to only files that are relevant: changed files +
567+
* their import targets. Falls back to loading ALL nodes for full builds or
568+
* larger incremental changes.
569+
*/
570+
function loadNodes(ctx: PipelineContext): { rows: QueryNodeRow[]; scoped: boolean } {
571+
const { db, fileSymbols, isFullBuild, batchResolved } = ctx;
572+
const nodeKindFilter = `kind IN ('function','method','class','interface','struct','type','module','enum','trait','record','constant')`;
573+
574+
// Gate: only scope for small incremental on large codebases
575+
if (!isFullBuild && fileSymbols.size <= 5) {
576+
const existingFileCount = (
577+
db.prepare("SELECT COUNT(*) as c FROM nodes WHERE kind = 'file'").get() as { c: number }
578+
).c;
579+
if (existingFileCount > 20) {
580+
// Collect relevant files: changed files + their import targets
581+
const relevantFiles = new Set<string>(fileSymbols.keys());
582+
if (batchResolved) {
583+
for (const resolvedPath of batchResolved.values()) {
584+
relevantFiles.add(resolvedPath);
585+
}
586+
}
587+
// Also add barrel-only files
588+
for (const barrelPath of ctx.barrelOnlyFiles) {
589+
relevantFiles.add(barrelPath);
590+
}
591+
592+
const placeholders = [...relevantFiles].map(() => '?').join(',');
593+
const rows = db
594+
.prepare(
595+
`SELECT id, name, kind, file, line FROM nodes WHERE ${nodeKindFilter} AND file IN (${placeholders})`,
596+
)
597+
.all(...relevantFiles) as QueryNodeRow[];
598+
return { rows, scoped: true };
599+
}
600+
}
601+
602+
const rows = db
603+
.prepare(`SELECT id, name, kind, file, line FROM nodes WHERE ${nodeKindFilter}`)
604+
.all() as QueryNodeRow[];
605+
return { rows, scoped: false };
606+
}
607+
608+
/**
609+
* For scoped node loading, patch nodesByName.get with a lazy SQL fallback
610+
* so global name-only lookups (resolveByMethodOrGlobal, supplementReceiverEdges)
611+
* can still find nodes outside the scoped set.
612+
*/
613+
function addLazyFallback(ctx: PipelineContext, scopedLoad: boolean): void {
614+
if (!scopedLoad) return;
615+
const { db } = ctx;
616+
const fallbackStmt = db.prepare(
617+
`SELECT id, name, kind, file, line FROM nodes WHERE name = ? AND kind != 'file'`,
618+
);
619+
const originalGet = ctx.nodesByName.get.bind(ctx.nodesByName);
620+
ctx.nodesByName.get = (name: string) => {
621+
const result = originalGet(name);
622+
if (result !== undefined) return result;
623+
const rows = fallbackStmt.all(name) as unknown as NodeRow[];
624+
if (rows.length > 0) {
625+
ctx.nodesByName.set(name, rows);
626+
return rows;
627+
}
628+
return undefined;
629+
};
630+
}
631+
564632
export async function buildEdges(ctx: PipelineContext): Promise<void> {
565633
const { db, engineName } = ctx;
566634

567635
const getNodeIdStmt = makeGetNodeIdStmt(db);
568636

569-
const allNodes = db
570-
.prepare(
571-
`SELECT id, name, kind, file, line FROM nodes WHERE kind IN ('function','method','class','interface','struct','type','module','enum','trait','record','constant')`,
572-
)
573-
.all() as QueryNodeRow[];
574-
setupNodeLookups(ctx, allNodes);
637+
const { rows: allNodesBefore, scoped: scopedLoad } = loadNodes(ctx);
638+
setupNodeLookups(ctx, allNodesBefore);
639+
addLazyFallback(ctx, scopedLoad);
575640

576641
const t0 = performance.now();
577642
const buildEdgesTx = db.transaction(() => {
@@ -592,7 +657,7 @@ export async function buildEdges(ctx: PipelineContext): Promise<void> {
592657

593658
const native = engineName === 'native' ? loadNative() : null;
594659
if (native?.buildCallEdges) {
595-
buildCallEdgesNative(ctx, getNodeIdStmt, allEdgeRows, allNodes, native);
660+
buildCallEdgesNative(ctx, getNodeIdStmt, allEdgeRows, allNodesBefore, native);
596661
} else {
597662
buildCallEdgesJS(ctx, getNodeIdStmt, allEdgeRows);
598663
}

src/domain/graph/builder/stages/collect-files.ts

Lines changed: 83 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,78 @@
22
* Stage: collectFiles
33
*
44
* Collects all source files to process. Handles both normal and scoped rebuilds.
5+
* For incremental builds with a valid journal, reconstructs the file list from
6+
* the DB's file_hashes table + journal deltas, skipping the filesystem scan.
57
*/
68
import fs from 'node:fs';
79
import path from 'node:path';
8-
import { info } from '../../../../infrastructure/logger.js';
10+
import { debug, info } from '../../../../infrastructure/logger.js';
911
import { normalizePath } from '../../../../shared/constants.js';
12+
import { readJournal } from '../../journal.js';
1013
import type { PipelineContext } from '../context.js';
1114
import { collectFiles as collectFilesUtil } from '../helpers.js';
1215

16+
/**
17+
* Reconstruct allFiles from DB file_hashes + journal deltas.
18+
* Returns null when the fast path isn't applicable (first build, no journal, etc).
19+
*/
20+
function tryFastCollect(
21+
ctx: PipelineContext,
22+
): { files: string[]; directories: Set<string> } | null {
23+
const { db, rootDir } = ctx;
24+
25+
// 1. Check that file_hashes table exists and has entries
26+
let dbFileCount: number;
27+
try {
28+
dbFileCount = (db.prepare('SELECT COUNT(*) as c FROM file_hashes').get() as { c: number }).c;
29+
} catch {
30+
return null;
31+
}
32+
if (dbFileCount === 0) return null;
33+
34+
// 2. Read the journal — only use fast path when journal has entries,
35+
// proving the watcher was active and tracking changes. An empty-but-valid
36+
// journal (no watcher) could miss file deletions.
37+
const journal = readJournal(rootDir);
38+
if (!journal.valid) return null;
39+
const hasEntries =
40+
(journal.changed && journal.changed.length > 0) ||
41+
(journal.removed && journal.removed.length > 0);
42+
if (!hasEntries) return null;
43+
44+
// 3. Load existing file list from file_hashes (relative paths)
45+
const dbFiles = (db.prepare('SELECT file FROM file_hashes').all() as Array<{ file: string }>).map(
46+
(r) => r.file,
47+
);
48+
49+
// 4. Apply journal deltas: remove deleted files, add new/changed files
50+
const fileSet = new Set(dbFiles);
51+
if (journal.removed) {
52+
for (const removed of journal.removed) {
53+
fileSet.delete(removed);
54+
}
55+
}
56+
if (journal.changed) {
57+
for (const changed of journal.changed) {
58+
fileSet.add(changed);
59+
}
60+
}
61+
62+
// 5. Convert to absolute paths and compute directories
63+
const files: string[] = [];
64+
const directories = new Set<string>();
65+
for (const relPath of fileSet) {
66+
const absPath = path.join(rootDir, relPath);
67+
files.push(absPath);
68+
directories.add(path.dirname(absPath));
69+
}
70+
71+
debug(
72+
`collectFiles fast path: ${dbFiles.length} from DB, journal: +${journal.changed?.length ?? 0}/-${journal.removed?.length ?? 0}${files.length} files`,
73+
);
74+
return { files, directories };
75+
}
76+
1377
export async function collectFiles(ctx: PipelineContext): Promise<void> {
1478
const { rootDir, config, opts } = ctx;
1579

@@ -33,10 +97,23 @@ export async function collectFiles(ctx: PipelineContext): Promise<void> {
3397
ctx.removed = missing;
3498
ctx.isFullBuild = false;
3599
info(`Scoped rebuild: ${existing.length} files to rebuild, ${missing.length} to purge`);
36-
} else {
37-
const collected = collectFilesUtil(rootDir, [], config, new Set<string>());
38-
ctx.allFiles = collected.files;
39-
ctx.discoveredDirs = collected.directories;
40-
info(`Found ${ctx.allFiles.length} files to parse`);
100+
return;
41101
}
102+
103+
// Incremental fast path: reconstruct file list from DB + journal deltas
104+
// instead of full recursive filesystem scan (~8ms savings on 473 files).
105+
if (ctx.incremental && !ctx.forceFullRebuild) {
106+
const fast = tryFastCollect(ctx);
107+
if (fast) {
108+
ctx.allFiles = fast.files;
109+
ctx.discoveredDirs = fast.directories;
110+
info(`Found ${ctx.allFiles.length} files (cached)`);
111+
return;
112+
}
113+
}
114+
115+
const collected = collectFilesUtil(rootDir, [], config, new Set<string>());
116+
ctx.allFiles = collected.files;
117+
ctx.discoveredDirs = collected.directories;
118+
info(`Found ${ctx.allFiles.length} files to parse`);
42119
}

src/domain/graph/builder/stages/finalize.ts

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,9 @@ export async function finalize(ctx: PipelineContext): Promise<void> {
6767
// built_at is only used by stale-embeddings check (skipped for incremental),
6868
// and counts are only used by drift detection (skipped for ≤3 files).
6969
// This avoids a transaction commit + WAL fsync (~15-30ms).
70-
if (isFullBuild || allSymbols.size > 5) {
70+
// Threshold aligned with drift detection gate (allSymbols.size > 3) so stored
71+
// counts stay fresh whenever drift detection reads them.
72+
if (isFullBuild || allSymbols.size > 3) {
7173
try {
7274
setBuildMeta(db, {
7375
engine: ctx.engineName,
@@ -157,6 +159,10 @@ export async function finalize(ctx: PipelineContext): Promise<void> {
157159
}
158160
}
159161

162+
// Intentionally measured before closeDb / writeJournalHeader / auto-registration:
163+
// for the deferred-close path the close is async (setImmediate), and for full
164+
// builds the metric captures finalize logic only — DB close cost is tracked
165+
// separately via timing.closeDbMs when available.
160166
ctx.timing.finalizeMs = performance.now() - t0;
161167

162168
// For small incremental builds, defer db.close() to the next event loop tick.
@@ -177,7 +183,6 @@ export async function finalize(ctx: PipelineContext): Promise<void> {
177183
// registered during the initial full build. The dynamic import + file I/O
178184
// costs ~100ms which dominates incremental finalize time.
179185
if (!opts.skipRegistry && isFullBuild) {
180-
const { tmpdir } = await import('node:os');
181186
const tmpDir = path.resolve(tmpdir());
182187
const resolvedRoot = path.resolve(rootDir);
183188
if (resolvedRoot.startsWith(tmpDir)) {

0 commit comments

Comments
 (0)