Skip to content

Commit f9f81f9

Browse files
authored
feat(cfg): bypass JS CFG visitor on native builds, fix Go for-range parity (#595)
* feat(cfg): bypass JS CFG visitor on native builds, fix Go for-range parity Add allCfgNative() fast path in buildCFGData that skips WASM parser init, tree parsing, and JS visitor when all definitions already have native CFG data from the Rust extractor — directly persists to SQLite. Fix Go for-range loop detection in Rust CFG builder: check for range_clause child node so for-range is treated as bounded (emitting branch_true + loop_exit) instead of infinite. Expand parity tests: add Go/Rust/Ruby to matrix, add complex pattern fixtures (try/catch/finally, switch, do-while, nested loops, labeled break) validating block counts, edge counts, and kinds match. * feat(cfg): bypass JS CFG visitor on native builds, fix Go for-range parity Add allCfgNative() fast path in buildCFGData that skips WASM parser init, tree parsing, and JS visitor when all definitions already have native CFG data from the Rust extractor — directly persists to SQLite. Fix Go for-range loop detection in Rust CFG builder: check for range_clause child node so for-range is treated as bounded (emitting branch_true + loop_exit) instead of infinite. Expand parity tests: add Go/Rust/Ruby to matrix, add complex pattern fixtures (try/catch/finally, switch, do-while, nested loops, labeled break) validating block counts, edge counts, and kinds match. * fix(cfg): resolve temporary value borrow error in has_child_of_kind Use a let binding for the tree cursor instead of borrowing a temporary, fixing E0716 compilation error on all platforms. * fix(cfg): remove unnecessary parentheses in allCfgNative check Biome formatter flagged the redundant grouping parens around the optional-chain expression. * fix(test): move hasGoRangeFix to beforeAll, use ctx.skip() for skipped tests - Move hasGoRangeFix computation into beforeAll where nativeResults is already populated, fixing the IIFE evaluation-order bug that always yielded false. - Replace bare return statements with ctx.skip() so Vitest reports skipped tests explicitly instead of as silent passes. - Fix inaccurate test description: complex-dowhile uses early return, not break. * fix(cfg): bind iterator result to extend cursor lifetime The iterator returned by node.children() borrows the cursor, but the cursor is dropped at the end of the statement when the result is returned directly. Binding the result to a local variable ensures the cursor outlives the iterator. * fix(cfg): align allCfgNative with initCfgParsers, clean stale CFG rows on fast path (#595) - Skip _tree files in allCfgNative to match initCfgParsers behavior, preventing false fast-path suppression in mixed-engine pipelines - Always call deleteCfgForNode in fast path before checking cfg content, ensuring stale rows are cleaned when function bodies are removed - Replace bare return guards with ctx.skip() in parity tests so Vitest reports them as skipped rather than silently passed Impact: 2 functions changed, 6 affected * fix(cfg): prevent fast path from wiping CFG in WASM-only builds (#595) When all files have _tree set (WASM-only builds), allCfgNative() vacuously returns true. The fast path then deletes all existing CFG rows without recomputing them, since _tree files have no native CFG. Guard the fast path with !symbols._tree so _tree files fall through to the slow path where getTreeAndLang uses the pre-parsed tree directly. Impact: 1 functions changed, 5 affected * refactor(cfg): simplify has_child_of_kind by returning directly (#595) Impact: 1 functions changed, 1 affected * docs(test): clarify hasGoRangeFix heuristic false-positive risk (#595) * fix(cfg): revert has_child_of_kind simplification (borrow checker) (#595) The intermediate `result` variable is required because `cursor` must outlive the iterator returned by `node.children()`. Returning directly causes a borrow-after-drop error since cursor is dropped before the temporary iterator's destructor runs. Impact: 1 functions changed, 1 affected * refactor(cfg): extract hasNativeCfgForFile shared predicate (#595) The native-CFG check was duplicated in allCfgNative, initCfgParsers, and getTreeAndLang. Extract into a single hasNativeCfgForFile helper so all three callers stay in sync automatically. Impact: 4 functions changed, 4 affected * fix(cfg): address review feedback — vacuous allCfgNative, stale CFG cleanup, doc comment (#595) - allCfgNative now returns false (not vacuous true) when fileSymbols is empty, all-_tree, or contains only non-CFG extensions. Tracks hasCfgFile sentinel to distinguish "all native" from "nothing to process". - Fast path adds !symbols._tree guard so WASM-parsed files fall through to the slow path (prevents CFG wipe on WASM-only builds). - Both fast and slow paths now unconditionally call deleteCfgForNode before the blocks-length guard, cleaning stale cfg_blocks/cfg_edges rows when a function body is removed. - Added doc comment to has_child_of_kind clarifying shallow (one-level) direct-child check semantics. Impact: 2 functions changed, 6 affected * fix(test): guard against silent pass when no defs have CFG blocks (#595) * docs(cfg): clarify parsers=null invariant for _tree files in fast path (#595) Impact: 1 functions changed, 5 affected * fix(test): guard against findFunctionNode returning null for all defs (#595) Both the parity test loop and complex patterns suite could silently pass with zero assertions if findFunctionNode returned null for every entry in defsWithCfg (e.g., due to line-number offset mismatch between native and WASM parsers). Added defsWithNode guard that calls ctx.skip() when no function nodes are found, matching the existing defsWithCfg pattern.
1 parent b1369b5 commit f9f81f9

3 files changed

Lines changed: 319 additions & 21 deletions

File tree

crates/codegraph-core/src/cfg.rs

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -803,6 +803,8 @@ impl<'a> CfgBuilder<'a> {
803803
for_stmt.child_by_field_name(field).is_some()
804804
|| for_stmt.child_by_field_name("right").is_some()
805805
|| for_stmt.child_by_field_name("value").is_some()
806+
// Go: for-range has a range_clause child (no condition/right/value fields)
807+
|| has_child_of_kind(for_stmt, "range_clause")
806808
// Explicit iterator-style node kinds across supported languages:
807809
// JS: for_in_statement, Java: enhanced_for_statement,
808810
// C#/PHP: foreach_statement, Ruby: for
@@ -1134,6 +1136,14 @@ impl<'a> CfgBuilder<'a> {
11341136

11351137
// ─── Helpers ────────────────────────────────────────────────────────────
11361138

1139+
/// Returns true if `node` has a direct child whose node kind equals `kind`.
1140+
/// This is a shallow (one-level) check — it does not recurse into grandchildren.
1141+
fn has_child_of_kind(node: &Node, kind: &str) -> bool {
1142+
let mut cursor = node.walk();
1143+
let result = node.children(&mut cursor).any(|c| c.kind() == kind);
1144+
result
1145+
}
1146+
11371147
fn node_line(node: &Node) -> u32 {
11381148
node.start_position().row as u32 + 1
11391149
}

src/features/cfg.ts

Lines changed: 74 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,17 @@ interface FileSymbols {
8484
_langId?: string;
8585
}
8686

87+
/**
88+
* Check whether all function/method definitions in a single file already
89+
* have native CFG data (blocks populated by the Rust extractor).
90+
* cfg === null means no body (expected); cfg with empty blocks means not computed.
91+
*/
92+
function hasNativeCfgForFile(symbols: FileSymbols): boolean {
93+
return symbols.definitions
94+
.filter((d) => (d.kind === 'function' || d.kind === 'method') && d.line)
95+
.every((d) => d.cfg === null || (d.cfg?.blocks?.length ?? 0) > 0);
96+
}
97+
8798
async function initCfgParsers(
8899
fileSymbols: Map<string, FileSymbols>,
89100
): Promise<{ parsers: unknown; getParserFn: unknown }> {
@@ -93,10 +104,7 @@ async function initCfgParsers(
93104
if (!symbols._tree) {
94105
const ext = path.extname(relPath).toLowerCase();
95106
if (CFG_EXTENSIONS.has(ext)) {
96-
const hasNativeCfg = symbols.definitions
97-
.filter((d) => (d.kind === 'function' || d.kind === 'method') && d.line)
98-
.every((d) => d.cfg === null || (d.cfg?.blocks?.length ?? 0) > 0);
99-
if (!hasNativeCfg) {
107+
if (!hasNativeCfgForFile(symbols)) {
100108
needsFallback = true;
101109
break;
102110
}
@@ -129,11 +137,7 @@ function getTreeAndLang(
129137
let tree = symbols._tree;
130138
let langId = symbols._langId;
131139

132-
const allNative = symbols.definitions
133-
.filter((d) => (d.kind === 'function' || d.kind === 'method') && d.line)
134-
.every((d) => d.cfg === null || (d.cfg?.blocks?.length ?? 0) > 0);
135-
136-
if (!tree && !allNative) {
140+
if (!tree && !hasNativeCfgForFile(symbols)) {
137141
if (!getParserFn) return null;
138142
langId = extToLang.get(ext);
139143
if (!langId || !CFG_RULES.has(langId)) return null;
@@ -246,14 +250,43 @@ function persistCfg(
246250

247251
// ─── Build-Time: Compute CFG for Changed Files ─────────────────────────
248252

253+
/**
254+
* Check if all function/method definitions across all files already have
255+
* native CFG data (blocks array populated by the Rust extractor).
256+
* When true, the WASM parser and JS CFG visitor can be fully bypassed.
257+
*/
258+
function allCfgNative(fileSymbols: Map<string, FileSymbols>): boolean {
259+
let hasCfgFile = false;
260+
for (const [relPath, symbols] of fileSymbols) {
261+
if (symbols._tree) continue; // already parsed via WASM; will use _tree in slow path
262+
const ext = path.extname(relPath).toLowerCase();
263+
if (!CFG_EXTENSIONS.has(ext)) continue;
264+
hasCfgFile = true;
265+
266+
if (!hasNativeCfgForFile(symbols)) return false;
267+
}
268+
// Return false when no CFG files found (empty map, all _tree, or all non-CFG
269+
// extensions) to avoid vacuously triggering the fast path.
270+
return hasCfgFile;
271+
}
272+
249273
export async function buildCFGData(
250274
db: BetterSqlite3Database,
251275
fileSymbols: Map<string, FileSymbols>,
252276
rootDir: string,
253277
_engineOpts?: unknown,
254278
): Promise<void> {
279+
// Fast path: when all function/method defs already have native CFG data,
280+
// skip WASM parser init, tree parsing, and JS visitor entirely — just persist.
281+
const allNative = allCfgNative(fileSymbols);
282+
255283
const extToLang = buildExtToLangMap();
256-
const { parsers, getParserFn } = await initCfgParsers(fileSymbols);
284+
let parsers: unknown = null;
285+
let getParserFn: unknown = null;
286+
287+
if (!allNative) {
288+
({ parsers, getParserFn } = await initCfgParsers(fileSymbols));
289+
}
257290

258291
const insertBlock = db.prepare(
259292
`INSERT INTO cfg_blocks (function_node_id, block_index, block_type, start_line, end_line, label)
@@ -270,6 +303,35 @@ export async function buildCFGData(
270303
const ext = path.extname(relPath).toLowerCase();
271304
if (!CFG_EXTENSIONS.has(ext)) continue;
272305

306+
// Native fast path: skip tree/visitor setup when all CFG is pre-computed.
307+
// Only apply to files without _tree — files with _tree were WASM-parsed
308+
// and need the slow path (visitor) to compute CFG.
309+
if (allNative && !symbols._tree) {
310+
for (const def of symbols.definitions) {
311+
if (def.kind !== 'function' && def.kind !== 'method') continue;
312+
if (!def.line) continue;
313+
314+
const nodeId = getFunctionNodeId(db, def.name, relPath, def.line);
315+
if (!nodeId) continue;
316+
317+
// Always delete stale CFG rows (handles body-removed case)
318+
deleteCfgForNode(db, nodeId);
319+
if (!def.cfg?.blocks?.length) continue;
320+
321+
persistCfg(
322+
def.cfg as unknown as { blocks: CfgBuildBlock[]; edges: CfgBuildEdge[] },
323+
nodeId,
324+
insertBlock,
325+
insertEdge,
326+
);
327+
analyzed++;
328+
}
329+
continue;
330+
}
331+
332+
// When allNative=true, parsers/getParserFn are null. This is safe because
333+
// _tree files use symbols._tree directly in getTreeAndLang (the parser
334+
// code path is never reached). Non-_tree files are handled by the fast path above.
273335
const treeLang = getTreeAndLang(symbols, relPath, rootDir, extToLang, parsers, getParserFn);
274336
if (!treeLang) continue;
275337
const { tree, langId } = treeLang;
@@ -302,9 +364,10 @@ export async function buildCFGData(
302364
if (r) cfg = { blocks: r.blocks, edges: r.edges };
303365
}
304366

367+
// Always purge stale rows (handles body-removed case)
368+
deleteCfgForNode(db, nodeId);
305369
if (!cfg || cfg.blocks.length === 0) continue;
306370

307-
deleteCfgForNode(db, nodeId);
308371
persistCfg(cfg, nodeId, insertBlock, insertEdge);
309372
analyzed++;
310373
}

0 commit comments

Comments
 (0)