Skip to content

perf(insert-nodes): native Rust/rusqlite pipeline for node insertion#654

Open
carlos-alm wants to merge 5 commits intomainfrom
fix/wasm-engine-parity-649
Open

perf(insert-nodes): native Rust/rusqlite pipeline for node insertion#654
carlos-alm wants to merge 5 commits intomainfrom
fix/wasm-engine-parity-649

Conversation

@carlos-alm
Copy link
Contributor

@carlos-alm carlos-alm commented Mar 27, 2026

Summary

  • Add bulk_insert_nodes napi function in crates/codegraph-core/src/insert_nodes.rs that writes nodes, children, containment edges, exports, and file hashes directly to SQLite via rusqlite in a single transaction
  • Wire it into the JS insert-nodes stage: when native engine is available, marshal allSymbols to lean batch format and delegate all DB writes to Rust — eliminating JS↔better-sqlite3 FFI overhead
  • JS fallback preserved unchanged for WASM engine or native failure

Target: insertMs < 50ms on native full builds (current: ~206ms)

Test plan

  • All 560 integration tests pass (WASM fallback exercised locally)
  • CI builds native addon and exercises the Rust insert path on Linux/macOS
  • Benchmark native full build to verify insertMs < 50ms target

Four root causes fixed across both engines:

WASM JS/TS: extractConstantsWalk only walked root-level children,
missing constants inside top-level blocks (if, try, for). Now recurses
into all non-function scopes, matching native's recursive walk.

WASM JS/TS: query pattern only matched property_identifier for methods,
missing private #methods. Added private_property_identifier pattern.

WASM Rust: extractRustParameters included self_parameter nodes that
native correctly skips. Now skips them.

Native JS: find_parent_of_types missed generator_function_declaration
and generator_function, leaking constants from generator function bodies.
Added both types to the scope filter.

Native Rust: function_item handler emitted a duplicate entry for trait
default-impl functions already captured by trait_item handler. Added
parent guard matching WASM's existing skip logic.
@claude
Copy link

claude bot commented Mar 27, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR: fix WASM-native engine parity gap

Root Cause Analysis

Problem Being Solved: This PR addresses a 26-node divergence between WASM and native parsing engines by fixing 5 specific extraction issues across both engines.

Why This Approach: The approach directly fixes root causes in the parsing logic rather than masking symptoms. Each fix targets a specific extraction inconsistency.

Risk Assessment: ✅ LOW RISK - The changes are surgical fixes to known issues with solid test coverage.

Backlog Compliance

  • Zero-dep: ✅ No new runtime dependencies
  • Foundation-aligned: ✅ Improves core parsing accuracy (P1: always-current)
  • Problem-fit: 5/5 - Directly addresses engine parity, critical for reliable parsing
  • Breaking: No - purely additive fixes to extraction logic
  • Tier: 1 (foundation-aligned ✅ + zero-dep ✅)

Code Quality Review

✅ Strong Points

  1. Precise Root Cause Identification: Each fix addresses a specific, well-documented extraction issue
  2. Symmetric Approach: Fixes applied to both WASM and native engines appropriately
  3. Excellent Documentation: Clear commit message explains all 5 fixes with examples
  4. Safe Changes: All modifications are surgical and maintain existing behavior for valid cases

Detailed Fix Analysis

WASM JS/TS Fixes (src/extractors/javascript.ts):

  1. Constants extraction: extractConstantsWalk now recurses into all non-function scopes (lines 322-323), correctly matching native behavior for constants in blocks
  2. Private methods: Added private_property_identifier pattern matching (line 270) for #methods

Native JS Fix (crates/codegraph-core/src/extractors/javascript.rs):
3. Generator functions: Added generator_function_declaration and generator_function to scope types (lines 297-298), preventing constant leakage

WASM Rust Fix (src/extractors/rust.ts):
4. Self parameters: Now correctly skips self_parameter nodes (line 230), matching native behavior

Native Rust Fix (crates/codegraph-core/src/extractors/rust_lang.rs):
5. Duplicate trait methods: Added parent guard for function_item inside traits (lines 44-47), preventing double emission

✅ Test Strategy

  • All 272 parser tests pass
  • All 560 integration tests pass
  • Manual verification shows gap reduced from 26 nodes to 0
  • Native rebuild needed in CI (appropriate for this type of fix)

Final Assessment

This is exemplary engineering work. The PR:

  • ✅ Identifies specific root causes rather than treating symptoms
  • ✅ Applies symmetric fixes across both engines appropriately
  • ✅ Maintains backward compatibility while improving accuracy
  • ✅ Has comprehensive test coverage
  • ✅ Documents all changes clearly

The approach follows the project's principle: "Never document bugs as expected behavior" - instead of accepting the divergence, it systematically fixes the underlying extraction issues.

Rating: ⭐⭐⭐⭐⭐ (5/5)

Action: APPROVE

Reasoning: This PR directly improves the core foundation of codegraph (parsing accuracy) with zero risk, excellent documentation, and proper testing. The fixes are surgical, well-reasoned, and address real extraction bugs rather than masking symptoms.


@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 27, 2026

Greptile Summary

This PR introduces a native Rust/rusqlite pipeline for the insert-nodes build stage, consolidating five previously JS-driven SQLite write phases (file nodes, definitions, exports, children, containment/parameter_of edges, and file hashes) into a single transaction executed entirely inside the Rust addon. When engineName === 'native', insertNodes marshals allSymbols into lean batch structs and hands them to bulk_insert_nodes; the JS fallback is preserved unchanged for WASM engine runs or native failures. The PR also deletes the older cfg_db.rs and dataflow_db.rs modules and removes their call-sites from cfg.ts and dataflow.ts, simplifying the overall native surface to one entry point. Several extractor parity fixes are bundled in: generator-function scope guards for JS constant extraction, a trait default-impl deduplication guard in the Rust extractor, self-parameter removal from Rust method nodes, and a new tree-sitter pattern to capture private class methods.\n\nKey changes:\n- crates/codegraph-core/src/insert_nodes.rs (new): Four-phase do_insert runs in one rusqlite transaction using INSERT OR IGNORE for nodes and prepare_cached for statement reuse.\n- src/domain/graph/builder/stages/insert-nodes.ts: tryNativeInsert gates on ctx.engineName === 'native' and returns false on any error, keeping the JS fallback always reachable.\n- src/features/cfg.ts: Native bulk-insert block removed; allNative variable is now unused dead code.\n- src/types.ts: Adds bulkInsertNodes to NativeAddon; stale bulkInsertCfg/bulkInsertDataflow declarations remain.

Confidence Score: 5/5

Safe to merge — the native fast-path is fully gated behind a boolean return and falls back to the unchanged JS path on any failure.

Both remaining findings are P2 cleanup items (a dead allNative variable in cfg.ts and stale type declarations in NativeAddon) with no runtime impact. The Rust implementation faithfully mirrors the existing JS phase structure. The previously reported export-constant duplicate bug is correctly resolved. No new correctness or data-integrity issues were introduced.

src/features/cfg.ts (unused allNative variable) and src/types.ts (stale bulkInsertCfg/bulkInsertDataflow declarations).

Important Files Changed

Filename Overview
crates/codegraph-core/src/insert_nodes.rs New Rust napi function bulk_insert_nodes — writes nodes, children, containment/parameter_of edges, exports, and file hashes in a single SQLite transaction; mirrors JS phase structure faithfully.
src/domain/graph/builder/stages/insert-nodes.ts Adds tryNativeInsert fast-path that marshals allSymbols to batch format and delegates all DB writes to Rust when engineName === 'native'; JS fallback preserved unchanged.
src/features/cfg.ts Removes the bulkInsertCfg native fast-path block; allNative variable is now computed but never used — dead code left behind.
src/types.ts Adds bulkInsertNodes signature to NativeAddon; bulkInsertCfg and bulkInsertDataflow declarations remain despite their Rust implementations being deleted.
src/extractors/javascript.ts Rewrites extractConstantsWalk to recurse through the full AST while guarding against function scopes; correctly fixes the previously-reported export constant duplicate by skipping recursion into export_statement nodes.
crates/codegraph-core/src/extractors/rust_lang.rs Adds a grandparent check to skip function_item nodes that are default implementations inside a trait_item, preventing duplicate symbol emission.

Reviews (2): Last reviewed commit: "fix(parser): prevent double-counting exp..." | Re-trigger Greptile

Comment on lines +286 to +323
let declNode = child;
// Handle `export const …` — unwrap the export_statement to its declaration child
if (node.type === 'export_statement') {
const inner = node.childForFieldName('declaration');
if (!inner) continue;
declNode = inner;
if (child.type === 'export_statement') {
const inner = child.childForFieldName('declaration');
if (inner) declNode = inner;
}

const t = declNode.type;
if (t !== 'lexical_declaration' && t !== 'variable_declaration') continue;
if (!declNode.text.startsWith('const ')) continue;

for (let j = 0; j < declNode.childCount; j++) {
const declarator = declNode.child(j);
if (!declarator || declarator.type !== 'variable_declarator') continue;
const nameN = declarator.childForFieldName('name');
const valueN = declarator.childForFieldName('value');
if (!nameN || nameN.type !== 'identifier' || !valueN) continue;
// Skip functions — already captured by query patterns
const valType = valueN.type;
if (
valType === 'arrow_function' ||
valType === 'function_expression' ||
valType === 'function'
)
continue;
if (isConstantValue(valueN)) {
definitions.push({
name: nameN.text,
kind: 'constant',
line: declNode.startPosition.row + 1,
endLine: nodeEndLine(declNode),
});
if (t === 'lexical_declaration' || t === 'variable_declaration') {
if (declNode.text.startsWith('const ')) {
for (let j = 0; j < declNode.childCount; j++) {
const declarator = declNode.child(j);
if (!declarator || declarator.type !== 'variable_declarator') continue;
const nameN = declarator.childForFieldName('name');
const valueN = declarator.childForFieldName('value');
if (!nameN || nameN.type !== 'identifier' || !valueN) continue;
// Skip functions — already captured by query patterns
const valType = valueN.type;
if (
valType === 'arrow_function' ||
valType === 'function_expression' ||
valType === 'function'
)
continue;
if (isConstantValue(valueN)) {
definitions.push({
name: nameN.text,
kind: 'constant',
line: declNode.startPosition.row + 1,
endLine: nodeEndLine(declNode),
});
}
}
}
}

// Recurse into non-function children (blocks, if-statements, etc.)
extractConstantsWalk(child, definitions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Exported constants counted twice

When child.type === 'export_statement', the code:

  1. Unwraps inner = child.childForFieldName('declaration') (the lexical_declaration)
  2. Processes declNode = inner → pushes the constant
  3. Then unconditionally calls extractConstantsWalk(child, definitions) — recursing into the export_statement itself

During that recursion the lexical_declaration is found again as a direct child of export_statement (.child(i) returns all children including named-field children), declNode is set to it, and the constant is pushed a second time.

This only affects top-level export const X = value declarations, but every such constant will appear as a duplicate in the definitions array, producing ghost nodes in the graph.

The minimal fix at the bottom of the loop:

// Recurse into non-function, non-export-statement children
if (child.type !== 'export_statement') {
  extractConstantsWalk(child, definitions);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added a guard to skip recursion into export_statement nodes in extractConstantsWalk. After unwrapping and processing the inner lexical_declaration, the function now skips recursing into the export_statement itself, preventing the duplicate. See commit 9215de9.

Move the entire insert-nodes stage to Rust when the native engine is
available.  The new `bulk_insert_nodes` napi function receives marshalled
definitions, exports, children, and file-hash data from JS and writes
everything to SQLite via rusqlite in a single transaction — eliminating
the JS↔better-sqlite3 FFI overhead that dominated the ~206ms timing.

Rust path: prepare_cached + individual executes (zero per-row FFI cost).
JS fallback preserved unchanged for WASM engine or native failure.

Target: insertMs < 50 ms on native full builds.
@carlos-alm carlos-alm changed the title fix(parser): close WASM-native engine parity gap perf(insert-nodes): native Rust/rusqlite pipeline for node insertion Mar 27, 2026
@carlos-alm
Copy link
Contributor Author

@greptileai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant