Skip to content

refactor: remove dead WASM call-AST extraction and pre-3.2 edge shim#686

Merged
carlos-alm merged 13 commits intomainfrom
refactor/remove-call-ast-wasm
Mar 30, 2026
Merged

refactor: remove dead WASM call-AST extraction and pre-3.2 edge shim#686
carlos-alm merged 13 commits intomainfrom
refactor/remove-call-ast-wasm

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Remove call AST node extraction from WASM visitor — call sites are already captured in FileSymbols.calls during extraction and stored as graph edges. The call entries in ast_nodes were written but never queried by any DB query, feature, or command. Deletes extractCallName, walkSubtree, walkCallArguments, findChildByKind, collectNode, and the call_expression entry from JS/TS astTypes. (~100 lines)
  • Remove pre-3.2.0 receiver edge supplement shimsupplementReceiverEdges was a JS fallback for native binaries older than 3.2.0 that didn't emit receiver edges. Current version is 3.5.0; all published binaries handle receivers natively. (~70 lines)

Net: −170 lines of dead code.

Test plan

  • Lint passes (biome check)
  • AST parity tests pass (7/7) — native engine unaffected
  • Integration AST tests pass (19/19)
  • CI full test suite (WASM grammar tests require npm install to build grammars)

…sitor

Call sites are already captured in FileSymbols.calls during extraction
and stored as graph edges — the separate `call` entries in ast_nodes
were written but never queried. This removes ~100 lines of call-specific
subtree walking logic (walkSubtree, walkCallArguments, collectNode,
extractCallName) and the call_expression entry from JS/TS astTypes.
The supplementReceiverEdges fallback was added for native binaries
older than 3.2.0 that didn't emit receiver or type-resolved method-call
edges. Current version is 3.5.0 — all published binaries handle
receivers natively. Removes ~70 lines of dead compat code.
@claude
Copy link
Copy Markdown

claude bot commented Mar 30, 2026

Claude finished @carlos-alm's task —— View job


Root Cause Analysis

Problem Being Solved: Removing genuinely dead code - WASM call AST extraction that duplicated already-captured call site data, and a compatibility shim for native binaries older than 3.2.0 (current is 3.5.0).

Why This Approach: Call sites are already captured in FileSymbols.calls during extraction and stored as graph edges in the edges table. The separate call entries in ast_nodes were written but never queried by any feature or command. The receiver edge supplement was needed for pre-3.2.0 native binaries that didn't emit receiver edges natively.

Risk Assessment: Low risk. This is pure dead code removal with no behavioral changes. The functionality remains available through existing pathways (call edges, native receiver handling).

Backlog Compliance

  • Zero-dep: ✓ — No new dependencies
  • Foundation-aligned: ✓ — Pure cleanup, improves maintainability
  • Problem-fit: 3 — Code cleanup that improves maintainability but doesn't directly address core AI navigation problems
  • Breaking: No — No public API or behavior changes
  • Tier: 1

Technical Analysis

Phase 1: Root Cause Verification

  • Real problem: Confirmed genuine dead code through codebase analysis
  • No grep hits for removed functions (extractCallName, walkSubtree, walkCallArguments, findChildByKind, collectNode, supplementReceiverEdges)
  • No database queries for ast_nodes with kind='call' - all call-related queries use edges table with kind='calls'

Phase 2: Configuration Changes

  • No configuration changes detected - pure code deletion

Phase 3: Test Integrity

  • Tests remain robust: ast.test.ts line 147 validates call node querying still works
  • AST parity tests in ast-parity.test.ts verify native engine call extraction unaffected
  • No test modifications that weaken assertions

Phase 4: Code Quality

  • Clean refactoring that removes ~170 lines of dead code
  • Call functionality preserved through FileSymbols.callsedges table pathway
  • WASM visitor simplified by removing redundant call-specific tree walking
  • Native receiver handling eliminates need for compatibility shim

Phase 5: Architecture Compliance

  • Maintains the dual-engine architecture (native/WASM) properly
  • Call AST nodes still work for supported scenarios through native engine path
  • No impact on existing APIs or command functionality

Critical Concerns

None identified. This is a clean dead-code removal.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Excellent code cleanup that removes 170 lines of genuinely dead code without any functional regression. Claims are well-verified, no configuration relaxation, tests remain strong. The only reason it's not 5/5 is that it's "just" cleanup rather than new value-added functionality.

This is exactly the kind of refactoring that keeps codebases healthy - removing code that has outlived its purpose without breaking anything.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 30, 2026

Greptile Summary

This PR removes two pieces of dead code (−170 lines) and wires up three new native bulk-insert fast paths for complexity, CFG, and dataflow.

Dead-code removals:

  • call_expression → 'call' dropped from astTypes; extractCallName, walkSubtree, walkCallArguments, findChildByKind, collectNode, and the matched Set all removed from ast-store-visitor.ts. Both WASM and native buildAstNodes paths now filter any residual 'call' kind entries to handle the transition period before the next Rust release strips them from the native extractor.
  • supplementReceiverEdges and NormalizedTypeEntry removed from build-edges.ts; the pre-3.2.0 receiver-edge shim is gone and buildReceiverEdge's type signature is narrowed accordingly.

New native bulk-insert paths (complexity, CFG, dataflow):

  • crates/codegraph-core/src/native_db.rs adds ComplexityRow, CfgEntry/CfgBlockRow/CfgEdgeRow, and DataflowEdge napi structs plus bulk_insert_complexity, bulk_insert_cfg, and bulk_insert_dataflow methods using unchecked_transaction + commit.
  • src/types.ts adds matching optional method signatures to NativeDatabase.
  • Each feature module checks for the napi method and short-circuits the expensive WASM parser init when all pre-computed native data is present.

Minor findings (P2):

  • In complexity.ts, a partial native bulk-insert failure falls back to the JS WASM path which re-inserts all rows with INSERT OR REPLACE, potentially replacing already-written native values with WASM-computed ones.
  • In dataflow.ts, resolveNode's global name fallback uses ORDER BY file, line LIMIT 10 and takes [0], which deterministically but arbitrarily resolves to the lexicographically earliest file when multiple functions share the same name.

Confidence Score: 5/5

Safe to merge — dead-code removals are clean and new fast paths degrade gracefully to the existing JS fallback.

All findings are P2. One is a theoretical mixed-engine inconsistency only on an unusual partial-failure path; the other is a name-resolution tiebreaker that may already be intentional. Neither affects correctness on the happy path. Dead-code removals are well-scoped, test suite is updated, and the 'call' filter handles the native binary transition period.

src/features/complexity.ts (partial-insert fallback), src/features/dataflow.ts (global name resolution)

Important Files Changed

Filename Overview
src/features/complexity.ts Adds native bulk-insert fast path; partial-failure fallback to JS WASM path may overwrite native-computed values.
src/features/dataflow.ts Adds native bulk-insert fast path; global name resolution picks first alphabetical file match, which may be wrong on name collisions.
src/ast-analysis/visitors/ast-store-visitor.ts Removes ~100 lines of dead call-AST extraction code; enterNode simplified to inline remaining node handling.
src/domain/graph/builder/stages/build-edges.ts Removes supplementReceiverEdges and NormalizedTypeEntry; buildReceiverEdge type signature narrowed cleanly.
src/features/cfg.ts Adds native CFG bulk-insert fast path using deleteCfgForNode (BetterSqlite3) then bulkInsertCfg (napi-rs) sequentially.
crates/codegraph-core/src/native_db.rs Adds ComplexityRow, CfgEntry/Block/Edge, DataflowEdge napi structs and bulk_insert_* methods with correct transaction semantics.
src/features/ast.ts Removes symbols.calls → ast_nodes write path; both paths filter residual 'call' kind entries for the native binary transition period.
src/types.ts Adds optional bulkInsertComplexity, bulkInsertCfg, bulkInsertDataflow to NativeDatabase interface.
tests/parsers/ast-nodes.test.ts Test updated to assert zero 'call' kind AST nodes, correctly tracking the new no-call-AST contract.
src/ast-analysis/rules/javascript.ts Removes call_expression: 'call' from astTypes — one-line deletion completing WASM visitor dead-type removal.
README.md Adds native Rust pipeline phase table; purely documentation.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Build Pipeline] --> B{Engine?}
    B -->|native| C[NativeDatabase napi-rs]
    B -->|wasm / auto fallback| D[WASM / JS path]

    C --> E{bulkInsertComplexity?}
    E -->|yes, all defs have native complexity| F[Native bulk insert\nINSERT OR REPLACE]
    E -->|no / partial failure| G[JS WASM fallback\nre-parse all files]
    F -->|inserted == rows.length| H[return]
    F -->|inserted < rows.length| G

    C --> I{bulkInsertCfg + allNative?}
    I -->|yes| J[deleteCfgForNode via BetterSqlite3\nthen bulkInsertCfg via napi-rs]
    I -->|no| K[JS WASM CFG visitor]
    J --> L[return]

    C --> M{bulkInsertDataflow?}
    M -->|yes, all files have symbols.dataflow| N[resolveNode local to global\nbulkInsertDataflow]
    M -->|needsJsFallback = true| O[JS WASM dataflow visitor]
    N --> P[return]

    D --> Q[ast-store-visitor\nnew / throw / await / string / regex only\ncall_expression REMOVED]
Loading

Reviews (3): Last reviewed commit: "fix(ast): filter native call AST nodes b..." | Re-trigger Greptile

…dataflow

Add three new NativeDatabase methods (bulk_insert_complexity,
bulk_insert_cfg, bulk_insert_dataflow) that write pre-computed Rust
analysis results directly to SQLite via rusqlite, bypassing
better-sqlite3 JS round-trips.

Wire native fast paths in complexity.ts, cfg.ts, and dataflow.ts that
try the rusqlite bulk insert first, falling back to better-sqlite3 when
native data is unavailable.

Update README with a table explaining what Rust handles on the native
path end-to-end.
block_stmt.execute() returns Result<usize>, not Result<()>
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Extract shared patterns from 9 language extractors into 4 reusable
helpers in helpers.ts, reducing per-language boilerplate by ~30 lines:

- findParentNode: replaces 6 findParent*/findCurrentImpl functions
- extractBodyMembers: replaces 5 body-iteration patterns for enums/structs
- stripQuotes: replaces inline .replace(/"/g,'') across 3 extractors
- lastPathSegment: replaces inline .split('.').pop() across 6 extractors

Net: +77 helper lines, -159 extractor lines = -82 lines total.
The matched Set was a deduplication guard left over from the removed
manual walkSubtree logic. The DFS framework visits each node exactly
once, so the guard never fires. Removing it eliminates a no-op
allocation.
The WASM visitor no longer extracts call AST nodes (removed in this PR),
but the native engine and the symbols.calls fallback path still inserted
them. This broke the build-parity test. Remove call kind insertion from
both the native bulk-insert filter and the WASM symbols.calls path.
Call AST nodes were dead — never queried by any feature or command.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed Greptile's feedback and fixed CI failures:

  1. Removed dead matched Set from ast-store-visitor.ts — the deduplication guard was left over from the removed walkSubtree logic and never fires since the DFS framework visits each node exactly once.

  2. Fixed build-parity test failure — the PR removed call AST extraction from the WASM visitor but left two other insertion paths intact:

    • Native symbols.astNodes still contained call entries from Rust extractors
    • WASM symbols.calls fallback still created call AST rows

    Both paths now filter out / skip call kind, restoring engine parity. Call AST nodes were dead (never queried by any feature or command).

  3. Merged origin/main to pick up the null visibility crash fix and sequence dataflow annotation fix from fix(ci): resolve visibility null crash and sequence dataflow annotation #693.

  4. Filed follow-up issue refactor(native): remove call kind from Rust AST node extraction #701 to remove call_types from the Rust extractors in a future native binary release (the JS-side filter handles parity until then).

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

The engine.ts check `!Array.isArray(symbols.astNodes)` skips the WASM
visitor when native provides astNodes. If native only produced call
entries (now dead), the array was non-empty but contained only filtered-
out kinds, causing the WASM visitor to be skipped and no AST nodes to
be stored. Filter call entries early so the WASM visitor runs when no
non-call entries remain.
…ST nodes

Move the call-kind filter from setupVisitors (which only runs for files
with _tree) to the top of runAnalyses so it applies to all files
including native-parsed ones. Add AST needs to ensureWasmTreesIfNeeded
so WASM trees are created when native astNodes were cleared. Also remove
unused nativeProvidedAstNodes variable and fix formatting.
@carlos-alm carlos-alm merged commit 4c143e7 into main Mar 30, 2026
19 checks passed
@carlos-alm carlos-alm deleted the refactor/remove-call-ast-wasm branch March 30, 2026 09:42
@github-actions github-actions bot locked and limited conversation to collaborators Mar 30, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant