perf(build): native Rust/rusqlite for roles & edge insertion (6.12)#658
perf(build): native Rust/rusqlite for roles & edge insertion (6.12)#658carlos-alm wants to merge 8 commits intomainfrom
Conversation
…nsertion (6.12) Roles: move classifyNodeRolesFull/Incremental SQL + classification logic to Rust (roles_db.rs). Single rusqlite connection runs fan-in/fan-out queries, computes medians, classifies roles, and batch-updates nodes — eliminates ~10 JS<->SQLite round-trips. Edges: add bulk_insert_edges (edges_db.rs) that writes computed edges directly to SQLite via rusqlite instead of marshaling back to JS. Restructure buildEdges to run edge computation in better-sqlite3 transaction, then native insert outside to avoid connection contention. 1-file regression fix: skip native call-edge path for small incremental builds (≤3 files) where napi-rs marshaling overhead exceeds savings. Both paths fall back gracefully to JS when native is unavailable.
|
Claude finished @carlos-alm's task —— View job PR Review: Native Rust/rusqlite for roles & edge insertionReview Progress
Phase 1: Root Cause Analysis ✅Problem Being Solved: Performance bottleneck in role classification and edge insertion operations where ~10 JS<->SQLite round-trips and marshaling overhead were creating latency in incremental builds. Why This Approach: Moving compute-heavy SQLite operations to native Rust eliminates marshaling overhead and allows for batch operations within single database connections. The approach includes graceful fallback to JS and regression handling for small incremental builds. Risk Assessment: Low risk - maintains dual-engine architecture philosophy, includes comprehensive fallback mechanisms, and follows existing patterns. Phase 2: Configuration Change Detection ✅No Configuration Changes: No changes to biome.json, linting rules, TypeScript config, build config, or test timeouts. Only additive changes to source code. Phase 3: Test Integrity ✅Test Coverage: Existing role classification tests remain intact (roles.test.ts with 30+ test cases). Integration tests cover incremental builds and edge parity. No test assertions were weakened or removed. Regression Testing: PR includes manual testing claims but relies on existing comprehensive test suite for validation. Phase 4: Code Quality Analysis ✅Rust Code Quality:
TypeScript Integration:
Minor Observations:
Phase 5: Backlog Compliance Assessment ✅Zero-dep: ✅ - No new runtime dependencies, builds on existing rusqlite/napi-rs foundation Foundation-aligned: ✅ - Fully aligned with dual-engine architecture (P3), maintains local-first principle (P1), and preserves JS API compatibility (P5) Problem-fit: 4/5 - Directly addresses AI agent efficiency by reducing build latency that blocks rapid iteration cycles. Performance improvements in core graph operations benefit all downstream analysis. Breaking: No - Purely additive with graceful fallback. Existing APIs unchanged. Tier: 1 (Zero-dep ✅ + Foundation-aligned ✅) Root Cause AnalysisProblem Being Solved: JS<->SQLite marshaling overhead causing 10+ round-trips in role classification and edge insertion, creating latency bottlenecks in incremental builds. Backlog Compliance
Critical ConcernsNone. Code follows security best practices, maintains type safety, and includes appropriate performance optimizations. Final Recommendation
|
Greptile SummaryThis PR moves role classification and bulk edge insertion from JS/better-sqlite3 into native Rust/rusqlite, eliminating ~10 JS↔SQLite round-trips per build. The two new Rust modules ( Key changes:
One P1 finding: the native-fails→JS-fallback edge-insertion path is non-transactional; wrapping it in Confidence Score: 4/5Safe to merge after addressing the non-transactional fallback in build-edges.ts; all other paths are correct and the Rust implementations match the JS reference faithfully. One P1 finding: the batchInsertEdges fallback after a native insert failure runs outside any transaction, so a mid-batch SQLite error would leave a partial edge set durably committed with no recovery signal. This is a one-line fix. The Rust implementations are solid — SQL queries match the JS originals, the ENTRY_PATH_PATTERNS regression from prior review is confirmed fixed, and the unused-import issue is also confirmed resolved. src/domain/graph/builder/stages/build-edges.ts — the fallback batchInsertEdges call at the end of buildEdges needs to be wrapped in a transaction. Important Files Changed
Sequence DiagramsequenceDiagram
participant JS as build-edges.ts
participant BSQ as better-sqlite3 tx
participant NAT as rusqlite (bulk_insert_edges)
participant DB as SQLite DB
JS->>BSQ: computeEdgesTx() — barrel deletion + edge computation
BSQ->>DB: DELETE barrel edges
BSQ->>BSQ: buildImportEdges / buildCallEdges → allEdgeRows[]
alt JS path (no native bulkInsertEdges)
BSQ->>DB: batchInsertEdges (inside tx, atomic)
end
BSQ-->>JS: tx committed
alt native?.bulkInsertEdges && allEdgeRows.length > 0
JS->>NAT: native.bulkInsertEdges(db.name, edges)
NAT->>DB: BEGIN / INSERT OR IGNORE chunks / COMMIT
alt returns true
NAT-->>JS: ok=true ✓
else returns false
NAT-->>JS: ok=false
JS->>DB: batchInsertEdges (⚠ no tx — non-atomic)
end
end
|
| stmt.raw_bind_parameter(1, *role)?; | ||
| for (i, id) in chunk.iter().enumerate() { | ||
| stmt.raw_bind_parameter(i as i32 + 2, *id)?; | ||
| } | ||
| stmt.raw_execute()?; | ||
| } | ||
| } | ||
| Ok(()) | ||
| } | ||
|
|
||
| // ── Full classification ────────────────────────────────────────────── | ||
|
|
||
| fn do_classify_full(conn: &mut Connection) -> rusqlite::Result<RoleSummary> { | ||
| let tx = conn.transaction()?; | ||
| let mut summary = RoleSummary::default(); | ||
|
|
||
| // 1. Leaf kinds → dead-leaf (skip expensive fan-in/fan-out JOINs) | ||
| let leaf_ids: Vec<i64> = { |
There was a problem hiding this comment.
dead-entry detection diverges from JS implementation
In the JS classifyDeadSubRole, dead-entry is detected via file-path patterns (ENTRY_PATH_PATTERNS):
const ENTRY_PATH_PATTERNS: readonly RegExp[] = [
/cli[/\\]commands[/\\]/,
/mcp[/\\]/,
/routes?[/\\]/,
/handlers?[/\\]/,
/middleware[/\\]/,
];
if (ENTRY_PATH_PATTERNS.some((p) => p.test(node.file!))) return 'dead-entry';The Rust implementation instead reuses FRAMEWORK_ENTRY_PREFIXES — name prefixes ("route:", "event:", "command:"):
if FRAMEWORK_ENTRY_PREFIXES.iter().any(|p| name.starts_with(p)) {
return "dead-entry";
}This branch is effectively dead code: any node whose name starts with route:, event:, or command: is already classified as "entry" in the outer classify_node check and will never reach classify_dead_sub_role. More critically, nodes inside mcp/, cli/commands/, routes/, handlers/, or middleware/ directories with zero fan-in will receive dead-entry from the JS path but dead-unresolved from the native path, causing silent role divergence between the two code paths.
The fix is to port the file-path-pattern check from the JS implementation:
const ENTRY_PATH_PATTERNS: &[&str] = &[
"cli/commands/", "cli\\commands\\",
"mcp/", "mcp\\",
"routes/", "route/",
"handlers/", "handler/",
"middleware/",
];
if ENTRY_PATH_PATTERNS.iter().any(|p| file.contains(p)) {
return "dead-entry";
}There was a problem hiding this comment.
Fixed in 54d66a5. Ported the file-path-pattern logic from JS ENTRY_PATH_PATTERNS into Rust as a const ENTRY_PATH_PATTERNS array with both forward-slash and backslash variants. The classify_dead_sub_role function now checks file.contains(pattern) instead of name.starts_with(prefix), matching the JS implementation exactly. The old name-prefix check was indeed dead code as noted.
| fan_out: u32, | ||
| is_exported: bool, | ||
| production_fan_in: u32, | ||
| median_fan_in: u32, |
There was a problem hiding this comment.
Unused
params import (also in edges_db.rs)
params is imported from rusqlite but never used — the code uses raw_bind_parameter throughout both new files. This will produce a compiler warning unused import: 'params' in both roles_db.rs (line 121) and edges_db.rs (line 14).
| median_fan_in: u32, | |
| use rusqlite::{Connection, OpenFlags}; |
Same fix applies to crates/codegraph-core/src/edges_db.rs line 14.
There was a problem hiding this comment.
Already fixed in prior commit 0849b36 — the params import was removed from both roles_db.rs and edges_db.rs. The current code only imports {Connection, OpenFlags} from rusqlite in both files.
…sify_dead_sub_role (#658)
|
Addressed all Greptile review feedback:
|
…th (#658) The comment claimed barrel-edge deletion and re-insertion were atomic, but with the native rusqlite path the insertion happens in Phase 2 on a separate connection. Updated the comment to accurately describe the atomicity guarantee: JS path is fully atomic; native path has a transient gap that self-heals on next incremental rebuild.
|
Addressed both new findings from the re-review: P1 — barrel-edge crash-atomicity comment: Fixed in e8c848f. The old comment incorrectly claimed deletion and re-creation were atomic across both code paths. Updated the Phase 1 comment to accurately describe the tradeoff: the JS path keeps deletion + insertion atomic within a single transaction, while the native rusqlite path has a transient gap between Phase 1 commit and Phase 2 insert. A crash in that window would leave barrel edges missing until the next incremental rebuild re-creates them. The failure mode is bounded and self-healing, matching your assessment. P2 — |
|
Claude finished @carlos-alm's task —— View job ✅ PR Review: Native Rust/rusqlite for roles & edge insertion (v2)SummaryThis is a well-executed performance optimization that ports the two most expensive build operations (role classification and edge insertion) from JS/better-sqlite3 to native Rust/rusqlite. The implementation follows the codebase's dual-engine philosophy with proper fallback mechanisms and includes thoughtful performance optimizations. Todo List
🔍 Code Quality AssessmentRust Implementation (
|
…aram safety (#658) 200 rows × 5 params = 1000 bind parameters, which exceeds the legacy SQLITE_MAX_VARIABLE_NUMBER default of 999. While bundled SQLite 3.43+ raises the limit, reducing to 199 (995 params) removes the risk for any SQLite build with the old default.
#658) The native edge insert fallback path was silent, making it hard to diagnose when the native path fails. Added a debug() call so the fallback is visible in verbose/debug output.
|
Addressed the P2 suggestions from the re-review:
|
Summary
roles_db.rs): movesclassifyNodeRolesFullandclassifyNodeRolesIncrementalto Rust/rusqlite — fan-in/fan-out queries, median computation, role classification, and batch UPDATE all happen in a single Rust function, eliminating ~10 JS<->SQLite round-tripsedges_db.rs): writes computed call/receiver/extends/implements edges directly to SQLite via rusqlite chunked multi-value INSERT, bypassing the JSbatchInsertEdgespathTarget:
rolesMs < 15ms,edgesMs < 30mson native full builds.Affected files
crates/codegraph-core/src/roles_db.rscrates/codegraph-core/src/edges_db.rscrates/codegraph-core/src/lib.rssrc/types.tsclassifyRolesFull,classifyRolesIncremental,bulkInsertEdgesto NativeAddonsrc/domain/graph/builder/stages/build-structure.tssrc/domain/graph/builder/stages/build-edges.tsTest plan