perf(db): migrate build pipeline writes to NativeDatabase (6.15)#669
perf(db): migrate build pipeline writes to NativeDatabase (6.15)#669carlos-alm merged 4 commits intomainfrom
Conversation
…fecycle (6.13) Foundation for moving all DB operations to rusqlite on the native engine path. Creates a persistent rusqlite::Connection holder exposed to JS, handling schema migrations and build metadata KV — eliminating redundant per-call connection open/close in the native build pipeline.
…connection (6.15) Consolidate all build-pipeline write operations into NativeDatabase methods that reuse a single persistent rusqlite connection, eliminating the per-call connection open/close overhead from the standalone functions. Rust: Refactor standalone modules (insert_nodes, edges_db, ast_db, roles_db) to expose pub(crate) functions accepting &Connection, then add wrapper methods on NativeDatabase: bulkInsertNodes, bulkInsertEdges, bulkInsertAstNodes, classifyRolesFull, classifyRolesIncremental, and purgeFilesData. Standalone #[napi] functions preserved for backward compatibility. TypeScript: Wire all build stages (insert-nodes, build-edges, build-structure, detect-changes, ast) to prefer ctx.nativeDb methods when available, falling back to standalone native functions then JS. Thread nativeDb through EngineOpts for analysis-phase AST insertion.
|
Claude finished @carlos-alm's task —— View job Root Cause AnalysisProblem Being Solved: This PR addresses a real performance bottleneck in the build pipeline. Each write operation (nodes, edges, AST nodes, roles) in phases 6.9-6.12 was creating a new SQLite connection, introducing per-call connection overhead that slows down builds. Why This Approach: The solution migrates write operations to use Risk Assessment: Low risk. This is an internal optimization that doesn't change external APIs or behavior, only improves performance by reducing connection overhead. Backlog Compliance
Configuration Change Detection✅ No HIGH-RISK changes detected
Test Integrity Assessment✅ Test integrity maintained
Code Quality ReviewRust Layer (
|
Greptile SummaryThis PR migrates all build-pipeline write operations (nodes, edges, AST nodes, roles, purge) to reuse Key changes:
Confidence Score: 5/5Safe to merge — all remaining findings are P2 observability/style concerns that do not affect correctness or data integrity. The architecture is sound:
Important Files Changed
Sequence DiagramsequenceDiagram
participant TS as Pipeline Stage (TS)
participant NDB as NativeDatabase<br/>(persistent conn)
participant NAT as Standalone Native fn<br/>(new conn per call)
participant JS as JS / better-sqlite3
Note over TS,JS: Tier 1 — NativeDatabase (Phase 6.15)
TS->>NDB: bulk_insert_nodes / bulk_insert_edges<br/>bulk_insert_ast_nodes / classify_roles_*<br/>purge_files_data
alt success
NDB-->>TS: true / count / RoleSummary
else rusqlite error
NDB-->>TS: false / 0 / null
Note over TS,NAT: Tier 2 — Standalone native (Phase 6.12)
TS->>NAT: bulkInsertNodes(dbPath, …)
alt success
NAT-->>TS: true / count / RoleSummary
else error
NAT-->>TS: false / 0 / null
Note over TS,JS: Tier 3 — JS fallback
TS->>JS: batchInsertEdges / classifyNodeRoles / …
JS-->>TS: result
end
end
Reviews (2): Last reviewed commit: "fix: rename do_insert to do_insert_nodes..." | Re-trigger Greptile |
|
|
||
| fn do_insert( | ||
| conn: &mut Connection, | ||
| pub(crate) fn do_insert( |
There was a problem hiding this comment.
Inconsistent naming of internal
pub(crate) function
The parallel functions in the sibling modules were renamed to include the entity type:
edges_db.rs→do_insert_edgesast_db.rs→do_insert_ast_nodes
But this function remains do_insert, which is ambiguous when referenced in native_db.rs as insert_nodes::do_insert. Consider renaming to do_insert_nodes for consistency with the other modules.
| pub(crate) fn do_insert( | |
| pub(crate) fn do_insert_nodes( |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Fixed — renamed do_insert to do_insert_nodes in insert_nodes.rs for consistency with do_insert_edges and do_insert_ast_nodes in sibling modules. Updated both call sites (standalone #[napi] export and NativeDatabase::bulk_insert_nodes).
| /// Bulk-insert edge rows using chunked multi-value INSERT statements. | ||
| /// Returns `true` on success, `false` on failure. |
There was a problem hiding this comment.
Errors silently swallowed in bulk insert methods
Both bulk_insert_nodes and bulk_insert_edges convert Err results to Ok(false) via .is_ok(), permanently discarding the rusqlite error. The TypeScript caller falls back to the JS path on false, which handles correctness, but debugging a silent failure (e.g. a schema mismatch or WAL lock timeout) becomes very difficult.
Consider at minimum logging the error before dropping it:
Ok(insert_nodes::do_insert(conn, &batches, &file_hashes, &removed_files)
.inspect_err(|e| eprintln!("[NativeDatabase] bulk_insert_nodes failed: {e}"))
.is_ok())The same pattern applies to bulk_insert_edges at the corresponding line.
There was a problem hiding this comment.
Fixed — added .inspect_err(|e| eprintln!(...)) before .is_ok() in both bulk_insert_nodes and bulk_insert_edges. Errors will now be logged to stderr before being converted to false, making silent failures diagnosable.
| const filesToPurge = [...ctx.removed, ...changePaths]; | ||
| // Prefer NativeDatabase persistent connection for purge (6.15) | ||
| if (ctx.nativeDb?.purgeFilesData) { | ||
| ctx.nativeDb.purgeFilesData(filesToPurge, false); | ||
| } else { | ||
| purgeFilesFromGraph(db, filesToPurge, { purgeHashes: false }); | ||
| } |
There was a problem hiding this comment.
Split atomicity between
purgeFilesData and reverse-dep edge deletions
purgeFilesData commits its own transaction via the persistent NativeDatabase connection, while the immediately following reverse-dependency edge purge (lines 337–341) still runs through the better-sqlite3 db handle:
const deleteOutgoingEdgesForFile = db.prepare(
'DELETE FROM edges WHERE source_id IN (SELECT id FROM nodes WHERE file = ?)',
);These are two sequential writes to the same SQLite file through two different connections. If the process is interrupted between the purgeFilesData commit and the deleteOutgoingEdgesForFile.run() calls, the database will contain dangling edges pointing to already-purged nodes. This atomicity gap was pre-existing (the old purgeFilesFromGraph was also separate from the reverse-dep loop), so this isn't a regression, but it's worth noting as a future hardening opportunity — the reverse-dep edge deletion could be folded into purgeFilesData to give true atomicity.
There was a problem hiding this comment.
Agreed this is a pre-existing atomicity gap. Created tracking issue #670 to fold the reverse-dep edge deletion into purgeFilesData for true single-transaction atomicity in a follow-up.
|
Addressed all review feedback: Greptile (3 comments):
Merge conflicts: Resolved 3 files ( |
Summary
NativeDatabase's persistent rusqlite connection, eliminating per-call connection overhead from phases 6.9–6.12insert_nodes,edges_db,ast_db,roles_db) to exposepub(crate)functions accepting&Connectionwithunchecked_transaction(), enabling zero-duplication reuse by both standalone#[napi]exports andNativeDatabasemethodsctx.nativeDbthroughEngineOptsand all pipeline stages with a 3-tier fallback chain: persistentNativeDatabase→ standalone native function → JS/better-sqlite3Changes by layer
Rust (
crates/codegraph-core/):native_db.rs— 6 new#[napi]methods:bulk_insert_nodes,bulk_insert_edges,bulk_insert_ast_nodes,classify_roles_full,classify_roles_incremental,purge_files_datainsert_nodes.rs,edges_db.rs,ast_db.rs,roles_db.rs— refactored topub(crate)+&Connection+unchecked_transaction()TypeScript (
src/):types.ts— extendedNativeDatabaseinterface with 6 new methods; addednativeDbtoEngineOptspipeline.ts— threadsctx.nativeDbintoengineOptsinsert-nodes.ts,build-edges.ts,build-structure.ts,detect-changes.ts,ast.ts— each stage now prefersctx.nativeDbmethods over standalone native callsTest plan
#[napi]method signatures.nodebinary with new methodscodegraph buildwith--engine nativeconfirms persistent connection path