chore: add benchmark npm script and stale embeddings warning#604
chore: add benchmark npm script and stale embeddings warning#604carlos-alm merged 5 commits intomainfrom
Conversation
Add `npm run benchmark` script to make benchmark execution discoverable instead of requiring manual `node --import ./scripts/ts-resolve-loader.js` invocation. Warn users when embeddings predate the last graph rebuild so they know to re-run `codegraph embed` for fresh search results. Impact: 1 functions changed, 8 affected
|
Claude finished @carlos-alm's task —— View job PR Review CompleteTodo List:
Phase 1: Root Cause Analysis ✓Problem Being Solved: This PR addresses two developer experience gaps:
Why This Approach: Both solutions are straightforward and appropriate:
Risk Assessment: Low risk - purely additive changes with no behavioral modifications to existing functionality Phase 2: Configuration Change Detection ✓Analysis: No high-risk configuration changes detected:
Phase 3: Test Integrity Assessment ✓Analysis: No test integrity concerns:
Phase 4: Code Quality Review ✓Code Analysis: package.json:46 - The benchmark script addition is clean and follows existing patterns: "benchmark": "node --experimental-strip-types --import ./scripts/ts-resolve-loader.js scripts/benchmark.ts"✅ Matches the existing Node.js flag usage pattern from other scripts src/domain/graph/builder/stages/finalize.ts:86-112 - Stale embeddings warning implementation:
No issues identified - the implementation is well-structured and follows the established codebase patterns. Phase 5: Backlog Compliance Rating ✓Root Cause AnalysisProblem Being Solved: Two developer experience improvements - discoverable benchmark execution and stale embeddings awareness Backlog Compliance
Critical ConcernsNone identified. This is a clean, low-risk developer experience improvement. Final Recommendation
|
Greptile SummaryThis PR makes two targeted improvements: it adds a The finalize changes correctly address both concerns raised in the prior review cycle:
One note on the warning semantics: Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Finalize
participant DB
User->>Finalize: codegraph build
Finalize->>Finalize: capture buildNow timestamp
Finalize->>DB: getBuildMeta — reads previous build counts
Finalize->>Finalize: incremental drift detection
Finalize->>DB: setBuildMeta with current built_at
Note over Finalize,DB: Moved early so downstream<br/>stale check sees current build time
Finalize->>DB: count orphaned embeddings
alt orphaned count is positive
Finalize->>User: warn orphaned embeddings
end
Finalize->>DB: query embedding_meta built_at
alt embedTime exists and precedes buildNow
Finalize->>User: warn stale embeddings
end
Finalize->>DB: closeDb
Finalize->>User: build complete
Reviews (2): Last reviewed commit: "Merge remote-tracking branch 'origin/cho..." | Re-trigger Greptile |
| if (embedTime < now && !Number.isNaN(embedTime)) { | ||
| const prevBuildAt = getBuildMeta(db, 'built_at'); | ||
| if (prevBuildAt) { | ||
| const prevBuildTime = new Date(prevBuildAt).getTime(); | ||
| if (embedTime < prevBuildTime) { | ||
| warn( | ||
| 'Embeddings were built before the last graph rebuild. Run "codegraph embed" to update.', | ||
| ); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Stale warning fires one build cycle too late
The comparison uses prevBuildAt — which is the previous build's built_at, because setBuildMeta (line 141) hasn't run yet for the current build. This means the warning fires one complete rebuild cycle after embeddings actually become stale.
Concrete example:
codegraph build→built_at= T0 stored in metadatacodegraph embed→embedding_meta.built_at= T1 (T1 > T0)codegraph build(first rebuild after embed) →prevBuildAt= T0,embedTime= T1,T1 < T0is false → no warning (but embeddings ARE now stale relative to this rebuild)codegraph build(second rebuild) →prevBuildAt= T2 > T1 → warning fires ✓
This also explains why the test plan step — "run codegraph embed, then codegraph build --no-incremental, confirm warning appears" — would not produce a warning on the first try unless there was already a completed build after the embed in the stored metadata.
The fix is to compare embedTime against now (the current build's effective timestamp) rather than prevBuildAt. The now variable is already available but currently only used as a guard that is always true:
if (embedBuiltAt) {
const embedTime = new Date(embedBuiltAt).getTime();
if (!Number.isNaN(embedTime)) {
const now = Date.now();
if (embedTime < now) {
warn(
'Embeddings were built before the last graph rebuild. Run "codegraph embed" to update.',
);
}
}
}Note: with this approach, prevBuildAt is no longer needed. If the concern is false-positives (always warning), consider storing a build-start timestamp in ctx and comparing against that instead, or move the setBuildMeta call for built_at to the top of finalize so prevBuildAt reflects the current build.
There was a problem hiding this comment.
Fixed in 4157d7b. Moved setBuildMeta before the stale-embeddings check so built_at in the DB reflects the current build timestamp when the comparison runs. This eliminates the off-by-one: on the first rebuild after an embed, the warning now fires immediately.
Additionally captured a single buildNow = new Date() at the top of finalize and use it for both the built_at metadata write and the stale comparison, ensuring consistency.
| const embedTime = new Date(embedBuiltAt).getTime(); | ||
| const now = Date.now(); | ||
| if (embedTime < now && !Number.isNaN(embedTime)) { |
There was a problem hiding this comment.
Redundant guard and inverted NaN-check order
embedTime < now is effectively always true for any real timestamp (it only fails for future-dated or NaN values), so it adds no filtering. Additionally, the !Number.isNaN(embedTime) guard should precede the comparison rather than follow it — evaluating NaN < now is safe (it returns false) but the intent reads incorrectly.
| const embedTime = new Date(embedBuiltAt).getTime(); | |
| const now = Date.now(); | |
| if (embedTime < now && !Number.isNaN(embedTime)) { | |
| const embedTime = new Date(embedBuiltAt).getTime(); | |
| if (!Number.isNaN(embedTime)) { |
There was a problem hiding this comment.
Fixed in 4157d7b. The NaN check now precedes the comparison (), and the redundant \ guard is replaced with a direct comparison against \ (the current build's wall-clock timestamp captured at the top of finalize). The old always-true \ pattern is gone.
- Move setBuildMeta before stale-embeddings check so built_at reflects the current build, fixing the off-by-one where the warning fired one rebuild cycle late - Fix NaN-check order: validate embedTime before comparing - Remove redundant embedTime < now guard (always true for real timestamps) - Use single buildNow timestamp for both metadata and comparison Impact: 1 functions changed, 8 affected
…' into fix/review-pr-604 Impact: 16 functions changed, 52 affected
Summary
npm run benchmarkscript topackage.json— makes benchmark execution discoverable instead of requiring manualnode --experimental-strip-types --import ./scripts/ts-resolve-loader.js scripts/benchmark.tsinvocationcodegraph embedfor fresh search resultsTest plan
npm run buildpassesnpx biome checkpasses on modified filesnpm run benchmarklaunches correctlycodegraph embed, thencodegraph build --no-incremental, confirm warning appears