Fix relaunch re-OCR: 1 ms tolerance on mtime fingerprint check#5
Merged
Conversation
Verified in production against a 4,756-row index: ~1,785 files were being re-OCR'd on every relaunch because the fingerprint guard used exact `Date == Date`. After the round-trip `Date → timeIntervalSince1970 → SQLite REAL → Date(timeIntervalSince1970:)` a Double ULP at 2026 timestamps (~400 ns at 1.78e9 s) drifts the value below APFS's ns-resolution mtime, so a random subset of rows stopped matching themselves. The existing unit test for `fingerprint(for:)` already compared with `accuracy: 0.001`, which is the tolerance production needed all along. Route both Indexer call sites through a new `ScreenshotStore.isAlreadyIndexed(at:mtime:size:)` that applies a 1 ms mtime tolerance and an exact size match. Size still catches real edits (which almost always change byte count), so the slack on mtime doesn't mask genuine changes. No schema change; existing rows benefit immediately on next launch. Added a regression test that upserts a high-precision fractional mtime (`Date(timeIntervalSince1970: 1_776_864_366.2835145)`) and asserts `isAlreadyIndexed` returns `true` for the same `(mtime, size)` and `false` when size differs. Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: GitButler <gitbutler@gitbutler.com>
There was a problem hiding this comment.
Pull request overview
Fixes unnecessary re-OCR on relaunch by making the “already indexed” fingerprint check resilient to floating-point drift in mtimes after a SQLite REAL round-trip.
Changes:
- Added
ScreenshotStore.isAlreadyIndexed(at:mtime:size:)that compares mtime with ~1ms tolerance and size exactly. - Updated
Indexerto use the new helper at both fingerprint-check call sites. - Added a regression test covering high-precision fractional mtimes and size mismatch behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
Sources/VistaCore/ScreenshotStore.swift |
Introduces tolerance-based “already indexed” check to avoid exact Date equality pitfalls. |
Sources/VistaCore/Indexer.swift |
Routes incremental-scan and single-file skip logic through isAlreadyIndexed. |
Tests/VistaCoreTests/ScreenshotStoreTests.swift |
Adds regression coverage for mtime precision drift across SQLite round-trips. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Four bot-review follow-ups on the mtime tolerance fix: - Extract the 0.001 s literal into `ScreenshotStore.mtimeTolerance` so the value lives in one place and tests can reference it instead of re-declaring a magic number. - Switch the comparison from `drift < 0.001` to `drift <= Self.mtimeTolerance` so the doc's "1 ms tolerance" and the code agree on inclusive equality at exactly 1 ms. - Rewrite the "below APFS's ns-resolution mtime" phrase in both the `ScreenshotStore` doc comment and the regression test comment: a ~400 ns Double ULP is *coarser* than APFS's 1 ns resolution, which is what makes some timestamps unrepresentable as Double — the previous wording read backwards. - Added a drift-outside-tolerance assertion to the regression test using `mtimeTolerance * 10` as the epsilon (well beyond the ~400 ns round-trip slop, so deterministic). Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: GitButler <gitbutler@gitbutler.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
2972 already indexed, 1785 new file(s) to OCRon startup, despite no real changes to those files. Root cause is exactDate == Dateequality on mtime after a SQLite round-trip — a Double ULP at year-2026 timestamps (~400 ns) falls below APFS's nanosecond mtime resolution, so a random subset of rows stopped matching themselves bit-for-bit.ScreenshotStore.isAlreadyIndexed(at:mtime:size:)that compares mtime with 1 ms tolerance and size exactly. Size still catches genuine edits (byte count changes almost always), so the slack on mtime doesn't mask real changes. No schema change; the existing DB benefits immediately on next launch.Date(timeIntervalSince1970: 1_776_864_366.2835145)) and assertsisAlreadyIndexedreturnstruefor the same(mtime, size),falsewhen size differs. This is the exact shape of a filesystem mtime that was breaking equality in production.testFingerprintReturnsLastKnowncompares mtime withaccuracy: 0.001— the test tolerated what the indexer rejected.Test plan
swift test --parallel— 36 tests green including new regression4756 already indexed, 0 new file(s) to OCR(not2972 already indexed, 1785 new)N already indexed, 1 new)