tests: broaden fixture corpus quality coverage by gaelic-ghost · Pull Request #17 · gaelic-ghost/SwiftlyFetch

gaelic-ghost · 2026-05-02T21:27:11Z

Summary

add synthetic near-miss and longer-body records to the Gutenberg fixture corpus
cover focused-vs-scattered all-term ranking and longer-body snippet selection in in-memory FetchKit tests
add SearchKit parity coverage for the same fixture behavior
reward tighter all-term evidence in the in-memory ranker so focused passages beat scattered matches
document why the synthetic fixture records exist

Verification

swift test --filter FixtureCorpusQualityTests
swift test --filter SearchKitFetchIndexTests/testSearchKitFetchIndexMatchesFixtureCorpusNearMissAndLongBodyBehavior
swift test
scripts/repo-maintenance/validate-all.sh

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0305735ba0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-02T21:29:48Z

+
+        while searchStart < lowercaseText.endIndex,
+              let range = lowercaseText.range(of: term, range: searchStart..<lowercaseText.endIndex) {
+            locations.append(lowercaseText.distance(from: lowercaseText.startIndex, to: range.lowerBound))


Avoid quadratic index-distance scans in term location loop

The new compactness scorer makes every .allTerms query walk each matching document body via termLocations, and this loop computes distance(from: startIndex, to:) for every hit. On long texts with frequent terms (for example, a common word appearing thousands of times), those repeated distance calculations accumulate to roughly O(n²) work per term, which can make in-memory search latency spike substantially compared with the previous constant-time scoring path. Converting the search text to a random-access representation once (or tracking offsets incrementally) avoids this regression.

Useful? React with 👍 / 👎.

tests: broaden fixture corpus quality coverage

0305735

gaelic-ghost added the tests label May 2, 2026

gaelic-ghost merged commit 532579a into main May 2, 2026
1 check passed

gaelic-ghost deleted the tests/broader-corpus-quality branch May 2, 2026 21:28

chatgpt-codex-connector Bot reviewed May 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tests: broaden fixture corpus quality coverage#17

tests: broaden fixture corpus quality coverage#17
gaelic-ghost merged 1 commit into
mainfrom
tests/broader-corpus-quality

gaelic-ghost commented May 2, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gaelic-ghost commented May 2, 2026

Summary

Verification

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant