Skip to content

Improve performance and readability of recall and precision measurements#647

Open
tlwillke wants to merge 1 commit intomainfrom
accuracy-metrics
Open

Improve performance and readability of recall and precision measurements#647
tlwillke wants to merge 1 commit intomainfrom
accuracy-metrics

Conversation

@tlwillke
Copy link
Collaborator

@tlwillke tlwillke commented Mar 19, 2026

Summary

This PR optimizes AccuracyMetrics to improve the performance and readability of recall and precision measurements. The primary focus is shifting from $O(N^2)$ list-scanning to $O(N)$ Set-based lookups and reducing object allocation overhead by interacting directly with primitive NodeScore arrays.

Key changes

Logic cleanup and "dead code" removal

  • Unused Signature Removal: Removed the redundant topKCorrect(List, List, ...) signature, consolidating logic into a single private method that operates directly on SearchResult.
  • Branch Elimination: Removed the unreachable if (gtView.size() > retrieved.size()) branch. Due to existing guard clauses (kGT <= kRetrieved and kRetrieved <= retrieved.size()), this condition was mathematically impossible.
  • Parity Preservation: Preserved specific, argument-focused exception messaging to ensure clear feedback for invalid k parameters.

Performance optimizations

  • Algorithmic Shift: Replaced $O(N)$ List.contains() calls with $O(1)$ HashSet.contains(). This moves the core intersection logic from quadratic to linear time complexity.
  • AP Nested Scan Removal: In averagePrecisionAtK, replaced the $O(i)$ subList(0, i).contains(p) duplicate check with a HashSet lookup. This transforms the AP calculation from an $O(K^2)$ operation to $O(K)$.
  • Allocation Reduction: Eliminated intermediate boxing and ArrayList creation in the SearchResult path. The logic now iterates directly over the NodeScore[] array, significantly reducing GC pressure during large benchmark runs.
  • Loop Efficiency: Replaced Stream API calls with manual for loops in high-frequency paths to avoid the object overhead and "setup tax" of the Stream API.
  • HashSet Pre-Sizing: Sets are now initialized with explicit capacities ($K / 0.75$) to prevent expensive internal rehashing.

Test coverage

  • Functional Validation: Verified Recall, AP, and MAP calculations against hand-calculated results to ensure mathematical parity with standard IR definitions.
  • Duplicate Handling: Confirmed that duplicate IDs in search results are handled via the "seen" Set, preventing artificial score inflation while maintaining $O(1)$ lookup speed.
  • Exception Parity: Explicitly verified that IllegalArgumentException strings remain 1:1 identical to the original implementation to prevent breaking downstream error-parsing.

Performance results

Benchmarks were conducted using 10,000 queries on the gecko-100k dataset (768 dimensions).

Metric Original Time Optimized Time Delta
recall@10 ~8ms ~same -
recall@100 ~58ms ~25ms ~57% reduction
AP@100 / MAP@100 ~62ms ~26ms ~58% reduction

While the benefit is negligible for $K=10$, the optimization becomes critical as $K$ increases. The reduction in AP measurement time is particularly significant as it eliminates the $O(K^2)$ complexity previously caused by nested sublist scans.

Notes / limitations

  • Autoboxing: While this PR eliminates intermediate List allocations, it still utilizes HashSet<Integer>. Further gains could be achieved using primitive-specific collections (like IntHashSet) if necessary in the future.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

Before you submit for review:

  • Does your PR follow guidelines from CONTRIBUTIONS.md?
  • Did you summarize what this PR does clearly and concisely?
  • Did you include performance data for changes which may be performance impacting?
  • Did you include useful docs for any user-facing changes or features?
  • Did you include useful javadocs for developer oriented changes, explaining new concepts or key changes?
  • Did you trigger and review regression testing results against the base branch via Run Bench Main?
  • Did you adhere to the code formatting guidelines (TBD)
  • Did you group your changes for easy review, providing meaningful descriptions for each commit?
  • Did you ensure that all files contain the correct copyright header?

If you did not complete any of these, then please explain below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant