Skip to content

fix(udf): l2_distance returns L2sq to match USearch and DuckDB#8

Merged
anoop-narang merged 1 commit intomainfrom
fix/l2-distance-remove-sqrt
Mar 18, 2026
Merged

fix(udf): l2_distance returns L2sq to match USearch and DuckDB#8
anoop-narang merged 1 commit intomainfrom
fix/l2-distance-remove-sqrt

Conversation

@anoop-narang
Copy link
Collaborator

Summary

  • Remove .sqrt() from l2_kernel so l2_distance UDF returns squared L2, matching USearch's MetricKind::L2sq and DuckDB VSS's array_distance
  • Previously the UDF returned actual L2 (with sqrt) while all rewritten paths returned L2sq, causing the same query to produce different numeric distance values depending on whether the optimizer fired
  • Update README distance metrics documentation

Test plan

  • cargo fmt --check
  • cargo clippy -- -D warnings
  • cargo test — all 32 tests pass (no tests assert on specific distance values, only ordering)

claude[bot]
claude bot previously approved these changes Mar 18, 2026
Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One stale doc comment not covered by the diff: src/lib.rs:96 still says — Euclidean distance (L2). Should be updated to — squared L2 distance (L2sq, no sqrt) to match the new semantic. Non-blocking.

claude[bot]
claude bot previously approved these changes Mar 18, 2026
.map(|(x, y)| (x - y) * (x - y))
.sum::<f32>()
.sqrt()
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (non-blocking): No comment here explaining that the sqrt omission is intentional. A future maintainer may add .sqrt() back thinking it's a missing step, silently reintroducing the inconsistency. Consider:

Suggested change
}
// Returns L2sq (no sqrt) — matches USearch MetricKind::L2sq and keeps numeric
// values consistent between the UDF path and the optimizer-rewritten index path.
}

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Issues

P1 — No numeric regression test (tests/execution.rs)

The entire motivation for this PR is fixing a numeric inconsistency (UDF returned L2, index returned L2sq). However, the test suite only asserts on ordering, not on actual distance values. This means the fix has zero regression coverage: if someone adds .sqrt() back to l2_kernel, every test will still pass.

A minimal test would cover the gap — e.g., using a non-trivial query where L2 ≠ L2sq:

// [3,4,0,0] vs [0,0,0,0] → L2sq = 25.0, L2 = 5.0
// Assert the returned value is 25.0, not 5.0
SELECT l2_distance(vector, ARRAY[3.0::float, 4.0::float, 0.0::float, 0.0::float]) AS d
FROM items WHERE id = 3

Without this, the consistency guarantee introduced by the fix is not tested.

Action Required

Add at least one test that asserts a specific numeric value for l2_distance so the squared-L2 semantics are regression-protected.

l2_distance UDF was computing actual L2 (with sqrt) while USearch and
the rewritten execution paths all use L2sq (no sqrt). This caused the
same query to return different numeric distance values depending on
whether the optimizer rewrote it.

Remove sqrt to match USearch's MetricKind::L2sq and DuckDB VSS's
array_distance behavior. All paths now return consistent L2sq values.
@anoop-narang anoop-narang force-pushed the fix/l2-distance-remove-sqrt branch from b78963f to 04cfd4d Compare March 18, 2026 07:14
@anoop-narang anoop-narang merged commit 3f303d3 into main Mar 18, 2026
5 checks passed
@anoop-narang anoop-narang deleted the fix/l2-distance-remove-sqrt branch March 18, 2026 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant