Expose max_distance as an optional setting in multi vector queries#478
Expose max_distance as an optional setting in multi vector queries#478justin-cechmanek merged 6 commits intomainfrom
Conversation
|
Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset. In case there are security findings, they will be communicated to you as a comment inside the PR. Hope you’ll enjoy using Jit. Questions? Comments? Want to learn more? Get in touch with us. |
There was a problem hiding this comment.
Pull request overview
This pull request adds support for configurable max_distance parameters in multi-vector queries, allowing users to specify different distance thresholds for each vector field. Previously, the max_distance was hardcoded to 2.0.
Changes:
- Added
max_distancefield to theVectorclass with default value of 2.0 and validation range [0.0, 2.0] - Modified
MultiVectorQueryto use per-vector max_distance values in VECTOR_RANGE queries - Changed query operator from
|(OR) toANDfor combining multiple vector range queries - Added comprehensive unit and integration tests for the new max_distance functionality
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| redisvl/query/aggregate.py | Added max_distance field to Vector class with validation; updated query building to use per-vector max_distance values; changed operator from OR to AND |
| tests/unit/test_aggregation_types.py | Added test coverage for max_distance parameter including validation tests and query string generation tests |
| tests/integration/test_aggregation.py | Added parameterized integration test to verify max_distance filtering behavior with various distance combinations |
| tests/integration/test_search_index.py | Minor formatting: added blank line for code style consistency |
| tests/integration/test_redis_cluster_support.py | Minor formatting: added blank lines for code style consistency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if not isinstance(max_distance, (float, int)): | ||
| raise ValueError("max_distance must be a value between 0.0 and 2.0") |
There was a problem hiding this comment.
The type check isinstance(max_distance, (float, int)) on line 43 is redundant because Pydantic's field validation already ensures the field is of type float. Additionally, this check will always pass since the field is declared as max_distance: float = 2.0. The more important validation is the range check on line 45, which should remain.
| if not isinstance(max_distance, (float, int)): | |
| raise ValueError("max_distance must be a value between 0.0 and 2.0") |
vishal-bala
left a comment
There was a problem hiding this comment.
LGTM! Just one note where you can use built-in Pydantic validation instead of custom
Co-authored-by: Vishal Bala <vishal-bala@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @field_validator("max_distance") | ||
| @classmethod | ||
| def validate_max_distance(cls, max_distance: float) -> float: | ||
| if not isinstance(max_distance, (float, int)): | ||
| raise ValueError("max_distance must be a value between 0.0 and 2.0") | ||
| if max_distance < 0.0 or max_distance > 2.0: | ||
| raise ValueError("max_distance must be a value between 0.0 and 2.0") | ||
| return max_distance |
There was a problem hiding this comment.
validate_max_distance currently rejects infinities but will still allow NaN (all comparisons to NaN are false, so it passes through). NaN would produce an invalid VECTOR_RANGE radius in the query string. Consider explicitly rejecting non-finite values (and also disallowing bool, since it’s an int subclass).
| f"@{field}:[VECTOR_RANGE {max_dist} $vector_{i}]=>{{$YIELD_DISTANCE_AS: distance_{i}}}" | ||
| ) | ||
|
|
||
| range_query = " | ".join(range_queries) | ||
| range_query = " AND ".join(range_queries) |
There was a problem hiding this comment.
This changes the multi-vector query from OR semantics (|) to AND semantics when combining vector range clauses. That’s a behavior change beyond “expose max_distance” and could be breaking for existing callers. If the AND behavior is intentional, it should be called out in the PR description/release notes (or consider preserving prior behavior via an option).
No description provided.