Skip to content

feat: Add normalized metadata reranking and query expansion (Fixes #40)#94

Open
zohaib-7035 wants to merge 2 commits intoINCF:mainfrom
zohaib-7035:feature/metadata-rerank-v2
Open

feat: Add normalized metadata reranking and query expansion (Fixes #40)#94
zohaib-7035 wants to merge 2 commits intoINCF:mainfrom
zohaib-7035:feature/metadata-rerank-v2

Conversation

@zohaib-7035
Copy link
Contributor

Summary

This PR re-introduces the Metadata Reranking feature and Query Expansion requested in issue #40, but completely rewrites the mathematical scoring logic to strictly address the maintainer feedback regarding distorted search results and flat variable additions.

Changes

  • Log-Normalization: Applies math.log10() to citation counts to safely compress massive outliers (e.g. 10,000 citations).
  • Min-Max Scaling: Scores for years and citations are scaled proportionally from 0.0 to 1.0 based on the dataset batches.
  • Strict Bounded Multipliers: Instead of doing flat unrestricted addition (e.g. + 10.0), we apply max-capped multipliers: Citations (max 1.15x), Year (max 1.10x), Trusted Source (max 1.05x). This strictly restricts the absolute maximum possible metadata boost to + 30%.
  • Semantic score quality can no longer be "drowned out" by metadata. A weak vector match will remain weak, preserving semantic/keyword query relevance heavily.
  • Re-added the synonym mapping for Query Expansion (mouse brain, eeg, fmri, etc).
  • Included completely new strict Pytest unit tests in test_metadata_rerank.py mathematically proving the boundary multiplier limits are protected safely.

Test Output Proof (Passing Screenshots)

Notice how the multipliers are mathematically bounded and all Pytest boundaries pass cleanly!

============================= test session starts =============================
platform win32 -- Python 3.12.10, pytest-9.0.2, pluggy-1.6.0
collected 3 items

backend/tests/test_metadata_rerank.py::test_rerank_max_bounds PASSED     [ 33%]
backend/tests/test_metadata_rerank.py::test_rerank_log_normalization PASSED [ 66%]
backend/tests/test_metadata_rerank.py::test_rerank_empty_metadata_handling PASSED [100%]

============================== 3 passed in 0.28s ==============================

Copy link
Collaborator

@QuantumByte-01 QuantumByte-01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reranking logic is sound (bounded multipliers, log-normalization) and the test structure is good. Three issues to fix:

1. Factual error in QUERY_SYNONYMS (critical)

"mouse brain": ["Rattus norvegicus", ...]

Rattus norvegicus is rat, not mouse. Mouse is Mus musculus. Expanding a 'mouse brain' query to return rat datasets would give users wrong results. Fix the mapping.

2. expand_query is only wired into smart_knowledge_search
smart_knowledge_search is only called when filters are provided. The primary search paths (general_search, general_search_async) don't use expand_query. Either apply it consistently across all search entry points, or add a comment explaining why it is intentionally selective.

3. Float precision risk in tests

assert ranked[0]["_rerank_multiplier"] == 1.30  # can be 1.3000000000000003
assert ranked[0]["_score"] == 130.0

IEEE 754 float arithmetic (1.0 + 0.10 + 0.15 + 0.05) can produce 1.3000000000000003. Use pytest.approx:

assert ranked[0]["_rerank_multiplier"] == pytest.approx(1.30)
assert ranked[0]["_score"] == pytest.approx(130.0)

…pand_query consistently, use pytest.approx
@zohaib-7035
Copy link
Contributor Author

zohaib-7035 commented Mar 20, 2026

Hi @QuantumByte-01 , I’ve addressed all three requested changes:

  • Fixed QUERY_SYNONYMS replaced "Rattus norvegicus" with "Mus musculus" under the "mouse brain" key. Good catch!
  • Wired expand_query consistently across general_search(), general_search_async(), and smart_knowledge_search() so query expansion works everywhere.
  • Updated float assertions in test_metadata_rerank.py to use pytest.approx() for better handling of precision edge cases.

All tests are now passing. Ready for your re-review-let me know if anything else is needed. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants