feat: Add normalized metadata reranking and query expansion (Fixes #40) by zohaib-7035 · Pull Request #94 · INCF/knowledge-space-agent

zohaib-7035 · 2026-03-16T21:05:26Z

Summary

This PR re-introduces the Metadata Reranking feature and Query Expansion requested in issue #40, but completely rewrites the mathematical scoring logic to strictly address the maintainer feedback regarding distorted search results and flat variable additions.

Changes

Log-Normalization: Applies math.log10() to citation counts to safely compress massive outliers (e.g. 10,000 citations).
Min-Max Scaling: Scores for years and citations are scaled proportionally from 0.0 to 1.0 based on the dataset batches.
Strict Bounded Multipliers: Instead of doing flat unrestricted addition (e.g. + 10.0), we apply max-capped multipliers: Citations (max 1.15x), Year (max 1.10x), Trusted Source (max 1.05x). This strictly restricts the absolute maximum possible metadata boost to + 30%.
Semantic score quality can no longer be "drowned out" by metadata. A weak vector match will remain weak, preserving semantic/keyword query relevance heavily.
Re-added the synonym mapping for Query Expansion (mouse brain, eeg, fmri, etc).
Included completely new strict Pytest unit tests in test_metadata_rerank.py mathematically proving the boundary multiplier limits are protected safely.

Test Output Proof (Passing Screenshots)

Notice how the multipliers are mathematically bounded and all Pytest boundaries pass cleanly!

============================= test session starts =============================
platform win32 -- Python 3.12.10, pytest-9.0.2, pluggy-1.6.0
collected 3 items

backend/tests/test_metadata_rerank.py::test_rerank_max_bounds PASSED     [ 33%]
backend/tests/test_metadata_rerank.py::test_rerank_log_normalization PASSED [ 66%]
backend/tests/test_metadata_rerank.py::test_rerank_empty_metadata_handling PASSED [100%]

============================== 3 passed in 0.28s ==============================

QuantumByte-01

The reranking logic is sound (bounded multipliers, log-normalization) and the test structure is good. Three issues to fix:

1. Factual error in QUERY_SYNONYMS (critical)

"mouse brain": ["Rattus norvegicus", ...]

Rattus norvegicus is rat, not mouse. Mouse is Mus musculus. Expanding a 'mouse brain' query to return rat datasets would give users wrong results. Fix the mapping.

2. expand_query is only wired into smart_knowledge_search
smart_knowledge_search is only called when filters are provided. The primary search paths (general_search, general_search_async) don't use expand_query. Either apply it consistently across all search entry points, or add a comment explaining why it is intentionally selective.

3. Float precision risk in tests

assert ranked[0]["_rerank_multiplier"] == 1.30  # can be 1.3000000000000003
assert ranked[0]["_score"] == 130.0

IEEE 754 float arithmetic (1.0 + 0.10 + 0.15 + 0.05) can produce 1.3000000000000003. Use pytest.approx:

assert ranked[0]["_rerank_multiplier"] == pytest.approx(1.30)
assert ranked[0]["_score"] == pytest.approx(130.0)

…pand_query consistently, use pytest.approx

zohaib-7035 · 2026-03-20T07:40:49Z

Hi @QuantumByte-01 , I’ve addressed all three requested changes:

Fixed QUERY_SYNONYMS replaced "Rattus norvegicus" with "Mus musculus" under the "mouse brain" key. Good catch!
Wired expand_query consistently across general_search(), general_search_async(), and smart_knowledge_search() so query expansion works everywhere.
Updated float assertions in test_metadata_rerank.py to use pytest.approx() for better handling of precision edge cases.

All tests are now passing. Ready for your re-review-let me know if anything else is needed. Thanks!

feat: re-implement metadata reranking with strict log and min-max bounds

36ab106

QuantumByte-01 mentioned this pull request Mar 19, 2026

fix(retrieval): resolve multiple bugs in retrieval.py (Fixes #74) #97

Merged

QuantumByte-01 requested changes Mar 19, 2026

View reviewed changes

fix: address PR INCF#94 review feedback - fix QUERY_SYNONYMS, wire ex…

e4852ba

…pand_query consistently, use pytest.approx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add normalized metadata reranking and query expansion (Fixes #40)#94

feat: Add normalized metadata reranking and query expansion (Fixes #40)#94
zohaib-7035 wants to merge 2 commits intoINCF:mainfrom
zohaib-7035:feature/metadata-rerank-v2

zohaib-7035 commented Mar 16, 2026

Uh oh!

QuantumByte-01 left a comment

Uh oh!

zohaib-7035 commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zohaib-7035 commented Mar 16, 2026

Summary

Changes

Test Output Proof (Passing Screenshots)

Uh oh!

QuantumByte-01 left a comment

Choose a reason for hiding this comment

Uh oh!

zohaib-7035 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zohaib-7035 commented Mar 20, 2026 •

edited

Loading