Skip to content

Add full page context expansion with neighboring page support#338

Merged
shubhadeepd merged 11 commits intodevelopfrom
dev/pranjald/page-context-expansion
Mar 31, 2026
Merged

Add full page context expansion with neighboring page support#338
shubhadeepd merged 11 commits intodevelopfrom
dev/pranjald/page-context-expansion

Conversation

@nv-pranjald
Copy link
Copy Markdown
Collaborator

@nv-pranjald nv-pranjald commented Feb 11, 2026

Context Expansion support for pdf files, this will fetch content of entire page and send it to llm for generation.

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • All commits are signed-off (git commit -s) and GPG signed (git commit -S).
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

Summary by CodeRabbit

  • New Features
    • Added page-context expansion: retrieve all chunks for matched pages when enabled
    • Added neighboring-page retrieval: optionally include pages before/after retrieved results
    • Introduced page-based context organization for improved response structure and grouping

@nv-pranjald nv-pranjald self-assigned this Feb 11, 2026
@nv-pranjald nv-pranjald added the enhancement New feature or request label Feb 11, 2026
@nv-pranjald nv-pranjald marked this pull request as draft February 11, 2026 07:51
@nv-pranjald nv-pranjald force-pushed the dev/pranjald/page-context-expansion branch from da27ea4 to c18624c Compare February 11, 2026 07:52
Comment thread src/nvidia_rag/rag_server/vlm.py Dismissed
@nv-pranjald
Copy link
Copy Markdown
Collaborator Author

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 13, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 13, 2026

Warning

.coderabbit.yaml has a parsing error

The CodeRabbit configuration file in this repository has a parsing error and default settings were used instead. Please fix the error(s) in the configuration file. You can initialize chat with CodeRabbit to get help with the configuration file.

💥 Parsing errors (1)
Validation error: Invalid regex pattern for base branch. Received: "release-**" at "reviews.auto_review.base_branches[2]"
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Walkthrough

This PR adds page-context expansion capabilities to the RAG pipeline, enabling retrieval of full pages and neighboring pages alongside initial retrievals. Changes include new environment variables, configuration with validation, backend VDB filtering methods, main RAG logic for page extraction and expansion, VLM page-organized content assembly, and comprehensive tests for the new functionality.

Changes

Cohort / File(s) Summary
Configuration & Environment
deploy/compose/docker-compose-rag-server.yaml, src/nvidia_rag/utils/configuration.py
Added APP_FETCH_FULL_PAGE_CONTEXT and APP_FETCH_NEIGHBORING_PAGES environment variables with field validators enforcing that neighboring-pages expansion requires full-page context.
RAG Pipeline Core
src/nvidia_rag/rag_server/main.py
Major additions to support page-context expansion: new parameters propagated through generate() and _rag_chain(), helper methods for page extraction/expansion/formatting, VDB re-fetching of missing chunks, page-aware context formatting, and extensive logging for retrieved pages and context structure.
API & Server Layer
src/nvidia_rag/rag_server/server.py
Added Prompt model fields fetch_full_page_context and fetch_neighboring_pages with CONFIG defaults, threaded through /generate endpoint to backend RAG.generate() call.
VLM Integration
src/nvidia_rag/rag_server/vlm.py
Added organize_by_page mode to extract_and_process_messages and stream_with_messages to interleave text/images per page; new helpers for building page-organized content_parts, extracting images, and logging content structure.
Vector Database Layer
src/nvidia_rag/utils/vdb/elasticsearch/es_queries.py, src/nvidia_rag/utils/vdb/elasticsearch/elastic_vdb.py, src/nvidia_rag/utils/vdb/milvus/milvus_vdb.py, src/nvidia_rag/utils/vdb/vdb_base.py
Added abstract retrieve_chunks_by_filter() method and implementations in Elasticsearch and Milvus backends to fetch chunks by source name and page numbers; added Elasticsearch query builder for filtering by source and pages.
Unit Tests
tests/unit/test_rag_server/test_page_context_organization.py, tests/unit/test_utils/test_configuration.py
Comprehensive test coverage for page extraction, expansion, formatting helpers; validation tests for configuration cross-field rules.

Sequence Diagram

sequenceDiagram
    actor Client
    participant Server
    participant RAG as RAG Pipeline
    participant VDB as Vector Database
    participant VLM as Vision Language Model

    Client->>Server: POST /generate (fetch_full_page_context, fetch_neighboring_pages)
    Server->>RAG: generate(prompt, fetch_full_page_context, fetch_neighboring_pages)
    RAG->>VDB: semantic_search (initial retrieval)
    VDB-->>RAG: initial_documents
    RAG->>RAG: _extract_page_set_from_docs()
    RAG->>RAG: _expand_page_set_with_neighbors()
    RAG->>VDB: retrieve_chunks_by_filter(source, expanded_pages)
    VDB-->>RAG: full_page_chunks
    RAG->>RAG: _expand_and_organize_context()
    RAG->>RAG: _format_context_by_page() or existing formatter
    RAG->>VLM: stream_with_messages(docs, organize_by_page)
    VLM->>VLM: _build_content_parts_by_page()
    VLM->>VLM: _extract_images_from_docs()
    VLM-->>Client: streaming response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

  • Integrate nemotron-nano-12b-v2-vl VLM with RAG #73: Modifies VLM message-streaming integration (extract_and_process_messages, stream_with_messages signatures) used by the RAG generate/_rag_chain paths, creating a code-level dependency with this PR's VLM changes.

Suggested reviewers

  • smasurekar
  • nv-nikkulkarni
  • shubhadeepd

Poem

🐰 A rabbit's ode to expanding pages:

Hops through chapters, page by page,
Neighboring neighbors join the stage,
Full context gathered, no chunk amiss,
VLM vision arranged like this—
Long-eared logic, wisely sage! 📖✨

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (34 files):

⚔️ .github/workflows/ci-pipeline.yml (content)
⚔️ README.md (content)
⚔️ deploy/compose/docker-compose-rag-server.yaml (content)
⚔️ docs/api-ingestor.md (content)
⚔️ docs/api-rag.md (content)
⚔️ docs/assets/arch_diagram.png (content)
⚔️ docs/change-model.md (content)
⚔️ docs/deploy-docker-self-hosted.md (content)
⚔️ docs/deploy-helm.md (content)
⚔️ docs/mig-deployment.md (content)
⚔️ docs/multi-collection-retrieval.md (content)
⚔️ docs/observability.md (content)
⚔️ docs/python-client.md (content)
⚔️ docs/release-notes.md (content)
⚔️ docs/support-matrix.md (content)
⚔️ docs/text_only_ingest.md (content)
⚔️ docs/troubleshooting.md (content)
⚔️ docs/user-interface.md (content)
⚔️ examples/nvidia_rag_mcp/mcp_server.py (content)
⚔️ examples/rag_react_agent/pyproject.toml (content)
⚔️ examples/rag_react_agent/uv.lock (content)
⚔️ notebooks/launchable.ipynb (content)
⚔️ notebooks/nat_mcp_integration.ipynb (content)
⚔️ src/nvidia_rag/rag_server/main.py (content)
⚔️ src/nvidia_rag/rag_server/server.py (content)
⚔️ src/nvidia_rag/rag_server/vlm.py (content)
⚔️ src/nvidia_rag/utils/configuration.py (content)
⚔️ src/nvidia_rag/utils/vdb/elasticsearch/elastic_vdb.py (content)
⚔️ src/nvidia_rag/utils/vdb/elasticsearch/es_queries.py (content)
⚔️ src/nvidia_rag/utils/vdb/milvus/milvus_vdb.py (content)
⚔️ src/nvidia_rag/utils/vdb/vdb_base.py (content)
⚔️ tests/integration/README.md (content)
⚔️ tests/unit/test_utils/test_configuration.py (content)
⚔️ uv.lock (content)

These conflicts must be resolved before merging into develop.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 86.27% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title accurately and specifically describes the main change: adding full page context expansion with neighboring page support, which is the primary objective reflected across all modified files.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev/pranjald/page-context-expansion

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/nvidia_rag/rag_server/vlm.py (1)

706-717: ⚠️ Potential issue | 🟠 Major

analyze_with_messages lacks the organize_by_page parameter present in stream_with_messages.

stream_with_messages (line 753) accepts and passes organize_by_page to extract_and_process_messages, but analyze_with_messages (line 657) does not have this parameter at all. Since extract_and_process_messages supports organize_by_page, the non-streaming VLM path cannot utilize per-page organization regardless of caller intent.

Should analyze_with_messages also accept and forward the organize_by_page parameter for API consistency?

src/nvidia_rag/rag_server/server.py (1)

1355-1393: ⚠️ Potential issue | 🟡 Minor

New fields missing from request_data logging dict.

All other Prompt fields are logged in request_data, but fetch_full_page_context and fetch_neighboring_pages are omitted. This makes debugging page-context issues harder.

Proposed fix
         "filter_expr": prompt.filter_expr,
         "confidence_threshold": prompt.confidence_threshold,
+        "fetch_full_page_context": prompt.fetch_full_page_context,
+        "fetch_neighboring_pages": prompt.fetch_neighboring_pages,
     }
🤖 Fix all issues with AI agents
In `@src/nvidia_rag/rag_server/main.py`:
- Around line 3468-3492: The loop computing grouped keys contains an unused
`filename` variable; replace that variable with `_` (or remove its assignment)
so only the intended loop variables `(filename, source_path, page_num, doc)` are
used for grouping, and remove the redundant filename recomputation earlier to
avoid confusion. In the `keys_sorted` iteration use the computed `filename` only
once when creating `marker = f"=== Page {page_num} ({filename}) ===\n"` (keep
the existing os.path.splitext/os.path.basename expression there) and delete the
earlier unused `filename` assignment. For the `no_page` branch, append a single
combined string (e.g., `"=== Additional context ===\n" +
"\n\n".join(format_fn(d) for d in no_page)`) instead of two separate parts so
formatting matches the `has_page` path and prevents an extra "\n\n" from
appearing when joining `parts`. Ensure references to `grouped`, `keys_sorted`,
`format_fn`, `marker`, and `no_page` are updated accordingly.
- Around line 3365-3440: The _expand_and_organize_context method currently
performs synchronous VDB network calls (vdb_op.retrieve_chunks_by_filter) inside
the async RAG path which will block the event loop; update this by removing the
unused collection_names parameter from the signature, and change the fetch loop
to run retrieve_chunks_by_filter on a worker thread (use asyncio.to_thread or
submit to ThreadPoolExecutor) and schedule the per-(coll,source) fetches in
parallel (gather the tasks and then merge results), and replace the misleading
hasattr(vdb_op, "retrieve_chunks_by_filter") guard with a direct call wrapped in
try/except that explicitly catches NotImplementedError and logs warnings; keep
the existing dedup logic (doc_key, seen, merged) but perform deduping after
collected parallel fetch results before returning merged.

In `@src/nvidia_rag/rag_server/server.py`:
- Around line 583-593: Add a cross-field `@model_validator` to the Prompt pydantic
model that enforces the same rule as validate_page_context_options: if
fetch_neighboring_pages > 0 then fetch_full_page_context must be True; otherwise
raise a ValidationError (or ValueError) with a clear message. Place the
validator inside the Prompt class near the existing field definitions for
fetch_full_page_context and fetch_neighboring_pages and name/reference it
similarly (e.g., validate_page_context_options) so API inputs like
fetch_neighboring_pages=5, fetch_full_page_context=False are rejected.

In `@src/nvidia_rag/rag_server/vlm.py`:
- Around line 400-414: The method _build_content_parts_by_page currently ignores
the textual_context parameter and assigns q but never uses it; change the
human_template.format call (and any later question_text usage inside
_build_content_parts_by_page) to use the provided textual_context and the
trimmed q variable (q = (question_text or "").strip()) instead of empty strings
or raw question_text, so the preformatted context and normalized question are
respected; if you decide not to use textual_context here, remove the unused
parameter and q, otherwise update the caller to supply the textual_context
argument so the method can include it when building the intro and any question
formatting.

In `@src/nvidia_rag/utils/vdb/milvus/milvus_vdb.py`:
- Around line 1206-1221: The Milvus query builds filter_expr with substring
matching (like "%{source_name}%") and interpolates source_name raw, causing
inconsistent behavior with the Elasticsearch implementation (which uses exact
term match) and risk of syntax-breaking characters; update the code that
constructs filter_expr in milvus_vdb.py (the filter_expr variable used with
MilvusClient.query) to perform an exact equality check on source["source_name"]
(to mirror es_queries.py's term on metadata.source.source_name.keyword) and
properly escape/quote source_name before interpolation (e.g., ensure internal
quotes/backslashes are escaped and the value is wrapped in quotes) so the filter
string is safe and semantics match across backends.

In `@tests/unit/test_rag_server/test_page_context_organization.py`:
- Around line 198-227: The test test_fetch_full_page_context_calls_vdb should
use call_args.kwargs for clarity and strengthen the final assertion: replace
accessing mock_vdb.retrieve_chunks_by_filter.call_args[1] with
mock_vdb.retrieve_chunks_by_filter.call_args.kwargs to explicitly read keyword
arguments, and change the weak assertion assert len(result) >= 1 to assert
len(result) == 1 because the mock returns an empty list and deduplication in
rag._expand_and_organize_context should leave only the original document.
🧹 Nitpick comments (9)
src/nvidia_rag/utils/configuration.py (1)

773-801: Consider an upper bound for fetch_neighboring_pages.

The validator ensures non-negative values, but there's no upper bound. A very large value (e.g., 1000) would cause the system to attempt fetching thousands of pages per retrieved chunk, which could degrade performance or overwhelm the VDB. A reasonable cap (e.g., 10 or 20) would provide a safety net.

💡 Optional: Add an upper bound
     `@field_validator`("fetch_neighboring_pages")
     `@classmethod`
     def validate_fetch_neighboring_pages(cls, v: int) -> int:
         if not isinstance(v, int) or isinstance(v, bool):
             raise TypeError(
                 f"fetch_neighboring_pages must be an integer, got {type(v).__name__}"
             )
         if v < 0:
             raise ValueError(
                 f"fetch_neighboring_pages must be >= 0, got {v}"
             )
+        if v > 20:
+            raise ValueError(
+                f"fetch_neighboring_pages must be <= 20, got {v}"
+            )
         return v
src/nvidia_rag/utils/vdb/milvus/milvus_vdb.py (1)

1226-1235: Entity-to-Document conversion is duplicated with retrieval_image_langchain.

Lines 1226-1233 are nearly identical to the conversion loop in retrieval_image_langchain (lines 1180-1187). Consider extracting a shared helper method.

♻️ Optional: Extract helper
+    `@staticmethod`
+    def _entities_to_documents(entities: list[dict]) -> list[Document]:
+        """Convert Milvus entities to LangChain Document objects."""
+        docs: list[Document] = []
+        for item in entities:
+            page_content = item.get("text") or item.get("chunk") or ""
+            metadata = {
+                "source": item.get("source"),
+                "content_metadata": item.get("content_metadata", {}),
+            }
+            docs.append(Document(page_content=page_content, metadata=metadata))
+        return docs

Then use it in both methods:

docs = self._entities_to_documents(entities)
return self._add_collection_name_to_retreived_docs(docs, collection_name)
src/nvidia_rag/rag_server/vlm.py (2)

376-397: Silent exception swallowing hinders debugging.

The bare except Exception: continue at line 396 silently discards all errors during image extraction (e.g., MinIO connection failures, malformed payloads). Adding a debug-level log would help diagnose issues in production without cluttering normal output.

💡 Suggested improvement
-            except Exception:
+            except Exception as e:
+                logger.debug("Skipping image extraction for doc: %s", e)
                 continue

365-370: Nested ternary expression is hard to read.

The source_id extraction has a confusing double isinstance check:

source_id = (
    source_meta.get("source_id", "")
    or (source_meta.get("source_name", "") if isinstance(source_meta, dict) else "")
    if isinstance(source_meta, dict)
    else ""
)

The inner isinstance(source_meta, dict) check is redundant since the outer one already guards the entire expression.

♻️ Simplified version
-            source_id = (
-                source_meta.get("source_id", "")
-                or (source_meta.get("source_name", "") if isinstance(source_meta, dict) else "")
-                if isinstance(source_meta, dict)
-                else ""
-            )
+            source_id = (
+                (source_meta.get("source_id", "") or source_meta.get("source_name", ""))
+                if isinstance(source_meta, dict)
+                else ""
+            )
tests/unit/test_utils/test_configuration.py (1)

847-862: Good validation coverage. Consider adding a positive test case.

The negative validation paths are well tested. Consider adding a test for the valid configuration to ensure it doesn't raise:

💡 Suggested additional test
def test_fetch_full_page_context_with_neighboring_pages_valid(self):
    """Test that fetch_neighboring_pages > 0 with fetch_full_page_context=True is valid."""
    config = RetrieverConfig(
        fetch_full_page_context=True,
        fetch_neighboring_pages=2,
    )
    assert config.fetch_full_page_context is True
    assert config.fetch_neighboring_pages == 2
src/nvidia_rag/utils/vdb/elasticsearch/elastic_vdb.py (1)

952-996: LGTM — solid implementation matching the base class contract.

The method correctly handles empty page_numbers, caps the result size, and gracefully returns an empty list on errors. The document construction mirrors existing patterns in the class.

Two observations worth noting:

  1. Source matching inconsistency across backends: This ES implementation uses exact keyword term matching (term: {"metadata.source.source_name.keyword": source_name}), while the Milvus implementation uses substring matching (like "%{source_name}%"). These have different semantics—ES will only match exact source names, while Milvus will match source names containing the string. If source naming conventions differ across deployments, this could yield different retrieval results. Consider documenting whether this behavioral difference is intentional.

  2. Static analysis: except Exception (line 974) is broad; consider catching specific Elasticsearch exceptions if feasible. Also, logger.error on line 975 could be logger.exception to include the traceback automatically.

src/nvidia_rag/rag_server/main.py (3)

3389-3401: Dedup key truncates content at 300 chars — potential for false merges.

If two distinct chunks on the same page share the first 300 characters (e.g., repeated headers, table rows, or structured data), one will be silently dropped. Consider using a hash of the full content instead:

♻️ Safer dedup key
+        import hashlib
+
         def doc_key(d: Document) -> tuple[str, str, int, str]:
             meta = getattr(d, "metadata", {}) or {}
             content_md = meta.get("content_metadata", {}) or {}
             source = meta.get("source", {})
             source_path = (
                 source.get("source_name", "") if isinstance(source, dict) else source
             )
             coll = meta.get("collection_name", "")
             page_num = content_md.get("page_number", 0)
-            content_preview = (getattr(d, "page_content", "") or "")[:300]
-            return (str(coll), str(source_path), int(page_num), content_preview)
+            content_hash = hashlib.md5(
+                (getattr(d, "page_content", "") or "").encode()
+            ).hexdigest()
+            return (str(coll), str(source_path), int(page_num), content_hash)

3293-3319: Unused max_chars parameter.

max_chars (line 3297) is declared but never referenced in the method body. Remove it to keep the signature clean.

♻️ Proposed fix
     def _log_context_structure(
         self,
         context_str: str,
         prefix: str = "Context structure",
-        max_chars: int = 60,
     ) -> None:

3217-3291: Consider extracting shared metadata extraction logic.

Both _log_retrieved_pages and _log_expanded_context_layout (as well as _extract_page_set_from_docs, _format_context_by_page, and doc_key inside _expand_and_organize_context) repeat the same metadata extraction pattern — pulling content_metadata.page_number and source.source_name from nested doc.metadata. A small shared helper (e.g., _extract_doc_page_info(doc) -> tuple[str, str | None, int | None]) would reduce duplication across these five call sites.

Comment thread src/nvidia_rag/rag_server/main.py
Comment thread src/nvidia_rag/rag_server/main.py
Comment thread src/nvidia_rag/rag_server/server.py
Comment thread src/nvidia_rag/rag_server/vlm.py
Comment thread src/nvidia_rag/utils/vdb/milvus/milvus_vdb.py
Comment thread tests/unit/test_rag_server/test_page_context_organization.py
@nv-pranjald nv-pranjald force-pushed the dev/pranjald/page-context-expansion branch from c18624c to 37e356a Compare February 17, 2026 11:59
@nv-pranjald nv-pranjald force-pushed the dev/pranjald/page-context-expansion branch from 37e356a to 6e2a6f8 Compare March 6, 2026 07:00
@nv-pranjald nv-pranjald marked this pull request as ready for review March 10, 2026 10:07
@nv-pranjald nv-pranjald changed the title Add full page context expansion with neighboring page support for pag… Add full page context expansion with neighboring page support Mar 10, 2026
Comment thread deploy/compose/docker-compose-rag-server.yaml
@nv-pranjald nv-pranjald force-pushed the dev/pranjald/page-context-expansion branch from bb4a218 to a7c8a0e Compare March 24, 2026 13:34
@nv-pranjald nv-pranjald force-pushed the dev/pranjald/page-context-expansion branch 3 times, most recently from a4bb5bb to f3e5eed Compare March 30, 2026 12:55
@nv-pranjald nv-pranjald force-pushed the dev/pranjald/page-context-expansion branch from ca4914d to c70ac73 Compare March 30, 2026 13:13
@shubhadeepd shubhadeepd merged commit 9e4e0a8 into develop Mar 31, 2026
6 checks passed
@shubhadeepd shubhadeepd deleted the dev/pranjald/page-context-expansion branch March 31, 2026 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants