Skip to content

Kheiss/readme quickstart#1492

Merged
sosahi merged 36 commits intoNVIDIA:mainfrom
kheiss-uwzoo:kheiss/readme-quickstart
Mar 9, 2026
Merged

Kheiss/readme quickstart#1492
sosahi merged 36 commits intoNVIDIA:mainfrom
kheiss-uwzoo:kheiss/readme-quickstart

Conversation

@kheiss-uwzoo
Copy link
Collaborator

@kheiss-uwzoo kheiss-uwzoo commented Mar 5, 2026

Pull Request Summary: NeMo Retriever README — NVIDIA Style Guide and PRD Alignment

Overview

Updates to nemo_retriever/README.md to align with the NVIDIA Writing Style Guide and the Ingest 2.0 PRD for NeMo Retriever Library, covering voice and tone, formatting, links, acronyms, structure, and naming/positioning.


Changes

Voice and Tone (PACE)

  • Contractions: Replaced "it is" with "it's" for a more conversational tone.
  • Latinisms: Replaced "via" with "through" (per guidance to prefer "by" or "through" instead of "via").

Acronyms and First Use

  • RAG: First use now spelled out as "retrieval-augmented generation (RAG) ingestion pipeline."
  • OCR: First use now spelled out as "If optical character recognition (OCR) fails…".

Links

Replaced generic or bare link text with descriptive text that matches the destination (avoiding "here," "read more," and raw URLs):

  • [docs.nvidia](...benchmarking/)NeMo Retriever extraction benchmarking documentation
  • [docs.nvidia](...extraction/audio/)NeMo Retriever audio extraction documentation
  • [docs.nvidia](...25.6.3/extraction/audio/)NeMo Retriever audio extraction documentation (25.6.3)
  • [docs.nvidia](...ray.html)NeMo Ray run guide
  • [huggingface](...parakeet-ctc-1.1b)Parakeet CTC 1.1B model on Hugging Face
  • [discuss.ray](...)Connecting to a remote Ray cluster on Kubernetes
  • [cohesity](...)How Cohesity uses NVIDIA NeMo Retriever microservices to improve RAG AI retrieval recall (Cohesity blog)

Formatting and Structure

  • Wrapped the LD_LIBRARY_PATH example in a fenced code block and introduced it with a full sentence and colon.
  • Removed a duplicate "Quick end‑to‑end test" section (repeated ## 8 heading, explanatory paragraph, and extra horizontal rule).
  • Added a missing comma in the leading sentence: "For example, the following command…".
  • Fixed a double space in "In this step, you uninstall" to a single space after "you".

PRD Conformance

Conformance: Yes (with 2 fixes applied)

The README was checked against the Ingest 2.0 PRD (NeMo Retriever Library). It already matched the PRD’s naming and positioning; two small fixes were made so it fully conforms.

Fixes Applied

  • Python package name (PRD: nemo_retriever):
    • Was: “installs the nemoretriever Python package”
    • Now: “installs the nemo_retriever Python package”
    • PRD specifies the Python import as nemo_retriever (lowercase, underscore).
  • Typo in file extension:
    • Was: “used for .txt ingestion” (with a trailing space)
    • Now: “used for .txt ingestion”

What Already Matched the PRD

  • Product name: “NeMo Retriever Library” (title case, space‑separated) used consistently.
  • GitHub repo: NVIDIA/NeMo-Retriever referenced correctly.
  • PyPI: nemo-retriever (lowercase, hyphenated) in install commands.
  • Python: nemo_retriever in paths and module references (aside from the one nemoretriever fix).
  • No legacy names: no nv-ingest, nv_ingest, or abbreviations like nr / nemo-ret.
  • Hyphens vs underscores: external identifiers use hyphens (nemo-retriever); Python/internal use underscores (nemo_retriever).
  • Scope: library‑first install, Ray, NIM, HuggingFace, PDF/HTML/text/audio, LanceDB, and the benchmark harness all reflected as in the PRD.

Optional Follow-up

  • Benchmark CLI naming: the PRD says the benchmark CLI is nemo-retriever-bench, while the README currently documents retriever harness (for example, retriever harness run).
    • If the shipped command is actually nemo-retriever-bench, add a short note or adjustment in the benchmark section.
    • If retriever harness is the correct, shipped interface, the README is fine as‑is.

A short conformance report was saved at docs/README_PRD_Conformance.md for reference and for inclusion in the PR if desired.


Files Changed

  • nemo_retriever/README.md

Reference

  • NVIDIA Style Guide (April 2025): Voice and Tone, Links, Abbreviations and Acronyms, Latinisms, Formatting, Technical Content.
  • Ingest 2.0 PRD (NeMo Retriever Library): naming, positioning, and scope.

@kheiss-uwzoo kheiss-uwzoo added the doc Improvements or additions to documentation label Mar 5, 2026
Copy link
Collaborator Author

@kheiss-uwzoo kheiss-uwzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed title and opening paragraph

@kheiss-uwzoo kheiss-uwzoo marked this pull request as ready for review March 5, 2026 20:05
@kheiss-uwzoo kheiss-uwzoo requested a review from a team as a code owner March 5, 2026 20:05
Changed opening paragraph to be more specific
Copy link
Collaborator Author

@kheiss-uwzoo kheiss-uwzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated opening

## Prerequisites
This quick start guide shows how to run NeMo Retriever in **library mode**, directly from your application, without Docker. In library mode, NeMo Retriever Library supports two deployment options:
- Load Hugging Face models locally on your GPU.
- Use locally deployed NeMo Retriever NIM endpoints for embedding and OCR.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not just for embedding and OCR. It includes all the NIMs (page-elements, OCR, embedding, by default, and graphic-elements and table-structure optionally)
@edknv and @ChrisJar to confirm this statement.

kheiss-uwzoo and others added 11 commits March 6, 2026 07:11
Co-authored-by: nkmcalli <nkmcalli@yahoo.com>
Co-authored-by: nkmcalli <nkmcalli@yahoo.com>
Co-authored-by: nkmcalli <nkmcalli@yahoo.com>
Co-authored-by: nkmcalli <nkmcalli@yahoo.com>
Co-authored-by: nkmcalli <nkmcalli@yahoo.com>
Co-authored-by: nkmcalli <nkmcalli@yahoo.com>
Co-authored-by: nkmcalli <nkmcalli@yahoo.com>
updated per Nicole's review
Fixed formatting per Nicole's review
Updated formatting per Nicole's review
changed from step to procedure
Copy link
Collaborator Author

@kheiss-uwzoo kheiss-uwzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated per Nicole review comments

@kheiss-uwzoo kheiss-uwzoo requested a review from nkmcalli March 6, 2026 16:23
@kheiss-uwzoo kheiss-uwzoo requested a review from sosahi March 9, 2026 18:16
@sosahi sosahi merged commit 549b2d9 into NVIDIA:main Mar 9, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants