Skip to content

(retriever) - Pin HuggingFace model revisions#1499

Merged
jdye64 merged 3 commits intoNVIDIA:mainfrom
jdye64:pin-hf-model-rev
Mar 6, 2026
Merged

(retriever) - Pin HuggingFace model revisions#1499
jdye64 merged 3 commits intoNVIDIA:mainfrom
jdye64:pin-hf-model-rev

Conversation

@jdye64
Copy link
Copy Markdown
Collaborator

@jdye64 jdye64 commented Mar 6, 2026

Pin HuggingFace model revisions to immutable commit SHAs

Problem

All 14 from_pretrained calls across the codebase were loading models from HuggingFace without specifying a revision, which defaults to the main branch. Any push to main on any of our upstream HF model repos could silently change what gets downloaded, potentially breaking inference at runtime.

Solution

Introduced a central model revision registry (hf_model_registry.py) that maps each HuggingFace model ID to a pinned git commit SHA. Every from_pretrained call now passes revision=get_hf_revision(model_id), locking downloads to an exact, immutable snapshot.

If a model ID isn't in the registry (e.g. a user-supplied custom model), get_hf_revision returns None, which preserves the default main branch behavior -- so no existing flexibility is lost.

Changes

New file:

  • nemo_retriever/src/nemo_retriever/utils/hf_model_registry.py -- single source of truth for all pinned model revisions

Updated files (14 from_pretrained calls pinned):

  • nemo_retriever/src/nemo_retriever/model/local/llama_nemotron_embed_1b_v2_embedder.py -- 2 calls (AutoTokenizer + AutoModel)
  • nemo_retriever/src/nemo_retriever/model/local/llama_nemotron_embed_vl_1b_v2_embedder.py -- 1 call (AutoModel)
  • nemo_retriever/src/nemo_retriever/model/local/nemotron_parse_v1_2.py -- 4 calls (AutoModel + AutoTokenizer + AutoProcessor + GenerationConfig)
  • nemo_retriever/src/nemo_retriever/model/local/parakeet_ctc_1_1b_asr.py -- 2 calls (AutoProcessor + AutoModelForCTC)
  • nemo_retriever/src/nemo_retriever/txt/split.py -- 1 call (AutoTokenizer)
  • api/src/nv_ingest_api/internal/transform/split_text.py -- 1 call (AutoTokenizer)
  • docker/scripts/post_build_triggers.py -- 1 call per model path (AutoTokenizer)

Pinned revisions

Model Commit SHA
nvidia/llama-3.2-nv-embedqa-1b-v2 cefc2394cc541737b7867df197984cf23f05367f
nvidia/parakeet-ctc-1.1b a707e818195cb97c8f7da2fc36b221a29f69a5db
nvidia/NVIDIA-Nemotron-Parse-v1.2 f42c8040b12ee64370922d108778ab655b722c5d
nvidia/llama-nemotron-embed-vl-1b-v2 859e1f2dac29c56c37a5279cf55f53f3e74efc6b
meta-llama/Llama-3.2-1B 4e20de362430cd3b72f300e6b0f18e50e7166e08
intfloat/e5-large-unsupervised 15af9288f69a6291f37bfb89b47e71abc747b206

How to bump a model version

Update the single SHA entry in hf_model_registry.py. All call sites will automatically pick up the new revision.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@jdye64 jdye64 requested a review from a team as a code owner March 6, 2026 18:40
@jdye64 jdye64 requested a review from edknv March 6, 2026 18:40
Copy link
Copy Markdown
Collaborator

@jioffe502 jioffe502 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Comment thread nemo_retriever/src/nemo_retriever/utils/hf_model_registry.py Outdated
@jdye64 jdye64 merged commit a2c6410 into NVIDIA:main Mar 6, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants