Skip to content

fix: respect HF_HUB_OFFLINE in download_model to avoid network calls#614

Merged
joein merged 1 commit intoqdrant:mainfrom
amasolov:fix/respect-hf-hub-offline-in-download-model
Mar 23, 2026
Merged

fix: respect HF_HUB_OFFLINE in download_model to avoid network calls#614
joein merged 1 commit intoqdrant:mainfrom
amasolov:fix/respect-hf-hub-offline-in-download-model

Conversation

@amasolov
Copy link
Contributor

@amasolov amasolov commented Mar 16, 2026

Fixes #615
Related: #565

Summary

When HF_HUB_OFFLINE is set to a truthy value (1, true, yes, on), download_model() should not attempt any network calls. Currently, even though there is a local_files_only=True first pass, if it fails (e.g. missing metadata file), the retry loop still calls download_files_from_huggingface() without local_files_only, which triggers model_info() — a network API call that immediately raises EnvironmentError in offline mode. This causes an unnecessary fallback to GCS, downloading ~83MB from storage.googleapis.com on every startup.

In air-gapped / restricted environments where both HuggingFace and Google Cloud Storage are unreachable, this means fastembed cannot load models that are already present in the local cache.

Fix

Set local_files_only=True at the top of download_model() when HF_HUB_OFFLINE is set to a truthy value. This ensures:

  1. The HF local cache pass uses snapshot_download(..., local_files_only=True) — works if the model is cached
  2. retries is set to 1 (no unnecessary retries)
  3. The retry loop skips the network-dependent HF path (hf_source and not local_files_only is False)
  4. retrieve_model_gcs() only checks for local fast-* directories without downloading
  5. Zero network traffic

The truthy value check (1, TRUE, YES, ON, case-insensitive) aligns with huggingface_hub's own parsing of HF_HUB_OFFLINE.

Context

This was discovered while deploying NeMo Guardrails on Red Hat OpenShift AI in corporate air-gapped environments. With HF_HUB_OFFLINE=1, fastembed's all-MiniLM-L6-v2 model (pre-cached in the container image during build) could not be loaded — the HF path raised an offline error, and the GCS fallback tried to download from storage.googleapis.com which was also blocked.


All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  • Does your submission pass the existing tests?
  • Have you added tests for your feature?
  • Have you installed pre-commit with pip3 install pre-commit and set up hooks with pre-commit install?

@coderabbitai
Copy link

coderabbitai bot commented Mar 16, 2026

📝 Walkthrough

Walkthrough

The fastembed/common/model_management.py change adds a guard in download_model that reads the HF_HUB_OFFLINE environment variable. If that variable is set to a truthy value (e.g., "1", "TRUE", "YES", "ON") and local_files_only is not already true, the code forces local_files_only = True and updates kwargs["local_files_only"] = True, ensuring subsequent operations in download_model operate in offline/local mode.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: respecting HF_HUB_OFFLINE to prevent unnecessary network calls in download_model().
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description clearly relates to the changeset: it explains the fix for respecting HF_HUB_OFFLINE environment variable in download_model() with detailed context about the problem and solution.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can generate a title for your PR based on the changes.

Add @coderabbitai placeholder anywhere in the title of your PR and CodeRabbit will replace it with a title based on the changes in the PR. You can change the placeholder by changing the reviews.auto_title_placeholder setting.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@fastembed/common/model_management.py`:
- Around line 398-400: Update the HF_HUB_OFFLINE check to treat common truthy
variants (e.g., "1", "true", "yes", "on") case-insensitively instead of only
"1": read env = os.environ.get("HF_HUB_OFFLINE", "0").lower() and if env in
{"1","true","yes","on"} and not local_files_only set local_files_only = True and
kwargs["local_files_only"] = True (this change should be applied where
local_files_only and kwargs["local_files_only"] are currently set).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c8f28c3e-2adb-493f-8790-db9130a2df1d

📥 Commits

Reviewing files that changed from the base of the PR and between ea55268 and 51f5789.

📒 Files selected for processing (1)
  • fastembed/common/model_management.py

When HF_HUB_OFFLINE is set to a truthy value (1, true, yes, on),
download_model() should treat local_files_only=True to avoid any
network calls. Currently, even with the local-cache-first pass (which
may fail due to missing metadata), the retry loop still calls
download_files_from_huggingface() without local_files_only, which
triggers model_info() — a network API call that immediately fails in
offline mode. This causes an unnecessary fallback to GCS download from
storage.googleapis.com.

By setting local_files_only=True when HF_HUB_OFFLINE is enabled:

1. The HF local cache pass works if the model is cached
2. The retry loop skips the network-dependent HF path entirely
3. retrieve_model_gcs() only checks for local fast-* directories
4. No network calls are attempted at all

The truthy value check aligns with huggingface_hub's own parsing of
HF_HUB_OFFLINE, which accepts "1", "true", "yes", "on" (case-insensitive).

This is critical for air-gapped / restricted environments where both
HuggingFace and Google Cloud Storage are unreachable.

Made-with: Cursor
@amasolov amasolov force-pushed the fix/respect-hf-hub-offline-in-download-model branch from 51f5789 to 5b9e072 Compare March 16, 2026 06:14
@joein
Copy link
Member

joein commented Mar 23, 2026

Hey @amasolov

Thanks for pointing it out and creating a fix!

@joein
Copy link
Member

joein commented Mar 23, 2026

Though, @amasolov are you sure you were using the latest version of fastembed?

In the latest version there is this code:

        if hf_source:
            try:
                cache_kwargs = deepcopy(kwargs)
                cache_kwargs["local_files_only"] = True
                return Path(
                    cls.download_files_from_huggingface(
                        hf_source,
                        cache_dir=cache_dir,
                        extra_patterns=extra_patterns,
                        **cache_kwargs,
                    )
                )
            except Exception:
                pass
            finally:
                enable_progress_bars()

Which tries to read a model from the disk if it exists, and if it does not - it fallbacks to the normal downloading process.
So, even if you don't set HF_HUB_OFFLINE=1, fastembed should still avoid making any network calls if the model is available on the disk.

I tried running this simple snippet 2 times: one with internet connection available (to download the model) and then completely without internet connection and it ran successfuly:

from fastembed import TextEmbedding

te = TextEmbedding(cache_dir='./offline_models')

print(next(te.embed('qwerty')))

@joein joein self-requested a review March 23, 2026 15:39
@joein joein merged commit 52ebfba into qdrant:main Mar 23, 2026
6 checks passed
@joein
Copy link
Member

joein commented Mar 23, 2026

Nevertheless, I still find it a good thing to add, it is available as of fastembed 0.8.0

@amasolov amasolov deleted the fix/respect-hf-hub-offline-in-download-model branch March 23, 2026 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: HF_HUB_OFFLINE=1 bypasses local HF cache and triggers GCS download

2 participants