new: add gemmaembedding-300m by joein · Pull Request #592 · qdrant/fastembed

joein · 2025-12-15T12:29:46Z

Add support for gemmaembedding-300m

There is a version of gemma embedding containing weights specifically designed for sentence transformers and retrieval.
https://huggingface.co/google/embeddinggemma-300m#usage

SentenceTransformer(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (4): Normalize()
)

It applies 2 projection layers after the pooling, and then puts the pooled result into sentence_embedding instead of last_hidden_state. Which is the second output of the onnx model, and we usually take the first one.

I expect to see more models to come up with this design, thus introducing a new class here.

coderabbitai · 2025-12-15T12:35:27Z

📝 Walkthrough

Walkthrough

Adds a new builtin ONNX-backed sentence embedding implementation and registers it: fastembed/text/builtin_sentence_embedding.py defines BuiltinSentenceEmbedding and BuiltinSentenceEmbeddingWorker and a supported_builtin_sentence_embedding_models entry for google/embeddinggemma-300m. Refactors ONNX execution by adding _run_model to OnnxTextModel and updates onnx_embed to use it. Registers BuiltinSentenceEmbedding in TextEmbedding.EMBEDDINGS_REGISTRY. Adds tests and canonical vectors plus query-embedding tests to tests/test_text_onnx_embeddings.py.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

new: preserve embeddings in a type set by their model #492: Similar change to ONNX output post-processing that preserves native model outputs instead of forcing float32 casting.

Suggested reviewers

hh-space-invader

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title references 'gemmaembedding-300m' which matches the model being added, but uses incorrect capitalization compared to the actual model name 'google/embeddinggemma-300m' and lacks clarity about what 'new' entails.	Consider a more precise title like 'Add support for google/embeddinggemma-300m model' to clearly communicate the addition and correct model naming.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description clearly explains the addition of gemmaembedding-300m support, the model's design (projection layers, sentence_embedding output), and rationale for introducing a new class.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch gemma-embed

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

fastembed/text/onnx_text_model.py (1)

95-103: Good abstraction of ONNX execution into _run_model

Routing onnx_embed through _run_model and defaulting to the first ONNX output keeps existing behavior while giving subclasses a clean override point for multi-output models (like the new built-in sentence embeddings). The OnnxOutputContext wiring still exposes both attention_mask and input_ids after preprocessing, which is useful for downstream pooling logic. A brief docstring on _run_model (e.g., that it must return the batch-major tensor used as model_output) would make the contract clearer for future subclasses, but the current implementation looks correct.

Also applies to: 105-109
fastembed/text/builtin_sentence_embedding.py (1)
46-49: Silence Ruff ARG002 for unused kwargs in _post_process_onnx_output

The kwargs parameter is required to match the base signature, but it’s unused here, which triggers Ruff’s ARG002. You can keep the interface and satisfy Ruff by discarding or renaming the argument, e.g.:
-    def _post_process_onnx_output(
-        self, output: OnnxOutputContext, **kwargs: Any
-    ) -> Iterable[NumpyArray]:
-        return output.model_output
+    def _post_process_onnx_output(
+        self, output: OnnxOutputContext, **_kwargs: Any
+    ) -> Iterable[NumpyArray]:
+        return output.model_output
(or add an explicit del kwargs inside the method).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 685fd9b and e14914a.

📒 Files selected for processing (4)

fastembed/text/builtin_sentence_embedding.py (1 hunks)
fastembed/text/onnx_text_model.py (1 hunks)
fastembed/text/text_embedding.py (2 hunks)
tests/test_text_onnx_embeddings.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

fastembed/text/onnx_text_model.py (1)

fastembed/common/onnx_model.py (1)

OnnxOutputContext (20-23)

tests/test_text_onnx_embeddings.py (4)

fastembed/text/text_embedding.py (3)

TextEmbedding (17-230)

_list_supported_models (38-42)

query_embed (191-202)

fastembed/late_interaction/late_interaction_text_embedding.py (2)

_list_supported_models (45-49)

query_embed (141-153)

fastembed/text/onnx_embedding.py (1)

_list_supported_models (190-197)

tests/utils.py (1)

should_test_model (42-67)

fastembed/text/text_embedding.py (1)

fastembed/text/builtin_sentence_embedding.py (1)

BuiltinSentenceEmbedding (30-54)

🪛 Ruff (0.14.8)

fastembed/text/builtin_sentence_embedding.py

47-47: Unused method argument: kwargs

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Python 3.13.x on ubuntu-latest test
GitHub Check: Python 3.11.x on ubuntu-latest test
GitHub Check: Python 3.13.x on windows-latest test
GitHub Check: Python 3.10.x on ubuntu-latest test
GitHub Check: Python 3.13.x on macos-latest test
GitHub Check: Python 3.12.x on ubuntu-latest test

🔇 Additional comments (4)

fastembed/text/text_embedding.py (1)

11-11: Registry wiring for BuiltinSentenceEmbedding looks correct

Importing BuiltinSentenceEmbedding and adding it to EMBEDDINGS_REGISTRY before CustomTextEmbedding ensures google/embeddinggemma-300m is resolved to the new backend without affecting existing models or user-defined custom models.

Also applies to: 24-26

tests/test_text_onnx_embeddings.py (2)

71-87: Canonical vectors and prefixes for google/embeddinggemma-300m are consistent

Adding a CANONICAL_VECTOR_VALUES entry plus DOC_PREFIXES/QUERY_PREFIXES and CANONICAL_QUERY_VECTOR_VALUES for google/embeddinggemma-300m keeps the test suite aligned with the model’s documented prompt format. Using "title: none | text: " and "task: search result | query: " here mirrors the description in the model metadata, so both document and query tests exercise the recommended usage.

138-140: Embedding and query tests correctly respect model-specific prefixes and env gating

Conditionally prefixing docs and queries based on DOC_PREFIXES/QUERY_PREFIXES ensures the new model is tested with its required prompt templates without impacting other models. Reusing should_test_model in test_query_embedding matches the existing CI/manual/local split and avoids pulling in the heavy google/embeddinggemma-300m model except when explicitly allowed, while still validating shape and canonical values when it is tested. The structure is consistent with test_embedding and looks solid.

Also applies to: 151-181

fastembed/text/builtin_sentence_embedding.py (1)

10-27: Builtin sentence embedding integration is structurally sound

The DenseModelDescription for google/embeddinggemma-300m plus the BuiltinSentenceEmbedding / BuiltinSentenceEmbeddingWorker pair cleanly plug this model into the existing ONNX text stack. Overriding _run_model to take the second ONNX output and letting _post_process_onnx_output pass output.model_output through means the built-in pooling/normalization stays inside the ONNX graph while the rest of the pipeline (batching, workers, registry) remains unchanged. Using threads=1 in the worker is also consistent with other single-session worker patterns.

Also applies to: 30-55, 57-69

…ence embedding

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

tests/test_text_onnx_embeddings.py (1)

151-166: Consider adding parametrization for consistency with test_embedding.

The test uses an empty string in should_test_model(model_desc, "", is_ci, is_manual) and filters by CANONICAL_QUERY_VECTOR_VALUES membership. While functional, this pattern differs from test_embedding which uses @pytest.mark.parametrize. Consider either:

Adding @pytest.mark.parametrize("model_name", list(CANONICAL_QUERY_VECTOR_VALUES.keys())) for explicit control

Adding a brief comment explaining the intentional deviation

This is a minor consistency observation; the test logic itself is correct.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_text_onnx_embeddings.py` around lines 151 - 166, The
test_query_embedding loop should be made consistent with test_embedding by
either adding pytest parametrization or documenting the intentional difference:
either add `@pytest.mark.parametrize`("model_name",
list(CANONICAL_QUERY_VECTOR_VALUES.keys())) to the test_query_embedding function
and use model_name instead of iterating TextEmbedding._list_supported_models(),
or add a short comment above the loop explaining why the test filters by
CANONICAL_QUERY_VECTOR_VALUES and calls should_test_model(model_desc, "", is_ci,
is_manual) with an empty query; update references in the test accordingly
(test_query_embedding, TextEmbedding._list_supported_models,
CANONICAL_QUERY_VECTOR_VALUES, should_test_model).

fastembed/text/builtin_sentence_embedding.py (2)

30-31: Consider expanding the docstring to document the output selection behavior.

The class has a critical behavioral difference from its parent: it selects ONNX output index [1] (sentence_embedding) instead of [0] (last_hidden_state). Documenting this would help future maintainers understand why this class exists.

📝 Suggested docstring enhancement

 class BuiltinSentenceEmbedding(OnnxTextEmbedding):
-    """Builtin Sentence Embedding uses built-in pooling and normalization of underlying onnx models"""
+    """Builtin Sentence Embedding uses built-in pooling and normalization of underlying ONNX models.
+    
+    Unlike the base OnnxTextEmbedding which uses the first ONNX output (last_hidden_state),
+    this class uses the second output (sentence_embedding) which contains the already-pooled
+    and normalized embeddings from models with built-in projection layers.
+    """

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@fastembed/text/builtin_sentence_embedding.py` around lines 30 - 31,
BuiltinSentenceEmbedding's docstring is too brief and doesn't explain that it
overrides the parent's output selection: update the BuiltinSentenceEmbedding
class docstring to state that, unlike OnnxTextEmbedding which uses ONNX output
index [0] (last_hidden_state), this class selects output index [1]
(sentence_embedding), and briefly describe the pooling/normalization behavior
and why the different output is chosen; reference the BuiltinSentenceEmbedding
and OnnxTextEmbedding names so maintainers can find the behavior difference
easily.

46-49: Consider adding a shape assertion for defensive validation.

The method returns output.model_output directly, trusting the ONNX model's built-in pooling/normalization. Adding a simple shape check would catch unexpected model output changes early.

🛡️ Proposed defensive validation

 def _post_process_onnx_output(
     self, output: OnnxOutputContext, **kwargs: Any
 ) -> Iterable[NumpyArray]:
+    if output.model_output.ndim != 2:
+        raise ValueError(
+            f"Expected 2D output from builtin sentence embedding, got shape: {output.model_output.shape}"
+        )
     return output.model_output

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@fastembed/text/builtin_sentence_embedding.py` around lines 46 - 49, Add
defensive shape validation in _post_process_onnx_output: instead of returning
output.model_output directly, iterate the arrays in
OnnxOutputContext.model_output and assert their dimensionality and embedding
size match the model expectation (e.g., ndim is 1 or 2 and the last dimension
equals self.embedding_dim or the class field that defines embedding size). If a
mismatch occurs, raise a clear ValueError mentioning _post_process_onnx_output
and the offending array shape so unexpected ONNX output changes fail fast and
are easy to debug.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@fastembed/text/builtin_sentence_embedding.py`:
- Around line 51-54: The _run_model currently returns self.model.run(...)[1]
which assumes a second output exists and contains sentence embeddings; instead
validate the outputs and pick the correct tensor: if onnx_output_names is
provided, use its index or name to extract the desired output; otherwise inspect
the actual output names returned by self.model (or the model metadata) to locate
"sentence_embedding" (or fall back to the last/only output if that's the
intended behavior), and raise a clear ValueError indicating available outputs
when the expected output name/index is missing; update the _run_model
implementation to perform these checks before accessing an index and include the
symbols _run_model, model.run, and "sentence_embedding" in your logic and error
message.

---

Nitpick comments:
In `@fastembed/text/builtin_sentence_embedding.py`:
- Around line 30-31: BuiltinSentenceEmbedding's docstring is too brief and
doesn't explain that it overrides the parent's output selection: update the
BuiltinSentenceEmbedding class docstring to state that, unlike OnnxTextEmbedding
which uses ONNX output index [0] (last_hidden_state), this class selects output
index [1] (sentence_embedding), and briefly describe the pooling/normalization
behavior and why the different output is chosen; reference the
BuiltinSentenceEmbedding and OnnxTextEmbedding names so maintainers can find the
behavior difference easily.
- Around line 46-49: Add defensive shape validation in
_post_process_onnx_output: instead of returning output.model_output directly,
iterate the arrays in OnnxOutputContext.model_output and assert their
dimensionality and embedding size match the model expectation (e.g., ndim is 1
or 2 and the last dimension equals self.embedding_dim or the class field that
defines embedding size). If a mismatch occurs, raise a clear ValueError
mentioning _post_process_onnx_output and the offending array shape so unexpected
ONNX output changes fail fast and are easy to debug.

In `@tests/test_text_onnx_embeddings.py`:
- Around line 151-166: The test_query_embedding loop should be made consistent
with test_embedding by either adding pytest parametrization or documenting the
intentional difference: either add `@pytest.mark.parametrize`("model_name",
list(CANONICAL_QUERY_VECTOR_VALUES.keys())) to the test_query_embedding function
and use model_name instead of iterating TextEmbedding._list_supported_models(),
or add a short comment above the loop explaining why the test filters by
CANONICAL_QUERY_VECTOR_VALUES and calls should_test_model(model_desc, "", is_ci,
is_manual) with an empty query; update references in the test accordingly
(test_query_embedding, TextEmbedding._list_supported_models,
CANONICAL_QUERY_VECTOR_VALUES, should_test_model).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9682476e-09c7-45cd-95f2-26e03a472b3a

📥 Commits

Reviewing files that changed from the base of the PR and between e14914a and eafcbca.

📒 Files selected for processing (4)

fastembed/text/builtin_sentence_embedding.py
fastembed/text/onnx_text_model.py
fastembed/text/text_embedding.py
tests/test_text_onnx_embeddings.py

🚧 Files skipped from review as they are similar to previous changes (1)

fastembed/text/onnx_text_model.py

coderabbitai · 2026-03-24T11:42:47Z

fastembed/text/builtin_sentence_embedding.py

+    def _run_model(
+        self, onnx_input: dict[str, Any], onnx_output_names: list[str] | None = None
+    ) -> NumpyArray:
+        return self.model.run(onnx_output_names, onnx_input)[1]  # type: ignore[union-attr]


⚠️ Potential issue | 🟡 Minor

Hardcoded output index [1] is brittle and lacks validation.

The method assumes the ONNX model always has at least two outputs with sentence_embedding at index 1. If the model's output structure changes or differs across versions, this will raise an IndexError without a clear diagnostic message.

Consider adding validation or a more descriptive error:

🛡️ Proposed fix with validation

def _run_model( self, onnx_input: dict[str, Any], onnx_output_names: list[str] | None = None ) -> NumpyArray: - return self.model.run(onnx_output_names, onnx_input)[1] # type: ignore[union-attr] + result = self.model.run(onnx_output_names, onnx_input) # type: ignore[union-attr] + if len(result) < 2: + raise ValueError( + f"Expected at least 2 ONNX outputs (last_hidden_state, sentence_embedding), " + f"but got {len(result)}. Ensure the model has built-in pooling." + ) + return result[1]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@fastembed/text/builtin_sentence_embedding.py` around lines 51 - 54, The _run_model currently returns self.model.run(...)[1] which assumes a second output exists and contains sentence embeddings; instead validate the outputs and pick the correct tensor: if onnx_output_names is provided, use its index or name to extract the desired output; otherwise inspect the actual output names returned by self.model (or the model metadata) to locate "sentence_embedding" (or fall back to the last/only output if that's the intended behavior), and raise a clear ValueError indicating available outputs when the expected output name/index is missing; update the _run_model implementation to perform these checks before accessing an index and include the symbols _run_model, model.run, and "sentence_embedding" in your logic and error message.

joein requested a review from tbung December 15, 2025 12:29

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

tbung approved these changes Mar 23, 2026

View reviewed changes

joein added 2 commits March 24, 2026 18:38

new: add gemma embed

3e6a57f

refactor: rename builtin pooling normalized embedding to builtin sent…

eafcbca

…ence embedding

joein force-pushed the gemma-embed branch from e14914a to eafcbca Compare March 24, 2026 11:38

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

joein merged commit 87678dd into main Mar 25, 2026
13 checks passed

joein deleted the gemma-embed branch March 25, 2026 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new: add gemmaembedding-300m#592

new: add gemmaembedding-300m#592
joein merged 2 commits intomainfrom
gemma-embed

joein commented Dec 15, 2025

Uh oh!

coderabbitai bot commented Dec 15, 2025 •

edited

Loading

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joein commented Dec 15, 2025

Uh oh!

coderabbitai bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Dec 15, 2025 •

edited

Loading