Skip to content

Replace OpenAI embeddings with local transformers.js embeddings#5

Open
JoeNerdan wants to merge 2 commits intoappwrite:mainfrom
JoeNerdan:local-transformers-embeddings
Open

Replace OpenAI embeddings with local transformers.js embeddings#5
JoeNerdan wants to merge 2 commits intoappwrite:mainfrom
JoeNerdan:local-transformers-embeddings

Conversation

@JoeNerdan
Copy link

@JoeNerdan JoeNerdan commented Feb 2, 2026

Summary

This PR replaces the OpenAI embedding API with local transformers.js embeddings, enabling fully local self-hosting without requiring an external API key.

  • Replace OpenAI's text-embedding-3-small with Xenova/all-mpnet-base-v2 via @xenova/transformers
  • Update scripts/init-vector-store.ts to use local embedding pipeline
  • Update src/lib/utils/query-vector-store.ts to use local embedding pipeline
  • Change vector dimension from 1536 to 768 (mpnet-base-v2 output size)

Problem

Self-hosting currently requires an OpenAI API key for generating embeddings, which:

  • Adds external dependency and potential point of failure
  • Incurs ongoing API costs
  • Requires users to create an OpenAI account and manage API keys

Solution

Use the @xenova/transformers library to run the all-mpnet-base-v2 sentence transformer model locally. This model runs entirely in the Node.js runtime with no external API calls.

Trade-offs

Aspect Before (OpenAI) After (Local)
Vector dimensions 1536 768
API key required Yes No
External dependency Yes No
API costs Per-token pricing Free
Model size Large cloud model Local model

The all-mpnet-base-v2 model provides excellent semantic similarity for documentation search use cases while being completely free to run locally.

Test plan

  • Verified the init-vector-store script runs successfully and generates embeddings
  • Verified search queries return semantically relevant results
  • Confirmed no OpenAI API key is needed in environment

🤖 Generated with Claude Code

Use Xenova/all-MiniLM-L6-v2 model via @xenova/transformers instead of
OpenAI's text-embedding-3-small API. This removes the requirement for
an OpenAI API key, making self-hosting fully local without external
dependencies.

Changes:
- Add @xenova/transformers dependency
- Update init-vector-store.ts to use local embeddings pipeline
- Update query-vector-store.ts to use local embeddings pipeline
- Change vector dimension from 1536 to 384 (MiniLM-L6-v2 output size)

Trade-off: MiniLM-L6-v2 is a smaller model than OpenAI's embedding model,
but it provides good quality for semantic search while being completely
free and running locally.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 2, 2026

Walkthrough

The PR adds "@xenova/transformers": "^2.17.2" to package.json and replaces OpenAI-based embedding calls with a local Xenova feature-extraction pipeline ("Xenova/all-mpnet-base-v2") in scripts/init-vector-store.ts and src/lib/utils/query-vector-store.ts. OPENAI_API_KEY checks and OpenAI-specific imports/calls are removed. Embedding generation now uses batching (default BATCH_SIZE=50), computes embeddings via the Xenova pipeline with mean pooling and normalization, and the vector dimension is updated to 768. The scripts/init-vector-store.ts function signature for upsertDocsPageEmbeddings now accepts embeddings: number[][].

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: replacing OpenAI embeddings with local transformers.js embeddings, which is the core objective of all file modifications.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@package.json`:
- Line 30: The package.json currently pins "@xenova/transformers": "^2.17.2"
which lacks Bun support and pulls in onnxruntime-node causing runtime failures;
update the dependency to the Transformers.js v3 package (published as
"@huggingface/transformers" v3+) by replacing the "@xenova/transformers" entry
with the appropriate "@huggingface/transformers" v3 version (and bump any
lockfile), then scan code that imports or calls pipeline or model constructors
(e.g., any uses of pipeline("feature-extraction", "Xenova/all-MiniLM-L6-v2") or
direct imports from "@xenova/transformers") and adapt import paths/names to the
new package if needed so Bun-compatible runtime paths are used.

In `@scripts/init-vector-store.ts`:
- Line 4: The pipeline import currently relies on the default cache location;
import env from "@xenova/transformers" and explicitly set env.cacheDir to a
persistent path before calling pipeline (e.g., set env.cacheDir =
process.env.MY_CACHE_PATH or a CI/container path) so the model download/cache
does not default to ".cache"; also add a short comment near the import noting
Bun incompatibility with `@xenova/transformers` (onnxruntime-node backend) and
consider migrating to `@huggingface/transformers` v3+ if Bun runtime support is
required.
🧹 Nitpick comments (2)
src/lib/utils/query-vector-store.ts (1)

17-22: Add a defensive embedding-size check (384) to catch configuration mismatches early.
The queryVectorStore function generates embeddings without validating the output size. Since the vector store index is created with dimension 384 (hardcoded for Xenova/all-MiniLM-L6-v2), a mismatch would fail silently at vectorStore.query(). An explicit check makes the constraint clear and catches misconfiguration immediately if the model or pooling settings change.

Proposed change
     const output = await embeddingPipeline(query, { pooling: "mean", normalize: true });
-    queryEmbedding = Array.from(output.data);
+    const vector = Array.from(output.data);
+    if (vector.length !== 384) {
+      throw new Error(`Unexpected embedding size: ${vector.length}`);
+    }
+    queryEmbedding = vector;

Consider applying the same validation in embedDocsPage() (scripts/init-vector-store.ts) for consistency.

scripts/init-vector-store.ts (1)

30-30: Add documentation about embedding model upgrade impact.

If the embedding model changes in future upgrades, operators must re-run the initialization script to recreate indexes with the new embedding dimension. Add a note to upgrade runbooks or deployment docs clarifying this requirement to prevent silent failures from dimension mismatches.

"@mastra/mcp": "^0.10.12",
"@mastra/rag": "^1.1.0",
"@modelcontextprotocol/sdk": "^1.17.4",
"@xenova/transformers": "^2.17.2",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

@xenova/transformers 2.17.2 Node.js 20 compatibility requirements

💡 Result:

For @xenova/transformers@2.17.2 on Node.js 20, the practical compatibility requirements are:

  • Node.js: 18+ (so Node 20 is supported) [1]

  • npm: 9+ [1]

  • Runtime/backend deps bundled by the package:

    • Uses onnxruntime-web@1.14.0 by default [2]
    • Depends on sharp@^0.32.0 (native module; your platform must be able to install/resolve it) [2]
    • Optional (Node-only, faster): onnxruntime-node@1.14.0 (optionalDependency; install it explicitly if you want the native CPU backend) [2]
  • Module system: Works with both ESM and CommonJS in Node (you choose based on your project setup) [1]

Sources: [1] [2]


🌐 Web query:

@xenova/transformers feature-extraction pipeline Xenova/all-MiniLM-L6-v2 support

💡 Result:

Yes—Xenova/all-MiniLM-L6-v2 is explicitly packaged with ONNX weights for Transformers.js / @xenova/transformers, and it’s intended to be used via the feature-extraction pipeline (to produce sentence embeddings). It supports common embedding options like mean pooling and normalization. [1]

import { pipeline } from '@xenova/transformers';

const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');

const texts = ['This is an example sentence', 'Each sentence is converted'];
const embeddings = await extractor(texts, { pooling: 'mean', normalize: true });

// embeddings is a Tensor (typically 384-dim per text for this model)
console.log(embeddings.dims);      // e.g. [2, 384]
console.log(embeddings.tolist());  // nested JS arrays

Sources: [1]


🌐 Web query:

@xenova/transformers 2.17.2 Bun support runtime

💡 Result:

@xenova/transformers@2.17.2 does not have explicit/official “Bun runtime support”. In v2, the ONNX backend selector is essentially “Node vs Browser”: when it thinks it’s running in Node, it uses onnxruntime-node; in the browser it uses onnxruntime-web. (huggingface.co)

Because Bun is Node-compatible, v2 typically goes down the onnxruntime-node path, and there are known Bun issues when running the basic pipeline example (errors during onnxruntime-node session construction, fallback to WASM, and the process not terminating). (github.com)

If you need real Bun support, the project added it in Transformers.js v3 (published as @huggingface/transformers, not @xenova/transformers). (huggingface.co)

Citations:


🏁 Script executed:

# Check if there's any additional context in package.json or related files
head -n 50 package.json | tail -n 30

Repository: appwrite/mcp-for-docs

Length of output: 787


@xenova/transformers 2.17.2 requires @huggingface/transformers v3+ for Bun support.

Node 20.9 is supported (requires Node 18+), and pipeline("feature-extraction", "Xenova/all-MiniLM-L6-v2") is available. However, v2.17.2 does not support Bun—it attempts to use onnxruntime-node in Bun, causing session construction failures and process termination issues. Bun support was added in Transformers.js v3 (published as @huggingface/transformers). If Bun compatibility is required, upgrade to the v3 package.

🤖 Prompt for AI Agents
In `@package.json` at line 30, The package.json currently pins
"@xenova/transformers": "^2.17.2" which lacks Bun support and pulls in
onnxruntime-node causing runtime failures; update the dependency to the
Transformers.js v3 package (published as "@huggingface/transformers" v3+) by
replacing the "@xenova/transformers" entry with the appropriate
"@huggingface/transformers" v3 version (and bump any lockfile), then scan code
that imports or calls pipeline or model constructors (e.g., any uses of
pipeline("feature-extraction", "Xenova/all-MiniLM-L6-v2") or direct imports from
"@xenova/transformers") and adapt import paths/names to the new package if
needed so Bun-compatible runtime paths are used.

import { getContent } from "../src/lib/utils/content";
import { embedMany } from "ai";
import { openai } from "@ai-sdk/openai";
import { pipeline } from "@xenova/transformers";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n scripts/init-vector-store.ts

Repository: appwrite/mcp-for-docs

Length of output: 5063


🌐 Web query:

transformers.js xenova cache environment variable TRANSFORMERS_CACHE

💡 Result:

TRANSFORMERS_CACHE is a Python transformers environment variable (and is deprecated there in favor of HF_HUB_CACHE / HF_HOME)—it is not the way you configure caching in Transformers.js. [1]

In Transformers.js (@xenova/transformers / @huggingface/transformers), you set the cache location via the runtime config object:

import { env } from '@xenova/transformers';
env.cacheDir = '/path/to/cache/';   // default is "./.cache"

Transformers.js also lets you choose the caching mechanism (filesystem vs browser cache) via flags like env.useFSCache / env.useBrowserCache. [2]

In Node environments you’ll typically see models cached under a .cache directory (often within the package folder, depending on how/where it’s bundled). [3]

Sources: [1] [2] [3]


🌐 Web query:

@xenova/transformers Bun runtime support compatibility

💡 Result:

  • Official Bun support: Bun compatibility is officially supported in Transformers.js v3, published on NPM as @huggingface/transformers (the project moved from @xenova/transformers used in v1/v2). [1]
  • Status of @xenova/transformers on Bun: @xenova/transformers (v1/v2) can run on Bun in some cases, but there have been Bun-specific runtime issues around the onnxruntime-node backend (errors during session creation, fallback to WASM, and the Bun process not exiting cleanly), and the related Bun issue was closed “not planned.” [2]
  • Recommendation: If you need Bun today, prefer migrating to @huggingface/transformers (v3+) rather than relying on @xenova/transformers. [1]

Sources:
[1] Hugging Face blog: Transformers.js v3 … Node.js, Deno, and Bun compatibility; new NPM package @huggingface/transformers
[2] Bun GitHub issue #11825: error when running simple example from @xenova/transformers


Configure model cache location explicitly; note Bun compatibility issues with current library.

The pipeline downloads and caches the model, which defaults to .cache directory. In CI/containers, configure the cache path using env.cacheDir from @xenova/transformers (not TRANSFORMERS_CACHE/XENOVA_CACHE, which don't apply to Transformers.js):

import { env } from '@xenova/transformers';
env.cacheDir = '/path/to/persistent/cache';

Bun runtime concern: @xenova/transformers (v1/v2) has known incompatibilities with Bun, particularly around the onnxruntime-node backend. Official Bun support is only available in @huggingface/transformers v3+. If Bun compatibility is required, consider migrating to that package.

Also applies to: 6-7

🤖 Prompt for AI Agents
In `@scripts/init-vector-store.ts` at line 4, The pipeline import currently relies
on the default cache location; import env from "@xenova/transformers" and
explicitly set env.cacheDir to a persistent path before calling pipeline (e.g.,
set env.cacheDir = process.env.MY_CACHE_PATH or a CI/container path) so the
model download/cache does not default to ".cache"; also add a short comment near
the import noting Bun incompatibility with `@xenova/transformers`
(onnxruntime-node backend) and consider migrating to `@huggingface/transformers`
v3+ if Bun runtime support is required.

- Upgraded from all-MiniLM-L6-v2 (384 dims) to all-mpnet-base-v2 (768 dims)
- Better semantic understanding with minimal latency impact (~40ms vs ~15ms per query)
- Still fully local, no API key required

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/init-vector-store.ts (1)

14-21: ⚠️ Potential issue | 🟡 Minor

Empty catch block silently swallows errors.

The empty catch block on line 21 hides any errors from deleteIndex. If the index doesn't exist, that may be expected, but other errors (connection issues, permission problems) would be silently ignored.

🛡️ Proposed fix to log non-expected errors
   try {
     console.log("Deleting index contents...");
     await vectorStore.deleteIndex({
       indexName: "docs",
     });
     console.log("Index contents deleted");
-  } catch (error) {}
+  } catch (error) {
+    // Index may not exist on first run - only log unexpected errors
+    if (error instanceof Error && !error.message.includes("not found")) {
+      console.warn("Warning deleting index:", error.message);
+    }
+  }
🧹 Nitpick comments (5)
src/lib/utils/query-vector-store.ts (3)

9-10: Top-level await blocks module loading during model initialization.

The pipeline initialization happens at module load time. On first run, this downloads the model (~90MB for mpnet-base-v2), blocking any imports of this module until complete. Consider lazy initialization to avoid blocking the application startup.

♻️ Proposed lazy initialization pattern
-// Local embedding model - no API key needed
-const embeddingPipeline = await pipeline("feature-extraction", "Xenova/all-mpnet-base-v2");
+// Local embedding model - no API key needed
+let embeddingPipeline: Awaited<ReturnType<typeof pipeline>> | null = null;
+
+async function getEmbeddingPipeline() {
+  if (!embeddingPipeline) {
+    embeddingPipeline = await pipeline("feature-extraction", "Xenova/all-mpnet-base-v2");
+  }
+  return embeddingPipeline;
+}

Then in queryVectorStore:

-    const output = await embeddingPipeline(query, { pooling: "mean", normalize: true });
+    const pipe = await getEmbeddingPipeline();
+    const output = await pipe(query, { pooling: "mean", normalize: true });

3-10: Configure model cache location explicitly for CI/container environments.

Similar to the init script, the pipeline downloads and caches the model to a default .cache directory. For reproducible builds in CI/containers, configure the cache path explicitly using env.cacheDir:

import { pipeline, env } from "@xenova/transformers";
env.cacheDir = process.env.TRANSFORMERS_CACHE_DIR || "./.cache";

28-38: Consider removing the as any type casts.

The as any casts on lines 30 and 34 suppress TypeScript's type checking. If the LibSQLVector types don't match, consider extending the types or using a more specific cast rather than any.

scripts/init-vector-store.ts (2)

23-24: Hardcoded 3-second delay lacks explanation.

The purpose of this delay is unclear. If it's waiting for index deletion to propagate, consider documenting why or implementing a proper ready-check instead.

📝 Add comment explaining the delay
-  // Wait 3 seconds
-  await new Promise((resolve) => setTimeout(resolve, 3000));
+  // Wait for index deletion to fully propagate before recreating
+  // TODO: Replace with proper ready-check if LibSQLVector supports it
+  await new Promise((resolve) => setTimeout(resolve, 3000));

111-118: Batch chunks in a single pipeline call for better throughput.

Chunks are processed sequentially in a for-loop. For documents with multiple chunks, pass all texts as an array to embeddingPipeline@xenova/transformers supports batched inputs and returns a Tensor where each row is an embedding.

The suggested implementation should use the Tensor's .tolist() method instead of manual slicing:

♻️ Option: Batch chunks with simpler handling
 async function embedDocsPage(chunks: Awaited<ReturnType<MDocument["chunk"]>>) {
-  const embeddings: number[][] = [];
-  for (const chunk of chunks) {
-    const output = await embeddingPipeline(chunk.text, { pooling: "mean", normalize: true });
-    embeddings.push(Array.from(output.data));
-  }
-  return embeddings;
+  const texts = chunks.map(chunk => chunk.text);
+  const output = await embeddingPipeline(texts, { pooling: "mean", normalize: true });
+  return output.tolist();
 }

@ChiragAgg5k
Copy link
Member

@JoeNerdan it would be better to allow option to use local embeddings instead of replacing openai altogether

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants