Replace OpenAI embeddings with local transformers.js embeddings#5
Replace OpenAI embeddings with local transformers.js embeddings#5JoeNerdan wants to merge 2 commits intoappwrite:mainfrom
Conversation
Use Xenova/all-MiniLM-L6-v2 model via @xenova/transformers instead of OpenAI's text-embedding-3-small API. This removes the requirement for an OpenAI API key, making self-hosting fully local without external dependencies. Changes: - Add @xenova/transformers dependency - Update init-vector-store.ts to use local embeddings pipeline - Update query-vector-store.ts to use local embeddings pipeline - Change vector dimension from 1536 to 384 (MiniLM-L6-v2 output size) Trade-off: MiniLM-L6-v2 is a smaller model than OpenAI's embedding model, but it provides good quality for semantic search while being completely free and running locally. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
WalkthroughThe PR adds "@xenova/transformers": "^2.17.2" to package.json and replaces OpenAI-based embedding calls with a local Xenova feature-extraction pipeline ("Xenova/all-mpnet-base-v2") in scripts/init-vector-store.ts and src/lib/utils/query-vector-store.ts. OPENAI_API_KEY checks and OpenAI-specific imports/calls are removed. Embedding generation now uses batching (default BATCH_SIZE=50), computes embeddings via the Xenova pipeline with mean pooling and normalization, and the vector dimension is updated to 768. The scripts/init-vector-store.ts function signature for upsertDocsPageEmbeddings now accepts embeddings: number[][]. Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@package.json`:
- Line 30: The package.json currently pins "@xenova/transformers": "^2.17.2"
which lacks Bun support and pulls in onnxruntime-node causing runtime failures;
update the dependency to the Transformers.js v3 package (published as
"@huggingface/transformers" v3+) by replacing the "@xenova/transformers" entry
with the appropriate "@huggingface/transformers" v3 version (and bump any
lockfile), then scan code that imports or calls pipeline or model constructors
(e.g., any uses of pipeline("feature-extraction", "Xenova/all-MiniLM-L6-v2") or
direct imports from "@xenova/transformers") and adapt import paths/names to the
new package if needed so Bun-compatible runtime paths are used.
In `@scripts/init-vector-store.ts`:
- Line 4: The pipeline import currently relies on the default cache location;
import env from "@xenova/transformers" and explicitly set env.cacheDir to a
persistent path before calling pipeline (e.g., set env.cacheDir =
process.env.MY_CACHE_PATH or a CI/container path) so the model download/cache
does not default to ".cache"; also add a short comment near the import noting
Bun incompatibility with `@xenova/transformers` (onnxruntime-node backend) and
consider migrating to `@huggingface/transformers` v3+ if Bun runtime support is
required.
🧹 Nitpick comments (2)
src/lib/utils/query-vector-store.ts (1)
17-22: Add a defensive embedding-size check (384) to catch configuration mismatches early.
ThequeryVectorStorefunction generates embeddings without validating the output size. Since the vector store index is created with dimension 384 (hardcoded for Xenova/all-MiniLM-L6-v2), a mismatch would fail silently atvectorStore.query(). An explicit check makes the constraint clear and catches misconfiguration immediately if the model or pooling settings change.Proposed change
const output = await embeddingPipeline(query, { pooling: "mean", normalize: true }); - queryEmbedding = Array.from(output.data); + const vector = Array.from(output.data); + if (vector.length !== 384) { + throw new Error(`Unexpected embedding size: ${vector.length}`); + } + queryEmbedding = vector;Consider applying the same validation in
embedDocsPage()(scripts/init-vector-store.ts) for consistency.scripts/init-vector-store.ts (1)
30-30: Add documentation about embedding model upgrade impact.If the embedding model changes in future upgrades, operators must re-run the initialization script to recreate indexes with the new embedding dimension. Add a note to upgrade runbooks or deployment docs clarifying this requirement to prevent silent failures from dimension mismatches.
| "@mastra/mcp": "^0.10.12", | ||
| "@mastra/rag": "^1.1.0", | ||
| "@modelcontextprotocol/sdk": "^1.17.4", | ||
| "@xenova/transformers": "^2.17.2", |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
@xenova/transformers 2.17.2 Node.js 20 compatibility requirements
💡 Result:
For @xenova/transformers@2.17.2 on Node.js 20, the practical compatibility requirements are:
-
Node.js: 18+ (so Node 20 is supported) [1]
-
npm: 9+ [1]
-
Runtime/backend deps bundled by the package:
- Uses
onnxruntime-web@1.14.0by default [2] - Depends on
sharp@^0.32.0(native module; your platform must be able to install/resolve it) [2] - Optional (Node-only, faster):
onnxruntime-node@1.14.0(optionalDependency; install it explicitly if you want the native CPU backend) [2]
- Uses
-
Module system: Works with both ESM and CommonJS in Node (you choose based on your project setup) [1]
Sources: [1] [2]
🌐 Web query:
@xenova/transformers feature-extraction pipeline Xenova/all-MiniLM-L6-v2 support
💡 Result:
Yes—Xenova/all-MiniLM-L6-v2 is explicitly packaged with ONNX weights for Transformers.js / @xenova/transformers, and it’s intended to be used via the feature-extraction pipeline (to produce sentence embeddings). It supports common embedding options like mean pooling and normalization. [1]
import { pipeline } from '@xenova/transformers';
const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const texts = ['This is an example sentence', 'Each sentence is converted'];
const embeddings = await extractor(texts, { pooling: 'mean', normalize: true });
// embeddings is a Tensor (typically 384-dim per text for this model)
console.log(embeddings.dims); // e.g. [2, 384]
console.log(embeddings.tolist()); // nested JS arraysSources: [1]
🌐 Web query:
@xenova/transformers 2.17.2 Bun support runtime
💡 Result:
@xenova/transformers@2.17.2 does not have explicit/official “Bun runtime support”. In v2, the ONNX backend selector is essentially “Node vs Browser”: when it thinks it’s running in Node, it uses onnxruntime-node; in the browser it uses onnxruntime-web. (huggingface.co)
Because Bun is Node-compatible, v2 typically goes down the onnxruntime-node path, and there are known Bun issues when running the basic pipeline example (errors during onnxruntime-node session construction, fallback to WASM, and the process not terminating). (github.com)
If you need real Bun support, the project added it in Transformers.js v3 (published as @huggingface/transformers, not @xenova/transformers). (huggingface.co)
Citations:
- 1: https://huggingface.co/docs/transformers.js/v2.17.2/en/api/backends/onnx?utm_source=openai
- 2: error when running simple example from
@xenova/transformersoven-sh/bun#11825 - 3: https://huggingface.co/blog/transformersjs-v3?utm_source=openai
🏁 Script executed:
# Check if there's any additional context in package.json or related files
head -n 50 package.json | tail -n 30Repository: appwrite/mcp-for-docs
Length of output: 787
@xenova/transformers 2.17.2 requires @huggingface/transformers v3+ for Bun support.
Node 20.9 is supported (requires Node 18+), and pipeline("feature-extraction", "Xenova/all-MiniLM-L6-v2") is available. However, v2.17.2 does not support Bun—it attempts to use onnxruntime-node in Bun, causing session construction failures and process termination issues. Bun support was added in Transformers.js v3 (published as @huggingface/transformers). If Bun compatibility is required, upgrade to the v3 package.
🤖 Prompt for AI Agents
In `@package.json` at line 30, The package.json currently pins
"@xenova/transformers": "^2.17.2" which lacks Bun support and pulls in
onnxruntime-node causing runtime failures; update the dependency to the
Transformers.js v3 package (published as "@huggingface/transformers" v3+) by
replacing the "@xenova/transformers" entry with the appropriate
"@huggingface/transformers" v3 version (and bump any lockfile), then scan code
that imports or calls pipeline or model constructors (e.g., any uses of
pipeline("feature-extraction", "Xenova/all-MiniLM-L6-v2") or direct imports from
"@xenova/transformers") and adapt import paths/names to the new package if
needed so Bun-compatible runtime paths are used.
| import { getContent } from "../src/lib/utils/content"; | ||
| import { embedMany } from "ai"; | ||
| import { openai } from "@ai-sdk/openai"; | ||
| import { pipeline } from "@xenova/transformers"; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n scripts/init-vector-store.tsRepository: appwrite/mcp-for-docs
Length of output: 5063
🌐 Web query:
transformers.js xenova cache environment variable TRANSFORMERS_CACHE
💡 Result:
TRANSFORMERS_CACHE is a Python transformers environment variable (and is deprecated there in favor of HF_HUB_CACHE / HF_HOME)—it is not the way you configure caching in Transformers.js. [1]
In Transformers.js (@xenova/transformers / @huggingface/transformers), you set the cache location via the runtime config object:
import { env } from '@xenova/transformers';
env.cacheDir = '/path/to/cache/'; // default is "./.cache"Transformers.js also lets you choose the caching mechanism (filesystem vs browser cache) via flags like env.useFSCache / env.useBrowserCache. [2]
In Node environments you’ll typically see models cached under a .cache directory (often within the package folder, depending on how/where it’s bundled). [3]
Sources: [1] [2] [3]
🌐 Web query:
@xenova/transformers Bun runtime support compatibility
💡 Result:
- Official Bun support: Bun compatibility is officially supported in Transformers.js v3, published on NPM as
@huggingface/transformers(the project moved from@xenova/transformersused in v1/v2). [1] - Status of
@xenova/transformerson Bun:@xenova/transformers(v1/v2) can run on Bun in some cases, but there have been Bun-specific runtime issues around theonnxruntime-nodebackend (errors during session creation, fallback to WASM, and the Bun process not exiting cleanly), and the related Bun issue was closed “not planned.” [2] - Recommendation: If you need Bun today, prefer migrating to
@huggingface/transformers(v3+) rather than relying on@xenova/transformers. [1]
Sources:
[1] Hugging Face blog: Transformers.js v3 … Node.js, Deno, and Bun compatibility; new NPM package @huggingface/transformers
[2] Bun GitHub issue #11825: error when running simple example from @xenova/transformers
Configure model cache location explicitly; note Bun compatibility issues with current library.
The pipeline downloads and caches the model, which defaults to .cache directory. In CI/containers, configure the cache path using env.cacheDir from @xenova/transformers (not TRANSFORMERS_CACHE/XENOVA_CACHE, which don't apply to Transformers.js):
import { env } from '@xenova/transformers';
env.cacheDir = '/path/to/persistent/cache';Bun runtime concern: @xenova/transformers (v1/v2) has known incompatibilities with Bun, particularly around the onnxruntime-node backend. Official Bun support is only available in @huggingface/transformers v3+. If Bun compatibility is required, consider migrating to that package.
Also applies to: 6-7
🤖 Prompt for AI Agents
In `@scripts/init-vector-store.ts` at line 4, The pipeline import currently relies
on the default cache location; import env from "@xenova/transformers" and
explicitly set env.cacheDir to a persistent path before calling pipeline (e.g.,
set env.cacheDir = process.env.MY_CACHE_PATH or a CI/container path) so the
model download/cache does not default to ".cache"; also add a short comment near
the import noting Bun incompatibility with `@xenova/transformers`
(onnxruntime-node backend) and consider migrating to `@huggingface/transformers`
v3+ if Bun runtime support is required.
- Upgraded from all-MiniLM-L6-v2 (384 dims) to all-mpnet-base-v2 (768 dims) - Better semantic understanding with minimal latency impact (~40ms vs ~15ms per query) - Still fully local, no API key required Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/init-vector-store.ts (1)
14-21:⚠️ Potential issue | 🟡 MinorEmpty catch block silently swallows errors.
The empty catch block on line 21 hides any errors from
deleteIndex. If the index doesn't exist, that may be expected, but other errors (connection issues, permission problems) would be silently ignored.🛡️ Proposed fix to log non-expected errors
try { console.log("Deleting index contents..."); await vectorStore.deleteIndex({ indexName: "docs", }); console.log("Index contents deleted"); - } catch (error) {} + } catch (error) { + // Index may not exist on first run - only log unexpected errors + if (error instanceof Error && !error.message.includes("not found")) { + console.warn("Warning deleting index:", error.message); + } + }
🧹 Nitpick comments (5)
src/lib/utils/query-vector-store.ts (3)
9-10: Top-level await blocks module loading during model initialization.The pipeline initialization happens at module load time. On first run, this downloads the model (~90MB for mpnet-base-v2), blocking any imports of this module until complete. Consider lazy initialization to avoid blocking the application startup.
♻️ Proposed lazy initialization pattern
-// Local embedding model - no API key needed -const embeddingPipeline = await pipeline("feature-extraction", "Xenova/all-mpnet-base-v2"); +// Local embedding model - no API key needed +let embeddingPipeline: Awaited<ReturnType<typeof pipeline>> | null = null; + +async function getEmbeddingPipeline() { + if (!embeddingPipeline) { + embeddingPipeline = await pipeline("feature-extraction", "Xenova/all-mpnet-base-v2"); + } + return embeddingPipeline; +}Then in
queryVectorStore:- const output = await embeddingPipeline(query, { pooling: "mean", normalize: true }); + const pipe = await getEmbeddingPipeline(); + const output = await pipe(query, { pooling: "mean", normalize: true });
3-10: Configure model cache location explicitly for CI/container environments.Similar to the init script, the pipeline downloads and caches the model to a default
.cachedirectory. For reproducible builds in CI/containers, configure the cache path explicitly usingenv.cacheDir:import { pipeline, env } from "@xenova/transformers"; env.cacheDir = process.env.TRANSFORMERS_CACHE_DIR || "./.cache";
28-38: Consider removing theas anytype casts.The
as anycasts on lines 30 and 34 suppress TypeScript's type checking. If the LibSQLVector types don't match, consider extending the types or using a more specific cast rather thanany.scripts/init-vector-store.ts (2)
23-24: Hardcoded 3-second delay lacks explanation.The purpose of this delay is unclear. If it's waiting for index deletion to propagate, consider documenting why or implementing a proper ready-check instead.
📝 Add comment explaining the delay
- // Wait 3 seconds - await new Promise((resolve) => setTimeout(resolve, 3000)); + // Wait for index deletion to fully propagate before recreating + // TODO: Replace with proper ready-check if LibSQLVector supports it + await new Promise((resolve) => setTimeout(resolve, 3000));
111-118: Batch chunks in a single pipeline call for better throughput.Chunks are processed sequentially in a for-loop. For documents with multiple chunks, pass all texts as an array to
embeddingPipeline—@xenova/transformerssupports batched inputs and returns a Tensor where each row is an embedding.The suggested implementation should use the Tensor's
.tolist()method instead of manual slicing:♻️ Option: Batch chunks with simpler handling
async function embedDocsPage(chunks: Awaited<ReturnType<MDocument["chunk"]>>) { - const embeddings: number[][] = []; - for (const chunk of chunks) { - const output = await embeddingPipeline(chunk.text, { pooling: "mean", normalize: true }); - embeddings.push(Array.from(output.data)); - } - return embeddings; + const texts = chunks.map(chunk => chunk.text); + const output = await embeddingPipeline(texts, { pooling: "mean", normalize: true }); + return output.tolist(); }
|
@JoeNerdan it would be better to allow option to use local embeddings instead of replacing openai altogether |
Summary
This PR replaces the OpenAI embedding API with local transformers.js embeddings, enabling fully local self-hosting without requiring an external API key.
text-embedding-3-smallwithXenova/all-mpnet-base-v2via@xenova/transformersscripts/init-vector-store.tsto use local embedding pipelinesrc/lib/utils/query-vector-store.tsto use local embedding pipelineProblem
Self-hosting currently requires an OpenAI API key for generating embeddings, which:
Solution
Use the
@xenova/transformerslibrary to run the all-mpnet-base-v2 sentence transformer model locally. This model runs entirely in the Node.js runtime with no external API calls.Trade-offs
The all-mpnet-base-v2 model provides excellent semantic similarity for documentation search use cases while being completely free to run locally.
Test plan
🤖 Generated with Claude Code