A TypeScript module for text vectorization and similarity search using transformer models.
The embedding module converts text into numerical vectors (embeddings) that capture semantic meaning, enabling:
- Semantic Search: Find similar content based on meaning, not just keywords
- Text Similarity: Compare documents, sentences, or phrases
- Content Clustering: Group related text together
- Recommendation Systems: Suggest similar content to users
npm install @xenova/transformersimport { Encoder } from '@core/embedding/Encoder'
// Initialize encoder
const encoder = new Encoder()
await encoder.initialize()
// Convert text to vectors
const sentences = ['Hello world', 'How are you?', 'Good morning']
const result = await encoder.extract(sentences)
console.log(result.data) // [[0.1, 0.2, ...], [0.3, 0.4, ...], ...]
console.log(result.dimensions) // Vector dimensions (varies by model)
console.log(result.count) // Number of sentences processedimport { Decoder } from '@core/embedding/Decoder'
// Initialize decoder
await Decoder.initialize()
// Store some content
await Decoder.encode('Machine learning algorithms process data', '/path/to/ml.txt')
await Decoder.encode('Artificial intelligence systems help users', '/path/to/ai.txt')
await Decoder.encode('Cooking recipes provide instructions', '/path/to/cooking.txt')
// Find similar content
const similar = await Decoder.query('AI and ML topics')
console.log(similar)
// [
// { vector: [...], filePath: '/path/to/ai.txt', similarity: 0.89 },
// { vector: [...], filePath: '/path/to/ml.txt', similarity: 0.85 },
// { vector: [...], filePath: '/path/to/cooking.txt', similarity: 0.12 }
// ]const result = await encoder.extract(sentences, {
pooling: 'mean', // 'mean' | 'cls' | 'max' (default: 'mean')
normalize: true, // Normalize vectors (default: true)
batchSize: 32 // Process in batches (default: 32)
})// Use custom model
await encoder.initialize('Xenova/model-name')Loads the transformer model for embedding extraction.
Converts text sentences into vector embeddings.
Checks if the encoder is ready for use.
Cleans up resources and resets state.
Initializes the underlying encoder for operations.
Converts text to vector and stores it for similarity queries.
Finds similar stored content using cosine similarity.
- Text Input: You provide text sentences or documents
- Vectorization: The transformer model converts text to numerical vectors
- Storage: Vectors are stored in memory with optional file associations
- Similarity: Cosine similarity finds the most similar stored vectors
- Results: Returns ranked results with similarity scores (0-1)
// Index documents
await Decoder.initialize()
await Decoder.encode(document1, '/path/to/doc1.txt')
await Decoder.encode(document2, '/path/to/doc2.txt')
// Search for relevant documents
const results = await Decoder.query('machine learning algorithms')// Store user preferences
await Decoder.encode('User prefers sci-fi movies', 'user_prefs')
await Decoder.encode('User enjoys action movies', 'user_prefs')
// Find similar content
const recommendations = await Decoder.query('space adventure films')// Process multiple texts
const texts = ['AI research', 'Machine learning', 'Cooking tips', 'Recipe ideas']
for (const text of texts) {
await Decoder.encode(text)
}
// Group similar content
const aiContent = await Decoder.query('artificial intelligence')
const cookingContent = await Decoder.query('food preparation')- Batch Processing: Process multiple texts at once for better performance
- Memory Management: Call
dispose()when done to free resources - Model Size: Default model is lightweight; use larger models for better accuracy
- Batch Size: Adjust
batchSizebased on available memory
try {
await encoder.initialize()
const result = await encoder.extract(['Hello world'])
} catch (error) {
console.error('Embedding failed:', error.message)
}@xenova/transformers: Transformer model execution@interfaces/Embedding: TypeScript type definitions
Note: The embedding module uses a lightweight model by default, which provides a balance between accuracy and performance for most use cases.