Skip to content

adityak74/embenx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

149 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Embenx Logo

Embenx β€” Agentic Memory Layer for Python AI Agents πŸš€

The Agentic Memory Layer & Universal Retrieval Toolkit.
Synthetic data generation, 20+ vector backends, hybrid search, and MCP native memory for AI agents.

Stars Issues MIT License Python 3.11+ Docs uv ready

PyPI Version PyPI Weekly Downloads

πŸ“– Read the Docs Β Β·Β  Explore the Visual UI Β Β·Β  Report Bug Β Β·Β  Request Feature


What is Embenx?

Embenx is a Python-native retrieval library that sits between raw vector indices and full-blown vector databases. It provides a high-level Collection API for managing embeddings and metadata, supporting advanced features like filtering, reranking, and quantization across 20+ backends.

🌟 New in v1.5.1: OpenSearch Integration

Embenx now natively supports OpenSearch as a vector backend. Scale your agentic memory to production clusters with native k-NN vector search and enterprise-grade durability.


Quickstart

Get up and running in 60 seconds.

Step 1 β€” Install

pip install embenx

Step 2 β€” Create a collection and add embeddings

import numpy as np
from embenx import Collection

# 768-dim FAISS-HNSW index (in-memory, no extra config needed)
col = Collection(dimension=768, indexer_type="faiss-hnsw")

vectors = np.random.rand(10, 768).astype("float32")
metadata = [{"id": i, "text": f"Document {i}"} for i in range(10)]
col.add(vectors, metadata)

Step 3 β€” Search

query = np.random.rand(768).astype("float32")
results = col.search(query, top_k=3)

for meta, dist in results:
    print(f"{meta['text']}  (distance: {dist:.4f})")

Library Usage

πŸš€ Production Deployment (OpenSearch, Qdrant, Milvus)

Embenx makes it easy to transition from local development to production-grade vector clusters.

from embenx import Collection

# Initialize with OpenSearch (Assumes http://localhost:9200)
# Use OPENSEARCH_URL env var to override
col = Collection(dimension=128, indexer_type="opensearch")

# Add data directly to OpenSearch
col.add(vectors, metadata)

# Search with native k-NN
results = col.search(query_vector, top_k=5)

🧠 Agentic Memory & Hybrid Search

Combine semantic search with keyword retrieval and self-healing feedback loops.

from embenx import Collection

# Initialize with hybrid search (FAISS + BM25)
col = Collection(dimension=768, indexer_type="faiss-hnsw", sparse_indexer_type="bm25")

# Hybrid Search using RRF (Reciprocal Rank Fusion)
results = col.hybrid_search(
    query_vector=query_vec,
    query_text="What is the capital of France?",
    top_k=5
)

# Self-healing feedback
col.feedback(doc_id="doc_123", label="good")

πŸ§ͺ Synthetic Data Generation

Generate high-quality query-document pairs to train or evaluate your retrieval pipelines.

results = col.generate_synthetic_queries(
    n_queries_per_doc=2,
    num_docs=100,
    model="gpt-4o-mini",  # Or "ollama/llama3"
    output_path="eval_data.jsonl"
)

Agentic Memory (MCP)

Embenx ships with a built-in Model Context Protocol (MCP) server. This allows AI agents (like Claude Desktop) to use Embenx collections as their own long-term memory.

1. Start the server

embenx mcp-start

Visual Explorer

Embenx provides a built-in web UI to visualize your vector collections, including an interactive HNSW Graph Visualizer and a RAG Playground.

embenx explorer

Features

  • 20+ Vector Backends β€” Native support for OpenSearch, Qdrant, Milvus, FAISS, PGVector, and more.
  • Synthetic Data Generation β€” Create high-quality query-document pairs using LLMs for training and evaluation.
  • Multimodal Support β€” Native support for image embeddings (CLIP).
  • RAG Playground β€” Test retrieval quality with an integrated LLM chat loop.
  • HNSW Graph Visualizer β€” Interactive 3D visualization of navigation layers.
  • Agentic Memory (MCP) β€” Native Model Context Protocol support for AI agents.
  • Self-Healing Retrieval β€” Integrated feedback loops to automatically improve ranking accuracy.
  • Temporal Memory (Echo) β€” Recency-biased retrieval and time-window filtering.
  • Spatial Memory (ESWM) β€” Neuroscience-inspired spatial cognitive maps for navigation.
  • Hybrid Search β€” Combine dense vectors with sparse BM25 retrieval using RRF.
  • Portable Formats β€” Native support for Parquet, NumPy (.npy/.npz), and FAISS (.index).

Supported Indexers

Indexer Key Family / Algorithm Best For
opensearch OpenSearch Native k-NN vector search (Production)
faiss-hnsw FAISS HNSW High-recall in-memory search
qdrant Qdrant Filtered vector search at scale
milvus Milvus Cluster Distributed production workloads
pgvector PostgreSQL pgvector Embeddings next to relational data
elasticsearch Elasticsearch Full-text + vector search combined
scann ScaNN Tree-AH State-of-the-art speed/recall (Linux)
usearch USearch HNSW High-performance C++, low latency
hnswlib HNSWLib Pure HNSW, easy to tune
weaviate Weaviate Multi-tenant, schema-driven search
duckdb DuckDB Analytical + vector hybrid queries
lance LanceDB Columnar Large disk-based datasets
bm25 BM25 (sparse) Keyword / sparse retrieval baseline
simple NumPy Exact Exact search, zero dependencies

...and 8 more variants including quantized (PQ/SQ8) and half-precision (f16/i8) indices.


Installation

pip install embenx

License

Distributed under the MIT License. See LICENSE for more information.


Built with ❀️ for the AI engineering community by adityak74

About

πŸš€ The Agentic Memory Layer & Universal Retrieval Toolkit. Synthetic data generation, 15+ vector backends, hybrid search, and MCP native memory for AI agents.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages