Moss

Real-time semantic search for AI agents. Sub-10 ms.

Moss is a sub-10 ms semantic search runtime built for Conversational AI agents. Hybrid retrieval (semantic + Keyword Search), built-in embeddings, metadata filtering, and a WebAssembly build that runs in the browser - all from a single SDK that embeds in your application.

No network hop on the hot path. No clusters to tune. Point the SDK at Moss Cloud, load your index, and query it in under 10 ms. Python, TypeScript, Elixir, and C.

Quickstart

Before you start: sign up at moss.dev for project_id and project_key - free tier available.

The snippets below need Python 3.10+ or Node.js 20+.

Python

pip install moss

from moss import MossClient, QueryOptions

client = MossClient("your_project_id", "your_project_key")

# Create an index and add documents
await client.create_index("support-docs", [
    {"id": "1", "text": "Refunds are processed within 3-5 business days."},
    {"id": "2", "text": "You can track your order on the dashboard."},
    {"id": "3", "text": "We offer 24/7 live chat support."},
])

# Load and query — results in <10 ms
await client.load_index("support-docs")
results = await client.query("support-docs", "how long do refunds take?", QueryOptions(top_k=3))

for doc in results.docs:
    print(f"[{doc.score:.3f}] {doc.text}")  # Returned in {results.time_taken_ms}ms

TypeScript

npm install @moss-dev/moss

import { MossClient } from "@moss-dev/moss";

const client = new MossClient("your_project_id", "your_project_key");

// Create an index and add documents
await client.createIndex("support-docs", [
  { id: "1", text: "Refunds are processed within 3-5 business days." },
  { id: "2", text: "You can track your order on the dashboard." },
  { id: "3", text: "We offer 24/7 live chat support." },
]);

// Load and query — results in <10 ms
await client.loadIndex("support-docs");
const results = await client.query("support-docs", "how long do refunds take?", { topK: 3 });

results.docs.forEach((doc) => {
  console.log(`[${doc.score.toFixed(3)}] ${doc.text}`); // Returned in ${results.timeTakenInMs}ms
});

Why Moss?

Most retrieval stacks call out to a remote vector database. The round trip alone runs 200–500 ms - enough to break a real-time conversation.

Moss runs search and embedding inside your process. There's no network hop on the hot path, so query latency lands in the single digits - fast enough that retrieval disappears from the latency budget. If you're building a voice bot, a copilot, or any agent that talks to humans, that's the difference between a tool that feels alive and one that feels laggy.

Benchmarks

End-to-end query latency (embedding + search) on 100,000 documents, 750 measured queries, top_k=5. Tested with Macbook pro (M4 Pro, 24GB).

System	P50	P95	P99	Mean
Moss	3.1 ms	4.3 ms	5.4 ms	3.3 ms
Pinecone	432.6 ms	732.1 ms	934.2 ms	485.8 ms
Qdrant	597.6 ms	682.0 ms	771.4 ms	596.5 ms
ChromaDB	351.8 ms	423.5 ms	538.5 ms	358.0 ms

Moss includes embedding in the measurement — competitors use an external embedding service (modal). Pinecone and Qdrant use cloud search.

Reproduce these benchmarks →

Moss isn't a database! It's a search runtime. You don't manage clusters, tune HNSW parameters, or worry about sharding. You index documents, load them into the runtime, and query. That's it.

Features

Sub-10 ms semantic search - single-digit-ms p99 in our benchmarks
Hybrid search - semantic + keyword in a single query
Built-in embedding models - no OpenAI key required (or bring your own)
Metadata filtering - $eq, $and, $in, $near operators
Runs in the browser too - separate WebAssembly SDK (@moss-dev/moss-web) for client-side semantic search with no server
Database connectors - ingest directly from SQLite, MongoDB, MySQL, and Supabase (packages/moss-data-connector/)
CLI - manage indexes and query from the terminal (packages/moss-cli/)
SDKs - Python (3.10+), TypeScript / Node.js (20+), Elixir, and C (libmoss)
Framework integrations - LangChain, DSPy, LlamaIndex, Pipecat, LiveKit, Vapi, ElevenLabs, Strands Agents

Examples

This repo contains working examples you can copy straight into your project:

examples/
├── python/                  # Python SDK samples
│   ├── load_and_query_sample.py
│   ├── comprehensive_sample.py
│   ├── custom_embedding_sample.py
│   └── metadata_filtering.py
├── python-classification/   # Classification example
├── javascript/              # TypeScript SDK samples
│   ├── load_and_query_sample.ts
│   ├── comprehensive_sample.ts
│   └── custom_embedding_sample.ts
├── javascript-web/          # Browser / WASM SDK samples
├── c/                       # C SDK samples (libmoss)
└── cookbook/                # Framework integrations
    ├── langchain/           # LangChain retriever
    ├── dspy/                # DSPy module
    ├── crewai/              # CrewAI integration
    ├── haystack/            # Haystack retriever
    ├── autogen/             # AutoGen integration
    ├── mastra/              # Mastra retriever
    ├── pydantic-ai/         # Pydantic AI integration
    └── daytona/             # Daytona sandbox example

apps/
├── next-js/                 # Next.js semantic search UI
├── pipecat-moss/            # Pipecat voice agent with Moss retrieval
├── vapi-moss/               # Vapi voice agent with Moss retrieval
├── elevenlabs-moss/         # ElevenLabs voice agent with Moss retrieval
├── livekit-moss-vercel/     # LiveKit voice agent on Vercel
├── agora-moss/              # Agora Conversational AI MCP server with Moss retrieval
├── moss-llamaindex/         # LlamaIndex RAG backend + frontend
├── moss-bun/                # Bun runtime example
└── docker/                  # Dockerized examples (ECS/K8s pattern)

Run the Python examples

cd examples/python
pip install -r requirements.txt
cp ../../.env.example .env   # Add your credentials
python load_and_query_sample.py

Run the TypeScript examples

cd examples/javascript
npm install
cp ../../.env.example .env   # Add your credentials
npx tsx load_and_query_sample.ts

Run the Next.js app

cd apps/next-js
npm install
cp ../../.env.example .env   # Add your credentials
npm run dev                  # Open http://localhost:3000

Run the Pipecat voice agent

Sub-10 ms retrieval plugged into Pipecat's real-time voice pipeline — a customer support agent that actually keeps up with conversation.

cd apps/pipecat-moss/pipecat-quickstart
# See README for setup and Pipecat Cloud deployment

Run the fully-local voice agent (Ollama + Moss + Pipecat)

A privacy-first voice AI stack: Ollama for LLM inference, Moss for retrieval, Pipecat for real-time audio - the LLM and retrieval both run on your machine.

cd apps/pipecat-moss/ollama-local
docker compose up

Full API reference: docs.moss.dev.

Integrations

Framework	Status	Example
LangChain	Available	`examples/cookbook/langchain/`
DSPy	Available	`examples/cookbook/dspy/`
LlamaIndex	Available	`apps/moss-llamaindex/`
CrewAI	Available	`examples/cookbook/crewai/`
AutoGen	Available	`examples/cookbook/autogen/`
Haystack	Available	`examples/cookbook/haystack/`
Mastra	Available	`examples/cookbook/mastra/`
Pydantic AI	Available	`examples/cookbook/pydantic-ai/`
Pipecat	Available	`apps/pipecat-moss/`
LiveKit	Available	`apps/livekit-moss-vercel/`
Vapi	Available	`apps/vapi-moss/`
ElevenLabs	Available	`apps/elevenlabs-moss/`
Agora	Available	`apps/agora-moss/`
Strands Agents	Available	`packages/strands-agents-moss/`
Next.js	Available	`apps/next-js/`
VitePress	Available	`packages/vitepress-plugin-moss/`
Vercel AI SDK	Available	`packages/vercel-sdk/`

Architecture

Three parts:

Moss Cloud - handles ingestion, document embedding, storage, and distribution. Point the SDK at it with a project ID and key.
Index - your documents and their vectors, packaged as a single artifact that lives on Moss Cloud.
Runtime - embedded in your application. It pulls indexes over HTTPS, holds them in memory, and serves queries locally.

Once an index is loaded, queries don't leave your process - that's where the sub-10 ms latency comes from. Document changes flow through Moss Cloud and the runtime stays in sync.

Two ways to run the runtime

Server-side - moss (Python) and @moss-dev/moss (Node.js 20+) embed the runtime in your backend. Use this when your agent runs on a server.
Browser - @moss-dev/moss-web is a WebAssembly build that downloads the index and runs queries entirely client-side, no server required. Use this for static sites, browser extensions, and offline-first apps. See examples/javascript-web/.

Full Python SDK source code is available at sdks/python/.

Contributing

We welcome contributions! Here's where the community can have the most impact:

New SDK bindings — Swift, Go, Elixir,...
Framework integrations — CrewAI, Haystack, AutoGen
Reranking support — plug in cross-encoder rerankers
Doc-parsing connectors — PDF, DOCX, HTML, Markdown ingestion
Examples and tutorials — if you build something with Moss, we'd love to feature it

See our Contributing Guide for setup instructions and our Roadmap for what's planned.

Check out issues labeled good first issue to get started.

Contributors

Community

Discord — ask questions, share what you're building
GitHub Issues — bug reports and feature requests
Twitter — announcements and updates

License

BSD 2-Clause License — the SDKs, examples, and integrations in this repo are fully open source.

_{Built by the team at Moss · Backed by Y Combinator}

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
.github		.github
apps		apps
assets		assets
benchmarks		benchmarks
examples		examples
moss-live-labs		moss-live-labs
packages		packages
scripts		scripts
sdks		sdks
.env.example		.env.example
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Moss

Real-time semantic search for AI agents. Sub-10 ms.

Quickstart

Python

TypeScript

Why Moss?

Benchmarks

Features

Examples

Run the Python examples

Run the TypeScript examples

Run the Next.js app

Run the Pipecat voice agent

Run the fully-local voice agent (Ollama + Moss + Pipecat)

Integrations

Architecture

Two ways to run the runtime

Contributing

Contributors

Community

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Moss

Real-time semantic search for AI agents. Sub-10 ms.

Quickstart

Python

TypeScript

Why Moss?

Benchmarks

Features

Examples

Run the Python examples

Run the TypeScript examples

Run the Next.js app

Run the Pipecat voice agent

Run the fully-local voice agent (Ollama + Moss + Pipecat)

Integrations

Architecture

Two ways to run the runtime

Contributing

Contributors

Community

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages