Complete knowledge base system with semantic search for Claude Code CLI
Features β’ Quick Start β’ Architecture β’ Usage β’ Docs
Give your Claude Code CLI semantic search superpowers over your local documents. This project integrates CocoIndex with Claude Code via the Model Context Protocol (MCP), enabling intelligent document search, LLM-powered metadata extraction, and a complete local knowledge base.
- Semantic Search - Find documents by meaning, not just keywords
- MCP Integration - Direct access from Claude Code CLI via MCP protocol
- LLM Extraction - Automatically extract titles, summaries, and key points using Claude
- Vector Database - PostgreSQL with pgvector for fast similarity search
- Live Updates - Documents are re-indexed automatically when changed
- FastAPI Server - REST API endpoints for programmatic access
- CocoInsight - Web UI for flow visualization
graph TB
subgraph "Claude Code CLI"
CC[Claude Code] -->|MCP Protocol| MCP[CocoIndex MCP Server]
end
subgraph "CocoIndex Engine"
MCP --> Search[Semantic Search]
MCP --> Index[Document Indexing]
MCP --> Extract[LLM Extraction]
end
subgraph "Data Layer"
Search --> PG[(PostgreSQL + pgvector)]
Index --> PG
Extract --> PG
end
subgraph "External APIs"
Index -->|Embeddings| OpenAI[OpenAI API]
Extract -->|Extraction| Anthropic[Anthropic Claude]
end
Docs[Your Documents] -->|Watch| Index
- Docker Desktop - For PostgreSQL with pgvector
- Python 3.11+ - Required by CocoIndex
- OpenAI API Key - For text embeddings
- Anthropic API Key - For LLM extraction (optional)
# Clone the repository
git clone https://github.com/puneet8800/cocoindex-claude-code.git
cd cocoindex-claude-code
# Run the setup script
./scripts/setup.shThis will:
- Start PostgreSQL with pgvector via Docker
- Create Python virtual environment
- Install all dependencies
- Setup CocoIndex backend tables
# Copy the example environment file
cp .env.example .env
# Edit .env and add your API keys
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...# Add markdown, text, or Python files to be indexed
cp your-docs/*.md data/documents/source .venv/bin/activate
cocoindex update main.pyAdd to your ~/.claude.json:
{
"mcpServers": {
"cocoindex": {
"command": "/path/to/cocoindex-claude-code/.venv/bin/python",
"args": ["/path/to/cocoindex-claude-code/mcp_server.py"],
"env": {
"COCOINDEX_DATABASE_URL": "postgres://cocoindex:cocoindex@localhost/cocoindex"
}
}
}
}Now in Claude Code, you can use semantic search:
> Search my documents for authentication best practices
> What documents mention API rate limiting?
> Find all content related to database migrations
sequenceDiagram
participant U as User
participant CC as Claude Code
participant MCP as MCP Server
participant CI as CocoIndex
participant DB as PostgreSQL
participant OAI as OpenAI
U->>CC: "Search for auth docs"
CC->>MCP: cocoindex_search(query)
MCP->>CI: search(query)
CI->>OAI: Generate query embedding
OAI-->>CI: Vector [1536 dims]
CI->>DB: Vector similarity search
DB-->>CI: Top K results
CI-->>MCP: Formatted results
MCP-->>CC: JSON response
CC-->>U: Relevant document chunks
flowchart LR
A[π Documents] -->|LocalFile Source| B[Split into Chunks]
B -->|2000 chars, 500 overlap| C[Generate Embeddings]
C -->|text-embedding-3-small| D[Store Vectors]
D -->|pgvector| E[(PostgreSQL)]
When connected to Claude Code, these tools become available:
| Tool | Description |
|---|---|
cocoindex_search |
Semantic search across indexed documents |
cocoindex_index |
Re-index all documents |
cocoindex_list |
List all indexed documents |
cocoindex_metadata |
Get LLM-extracted metadata for a document |
cocoindex_add |
Add a new document to the index |
# Activate virtual environment
source .venv/bin/activate
# Interactive search
python scripts/query.py
# Single query
python scripts/query.py "machine learning best practices"
# With result limit
python scripts/query.py -l 10 "python async patterns"# Start the API server
python main.py
# Or with live reloading
uvicorn main:app --reloadAPI Endpoints:
GET /- API informationGET /health- Health checkGET /flows- List registered flowsGET /search?q=query&limit=5- Search documentsGET /docs- Swagger UI
# Start server with CocoInsight support
cocoindex server main.py -ci
# Open https://cocoindex.io and connect to localhost:49344cocoindex-claude-code/
βββ flows/ # CocoIndex flow definitions
β βββ text_embedding.py # Vector search flow (OpenAI embeddings)
β βββ llm_extraction.py # LLM metadata extraction (Claude)
βββ scripts/ # Utility scripts
β βββ setup.sh # Initial setup
β βββ start.sh # Start services
β βββ stop.sh # Stop services
β βββ query.py # Interactive search CLI
βββ docker/ # Docker configuration
β βββ compose.yaml # PostgreSQL with pgvector
βββ data/documents/ # Your documents go here
βββ docs/ # Documentation
βββ main.py # FastAPI entry point
βββ mcp_server.py # MCP server for Claude Code
βββ pyproject.toml # Python dependencies
βββ .env.example # Environment template
- Getting Started Guide - Detailed setup instructions
- Architecture - System design and components
- MCP Integration - Claude Code configuration
- Flows Explained - CocoIndex flows documentation
- Troubleshooting - Common issues and solutions
Indexes documents for semantic search:
- Reads files from
data/documents/ - Splits into chunks (2000 chars, 500 overlap)
- Generates embeddings with OpenAI
text-embedding-3-small - Stores in PostgreSQL with vector index
Supported file types: .md, .txt, .py
Extracts structured metadata using Claude:
- Title
- Summary
- Key points
- Topics
- Document type
Requires: ANTHROPIC_API_KEY in .env
# List all flows
cocoindex ls main.py
# Show flow details
cocoindex show main.py:TextEmbedding
# Setup backend tables
cocoindex setup main.py -f
# Update index (one-time)
cocoindex update main.py
# Update index (continuous/live)
cocoindex update main.py -L
# Evaluate without exporting
cocoindex evaluate main.py -o ./output
# Start server with CocoInsight
cocoindex server main.py -ci -L --reload
# Drop all backend tables
cocoindex drop main.pyWe welcome contributions! Please see CONTRIBUTING.md for guidelines.
MIT License - see LICENSE for details.
- CocoIndex - The indexing engine powering this project
- Claude Code - Anthropic's CLI for Claude
- pgvector - Vector similarity search for PostgreSQL
Made with β€οΈ for the Claude Code community
