CocoIndex Claude Code

Complete knowledge base system with semantic search for Claude Code CLI

Features • Quick Start • Architecture • Usage • Docs

Give your Claude Code CLI semantic search superpowers over your local documents. This project integrates CocoIndex with Claude Code via the Model Context Protocol (MCP), enabling intelligent document search, LLM-powered metadata extraction, and a complete local knowledge base.

Features

Semantic Search - Find documents by meaning, not just keywords
MCP Integration - Direct access from Claude Code CLI via MCP protocol
LLM Extraction - Automatically extract titles, summaries, and key points using Claude
Vector Database - PostgreSQL with pgvector for fast similarity search
Live Updates - Documents are re-indexed automatically when changed
FastAPI Server - REST API endpoints for programmatic access
CocoInsight - Web UI for flow visualization

Architecture

graph TB
    subgraph "Claude Code CLI"
        CC[Claude Code] -->|MCP Protocol| MCP[CocoIndex MCP Server]
    end

    subgraph "CocoIndex Engine"
        MCP --> Search[Semantic Search]
        MCP --> Index[Document Indexing]
        MCP --> Extract[LLM Extraction]
    end

    subgraph "Data Layer"
        Search --> PG[(PostgreSQL + pgvector)]
        Index --> PG
        Extract --> PG
    end

    subgraph "External APIs"
        Index -->|Embeddings| OpenAI[OpenAI API]
        Extract -->|Extraction| Anthropic[Anthropic Claude]
    end

    Docs[Your Documents] -->|Watch| Index

Quick Start

Prerequisites

Docker Desktop - For PostgreSQL with pgvector
Python 3.11+ - Required by CocoIndex
OpenAI API Key - For text embeddings
Anthropic API Key - For LLM extraction (optional)

1. Clone & Setup

# Clone the repository
git clone https://github.com/puneet8800/cocoindex-claude-code.git
cd cocoindex-claude-code

# Run the setup script
./scripts/setup.sh

This will:

Start PostgreSQL with pgvector via Docker
Create Python virtual environment
Install all dependencies
Setup CocoIndex backend tables

2. Configure Environment

# Copy the example environment file
cp .env.example .env

# Edit .env and add your API keys
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...

3. Add Your Documents

# Add markdown, text, or Python files to be indexed
cp your-docs/*.md data/documents/

4. Index Documents

source .venv/bin/activate
cocoindex update main.py

5. Configure Claude Code

Add to your ~/.claude.json:

{
  "mcpServers": {
    "cocoindex": {
      "command": "/path/to/cocoindex-claude-code/.venv/bin/python",
      "args": ["/path/to/cocoindex-claude-code/mcp_server.py"],
      "env": {
        "COCOINDEX_DATABASE_URL": "postgres://cocoindex:cocoindex@localhost/cocoindex"
      }
    }
  }
}

6. Use in Claude Code

Now in Claude Code, you can use semantic search:

> Search my documents for authentication best practices
> What documents mention API rate limiting?
> Find all content related to database migrations

How It Works

Data Flow

sequenceDiagram
    participant U as User
    participant CC as Claude Code
    participant MCP as MCP Server
    participant CI as CocoIndex
    participant DB as PostgreSQL
    participant OAI as OpenAI

    U->>CC: "Search for auth docs"
    CC->>MCP: cocoindex_search(query)
    MCP->>CI: search(query)
    CI->>OAI: Generate query embedding
    OAI-->>CI: Vector [1536 dims]
    CI->>DB: Vector similarity search
    DB-->>CI: Top K results
    CI-->>MCP: Formatted results
    MCP-->>CC: JSON response
    CC-->>U: Relevant document chunks

Indexing Pipeline

flowchart LR
    A[📄 Documents] -->|LocalFile Source| B[Split into Chunks]
    B -->|2000 chars, 500 overlap| C[Generate Embeddings]
    C -->|text-embedding-3-small| D[Store Vectors]
    D -->|pgvector| E[(PostgreSQL)]

MCP Tools Available

When connected to Claude Code, these tools become available:

Tool	Description
`cocoindex_search`	Semantic search across indexed documents
`cocoindex_index`	Re-index all documents
`cocoindex_list`	List all indexed documents
`cocoindex_metadata`	Get LLM-extracted metadata for a document
`cocoindex_add`	Add a new document to the index

Usage

Command Line Search

# Activate virtual environment
source .venv/bin/activate

# Interactive search
python scripts/query.py

# Single query
python scripts/query.py "machine learning best practices"

# With result limit
python scripts/query.py -l 10 "python async patterns"

FastAPI Server

# Start the API server
python main.py

# Or with live reloading
uvicorn main:app --reload

API Endpoints:

GET / - API information
GET /health - Health check
GET /flows - List registered flows
GET /search?q=query&limit=5 - Search documents
GET /docs - Swagger UI

CocoInsight Web UI

# Start server with CocoInsight support
cocoindex server main.py -ci

# Open https://cocoindex.io and connect to localhost:49344

Project Structure

cocoindex-claude-code/
├── flows/                  # CocoIndex flow definitions
│   ├── text_embedding.py   # Vector search flow (OpenAI embeddings)
│   └── llm_extraction.py   # LLM metadata extraction (Claude)
├── scripts/                # Utility scripts
│   ├── setup.sh            # Initial setup
│   ├── start.sh            # Start services
│   ├── stop.sh             # Stop services
│   └── query.py            # Interactive search CLI
├── docker/                 # Docker configuration
│   └── compose.yaml        # PostgreSQL with pgvector
├── data/documents/         # Your documents go here
├── docs/                   # Documentation
├── main.py                 # FastAPI entry point
├── mcp_server.py           # MCP server for Claude Code
├── pyproject.toml          # Python dependencies
└── .env.example            # Environment template

Documentation

Getting Started Guide - Detailed setup instructions
Architecture - System design and components
MCP Integration - Claude Code configuration
Flows Explained - CocoIndex flows documentation
Troubleshooting - Common issues and solutions

Flows

TextEmbedding Flow

Indexes documents for semantic search:

Reads files from data/documents/
Splits into chunks (2000 chars, 500 overlap)
Generates embeddings with OpenAI text-embedding-3-small
Stores in PostgreSQL with vector index

Supported file types: .md, .txt, .py

LLMExtraction Flow

Extracts structured metadata using Claude:

Title
Summary
Key points
Topics
Document type

Requires: ANTHROPIC_API_KEY in .env

CLI Commands Reference

# List all flows
cocoindex ls main.py

# Show flow details
cocoindex show main.py:TextEmbedding

# Setup backend tables
cocoindex setup main.py -f

# Update index (one-time)
cocoindex update main.py

# Update index (continuous/live)
cocoindex update main.py -L

# Evaluate without exporting
cocoindex evaluate main.py -o ./output

# Start server with CocoInsight
cocoindex server main.py -ci -L --reload

# Drop all backend tables
cocoindex drop main.py

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Acknowledgments

CocoIndex - The indexing engine powering this project
Claude Code - Anthropic's CLI for Claude
pgvector - Vector similarity search for PostgreSQL

Made with ❤️ for the Claude Code community

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CocoIndex Claude Code

Features

Architecture

Quick Start

Prerequisites

1. Clone & Setup

2. Configure Environment

3. Add Your Documents

4. Index Documents

5. Configure Claude Code

6. Use in Claude Code

How It Works

Data Flow

Indexing Pipeline

MCP Tools Available

Usage

Command Line Search

FastAPI Server

CocoInsight Web UI

Project Structure

Documentation

Flows

TextEmbedding Flow

LLMExtraction Flow

CLI Commands Reference

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data/documents		data/documents
docker		docker
docs		docs
flows		flows
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
mcp_server.py		mcp_server.py
pyproject.toml		pyproject.toml

License

pkmdev-sec/cocoindex-claude-code

Folders and files

Latest commit

History

Repository files navigation

CocoIndex Claude Code

Features

Architecture

Quick Start

Prerequisites

1. Clone & Setup

2. Configure Environment

3. Add Your Documents

4. Index Documents

5. Configure Claude Code

6. Use in Claude Code

How It Works

Data Flow

Indexing Pipeline

MCP Tools Available

Usage

Command Line Search

FastAPI Server

CocoInsight Web UI

Project Structure

Documentation

Flows

TextEmbedding Flow

LLMExtraction Flow

CLI Commands Reference

Contributing

License

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages