Skip to content

Complete knowledge base system with semantic search for Claude Code CLI. Powered by CocoIndex, PostgreSQL + pgvector, and MCP protocol.

License

Notifications You must be signed in to change notification settings

pkmdev-sec/cocoindex-claude-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CocoIndex Logo

CocoIndex Claude Code

Complete knowledge base system with semantic search for Claude Code CLI

Features β€’ Quick Start β€’ Architecture β€’ Usage β€’ Docs

Python 3.11+ MIT License Claude Code MCP Compatible CocoIndex


Give your Claude Code CLI semantic search superpowers over your local documents. This project integrates CocoIndex with Claude Code via the Model Context Protocol (MCP), enabling intelligent document search, LLM-powered metadata extraction, and a complete local knowledge base.

Features

  • Semantic Search - Find documents by meaning, not just keywords
  • MCP Integration - Direct access from Claude Code CLI via MCP protocol
  • LLM Extraction - Automatically extract titles, summaries, and key points using Claude
  • Vector Database - PostgreSQL with pgvector for fast similarity search
  • Live Updates - Documents are re-indexed automatically when changed
  • FastAPI Server - REST API endpoints for programmatic access
  • CocoInsight - Web UI for flow visualization

Architecture

graph TB
    subgraph "Claude Code CLI"
        CC[Claude Code] -->|MCP Protocol| MCP[CocoIndex MCP Server]
    end

    subgraph "CocoIndex Engine"
        MCP --> Search[Semantic Search]
        MCP --> Index[Document Indexing]
        MCP --> Extract[LLM Extraction]
    end

    subgraph "Data Layer"
        Search --> PG[(PostgreSQL + pgvector)]
        Index --> PG
        Extract --> PG
    end

    subgraph "External APIs"
        Index -->|Embeddings| OpenAI[OpenAI API]
        Extract -->|Extraction| Anthropic[Anthropic Claude]
    end

    Docs[Your Documents] -->|Watch| Index
Loading

Quick Start

Prerequisites

  • Docker Desktop - For PostgreSQL with pgvector
  • Python 3.11+ - Required by CocoIndex
  • OpenAI API Key - For text embeddings
  • Anthropic API Key - For LLM extraction (optional)

1. Clone & Setup

# Clone the repository
git clone https://github.com/puneet8800/cocoindex-claude-code.git
cd cocoindex-claude-code

# Run the setup script
./scripts/setup.sh

This will:

  • Start PostgreSQL with pgvector via Docker
  • Create Python virtual environment
  • Install all dependencies
  • Setup CocoIndex backend tables

2. Configure Environment

# Copy the example environment file
cp .env.example .env

# Edit .env and add your API keys
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...

3. Add Your Documents

# Add markdown, text, or Python files to be indexed
cp your-docs/*.md data/documents/

4. Index Documents

source .venv/bin/activate
cocoindex update main.py

5. Configure Claude Code

Add to your ~/.claude.json:

{
  "mcpServers": {
    "cocoindex": {
      "command": "/path/to/cocoindex-claude-code/.venv/bin/python",
      "args": ["/path/to/cocoindex-claude-code/mcp_server.py"],
      "env": {
        "COCOINDEX_DATABASE_URL": "postgres://cocoindex:cocoindex@localhost/cocoindex"
      }
    }
  }
}

6. Use in Claude Code

Now in Claude Code, you can use semantic search:

> Search my documents for authentication best practices
> What documents mention API rate limiting?
> Find all content related to database migrations

How It Works

Data Flow

sequenceDiagram
    participant U as User
    participant CC as Claude Code
    participant MCP as MCP Server
    participant CI as CocoIndex
    participant DB as PostgreSQL
    participant OAI as OpenAI

    U->>CC: "Search for auth docs"
    CC->>MCP: cocoindex_search(query)
    MCP->>CI: search(query)
    CI->>OAI: Generate query embedding
    OAI-->>CI: Vector [1536 dims]
    CI->>DB: Vector similarity search
    DB-->>CI: Top K results
    CI-->>MCP: Formatted results
    MCP-->>CC: JSON response
    CC-->>U: Relevant document chunks
Loading

Indexing Pipeline

flowchart LR
    A[πŸ“„ Documents] -->|LocalFile Source| B[Split into Chunks]
    B -->|2000 chars, 500 overlap| C[Generate Embeddings]
    C -->|text-embedding-3-small| D[Store Vectors]
    D -->|pgvector| E[(PostgreSQL)]
Loading

MCP Tools Available

When connected to Claude Code, these tools become available:

Tool Description
cocoindex_search Semantic search across indexed documents
cocoindex_index Re-index all documents
cocoindex_list List all indexed documents
cocoindex_metadata Get LLM-extracted metadata for a document
cocoindex_add Add a new document to the index

Usage

Command Line Search

# Activate virtual environment
source .venv/bin/activate

# Interactive search
python scripts/query.py

# Single query
python scripts/query.py "machine learning best practices"

# With result limit
python scripts/query.py -l 10 "python async patterns"

FastAPI Server

# Start the API server
python main.py

# Or with live reloading
uvicorn main:app --reload

API Endpoints:

  • GET / - API information
  • GET /health - Health check
  • GET /flows - List registered flows
  • GET /search?q=query&limit=5 - Search documents
  • GET /docs - Swagger UI

CocoInsight Web UI

# Start server with CocoInsight support
cocoindex server main.py -ci

# Open https://cocoindex.io and connect to localhost:49344

Project Structure

cocoindex-claude-code/
β”œβ”€β”€ flows/                  # CocoIndex flow definitions
β”‚   β”œβ”€β”€ text_embedding.py   # Vector search flow (OpenAI embeddings)
β”‚   └── llm_extraction.py   # LLM metadata extraction (Claude)
β”œβ”€β”€ scripts/                # Utility scripts
β”‚   β”œβ”€β”€ setup.sh            # Initial setup
β”‚   β”œβ”€β”€ start.sh            # Start services
β”‚   β”œβ”€β”€ stop.sh             # Stop services
β”‚   └── query.py            # Interactive search CLI
β”œβ”€β”€ docker/                 # Docker configuration
β”‚   └── compose.yaml        # PostgreSQL with pgvector
β”œβ”€β”€ data/documents/         # Your documents go here
β”œβ”€β”€ docs/                   # Documentation
β”œβ”€β”€ main.py                 # FastAPI entry point
β”œβ”€β”€ mcp_server.py           # MCP server for Claude Code
β”œβ”€β”€ pyproject.toml          # Python dependencies
└── .env.example            # Environment template

Documentation

Flows

TextEmbedding Flow

Indexes documents for semantic search:

  1. Reads files from data/documents/
  2. Splits into chunks (2000 chars, 500 overlap)
  3. Generates embeddings with OpenAI text-embedding-3-small
  4. Stores in PostgreSQL with vector index

Supported file types: .md, .txt, .py

LLMExtraction Flow

Extracts structured metadata using Claude:

  • Title
  • Summary
  • Key points
  • Topics
  • Document type

Requires: ANTHROPIC_API_KEY in .env

CLI Commands Reference

# List all flows
cocoindex ls main.py

# Show flow details
cocoindex show main.py:TextEmbedding

# Setup backend tables
cocoindex setup main.py -f

# Update index (one-time)
cocoindex update main.py

# Update index (continuous/live)
cocoindex update main.py -L

# Evaluate without exporting
cocoindex evaluate main.py -o ./output

# Start server with CocoInsight
cocoindex server main.py -ci -L --reload

# Drop all backend tables
cocoindex drop main.py

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Acknowledgments

  • CocoIndex - The indexing engine powering this project
  • Claude Code - Anthropic's CLI for Claude
  • pgvector - Vector similarity search for PostgreSQL

Made with ❀️ for the Claude Code community

About

Complete knowledge base system with semantic search for Claude Code CLI. Powered by CocoIndex, PostgreSQL + pgvector, and MCP protocol.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published