Skip to content

Latest commit

 

History

History
106 lines (78 loc) · 2.48 KB

File metadata and controls

106 lines (78 loc) · 2.48 KB

doc-agent

Document extraction and semantic search CLI with MCP integration. Extract structured data from invoices, receipts, and bank statements using Vision AI.

Features

  • 🔍 Document Extraction: Extract structured data from PDFs and images using Vision AI
  • 🦙 Ollama-First: Privacy-first default using local llama3.2-vision model
  • 🔧 Zero Setup: Auto-installs Ollama via Homebrew if needed, auto-pulls models
  • 📄 Multi-Format: Supports PDFs and images (PNG, JPEG, WebP)
  • 🔬 OCR-Enhanced: Uses Tesseract.js for accurate text extraction from receipts
  • 💾 Local Storage: All data persists to local SQLite database
  • 🔎 Semantic Search: Natural language search over indexed documents (coming soon)
  • 🤖 MCP Integration: Use via Claude Desktop or any MCP-compatible assistant
  • 🔒 Privacy-First: Data stays on your machine (unless you opt for cloud AI)

Quick Start

Installation

npm install -g doc-agent

Usage

Extract document data (uses Ollama by default):

doc extract invoice.pdf

💡 Don't have Ollama? No problem! The CLI will offer to install it for you via Homebrew.

With Gemini (cloud, higher accuracy):

export GEMINI_API_KEY=your_key_here
doc extract invoice.pdf --provider gemini

Start MCP server:

doc mcp

Claude Desktop Integration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "doc-agent": {
      "command": "npx",
      "args": ["-y", "doc-agent", "mcp"],
      "env": {
        "GEMINI_API_KEY": "your_key_here"
      }
    }
  }
}

Then in Claude Desktop:

"Extract data from ~/Downloads/invoice.pdf"

Development

# Clone and install dependencies
git clone https://github.com/prosdevlab/doc-agent
cd doc-agent
pnpm install

# Build the project
pnpm build

# Run CLI locally
pnpm dev extract examples/invoice.pdf

# Run tests
pnpm test

# Start MCP server
pnpm mcp

Architecture

The CLI is built with Ink (React for CLIs) for rich interactive output:

packages/
├── cli/           # Ink-based CLI with services, hooks, and components
├── core/          # Shared types and interfaces
├── extract/       # Document extraction (Gemini, Ollama) + OCR
├── storage/       # SQLite persistence (Drizzle ORM)
└── vector-store/  # Vector database for semantic search

Roadmap

See ROADMAP.md for the project plan.

License

MIT