Skip to content

dondetir/docuchat-agent_cli

DocuChat - Local Document Intelligence with Knowledge Graphs

A privacy-first document chat system that combines vector search with knowledge graphs for deeper document understanding.

Python License: MIT CI Code style: black PRs Welcome


πŸ“Έ Demo

Interactive Chat Interface

DocuChat Main Interface Rich CLI interface with interactive chat, source citations, and explanation modes

One-Click Setup

One-Click Installation Automated setup script handles all dependencies - ChromaDB, Neo4j, Ollama, and spaCy


πŸ“– Table of Contents


Why DocuChat?

  • πŸ”’ Complete Privacy: All processing happens locally - your documents never leave your machine
  • 🧠 Hybrid Intelligence: ChromaDB (vector) + Neo4j (knowledge graph) for deeper understanding
  • 🎯 Multi-step Reasoning: Q*-inspired planning breaks complex queries into logical steps
  • πŸ“Š Explainable AI: Shows reasoning behind every answer with source citations
  • πŸ” Entity Intelligence: spaCy NLP preserves entity boundaries during processing
  • πŸ“„ Universal Support: PDF, DOCX, TXT, MD, HTML with OCR for scanned documents

Technical Comparison

Feature DocuChat AnythingLLM PrivateGPT GraphRAG
Knowledge Graph Neo4j ❌ ❌ Built-in
Multi-step Planning Q*-inspired ❌ ❌ Framework Only
Self-Critique Built-in ❌ ❌ Framework Only
Entity-aware Chunking spaCy NER Standard Standard LLM-based
Interface CLI + Rich UI Web GUI API/CLI Python Library

Quick Start

Installation

# Clone and setup
git clone https://github.com/dondetir/docuchat-agent_cli.git
cd docuchat-agent_cli
./setup_docuchat.sh

# Manual setup
python -m venv venv && source venv/bin/activate
pip install -e .

Basic Usage

# 1. Ingest documents
docuchat ingest ./documents

# 2. Start chatting
docuchat chat

# 3. Process web content
docuchat url https://example.com/docs

Example Conversation

You: What security vulnerabilities were mentioned?

DocuChat: Found 3 security vulnerabilities:

1. SQL Injection Risk (Critical)
   πŸ“„ Source: security_audit.pdf, page 15
   Details: Unvalidated user input in payment module

2. Outdated SSL Certificates (Medium)
   πŸ“„ Source: infrastructure_review.docx, section 4
   Details: 5 certificates expiring within 30 days

3. Weak Password Policy (Low)
   πŸ“„ Source: compliance_report.pdf, page 8
   Details: No complexity requirements enforced

System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   DOCUCHAT SYSTEM                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                         β”‚
β”‚  πŸ“„ Documents β†’ Processing β†’ Dual Storage β†’ AI Agents  β”‚
β”‚                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚   Loaders   β”‚   β”‚  Vector DB  β”‚   β”‚    Query    β”‚    β”‚
β”‚  β”‚ (PDF,DOCX,  β”‚β†’  β”‚ (ChromaDB)  β”‚β†’  β”‚  Analyzer   β”‚    β”‚
β”‚  β”‚  TXT,MD)    β”‚   β”‚             β”‚   β”‚             β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚         β”‚                   β”‚               β”‚           β”‚
β”‚         ↓                   β”‚               ↓           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚ Entity NER  β”‚   β”‚ Knowledge   β”‚   β”‚ Reasoning   β”‚    β”‚
β”‚  β”‚  (spaCy)    β”‚β†’  β”‚ Graph(Neo4j)β”‚β†’  β”‚  Planner    β”‚    β”‚
β”‚  β”‚             β”‚   β”‚             β”‚   β”‚ (Q*-inspiredβ”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                              β”‚           β”‚
β”‚                           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚
β”‚                           ↓                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚         PARALLEL RETRIEVAL ENGINE                β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚   β”‚
β”‚  β”‚  β”‚ Vector Search   β”‚  β”‚   Graph Traversal       β”‚ β”‚   β”‚
β”‚  β”‚  β”‚ (Similarity)    β”‚  β”‚ (Entity Relations)      β”‚ β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                           β”‚                             β”‚
β”‚                           ↓                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚ Context     β”‚   β”‚ Response    β”‚   β”‚ Self-       β”‚    β”‚
β”‚  β”‚ Builder     β”‚β†’  β”‚ Generator   β”‚β†’  β”‚ Critique    β”‚    β”‚
β”‚  β”‚             β”‚   β”‚ (LLM)       β”‚   β”‚ Loop        β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

CLI Commands

Command Description Example
ingest <folder> Process documents from folder docuchat ingest ./docs
chat Interactive chat with documents docuchat chat --llm gemini:2.5-flash
url <url> Process web content docuchat url https://blog.com/post
status Check system health docuchat status --check-services
reset-data Clear documents only docuchat reset-data
reset-all Complete system reset docuchat reset-all

Chat Options

# LLM models
docuchat chat --llm gemma3:1b          # Local (default)
docuchat chat --llm gemini:2.5-flash   # Cloud (fast)
docuchat chat --llm gemini:2.5-pro     # Cloud (high quality)

# Personas
docuchat chat --persona medical        # Medical expert responses
docuchat chat --persona technical      # Technical expert responses

# Verbosity
docuchat chat --verbose                # Show reasoning traces

Performance Specifications

  • Memory Usage: ~5.5GB with models loaded
  • Query Response: 2-5 seconds typical
  • Document Processing: ~50 documents/minute
  • Tested Hardware: Intel i7-6500U (2 cores, 4 threads)
  • Optimization: Automatic CPU feature detection and tuning

Unique Capabilities

Verified in codebase and distinguishing from alternatives:

  1. Parallel Hybrid Retrieval - Simultaneous vector and graph search (rag_workflow.py:213)
  2. Q-Inspired Planning* - Multi-step reasoning for complex queries (reasoning_planner.py)
  3. Entity-Aware Chunking - Preserves semantic boundaries (document_processor.py:645)
  4. Self-Critique Loop - LLM evaluates its own responses (response_generator.py:951-1176)
  5. Explanation Modes - Four levels of reasoning transparency

Use Cases

Ideal for:

  • Developers needing explainable AI with reasoning transparency
  • Organizations requiring complete data privacy and local processing
  • Complex document analysis involving entity relationships
  • Research projects needing multi-step reasoning capabilities

Project Structure

docuchat/
β”œβ”€β”€ cli/              # Command-line interface
β”œβ”€β”€ core/             # Document processing and business logic
β”œβ”€β”€ agents/           # LangGraph workflow implementation
β”œβ”€β”€ integrations/     # ChromaDB and Neo4j clients
β”œβ”€β”€ models/           # Data models and schemas
β”œβ”€β”€ utils/            # Hardware optimization and utilities
└── config/           # Configuration management

Limitations

  • CLI interface only (no web GUI currently)
  • Requires Neo4j and ChromaDB setup
  • Higher memory usage than vector-only solutions
  • Newer project with growing community

Development

Setup

# Development install
git clone https://github.com/dondetir/docuchat-agent_cli.git
cd docuchat-agent_cli
python -m venv venv && source venv/bin/activate
pip install -e ".[dev]"

Contributing

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing-feature)
  3. Make changes and add tests
  4. Run quality checks
  5. Submit pull request

License

MIT License

DocuChat is free and open-source software licensed under the MIT License. See LICENSE for details.

We appreciate:

  • ⭐ Stars on GitHub
  • πŸ› Bug reports and feature requests
  • 🀝 Pull requests and contributions
  • πŸ“’ Sharing DocuChat with others

Example attribution (optional but appreciated):

Powered by DocuChat (https://github.com/dondetir/docuchat-agent_cli)

Support


Built for privacy, transparency, and developer empowerment.

About

DocuChat - Local Document Intelligence with Knowledge Graphs

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors