A privacy-first document chat system that combines vector search with knowledge graphs for deeper document understanding.
Rich CLI interface with interactive chat, source citations, and explanation modes
Automated setup script handles all dependencies - ChromaDB, Neo4j, Ollama, and spaCy
- Why DocuChat?
- Technical Comparison
- Quick Start
- System Architecture
- CLI Commands
- Performance Specifications
- Use Cases
- Limitations
- Development
- License
- Support
- π Complete Privacy: All processing happens locally - your documents never leave your machine
- π§ Hybrid Intelligence: ChromaDB (vector) + Neo4j (knowledge graph) for deeper understanding
- π― Multi-step Reasoning: Q*-inspired planning breaks complex queries into logical steps
- π Explainable AI: Shows reasoning behind every answer with source citations
- π Entity Intelligence: spaCy NLP preserves entity boundaries during processing
- π Universal Support: PDF, DOCX, TXT, MD, HTML with OCR for scanned documents
| Feature | DocuChat | AnythingLLM | PrivateGPT | GraphRAG |
|---|---|---|---|---|
| Knowledge Graph | Neo4j | β | β | Built-in |
| Multi-step Planning | Q*-inspired | β | β | Framework Only |
| Self-Critique | Built-in | β | β | Framework Only |
| Entity-aware Chunking | spaCy NER | Standard | Standard | LLM-based |
| Interface | CLI + Rich UI | Web GUI | API/CLI | Python Library |
# Clone and setup
git clone https://github.com/dondetir/docuchat-agent_cli.git
cd docuchat-agent_cli
./setup_docuchat.sh
# Manual setup
python -m venv venv && source venv/bin/activate
pip install -e .# 1. Ingest documents
docuchat ingest ./documents
# 2. Start chatting
docuchat chat
# 3. Process web content
docuchat url https://example.com/docsYou: What security vulnerabilities were mentioned?
DocuChat: Found 3 security vulnerabilities:
1. SQL Injection Risk (Critical)
π Source: security_audit.pdf, page 15
Details: Unvalidated user input in payment module
2. Outdated SSL Certificates (Medium)
π Source: infrastructure_review.docx, section 4
Details: 5 certificates expiring within 30 days
3. Weak Password Policy (Low)
π Source: compliance_report.pdf, page 8
Details: No complexity requirements enforced
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DOCUCHAT SYSTEM β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β π Documents β Processing β Dual Storage β AI Agents β
β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Loaders β β Vector DB β β Query β β
β β (PDF,DOCX, ββ β (ChromaDB) ββ β Analyzer β β
β β TXT,MD) β β β β β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β β β β
β β β β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Entity NER β β Knowledge β β Reasoning β β
β β (spaCy) ββ β Graph(Neo4j)ββ β Planner β β
β β β β β β (Q*-inspiredβ β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β β
β βββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β PARALLEL RETRIEVAL ENGINE β β
β β βββββββββββββββββββ βββββββββββββββββββββββββββ β β
β β β Vector Search β β Graph Traversal β β β
β β β (Similarity) β β (Entity Relations) β β β
β β βββββββββββββββββββ βββββββββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Context β β Response β β Self- β β
β β Builder ββ β Generator ββ β Critique β β
β β β β (LLM) β β Loop β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Command | Description | Example |
|---|---|---|
ingest <folder> |
Process documents from folder | docuchat ingest ./docs |
chat |
Interactive chat with documents | docuchat chat --llm gemini:2.5-flash |
url <url> |
Process web content | docuchat url https://blog.com/post |
status |
Check system health | docuchat status --check-services |
reset-data |
Clear documents only | docuchat reset-data |
reset-all |
Complete system reset | docuchat reset-all |
# LLM models
docuchat chat --llm gemma3:1b # Local (default)
docuchat chat --llm gemini:2.5-flash # Cloud (fast)
docuchat chat --llm gemini:2.5-pro # Cloud (high quality)
# Personas
docuchat chat --persona medical # Medical expert responses
docuchat chat --persona technical # Technical expert responses
# Verbosity
docuchat chat --verbose # Show reasoning traces- Memory Usage: ~5.5GB with models loaded
- Query Response: 2-5 seconds typical
- Document Processing: ~50 documents/minute
- Tested Hardware: Intel i7-6500U (2 cores, 4 threads)
- Optimization: Automatic CPU feature detection and tuning
Verified in codebase and distinguishing from alternatives:
- Parallel Hybrid Retrieval - Simultaneous vector and graph search (
rag_workflow.py:213) - Q-Inspired Planning* - Multi-step reasoning for complex queries (
reasoning_planner.py) - Entity-Aware Chunking - Preserves semantic boundaries (
document_processor.py:645) - Self-Critique Loop - LLM evaluates its own responses (
response_generator.py:951-1176) - Explanation Modes - Four levels of reasoning transparency
Ideal for:
- Developers needing explainable AI with reasoning transparency
- Organizations requiring complete data privacy and local processing
- Complex document analysis involving entity relationships
- Research projects needing multi-step reasoning capabilities
docuchat/
βββ cli/ # Command-line interface
βββ core/ # Document processing and business logic
βββ agents/ # LangGraph workflow implementation
βββ integrations/ # ChromaDB and Neo4j clients
βββ models/ # Data models and schemas
βββ utils/ # Hardware optimization and utilities
βββ config/ # Configuration management
- CLI interface only (no web GUI currently)
- Requires Neo4j and ChromaDB setup
- Higher memory usage than vector-only solutions
- Newer project with growing community
# Development install
git clone https://github.com/dondetir/docuchat-agent_cli.git
cd docuchat-agent_cli
python -m venv venv && source venv/bin/activate
pip install -e ".[dev]"- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Make changes and add tests
- Run quality checks
- Submit pull request
MIT License
DocuChat is free and open-source software licensed under the MIT License. See LICENSE for details.
We appreciate:
- β Stars on GitHub
- π Bug reports and feature requests
- π€ Pull requests and contributions
- π’ Sharing DocuChat with others
Example attribution (optional but appreciated):
Powered by DocuChat (https://github.com/dondetir/docuchat-agent_cli)
- Documentation: See docs/ folder
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Contributing: See CONTRIBUTING.md
- Security: See SECURITY.md
Built for privacy, transparency, and developer empowerment.