This document provides comprehensive information for AI assistants working with the PDF-RAG codebase. It contains technical details, architecture patterns, and development guidelines.
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Next.js β β Node.js β β Python β
β Client βββββΊβ API Server βββββΊβ Processing β
β (Port 3500) β β (Port 3000) β β Service β
βββββββββββββββββββ βββββββββββββββββββ β (Port 8000) β
β βββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β RabbitMQ βββββΊβ PostgreSQL β
β (Message β β + pgvector β
β Queue) β βββββββββββββββββββ
βββββββββββββββββββ
- Upload: Client β API Server β RabbitMQ Queue
- Processing: RabbitMQ β Processing Service β PostgreSQL
- Chat: Client β API Server β Processing Service β LLM β Client
pdf-RAG/
βββ client/ # Next.js frontend
β βββ src/app/
β β βββ components/ # React components
β β β βββ PDFDropzone.tsx
β β β βββ ProcessingStatus.tsx
β β β βββ ui/ # Reusable UI components
β β βββ utils/ # Utility functions
β β βββ page.tsx # Main page
β βββ package.json
βββ server/ # Node.js API server
β βββ src/
β β βββ api/
β β β βββ controllers/ # Request handlers
β β β βββ routes/ # API endpoints
β β βββ services/
β β β βββ chat/ # Chat management
β β β βββ llm/ # LLM providers
β β β βββ queue/ # Message queue
β β β βββ websocket/ # WebSocket handling
β β βββ middleware/ # Express middleware
β β βββ types/ # TypeScript types
β β βββ index.ts # Server entry point
β βββ package.json
βββ processing-service/ # Python processing service
β βββ src/
β β βββ api/ # FastAPI endpoints
β β βββ process_pipeline/ # Document processing
β β β βββ extract.py # Text extraction (Docling)
β β β βββ chunk.py # Text chunking
β β β βββ embed.py # Embedding generation
β β βββ rag/ # RAG components
β β β βββ search.py # Vector search
β β βββ storage/ # Database management
β β βββ notifier/ # Status notifications
β β βββ main.py # Service entry point
β βββ requirements/
βββ docker-compose.dev.yml # Development setup
- Framework: Next.js 15 with App Router
- Language: TypeScript
- Styling: Tailwind CSS
- Components: Radix UI primitives
- File Handling: react-dropzone
- State Management: React hooks (useState, useEffect)
- Runtime: Node.js with TypeScript
- Framework: Express.js
- WebSocket: ws library
- Queue: amqplib (RabbitMQ client)
- File Upload: multer
- LLM Integration: @anthropic-ai/sdk, openai
- Framework: FastAPI
- Document Processing: Docling 2.24.0
- Embeddings: OpenAI text-embedding-3-large
- Database: psycopg2-binary, pgvector
- Queue: pika (RabbitMQ client)
- Async Processing: Threading
- Database: PostgreSQL with pgvector extension
- Message Queue: RabbitMQ
- Containerization: Docker & Docker Compose
- Database Admin: PgAdmin
-- Documents metadata
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
filename TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Text chunks with embeddings
CREATE TABLE chunks (
id SERIAL PRIMARY KEY,
document_id INTEGER REFERENCES documents(id),
chunk_text TEXT,
embedding vector(1536), -- OpenAI embedding dimension
page_numbers INTEGER[],
metadata JSONB,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Vector similarity index
CREATE INDEX chunks_embedding_idx
ON chunks USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);- Upload: File uploaded to
/api/document/upload - Queue: Job queued in RabbitMQ with metadata
- Extraction: Docling extracts text and structure
- Chunking: Text split into optimal chunks (configurable size)
- Embedding: OpenAI generates 1536-dimension vectors
- Storage: Chunks stored in PostgreSQL with metadata
- Notification: WebSocket notifies completion
- Query: User message sent to
/api/chat/chat - Embedding: Query converted to vector
- Search: Vector similarity search in PostgreSQL
- Context: Top-k results combined with conversation history
- Generation: LLM generates response with context
- Response: Answer returned to client
- Server: Express error middleware with structured responses
- Processing: Try-catch with retry logic and queue rejection
- Client: Error boundaries and user-friendly messages
- Server: Console logging with structured format
- Processing: Python logging with configurable levels
- Docker: JSON file logging with rotation
- Environment Variables: All secrets and config via .env
- Docker: Environment variables in docker-compose
- TypeScript: Strong typing for all interfaces
class ChatManager {
// Handles chat flow orchestration
async handleMessage(request: ChatRequest): Promise<ChatResponse>
private async getVectorSearchResults(query: string): Promise<VectorSearchResult[]>
private buildContext(searchResults: VectorSearchResult[], conversationId: string): string
private async generateResponse(context: string, userMessage: string): Promise<string>
}class LLMService {
// Multi-provider LLM integration
async generateResponse(request: LLMRequest): Promise<LLMResponse>
setProvider(provider: 'anthropic' | 'openai'): void
}class DocumentProcessor:
# Main processing pipeline
def process_document(self, file_id: str, file_path: str, metadata: Dict = None) -> Dictclass VectorSearch:
# Vector similarity search
def search(self, query: str, document_id: Optional[int] = None,
top_k: int = 5, min_score: float = 0.0) -> List[Dict[str, Any]]
def _generate_embedding(self, text: str) -> List[float]- Problem: Processing service OOM with large documents
- Solution: Increase memory limits in docker-compose.yml
- Monitoring:
docker statsto check memory usage
- Problem: RabbitMQ connection drops during processing
- Solution: Implement connection retry logic
- Monitoring: Check RabbitMQ management interface
- Problem: PostgreSQL connection pool exhaustion
- Solution: Implement connection pooling and proper cleanup
- Monitoring: Check database connection count
- Problem: WebSocket connections drop during long processing
- Solution: Implement heartbeat and reconnection logic
- Monitoring: Check WebSocket connection status
# Start infrastructure
docker-compose -f docker-compose.dev.yml up postgres rabbitmq pgadmin
# Start API server
cd server && npm run dev
# Start processing service
cd processing-service && python src/main.py
# Start client
cd client && npm run dev# Full development environment
npm run dev
# Production build
npm run build && npm start
# View logs
npm run logs# Connect to database
docker exec -it pdf_chat_rag-postgres-1 psql -U postgres -d ragdb
# Check vector extension
\dx
# View tables
\dt
# Query chunks
SELECT COUNT(*) FROM chunks;- Docling Models: ~6.5GB for OCR models (cached after first run)
- Processing Service: 2-7GB depending on document size
- Database: Varies with document count and chunk size
- Small PDFs (< 10 pages): 30-60 seconds
- Medium PDFs (10-50 pages): 2-5 minutes
- Large PDFs (50+ pages): 5-15 minutes
- Use connection pooling for database
- Implement caching for frequent queries
- Batch embedding generation when possible
- Use appropriate chunk sizes for your use case
- Store in environment variables only
- Never commit to version control
- Use different keys for development/production
- Validate file types and sizes
- Sanitize filenames
- Implement rate limiting
- Use connection strings with credentials
- Implement proper access controls
- Regular security updates
- Test individual functions and classes
- Mock external dependencies
- Test error handling paths
- Test API endpoints
- Test database operations
- Test queue processing
- Test complete user workflows
- Test file upload and processing
- Test chat functionality
- API server health endpoint
- Processing service health endpoint
- Database connectivity checks
- Processing time per document
- Queue depth and processing rate
- Database query performance
- Memory and CPU usage
- Structured logging with timestamps
- Error tracking and alerting
- Performance metrics logging
- Use production-grade database
- Implement proper backup strategies
- Set up monitoring and alerting
- Configure SSL/TLS certificates
- Horizontal scaling of services
- Database read replicas
- Load balancing for API servers
- Queue partitioning for processing
- Regular database maintenance
- Model updates and migrations
- Security patches and updates
- Performance optimization
This project demonstrates:
- Microservices Architecture: Service separation and communication
- RAG Implementation: Vector search and context retrieval
- Async Processing: Message queues and background jobs
- Real-time Communication: WebSocket implementation
- Container Orchestration: Docker Compose
- Vector Databases: PostgreSQL with pgvector
- LLM Integration: Multiple provider support
- TypeScript: Strong typing and interfaces
- Python: FastAPI and async processing
- React: Modern hooks and component patterns
- Follow TypeScript best practices
- Use meaningful variable and function names
- Add proper error handling
- Include type annotations
- Update README for new features
- Add inline code comments
- Document API changes
- Update this agents.md file
- Add tests for new features
- Test error scenarios
- Verify integration points
- Test performance implications