Skip to content

Latest commit

 

History

History
750 lines (617 loc) · 22.2 KB

File metadata and controls

750 lines (617 loc) · 22.2 KB

RAG Model - Document Q&A System

A sophisticated Retrieval-Augmented Generation (RAG) system built with FastAPI that enables intelligent document-based question answering. This application combines the power of OpenAI's language models with vector search capabilities to provide contextually accurate responses based on uploaded documents.

🚀 Features

Core Functionality

  • Document Upload & Processing: Upload PDF documents that are automatically processed and vectorized
  • Intelligent Q&A: Ask questions about your documents and get contextually relevant answers
  • Vector Search: Advanced semantic search using Qdrant vector database
  • Chat History: Persistent conversation history with context awareness
  • User Authentication: Secure JWT-based authentication system
  • Real-time Processing: Efficient document chunking and embedding generation

Technical Highlights

  • RAG Architecture: Combines retrieval and generation for accurate responses
  • Vector Embeddings: Uses OpenAI's text-embedding-3-large model (3072 dimensions)
  • Scalable Database: PostgreSQL for relational data, Qdrant for vector storage
  • Modern API: RESTful API with automatic OpenAPI documentation
  • Production Ready: Comprehensive logging, error handling, and database migrations

📋 Table of Contents

🏗️ Architecture

System Overview

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   FastAPI App   │    │   PostgreSQL    │    │     Qdrant      │
│                 │    │                 │    │   Vector DB     │
│  ┌───────────┐  │    │  ┌───────────┐  │    │  ┌───────────┐  │
│  │ Auth      │  │◄──►│  │   Users   │  │    │  │ Embeddings│  │
│  │ Routes    │  │    │  │ Documents │  │    │  │ Vectors   │  │
│  └───────────┘  │    │  │ Messages  │  │    │  │ Metadata  │  │
│  ┌───────────┐  │    │  └───────────┘  │    │  └───────────┘  │
│  │ Chat      │  │    └─────────────────┘    └─────────────────┘
│  │ Routes    │  │              │                       │
│  └───────────┘  │              │                       │
│  ┌───────────┐  │              │                       │
│  │ Doc       │  │              │                       │
│  │ Routes    │  │              │                       │
│  └───────────┘  │              │                       │
└─────────────────┘              │                       │
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   OpenAI API    │    │   SQLAlchemy    │    │ Qdrant Client   │
│                 │    │      ORM        │    │                 │
│ ┌─────────────┐ │    │                 │    │                 │
│ │ GPT-4.1     │ │    │                 │    │                 │
│ │ Embeddings  │ │    │                 │    │                 │
│ └─────────────┘ │    │                 │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘

RAG Pipeline

  1. Document Ingestion: PDF files are uploaded and processed
  2. Text Extraction: Content is extracted using LlamaIndex PDFReader
  3. Chunking: Text is split into manageable chunks (1000 chars, 100 overlap)
  4. Vectorization: Chunks are converted to embeddings using OpenAI
  5. Storage: Vectors stored in Qdrant with metadata
  6. Query Processing: User questions are vectorized and matched
  7. Context Retrieval: Relevant chunks are retrieved based on similarity
  8. Response Generation: OpenAI generates answers using retrieved context

Technology Stack

  • Backend Framework: FastAPI 0.116.1+
  • Language Model: OpenAI GPT-4.1
  • Embeddings: OpenAI text-embedding-3-large (3072D)
  • Vector Database: Qdrant 1.16.1
  • Relational Database: PostgreSQL with SQLAlchemy 2.0.44
  • Authentication: JWT with python-jose 3.5.0
  • Password Hashing: Argon2 via passlib 1.7.4
  • Document Processing: LlamaIndex for PDF parsing
  • Migration Management: Alembic 1.17.2
  • Environment Management: UV package manager

🛠️ Installation

Prerequisites

  • Python: 3.13+ (specified in pyproject.toml)
  • PostgreSQL: 12+ for relational data storage
  • Qdrant: Vector database (can run via Docker)
  • OpenAI API Key: For language model and embeddings

Quick Start

  1. Clone the Repository

    git clone <repository-url>
    cd Rag-Model
  2. Set Up Python Environment

    # Using UV (recommended)
    uv venv
    uv pip install -e .
    
    # Or using pip
    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    pip install -e .
  3. Install Dependencies

    # Production dependencies
    uv pip install -r pyproject.toml
    
    # Development dependencies (optional)
    uv pip install -e .[dev]
  4. Set Up Databases

    # Start Qdrant (using Docker)
    docker run -p 6333:6333 qdrant/qdrant
    
    # Ensure PostgreSQL is running
    # Create database: rag_model
  5. Configure Environment

    cp .env.example .env
    # Edit .env with your configuration
  6. Run Database Migrations

    alembic upgrade head
  7. Start the Application

    uvicorn main:app --reload --host 0.0.0.0 --port 8000

⚙️ Configuration

Environment Variables

Create a .env file in the project root with the following variables:

# OpenAI Configuration
OPENAPI_API_KEY=<--value goes here-->               (required)

# Model Configuration
MODEL_NAME=<--value goes here-->                    (defaults: gpt-4.1 )
EMBED_MODEL=<--value goes here-->                   (defaults: text-embedding-3-large)
EMBED_SIZE=<--value goes here-->                    (defaults: 3072)

# Database Configuration
DB_STRING=<--value goes here-->                     (required)
ECHO_SQL=<--value goes here-->                      (defaults: False)
DB_SCHEMA=<--value goes here-->                     (defaults rag_model)

# Vector Database
VECTOR_DB_URL=<--value goes here-->                 (defaults: http://localhost:6333)
VECTOR_INCLUSION_THRESHOLD=<--value goes here-->    (defaults: 0.5)

# Authentication
JWT_SECRET_KEY=<--value goes here-->                (required)
JWT_ALGORITHM=<--value goes here-->                 (defaults: HS256)

# Application Settings
FALLBACK_MESSAGE=Sorry, Could not generate a message. Please try again later.
LOG_FILE=app.log

Configuration Details

Model Settings

  • MODEL_NAME: OpenAI model for chat completions (default: gpt-4.1)
  • EMBED_MODEL: Embedding model (default: text-embedding-3-large)
  • EMBED_SIZE: Embedding dimensions (3072 for text-embedding-3-large)

Database Settings

  • DB_STRING: PostgreSQL connection string
  • VECTOR_DB_URL: Qdrant server URL
  • VECTOR_INCLUSION_THRESHOLD: Minimum similarity score for including documents (0.0-1.0)

Security Settings

  • JWT_SECRET_KEY: Secret key for JWT token signing (use a strong, random key)
  • JWT_ALGORITHM: JWT signing algorithm (HS256 recommended)

📖 Usage

Starting the Application

  1. Development Mode

    uvicorn main:app --reload --host 0.0.0.0 --port 8000
  2. Production Mode

    uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4
  3. Access the Application

Basic Workflow

  1. Register/Login

    # Register a new user
    curl -X POST "http://localhost:8000/auth/signup" \
         -H "Content-Type: application/json" \
         -d '{"name": "John Doe", "email": "john@example.com", "password": "securepassword123"}'
    
    # Login
    curl -X POST "http://localhost:8000/auth/login" \
         -H "Content-Type: application/json" \
         -d '{"email": "john@example.com", "password": "securepassword123"}'
  2. Upload Documents

    curl -X POST "http://localhost:8000/docs/upload-pdf-document" \
         -H "Authorization: Bearer YOUR_JWT_TOKEN" \
         -F "file=@document.pdf"
  3. Ask Questions

    curl -X POST "http://localhost:8000/chats/send-message" \
         -H "Authorization: Bearer YOUR_JWT_TOKEN" \
         -H "Content-Type: application/json" \
         -d '{"message": "What is the main topic of the document?"}'
  4. View Chat History

    curl -X GET "http://localhost:8000/chats/messages" \
         -H "Authorization: Bearer YOUR_JWT_TOKEN"

📚 API Documentation

Authentication Endpoints

POST /auth/signup

Register a new user account.

Request Body:

{
  "name": "John Doe",
  "email": "john@example.com",
  "password": "securepassword123"
}

Response:

{
  "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9...",
  "token_type": "bearer",
  "user_id": 1,
  "email": "john@example.com",
  "name": "John Doe"
}

POST /auth/login

Authenticate user and receive JWT token.

Request Body:

{
  "email": "john@example.com",
  "password": "securepassword123"
}

Response:

{
  "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9...",
  "token_type": "bearer",
  "user_id": 1,
  "email": "john@example.com",
  "name": "John Doe"
}

GET /auth/me

Get current authenticated user information.

Headers:

Authorization: Bearer YOUR_JWT_TOKEN

Response:

{
  "user_id": 1,
  "name": "John Doe",
  "email": "john@example.com",
  "is_active": true,
  "is_verified": false,
  "created_at": "2024-12-09T08:25:00Z"
}

Document Management Endpoints

POST /docs/upload-pdf-document

Upload and process a PDF document.

Headers:

Authorization: Bearer YOUR_JWT_TOKEN
Content-Type: multipart/form-data

Request Body:

file: (PDF file)

Response:

{
  "message": "Document 'example.pdf' uploaded and processed successfully",
  "doc_uuid": "550e8400-e29b-41d4-a716-446655440000",
  "status": "success"
}

GET /docs/list-documents

List all uploaded documents for the current user.

Headers:

Authorization: Bearer YOUR_JWT_TOKEN

Response:

{
  "documents": [
    {
      "doc_uuid": "550e8400-e29b-41d4-a716-446655440000",
      "file_url": "Not Available",
      "file_size": 1048576,
      "original_filename": "example.pdf",
      "mime_type": "application/pdf",
      "created_at": "2024-12-09T08:25:00Z",
      "total_chunks": 42
    }
  ],
  "count": 1,
  "status": "success"
}

DELETE /docs/delete-document/{doc_uuid}

Delete a document and its associated vectors.

Headers:

Authorization: Bearer YOUR_JWT_TOKEN

Response:

{
  "message": "Document 'example.pdf' deleted successfully",
  "status": "success"
}

Chat Endpoints

POST /chats/send-message

Send a message and get an AI response based on uploaded documents.

Headers:

Authorization: Bearer YOUR_JWT_TOKEN
Content-Type: application/json

Request Body:

{
  "message": "What are the main points discussed in the document?",
  "message_history_count": 20
}

Response:

{
  "status": "success",
  "total_input_vectors": 1536,
  "total_query_hits": 5,
  "total_output_tokens": 150,
  "query_hit_doc_uuids": ["550e8400-e29b-41d4-a716-446655440000"],
  "model_response": "Based on the uploaded document, the main points discussed are..."
}

GET /chats/messages

Retrieve chat history for the current user.

Headers:

Authorization: Bearer YOUR_JWT_TOKEN

Response:

{
  "status": "success",
  "messages": [
    {
      "role": "user",
      "content": "What are the main points?",
      "model_used": "gpt-4.1",
      "tokens_used": 1536,
      "response_time_ms": 0,
      "ai_prompt": null,
      "context_document_uuid": null,
      "created_at": "2024-12-09T08:25:00Z"
    },
    {
      "role": "assistant",
      "content": "Based on the document...",
      "model_used": "gpt-4.1",
      "tokens_used": 150,
      "response_time_ms": 2500,
      "ai_prompt": "System: You are a helpful assistant...",
      "context_document_uuid": ["550e8400-e29b-41d4-a716-446655440000"],
      "created_at": "2024-12-09T08:25:02Z"
    }
  ]
}

DELETE /chats/clear-history

Clear all chat history for the current user.

Headers:

Authorization: Bearer YOUR_JWT_TOKEN

Response:

{
  "status": "success"
}

Health Check

GET /health

Check application health status.

Response:

{
  "status": "healthy",
  "message": "RAG API is running."
}

📁 Project Structure

Rag-Model/
├── 📁 alembic/                    # Database migrations
│   ├── 📁 versions/               # Migration files
│   │   └── c938cf66d0f6_initial_setup.py
│   └── env.py                     # Alembic environment configuration
├── 📁 authentication/             # Authentication module
│   ├── __init__.py               # Module exports
│   ├── auth_models.py            # Authentication data models
│   └── utils.py                  # JWT and password utilities
├── 📁 database/                   # Database layer
│   ├── models.py                 # SQLAlchemy models
│   ├── postgres_db.py            # PostgreSQL connection
│   └── vector_db.py              # Qdrant vector database client
├── 📁 llm/                       # Language model integration
│   ├── models.py                 # LLM data models
│   └── openai_client.py          # OpenAI API client
├── 📁 log_config/                # Logging configuration
│   ├── __init__.py               # Logger factory
│   └── logging_config.py         # Logging setup
├── 📁 route_models/              # API request/response models
│   ├── auth_models.py            # Authentication models
│   ├── chat_models.py            # Chat endpoint models
│   └── doc_models.py             # Document endpoint models
├── 📁 routers/                   # FastAPI route handlers
│   ├── __init__.py               # Router exports
│   ├── auth_routes.py            # Authentication endpoints
│   ├── chat_routes.py            # Chat endpoints
│   └── doc_routes.py             # Document endpoints
├── 📁 utilities/                 # Utility functions
│   ├── __init__.py               # Utility exports
│   └── utility.py                # PDF processing and vectorization
├── 📁 logs/                      # Application logs (auto-created)
├── 📁 vector_db_storage/         # Qdrant data storage (auto-created)
├── .env                          # Environment variables
├── .gitignore                    # Git ignore rules
├── .python-version               # Python version specification
├── alembic.ini                   # Alembic configuration
├── main.py                       # FastAPI application entry point
├── pyproject.toml                # Project dependencies and metadata
├── settings.py                   # Application configuration
├── uv.lock                       # UV lock file
└── README.md                     # This file

Key Components

Core Application (main.py)

  • FastAPI application setup with CORS middleware
  • Application lifespan management
  • Database connection initialization
  • Router registration and API documentation

Authentication System (authentication/)

  • JWT Token Management: Secure token creation and validation
  • Password Security: Argon2 hashing for password storage
  • Role-Based Access: User role authorization system
  • Dependency Injection: FastAPI dependencies for route protection

Database Layer (database/)

  • PostgreSQL Models: User, Document, and Message entities
  • Vector Database: Qdrant integration for embeddings storage
  • Connection Management: Session handling and connection pooling
  • Migration Support: Alembic for database schema management

Language Model Integration (llm/)

  • OpenAI Client: GPT-4.1 and embedding model integration
  • RAG Pipeline: Document processing and context retrieval
  • Response Generation: Contextual answer generation
  • Token Management: Usage tracking and optimization

API Routes (routers/)

  • Authentication Routes: Login, signup, user management
  • Document Routes: Upload, list, delete PDF documents
  • Chat Routes: Message sending, history management
  • Error Handling: Comprehensive exception management

Utilities (utilities/)

  • PDF Processing: Document parsing and text extraction
  • Vectorization: Text-to-embedding conversion
  • Chunking: Intelligent text segmentation

Qdrant Vector Database

Collection: "rag"

  • Vector Dimension: 3072 (OpenAI text-embedding-3-large)
  • Distance Metric: Cosine similarity
  • Payload Schema:
    {
      "source": "document_filename.pdf",
      "text": "chunk_content_text",
      "uuid": "document_uuid"
    }

Relationships

  • Users → Documents: One-to-many (cascade delete)
  • Users → Messages: One-to-many (cascade delete)
  • Documents → Vector Embeddings: One-to-many (via UUID)

🔧 Development

Setting Up Development Environment

  1. Install Development Dependencies

    uv pip install -e .[dev]
  2. Code Formatting

    black .

Development Tools

Available Dependencies

  • black: Code formatting (25.12.0+)
  • icecream: Enhanced debugging (2.1.8+)

Database Migrations

Create a New Migration

alembic revision --autogenerate -m "Description of changes"

Apply Migrations

alembic upgrade head

Rollback Migration

alembic downgrade -1

Logging

The application uses a comprehensive logging system:

  • File Logging: Rotating logs in logs/app.log
  • Console Logging: Real-time output during development
  • Log Levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
  • Structured Format: Timestamp, module, level, file:line, message

Code Style Guidelines

  1. Import Organization

    # 1st party imports
    import os
    from typing import List
    
    # 3rd party imports
    from fastapi import FastAPI
    from sqlalchemy import create_engine
    
    # local imports
    from settings import project_settings
    from database.models import User
  2. Function Documentation

    def example_function(param1: str, param2: int = 10) -> bool:
        """
        Brief description of the function.
        
        Args:
            param1 (str): Description of param1.
            param2 (int): Description of param2. Defaults to 10.
        
        Returns:
            bool: Description of return value.
        """
  3. Error Handling

    try:
        # operation
        result = perform_operation()
        logger.info("Operation successful")
        return result
    except SpecificException as e:
        logger.error(f"Specific error: {str(e)}")
        raise HTTPException(status_code=400, detail="Specific error message")
    except Exception as e:
        logger.exception(f"Unexpected error: {str(e)}")
        raise HTTPException(status_code=500, detail="Internal server error")

🤝 Contributing

Getting Started

  1. Fork the Repository

  2. Create a Feature Branch

    git checkout -b feature/your-feature-name
  3. Make Changes

    • Follow code style guidelines
    • Add tests for new functionality
    • Update documentation as needed
  4. Test Your Changes

    # Format code
    black .
    
    # Run tests (when available)
    pytest
    
    # Test API endpoints
    curl -X GET "http://localhost:8000/health"
  5. Submit a Pull Request

    • Provide clear description of changes
    • Reference any related issues
    • Ensure all checks pass

Development Guidelines

  1. Code Quality

    • Follow PEP 8 style guidelines
    • Use type hints consistently
    • Write comprehensive docstrings
    • Handle errors gracefully
  2. Testing

    • Write unit tests for new functions
    • Test API endpoints thoroughly
    • Verify database operations
    • Test error conditions
  3. Documentation

    • Update README for new features
    • Document API changes
    • Add inline code comments
    • Update configuration examples

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

🙏 Acknowledgments

  • OpenAI: For providing powerful language models and embeddings
  • Qdrant: For the excellent vector database solution
  • FastAPI: For the modern, fast web framework
  • LlamaIndex: For document processing capabilities
  • SQLAlchemy: For robust database ORM
  • Contributors: All developers who have contributed to this project

Built with ❤️ using FastAPI, OpenAI, and Qdrant