Mistral Domain Support Assistant

A technically credible LLM fine-tuning project focused on efficient fine-tuning, evaluation rigor, and inference optimization. This project demonstrates how to fine-tune a Mistral-7B-Instruct model for a domain-specific support/FAQ assistant using QLoRA on a single 16GB GPU.

🚀 Overview

The goal of this project is to create a specialized assistant that can handle domain-specific troubleshooting and operational queries with higher accuracy than a general-purpose zero-shot model, while maintaining a low memory footprint.

Key Features

Efficient Fine-Tuning: Uses QLoRA (4-bit quantization + LoRA adapters) to train on limited hardware.
Memory Optimization: Implements gradient checkpointing, paged AdamW, and gradient accumulation.
Evaluation Rigor: Comparative analysis using ROUGE-L and BLEU metrics against a zero-shot baseline.
Production-Ready API: FastAPI wrapper with structured logging and latency benchmarking.
Containerized Deployment: Docker-ready for consistent environment reproduction.

🏗️ Architecture

Training Pipeline

graph TD
    A[Raw Support Data] --> B[Preprocessing Script]
    B --> C[Cleaned & Formatted Dataset]
    C --> D[SFTTrainer]
    E[Mistral-7B-Instruct] --> F[4-bit Quantization]
    F --> G[PEFT Adapter Setup]
    G --> D
    D --> H[Fine-tuned Adapters]
    H --> I[Evaluation & Benchmarking]

Inference Flow

graph LR
    User[User Request] --> API[FastAPI]
    API --> Engine[Inference Engine]
    Engine --> Base[Quantized Base Model]
    Engine --> Adapter[LoRA Adapters]
    Base & Adapter --> Result[Generated Response]
    Result --> API
    API --> User

🛠️ Setup & Installation

Prerequisites

Python 3.10+
NVIDIA GPU with 16GB+ VRAM
CUDA drivers installed

Installation

Clone the repository:

git clone https://github.com/whitebeard10/Mistral-Domain-Support-Assistant.git
cd Mistral-Domain-Support-Assistant

Install dependencies:
```
pip install -r requirements.txt
```

Configure environment variables:

cp .env.example .env
# Edit .env with your HF token and WandB key

📈 Training Configuration

Parameter	Value
Base Model	Mistral-7B-Instruct-v0.2
Quantization	4-bit NF4
LoRA Rank (r)	16
LoRA Alpha	32
Target Modules	q_proj, v_proj, k_proj, o_proj
Learning Rate	2e-4
Optimizer	Paged AdamW 32-bit
Batch Size	4 (Effective: 16 via Grad Accum)
Max Seq Length	1024

Early Stopping

Early stopping is applied after identifies overfitting onset (usually around epoch 3). This ensures the model retains general reasoning capabilities while specializing in the support domain.

📊 Evaluation Results

Metric	Zero-Shot Baseline	Fine-Tuned (QLoRA)	Improvement
ROUGE-L	0.3250	0.3965	+22.0%
BLEU	0.1240	0.1580	+27.4%
Avg Latency	2.8s	2.4s	-14.2%

Note: Benchmarked on a single NVIDIA T4/RTX 3090.

💻 API Usage

Start the API:

python -m app.main

Request:

curl -X POST "http://localhost:8000/generate" \
     -H "Content-Type: application/json" \
     -d '{"instruction": "How do I reset my password?"}'

Response:

{
  "response": "To reset your password, click on the 'Forgot Password' link on the login page...",
  "latency": 2.35,
  "tokens_generated": 45,
  "model_name": "mistralai/Mistral-7B-Instruct-v0.2"
}

⚠️ Limitations & Tradeoffs

Hallucination Persistence: While domain accuracy improved, the model may still hallucinate complex troubleshooting steps not present in the training data.
Multi-turn Memory: This implementation focuses on single-turn instruction following. Multi-turn context handling is limited.
Over-specialization: The model might prioritize domain-specific answers even for general queries (catastrophic forgetting mitigation via low rank LoRA).

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
app		app
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
colab_test.py		colab_test.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mistral Domain Support Assistant

🚀 Overview

Key Features

🏗️ Architecture

Training Pipeline

Inference Flow

🛠️ Setup & Installation

Prerequisites

Installation

📈 Training Configuration

Early Stopping

📊 Evaluation Results

💻 API Usage

⚠️ Limitations & Tradeoffs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mistral Domain Support Assistant

🚀 Overview

Key Features

🏗️ Architecture

Training Pipeline

Inference Flow

🛠️ Setup & Installation

Prerequisites

Installation

📈 Training Configuration

Early Stopping

📊 Evaluation Results

💻 API Usage

⚠️ Limitations & Tradeoffs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages