Multi Model RAG

A local multimodal RAG system for PDFs with:

text extraction
table extraction
image extraction + image captioning
FAISS vector search
LLM answer generation
chat-style Streamlit UI

The project uses Ollama models locally:

nomic-embed-text for embeddings
llama3 for answer generation
llava for image understanding

Features

Chat UI (streamlit_app.py) with message history and source display
PDF ingestion pipeline (scripts/ingest.py)
CLI query demo (scripts/query_demo.py)
FastAPI endpoints (/health, /query)
Source-aware answers with page references in prompt context

Project Structure

rag/
  app/
    main.py
    routes/query.py
  RAG/
    augmentation/prompt_builder.py
    embeddings/ollama_embed.py
    generation/llm.py
    indexing/{pdf_loader,chunker,table_extractor,image_extractor,build_index}.py
    multimodel/{table_parser,image_captioner}.py
    retrieval/retriever.py
  scripts/
    ingest.py
    query_demo.py
  streamlit_app.py
  requirements.txt

Prerequisites

Python 3.10+ (3.12 works).
Ollama installed and available in PATH.
Models pulled in Ollama:
- llama3
- nomic-embed-text
- llava

For table extraction on Windows, install Ghostscript if Camelot requires it.

Installation

From project root (rag/):

python -m venv .venv
.\.venv\Scripts\activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt

Deploy with Docker Compose (Recommended)

This repo now includes:

Dockerfile
docker-compose.yml
.env.example

1) Prepare env file

Copy-Item .env.example .env

2) Start services

docker compose up -d --build

3) Pull Ollama models (one-time)

docker compose exec ollama ollama pull llama3
docker compose exec ollama ollama pull nomic-embed-text
docker compose exec ollama ollama pull llava

4) Add PDFs and build FAISS index

Put PDFs into data/raw/, then run:

docker compose --profile jobs run --rm ingest

5) Access apps

Streamlit UI: http://localhost:8501
FastAPI: http://localhost:8000
Health check: http://localhost:8000/health

6) Stop services

docker compose down

To keep containers but stop only:

docker compose stop

Start Ollama server:

ollama serve

In another terminal:

ollama pull llama3
ollama pull nomic-embed-text
ollama pull llava

Deploy on a Cloud VM

For production, use a VM/VPS (not serverless), then:

Install Docker + Compose.
Clone repo.
Run the same Docker Compose commands above.
Expose only ports you need (usually 8501 and/or 8000) behind Nginx/Caddy with TLS.
Keep persistent storage for:
- Ollama model volume (ollama_data)
- data/
- vectorstore/

Data Preparation

Place your PDFs in:

data/raw/

Example:

New-Item -ItemType Directory -Force data\raw | Out-Null

Build the Index (Ingestion)

python scripts\ingest.py

This creates:

vectorstore/faiss_index/index.bin
vectorstore/faiss_index/meta.pkl
extracted images in data/images/ (if present in PDFs)

Run the Chat UI (Recommended)

streamlit run streamlit_app.py

In the sidebar:

Upload PDFs
Click Save PDFs
Click Run Ingestion
Start chatting in the input box at the bottom

Run the API

uvicorn app.main:app --reload --port 8000

Endpoints:

GET /health
GET /query?q=your_question&top_k=5
POST /query with JSON:

{
  "q": "What is the revenue trend?",
  "top_k": 5
}

PowerShell example:

curl.exe -X POST "http://127.0.0.1:8000/query" `
  -H "Content-Type: application/json" `
  -d "{\"q\":\"Summarize the document\",\"top_k\":5}"

Run CLI Query Demo

python scripts\query_demo.py "What is the profit margin?" 5

How It Works

scripts/ingest.py loads PDFs from data/raw.
Text is chunked and stored as type=text.
Tables are extracted and converted to markdown (type=table).
Images are extracted and captioned with LLaVA (type=image).
All chunks are embedded and indexed in FAISS.
Query flow:
- embed question
- retrieve top-k chunks
- build prompt with context
- generate answer with llama3

Troubleshooting

Index files are missing...
- Run: python scripts\ingest.py
No PDF files found in data/raw
- Add PDFs to data/raw and re-run ingestion.
Table extraction failed: camelot-py is not installed
- Install dependencies again: python -m pip install -r requirements.txt
- On Windows, install Ghostscript if needed.
Ollama connection/model errors
- Ensure ollama serve is running.
- Verify models with ollama list.
Slow response
- Local inference depends on hardware. CPU-only runs are slower.

Notes

The system is fully local (embedding + generation + image captioning via Ollama).
You can change model names in:
- RAG/embeddings/ollama_embed.py
- RAG/generation/llm.py
- RAG/multimodel/image_captioner.py
For container deployment, model/host settings are controlled by .env and passed via docker-compose.yml.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi Model RAG

Features

Project Structure

Prerequisites

Installation

Deploy with Docker Compose (Recommended)

1) Prepare env file

2) Start services

3) Pull Ollama models (one-time)

4) Add PDFs and build FAISS index

5) Access apps

6) Stop services

Deploy on a Cloud VM

Data Preparation

Build the Index (Ingestion)

Run the Chat UI (Recommended)

Run the API

Run CLI Query Demo

How It Works

Troubleshooting

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
RAG		RAG
app		app
scripts		scripts
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Varshith-Yadav/Multi_Model_RAG

Folders and files

Latest commit

History

Repository files navigation

Multi Model RAG

Features

Project Structure

Prerequisites

Installation

Deploy with Docker Compose (Recommended)

1) Prepare env file

2) Start services

3) Pull Ollama models (one-time)

4) Add PDFs and build FAISS index

5) Access apps

6) Stop services

Deploy on a Cloud VM

Data Preparation

Build the Index (Ingestion)

Run the Chat UI (Recommended)

Run the API

Run CLI Query Demo

How It Works

Troubleshooting

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages