A local multimodal RAG system for PDFs with:
- text extraction
- table extraction
- image extraction + image captioning
- FAISS vector search
- LLM answer generation
- chat-style Streamlit UI
The project uses Ollama models locally:
nomic-embed-textfor embeddingsllama3for answer generationllavafor image understanding
- Chat UI (
streamlit_app.py) with message history and source display - PDF ingestion pipeline (
scripts/ingest.py) - CLI query demo (
scripts/query_demo.py) - FastAPI endpoints (
/health,/query) - Source-aware answers with page references in prompt context
rag/
app/
main.py
routes/query.py
RAG/
augmentation/prompt_builder.py
embeddings/ollama_embed.py
generation/llm.py
indexing/{pdf_loader,chunker,table_extractor,image_extractor,build_index}.py
multimodel/{table_parser,image_captioner}.py
retrieval/retriever.py
scripts/
ingest.py
query_demo.py
streamlit_app.py
requirements.txt
- Python 3.10+ (3.12 works).
- Ollama installed and available in PATH.
- Models pulled in Ollama:
llama3nomic-embed-textllava
For table extraction on Windows, install Ghostscript if Camelot requires it.
From project root (rag/):
python -m venv .venv
.\.venv\Scripts\activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txtThis repo now includes:
Dockerfiledocker-compose.yml.env.example
Copy-Item .env.example .envdocker compose up -d --builddocker compose exec ollama ollama pull llama3
docker compose exec ollama ollama pull nomic-embed-text
docker compose exec ollama ollama pull llavaPut PDFs into data/raw/, then run:
docker compose --profile jobs run --rm ingest- Streamlit UI:
http://localhost:8501 - FastAPI:
http://localhost:8000 - Health check:
http://localhost:8000/health
docker compose downTo keep containers but stop only:
docker compose stopStart Ollama server:
ollama serveIn another terminal:
ollama pull llama3
ollama pull nomic-embed-text
ollama pull llavaFor production, use a VM/VPS (not serverless), then:
- Install Docker + Compose.
- Clone repo.
- Run the same Docker Compose commands above.
- Expose only ports you need (usually 8501 and/or 8000) behind Nginx/Caddy with TLS.
- Keep persistent storage for:
- Ollama model volume (
ollama_data) data/vectorstore/
- Ollama model volume (
Place your PDFs in:
data/raw/
Example:
New-Item -ItemType Directory -Force data\raw | Out-Nullpython scripts\ingest.pyThis creates:
vectorstore/faiss_index/index.binvectorstore/faiss_index/meta.pkl- extracted images in
data/images/(if present in PDFs)
streamlit run streamlit_app.pyIn the sidebar:
- Upload PDFs
- Click
Save PDFs - Click
Run Ingestion - Start chatting in the input box at the bottom
uvicorn app.main:app --reload --port 8000Endpoints:
GET /healthGET /query?q=your_question&top_k=5POST /querywith JSON:
{
"q": "What is the revenue trend?",
"top_k": 5
}PowerShell example:
curl.exe -X POST "http://127.0.0.1:8000/query" `
-H "Content-Type: application/json" `
-d "{\"q\":\"Summarize the document\",\"top_k\":5}"python scripts\query_demo.py "What is the profit margin?" 5scripts/ingest.pyloads PDFs fromdata/raw.- Text is chunked and stored as
type=text. - Tables are extracted and converted to markdown (
type=table). - Images are extracted and captioned with LLaVA (
type=image). - All chunks are embedded and indexed in FAISS.
- Query flow:
- embed question
- retrieve top-k chunks
- build prompt with context
- generate answer with
llama3
-
Index files are missing...- Run:
python scripts\ingest.py
- Run:
-
No PDF files found in data/raw- Add PDFs to
data/rawand re-run ingestion.
- Add PDFs to
-
Table extraction failed: camelot-py is not installed- Install dependencies again:
python -m pip install -r requirements.txt - On Windows, install Ghostscript if needed.
- Install dependencies again:
-
Ollama connection/model errors
- Ensure
ollama serveis running. - Verify models with
ollama list.
- Ensure
-
Slow response
- Local inference depends on hardware. CPU-only runs are slower.
- The system is fully local (embedding + generation + image captioning via Ollama).
- You can change model names in:
RAG/embeddings/ollama_embed.pyRAG/generation/llm.pyRAG/multimodel/image_captioner.py
- For container deployment, model/host settings are controlled by
.envand passed viadocker-compose.yml.