AI RAG System

A Retrieval-Augmented Generation (RAG) chatbot for a mathematics PDF using Gemini models, FAISS vector search, LangChain orchestration, and a Streamlit chat UI.

System Overview

This system answers user questions from a local PDF by combining semantic retrieval with a constrained LLM prompt.

PDF ingestion

The PDF is loaded from pdfs/mathematics.pdf.

Chunking

Text is split with RecursiveCharacterTextSplitter.
Current settings: chunk_size=1000, chunk_overlap=150.

Embedding + indexing

Chunks are embedded with models/gemini-embedding-001.
Vectors are indexed and saved to vectorstores/mathematics_faiss_gemini using FAISS.

Retrieval

For each question, the retriever returns top-k chunks (k=4).

Answer generation

gemini-2.5-flash receives retrieved context plus the user question.
Prompt instructs the model to answer only from retrieved context.

UI

Streamlit provides chat interaction and a button to create/refresh the vector store.

Key Features

Local PDF-to-chat workflow.
Persistent FAISS index for faster reuse.
Source-page display from retrieved chunks.
Streamlit interface with session message history.
Optional sidebar API key override.

Project Structure

rag_chatbot/
├─ app.py
├─ rag_notebook.ipynb
├─ requirements.txt
├─ README.md
├─ images/
│  └─ dashboard.png
├─ pdfs/
│  └─ mathematics.pdf
└─ vectorstores/
   └─ mathematics_faiss_gemini/

Tech Stack

Python
Streamlit
LangChain
FAISS (faiss-cpu)
Google Gemini (langchain-google-genai)
PyPDF (pypdf / PyPDFLoader)

Prerequisites

Python 3.10+
A valid GOOGLE_API_KEY

Setup

Create and activate a virtual environment.
Install dependencies:

pip install -r requirements.txt

Create .env in the project root:

GOOGLE_API_KEY=your_api_key_here

Run

Start the Streamlit app:

streamlit run app.py

Open the local URL shown in terminal (usually http://localhost:8501).

Configuration

You can tune behavior in app.py:

EMBEDDING_MODEL = "models/gemini-embedding-001"
CHAT_MODEL = "gemini-2.5-flash"
CHUNK_SIZE = 1000
CHUNK_OVERLAP = 150
Retriever k in search_kwargs={"k": 4}

Known Limitations

Retrieval is dense-only (no hybrid keyword retrieval yet).
Chunking is fixed-size oriented, not true semantic chunking.
No automated RAG evaluation harness yet.
Basic observability and cost controls.

Improvement Dashboard

Area	Current State	Target State	Priority	Status
Retrieval Quality	Top-k dense retrieval (`k=4`)	Hybrid retrieval + reranking	High	Planned
Chunking Strategy	Fixed-size chunking	Semantic / hierarchical chunking	High	Planned
Evaluation	Manual checks	Automated RAG evaluation	High	Planned
Prompting	Single static template	Prompt variants + guardrails	Medium	Planned
Observability	Minimal app logs	Latency and retrieval-quality metrics	Medium	Planned
UX	Basic chat flow	Better citations and filtering controls	Medium	Planned
Security	Env/sidebar key entry	Managed secrets + safer deployment pattern	High	Planned

Improvement Plan (Practical)

Phase 1: Quality Baseline

Create a small benchmark dataset (question, expected answer, source page).
Track baseline metrics: grounded-answer rate, citation accuracy, latency.
Tune k, chunk_size, and chunk_overlap with A/B runs.

Phase 2: Retrieval Upgrades

Add MMR or reranking to reduce duplicate and noisy chunks.
Add metadata-aware retrieval (sections/pages/topics).
Evaluate hybrid retrieval (dense + BM25).

Phase 3: Reliability and Ops

Add structured logs and request tracing.
Add retries/timeouts for model and embedding calls.
Add regression checks in CI for retrieval and answer quality.

Success Criteria

Grounded-answer rate >= 90% on evaluation set.
Citation accuracy >= 95%.
Median response latency <= 3 seconds for common questions.
Hallucination reports trend downward over time.

Troubleshooting

GOOGLE_API_KEY missing:
- Set key in .env or enter in Streamlit sidebar.
PDF not found:
- Ensure file exists at pdfs/mathematics.pdf.
Empty or weak answers:
- Rebuild vector store and increase retriever k.
- Revisit chunk settings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI RAG System

System Overview

Key Features

Project Structure

Tech Stack

Prerequisites

Setup

Run

Configuration

Known Limitations

Improvement Dashboard

Improvement Plan (Practical)

Phase 1: Quality Baseline

Phase 2: Retrieval Upgrades

Phase 3: Reliability and Ops

Success Criteria

Troubleshooting

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
images		images
pdfs		pdfs
vectorstores/mathematics_faiss_gemini		vectorstores/mathematics_faiss_gemini
.gitignore		.gitignore
README.md		README.md
app.py		app.py
rag_notebook.ipynb		rag_notebook.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AI RAG System

System Overview

Key Features

Project Structure

Tech Stack

Prerequisites

Setup

Run

Configuration

Known Limitations

Improvement Dashboard

Improvement Plan (Practical)

Phase 1: Quality Baseline

Phase 2: Retrieval Upgrades

Phase 3: Reliability and Ops

Success Criteria

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages