A Retrieval-Augmented Generation (RAG) chatbot for a mathematics PDF using Gemini models, FAISS vector search, LangChain orchestration, and a Streamlit chat UI.
This system answers user questions from a local PDF by combining semantic retrieval with a constrained LLM prompt.
- PDF ingestion
- The PDF is loaded from
pdfs/mathematics.pdf.
- Chunking
- Text is split with
RecursiveCharacterTextSplitter. - Current settings:
chunk_size=1000,chunk_overlap=150.
- Embedding + indexing
- Chunks are embedded with
models/gemini-embedding-001. - Vectors are indexed and saved to
vectorstores/mathematics_faiss_geminiusing FAISS.
- Retrieval
- For each question, the retriever returns top-k chunks (
k=4).
- Answer generation
gemini-2.5-flashreceives retrieved context plus the user question.- Prompt instructs the model to answer only from retrieved context.
- UI
- Streamlit provides chat interaction and a button to create/refresh the vector store.
- Local PDF-to-chat workflow.
- Persistent FAISS index for faster reuse.
- Source-page display from retrieved chunks.
- Streamlit interface with session message history.
- Optional sidebar API key override.
rag_chatbot/
├─ app.py
├─ rag_notebook.ipynb
├─ requirements.txt
├─ README.md
├─ images/
│ └─ dashboard.png
├─ pdfs/
│ └─ mathematics.pdf
└─ vectorstores/
└─ mathematics_faiss_gemini/
- Python
- Streamlit
- LangChain
- FAISS (
faiss-cpu) - Google Gemini (
langchain-google-genai) - PyPDF (
pypdf/PyPDFLoader)
- Python 3.10+
- A valid
GOOGLE_API_KEY
- Create and activate a virtual environment.
- Install dependencies:
pip install -r requirements.txt- Create
.envin the project root:
GOOGLE_API_KEY=your_api_key_hereStart the Streamlit app:
streamlit run app.pyOpen the local URL shown in terminal (usually http://localhost:8501).
You can tune behavior in app.py:
EMBEDDING_MODEL = "models/gemini-embedding-001"CHAT_MODEL = "gemini-2.5-flash"CHUNK_SIZE = 1000CHUNK_OVERLAP = 150- Retriever
kinsearch_kwargs={"k": 4}
- Retrieval is dense-only (no hybrid keyword retrieval yet).
- Chunking is fixed-size oriented, not true semantic chunking.
- No automated RAG evaluation harness yet.
- Basic observability and cost controls.
| Area | Current State | Target State | Priority | Status |
|---|---|---|---|---|
| Retrieval Quality | Top-k dense retrieval (k=4) |
Hybrid retrieval + reranking | High | Planned |
| Chunking Strategy | Fixed-size chunking | Semantic / hierarchical chunking | High | Planned |
| Evaluation | Manual checks | Automated RAG evaluation | High | Planned |
| Prompting | Single static template | Prompt variants + guardrails | Medium | Planned |
| Observability | Minimal app logs | Latency and retrieval-quality metrics | Medium | Planned |
| UX | Basic chat flow | Better citations and filtering controls | Medium | Planned |
| Security | Env/sidebar key entry | Managed secrets + safer deployment pattern | High | Planned |
- Create a small benchmark dataset (
question,expected answer,source page). - Track baseline metrics: grounded-answer rate, citation accuracy, latency.
- Tune
k,chunk_size, andchunk_overlapwith A/B runs.
- Add MMR or reranking to reduce duplicate and noisy chunks.
- Add metadata-aware retrieval (sections/pages/topics).
- Evaluate hybrid retrieval (dense + BM25).
- Add structured logs and request tracing.
- Add retries/timeouts for model and embedding calls.
- Add regression checks in CI for retrieval and answer quality.
- Grounded-answer rate >= 90% on evaluation set.
- Citation accuracy >= 95%.
- Median response latency <= 3 seconds for common questions.
- Hallucination reports trend downward over time.
-
GOOGLE_API_KEYmissing:- Set key in
.envor enter in Streamlit sidebar.
- Set key in
-
PDF not found:
- Ensure file exists at
pdfs/mathematics.pdf.
- Ensure file exists at
-
Empty or weak answers:
- Rebuild vector store and increase retriever
k. - Revisit chunk settings.
- Rebuild vector store and increase retriever
