Upload any PDF, ask questions, get answers — powered by sentence-transformers, FAISS, and DeepSeek via HuggingFace Inference API.
FastAPI · FAISS · Sentence Transformers · PyMuPDF · Google Cloud Run · Docker
A lightweight Retrieval-Augmented Generation (RAG) API that turns any PDF into a queryable knowledge base in seconds.
Upload a PDF, ask a question in natural language, and get a context-grounded answer — no hallucinations from documents that weren't in the upload, because the LLM is constrained to only the retrieved chunks.
Deployed on Google Cloud Run via Cloud Build CI/CD.
User uploads PDF + question
│
▼
┌─────────────────────┐
│ FastAPI endpoint │ POST /query_pdf
│ /query_pdf │
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ PyMuPDF (fitz) │ Extract raw text from PDF pages
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Text chunker │ Sentence-aware splitting (500 char max)
│ LocalRag │ Normalizes whitespace, preserves sentences
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Sentence │ all-MiniLM-L6-v2
│ Transformers │ Batch encode chunks → normalized embeddings
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ FAISS Index │ IndexFlatIP (inner product on normalized
│ │ vectors = cosine similarity)
└────────┬────────────┘
│ Top-K retrieval (k=5)
▼
┌─────────────────────┐
│ HF Inference API │ DeepSeek-V3 via InferenceClient
│ (DeepSeek-V3) │ Grounded prompt with retrieved context
└────────┬────────────┘
│
▼
{"answer": "..."}
| Layer | Technology |
|---|---|
| API framework | FastAPI |
| PDF extraction | PyMuPDF (fitz) |
| Embeddings | sentence-transformers · all-MiniLM-L6-v2 |
| Vector store | FAISS (IndexFlatIP — cosine via normalized IP) |
| LLM | DeepSeek-V3 via HuggingFace Inference API |
| Containerization | Docker |
| Cloud deployment | Google Cloud Run |
| CI/CD | Google Cloud Build (cloudbuild.yaml) |
Health check — confirms the API is running.
{"message": "RAG API is running. Use /docs to interact with it."}Upload a PDF and ask a question about it.
Request (multipart/form-data):
| Field | Type | Description |
|---|---|---|
file |
PDF file | The document to query |
question |
string | Your natural language question |
Response:
{
"answer": "The contract termination clause requires 30 days written notice..."
}Error responses:
400— Not a PDF, empty file, or no text extractable500— Extraction or RAG pipeline failure
Sentence-aware splitting with a 500-character hard cap per chunk. The chunker normalizes whitespace once, splits on sentence boundaries ([.!?]), and accumulates sentences until the cap is reached — avoiding mid-sentence cuts that break context.
all-MiniLM-L6-v2 from sentence-transformers — a fast, lightweight model that produces 384-dimensional embeddings. Embeddings are L2-normalized before indexing so inner product search is equivalent to cosine similarity.
IndexFlatIP (flat inner product index) — exact nearest-neighbor search, no approximation. Top-5 chunks are retrieved per query.
Retrieved chunks are concatenated (capped at 2000 characters to keep inference fast) and passed to DeepSeek-V3 via the HuggingFace Inference API. The prompt explicitly instructs the model to answer only from context and say "I don't know" if the answer isn't present — minimizing hallucination.
The FAISS index and chunk list can be saved to disk (save_index()) and reloaded (_load_index()) to avoid rebuilding on every request for large documents.
- Python 3.11+
- HuggingFace API key with Inference API access
# Clone the repo
git clone https://github.com/DebugJedi/InformationExtraction.git
cd InformationExtraction
# Create virtual environment
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Set up environment
echo "HF_API_KEY=your_huggingface_api_key" > .envuvicorn app:app --reload --port 8000Open http://localhost:8000/docs for the interactive API UI.
curl -X POST http://localhost:8000/query_pdf \
-F "file=@your_document.pdf" \
-F "question=What are the key findings in this report?"# Build
docker build -t rag-api .
# Run
docker run -p 8000:8000 -e HF_API_KEY=your_key rag-apiThis repo includes a full cloudbuild.yaml that builds, pushes, and deploys automatically.
# Authenticate
gcloud auth login
gcloud config set project YOUR_PROJECT_ID
# Enable required APIs
gcloud services enable run.googleapis.com cloudbuild.googleapis.com artifactregistry.googleapis.com
# Create Artifact Registry repo
gcloud artifacts repositories create ocr-rag-app-repo \
--repository-format=docker \
--location=us-west1gcloud builds submit --config cloudbuild.yaml \
--substitutions _SERVICE=my-app,_REGION=us-west1Or connect your GitHub repo to Cloud Build for automatic deploys on every push.
gcloud run services update my-app \
--region=us-west1 \
--set-env-vars HF_API_KEY=your_keyInformationExtraction/
├── app.py ← FastAPI app · /query_pdf endpoint
├── src/
│ ├── config.py ← LocalRag class · FAISS · embeddings · generation
│ └── utils/
│ └── extract_text.py ← PyMuPDF PDF + txt extraction
├── Dockerfile
├── cloudbuild.yaml ← GCP Cloud Build CI/CD pipeline
├── requirements.txt
└── .gitignore
| Variable | Required | Description |
|---|---|---|
HF_API_KEY |
✅ Yes | HuggingFace API key for Inference API access |
- PDF upload and text extraction
- Sentence-aware text chunking
- FAISS vector index with cosine similarity
- HuggingFace Inference API generation
- FAISS index persistence (save/load)
- Docker + Google Cloud Run deployment
- Cloud Build CI/CD pipeline
- Multi-document knowledge base (persistent store)
- Support for
.txt,.docxuploads - Streaming responses
- Auth middleware for API key protection
Built and maintained by Priyank Rao — Data Scientist / ML Engineer
Portfolio · GitHub
MIT