Built to demonstrate backend engineering for enterprise RAG workflows using FastAPI, PostgreSQL/pgvector, AWS S3, and AWS Bedrock.
P_AI ingests PDF documents, stores source files in S3, extracts and chunks text, generates vector embeddings, and serves retrieval-augmented answers through API endpoints.
The project is designed as a practical backend MVP for internal knowledge retrieval use cases.
- PDF ingestion pipeline (
/ingest/file)- Upload PDF
- Store original file in S3
- Extract text (PyMuPDF with pdfminer fallback)
- Clean OCR artifacts
- Chunk text into retrieval units
- Create embeddings via Bedrock Titan
- Persist documents/chunks/embeddings in PostgreSQL (pgvector)
- Retrieval and answering
POST /query/search: vector retrieval onlyPOST /query/query: retrieval + Bedrock generation
- Configurable model/provider setup via environment variables
- Support for AWS profile-based local development
- Client uploads a PDF to
POST /ingest/file - API stores bytes in S3
- API extracts text from PDF and cleans OCR noise
- API chunks text
- API generates embeddings per chunk
- API saves
documents,chunks, andembeddings
- Client submits question to
POST /query/query - API embeds the question
- API retrieves top-k nearest chunks using pgvector distance (
<->) - API builds a grounded prompt from snippets
- API generates answer with Bedrock LLM
- API returns answer + snippets
- FastAPI
- SQLAlchemy
- PostgreSQL + pgvector
- AWS S3
- AWS Bedrock (Titan embeddings + configurable chat model)
- PyMuPDF / pdfminer.six
backend/app/main.py- FastAPI app setup and router registrationbackend/app/routers/ingest.py- ingestion APIbackend/app/routers/query.py- search/query APIsbackend/app/models.py- SQLAlchemy models (documents,chunks,embeddings)backend/app/services/embeddings.py- Bedrock embedding servicebackend/app/services/retriever.py- pgvector nearest-neighbor retrievalbackend/app/services/generator.py- Bedrock generation + promptingbackend/app/services/pdf.py- PDF text extraction/cleaningbackend/app/services/chunker.py- safe chunking logicbackend/app/services/s3.py- S3 upload/read helpersbackend/.env.example- environment template
- Python 3.11+
- PostgreSQL with
pgvectorextension enabled - AWS credentials with access to:
- S3 bucket
- Bedrock runtime (embed + generation models)
Create and edit env file:
cd backend
cp .env.example .envSet required values in .env:
DB_URLS3_BUCKETAWS_REGIONBEDROCK_REGIONBEDROCK_EMBED_MODELBEDROCK_CHAT_MODELAWS_PROFILE(optional for local profile-based auth)
From project root:
cd backend
python -m venv .venv
source .venv/bin/activate
pip install -e .
uvicorn app.main:app --reload --port 8080Health check:
curl http://127.0.0.1:8080/healthzGET /healthzPOST /ingest/filePOST /query/searchPOST /query/query
curl -X POST http://127.0.0.1:8080/ingest/file \
-F "file=@sample.pdf"curl -X POST http://127.0.0.1:8080/query/search \
-H "Content-Type: application/json" \
-d '{"query":"Summarize the key findings","top_k":4}'curl -X POST http://127.0.0.1:8080/query/query \
-H "Content-Type: application/json" \
-d '{"question":"What are the main conclusions?","top_k":4}'documents: source metadata (title,s3_uri,mime)chunks: chunk text and sequence numberembeddings: pgvector embedding per chunk
- No auth/multi-tenant isolation (MVP)
- No background queue for large ingest jobs
- Minimal automated testing in current repo
- Prompting/retrieval evaluation pipeline not yet formalized
- Add sentence-transformers fallback for local/offline embedding workflows
- Harden AWS deployment (ECS/EC2 + RDS + IAM least privilege)
- Add async/background ingestion for large files
- Add evaluation suite for retrieval quality and answer grounding
- Add API auth, rate limits, and observability
MIT License (see LICENSE).