Run AI on your own computer. No cloud. No API keys. No data leaves your machine.
PocketRAG is a fully offline, self-hosted Retrieval-Augmented Generation (RAG) application that lets you chat with any PDF using local AI models. Built with Ollama, Weaviate, and Next.js — it runs entirely on your laptop, even without internet.
Perfect for: private documents, sensitive data, offline environments, air-gapped systems, or anyone who wants a free, local alternative to ChatGPT PDF tools.
- 🔒 100% Offline & Private — No data sent to any server. Everything runs locally.
- 🤖 Local LLM — Uses small language models (Gemma 4, Phi-3 Mini) via Ollama. No GPU needed.
- 🔍 Hybrid Search — Combines dense vector (semantic) search + BM25 keyword search for high-precision retrieval.
- 📄 PDF Document Q&A — Upload any text-based PDF and ask questions in plain English.
- 🗄️ Local Vector Database — Weaviate runs in Docker on your machine. No cloud vector DB needed.
- ⚡ Grounded Answers — The model only answers from your document. No hallucination from external knowledge.
- 🧩 Switchable Models — Toggle between Gemma 4 (capable) and Phi-3 Mini (faster) in the UI.
- 🛠️ Developer-Friendly — Built-in debug mode writes chunk logs and Q&A traces to local files.
RAG = Retrieval-Augmented Generation
Imagine you give a very smart assistant a book to read. Instead of memorizing the whole book, the assistant bookmarks the most relevant pages when you ask a question, then answers using only those pages.
That's exactly what this app does with your PDF:
- Reads your PDF and breaks it into small searchable pieces (called "chunks")
- Stores those chunks in a local vector database (Weaviate)
- When you ask a question, it runs hybrid search to find the most relevant chunks
- Feeds those chunks to a local LLM (via Ollama) to generate a human-readable answer
The AI never guesses — it only answers from your document. This technique is called Retrieval-Augmented Generation (RAG).
| Tool | What it does | Why we use it |
|---|---|---|
| Ollama | Runs AI models locally on your computer | So no data ever leaves your machine |
| Gemma 4 / Phi-3 Mini | The LLM that reads chunks and writes answers | Small, fast models that run without a GPU |
| nomic-embed-text | Converts text into numbers (vectors) for smart search | Best open-source embedding model for retrieval |
| Weaviate | Local vector database that stores and searches chunks | Supports Hybrid Search (keyword + semantic) out of the box |
| Docker | Runs Weaviate as a container | Easy, no-install way to run Weaviate locally |
| pdfjs-dist | Extracts text from PDF files page by page | Mozilla's official PDF library — works with Node.js 24, no native dependencies |
| Next.js | The web framework for the chat UI | React-based, runs locally in your browser |
Your PDF
→ pdfjs-dist extracts text from each page (text-based PDFs only)
→ Split into ~350 character chunks (with 50 char overlap)
→ Each chunk is labelled with its page number
→ nomic-embed-text converts each chunk into a vector (list of numbers)
→ Text + vector stored together in Weaviate
Your Question
→ Weaviate runs TWO searches simultaneously:
1. Dense (Semantic) Search — finds chunks with similar meaning
2. BM25 Keyword Search — finds chunks with exact word matches
→ Both results are blended 50/50 (alpha = 0.5)
→ Top 5 most relevant chunks are returned
Why Hybrid Search? Pure AI search often misses exact names, IDs, and numbers. BM25 catches those. Together, they give the best of both worlds.
Top 5 Chunks + Your Question
→ Sent to Gemma 4 (or Phi-3 Mini) via Ollama
→ Model is instructed: "Answer ONLY using the context below"
→ Final human-readable answer is shown in the chat
Note
PocketRAG works on macOS, Windows, and Linux. All tools below are cross-platform.
Download from: https://nodejs.org/
To verify: open your terminal and run:
node --version
# Should print: v24.15.0This project was built and tested on Node.js v24.15.0. Use this version to avoid compatibility issues.
Docker lets you run Weaviate without installing it directly.
Download from: https://www.docker.com/products/docker-desktop/
After installing, open Docker Desktop and make sure it's running (you'll see the Docker whale icon in your taskbar/menu bar).
Ollama is the most important piece. It lets you run AI models (like Gemma and Phi) entirely on your own computer, for free, with no API keys.
Download and install from: https://ollama.com/download
After installing, Ollama runs silently in the background. To verify it's working, open your terminal and run:
ollama --version
# Should print something like: ollama version 0.x.xOpen your terminal and run this Docker command:
docker run -d \
-p 8080:8080 \
-p 50051:50051 \
cr.weaviate.io/semitechnologies/weaviate:1.27.0To verify it's running: open http://localhost:8080/v1/meta in your browser. You should see a JSON response.
You only need to run this once. Next time, use
docker start <container-id>or restart from Docker Desktop.
Open your terminal and run these commands one at a time:
# This is the embedding model — converts text into searchable vectors
# Size: ~274 MB
ollama pull nomic-embed-text# This is the main LLM — reads your chunks and writes answers
# Size: ~3.3 GB (takes a few minutes to download)
ollama pull gemma4# This is a smaller, faster alternative LLM (optional but recommended)
# Size: ~2.2 GB
ollama pull phi3:miniThese only need to be downloaded once. After that, they live on your machine permanently.
# Install dependencies
npm install --legacy-peer-depsThe
--legacy-peer-depsflag is needed because some LangChain packages have minor version conflicts. This is safe and required.
npm run devOpen http://localhost:3000 in your browser. You should see the PDF Assistant chat interface.
- Select a PDF — Click "Select PDF" in the top bar and choose any PDF file from your computer.
- Upload & Index — Click Upload. Wait for the status to change to "PDF Indexed ✅". (This may take 30–60 seconds for large PDFs.)
- Choose a Model — Use the dropdown to switch between Gemma 4 (more capable) and Phi-3 Mini (faster).
- Ask Questions — Type your question and press Enter or click the send button.
- Clear & Reload — Click Clear DB to remove the current PDF and upload a new one.
Important
Only text-based PDFs are supported.
This app extracts text directly from the PDF file. It does not support:
- ❌ Scanned PDFs — PDFs that are photos/images of documents (no selectable text layer)
- ❌ Image-only PDFs — PDFs where content is embedded as pictures
- ❌ Handwritten documents — even if scanned
- ❌ Password-protected PDFs
- ❌ OCR (Optical Character Recognition) — not built in
How to check if your PDF is supported: Open it in any viewer and try to highlight text with your cursor. If you can select individual words, it will work. If the cursor selects the whole page like an image, it won't.
If you upload an unsupported PDF, the app will show a clear error message in the chat.
| Problem | Fix |
|---|---|
| Status shows "No PDF uploaded ❌" after refresh even though I uploaded | Make sure the Weaviate Docker container is still running |
| App is slow to respond | The LLM is running on your CPU. Phi-3 Mini is faster if you need quicker responses |
ollama pull fails |
Make sure Ollama is installed and running (ollama serve in terminal) |
| Docker command fails | Make sure Docker Desktop is open and running |
npm install errors |
Try npm install --legacy-peer-deps — the flag is required |
For development, the app has a built-in debug mode controlled by lib/settings.ts:
// lib/settings.ts
const settings = {
LOCAL_DEBUGGING: true, // set to false to disable
};When LOCAL_DEBUGGING is true, two files are automatically written to the _local_debug/ folder:
| File | Written when | Contents |
|---|---|---|
debug_uploaded_file_chunks.txt |
PDF is uploaded | Total chunk count + full text of every chunk |
debug_qna.txt |
Question is asked | Timestamp, question, retrieved context, LLM answer |
Use these to diagnose:
- Is the PDF being parsed correctly? (
debug_uploaded_file_chunks.txt) - Is Weaviate retrieving the right context? (
debug_qna.txt) - Is the LLM answering from context or hallucinating?
Note
The _local_debug/ folder is git-ignored — debug files are never pushed to GitHub.
doc-search-app/
├── app/
│ ├── api/
│ │ ├── ask/route.ts # POST /api/ask — runs hybrid search + LLM
│ │ ├── delete/route.ts # POST /api/delete — wipes Weaviate collection
│ │ ├── status/route.ts # GET /api/status — checks if PDF is indexed
│ │ └── upload/route.ts # POST /api/upload — ingests PDF into Weaviate
│ ├── page.tsx # Chat UI
│ └── layout.tsx # Root layout
├── lib/
│ ├── rag.ts # Core RAG logic (loadPDF, askQuestion, deletePDFData)
│ └── settings.ts # App-wide settings (LOCAL_DEBUGGING toggle)
├── _local_debug/ # Debug output — git-ignored, local use only
│ ├── debug_uploaded_file_chunks.txt
│ └── debug_qna.txt
└── uploads/ # Temporary storage for uploaded PDFs