PocketRAG — Local RAG for PDF Documents

Run AI on your own computer. No cloud. No API keys. No data leaves your machine.

PocketRAG is a fully offline, self-hosted Retrieval-Augmented Generation (RAG) application that lets you chat with any PDF using local AI models. Built with Ollama, Weaviate, and Next.js — it runs entirely on your laptop, even without internet.

Perfect for: private documents, sensitive data, offline environments, air-gapped systems, or anyone who wants a free, local alternative to ChatGPT PDF tools.

Key Features

🔒 100% Offline & Private — No data sent to any server. Everything runs locally.
🤖 Local LLM — Uses small language models (Gemma 4, Phi-3 Mini) via Ollama. No GPU needed.
🔍 Hybrid Search — Combines dense vector (semantic) search + BM25 keyword search for high-precision retrieval.
📄 PDF Document Q&A — Upload any text-based PDF and ask questions in plain English.
🗄️ Local Vector Database — Weaviate runs in Docker on your machine. No cloud vector DB needed.
⚡ Grounded Answers — The model only answers from your document. No hallucination from external knowledge.
🧩 Switchable Models — Toggle between Gemma 4 (capable) and Phi-3 Mini (faster) in the UI.
🛠️ Developer-Friendly — Built-in debug mode writes chunk logs and Q&A traces to local files.

What is RAG? (Simple Explanation)

RAG = Retrieval-Augmented Generation

Imagine you give a very smart assistant a book to read. Instead of memorizing the whole book, the assistant bookmarks the most relevant pages when you ask a question, then answers using only those pages.

That's exactly what this app does with your PDF:

Reads your PDF and breaks it into small searchable pieces (called "chunks")
Stores those chunks in a local vector database (Weaviate)
When you ask a question, it runs hybrid search to find the most relevant chunks
Feeds those chunks to a local LLM (via Ollama) to generate a human-readable answer

The AI never guesses — it only answers from your document. This technique is called Retrieval-Augmented Generation (RAG).

What Tools Are Used (and Why)

Tool	What it does	Why we use it
Ollama	Runs AI models locally on your computer	So no data ever leaves your machine
Gemma 4 / Phi-3 Mini	The LLM that reads chunks and writes answers	Small, fast models that run without a GPU
nomic-embed-text	Converts text into numbers (vectors) for smart search	Best open-source embedding model for retrieval
Weaviate	Local vector database that stores and searches chunks	Supports Hybrid Search (keyword + semantic) out of the box
Docker	Runs Weaviate as a container	Easy, no-install way to run Weaviate locally
pdfjs-dist	Extracts text from PDF files page by page	Mozilla's official PDF library — works with Node.js 24, no native dependencies
Next.js	The web framework for the chat UI	React-based, runs locally in your browser

How It Works (Under the Hood)

Step 1 — PDF Upload & Indexing

Your PDF
  → pdfjs-dist extracts text from each page (text-based PDFs only)
  → Split into ~350 character chunks (with 50 char overlap)
  → Each chunk is labelled with its page number
  → nomic-embed-text converts each chunk into a vector (list of numbers)
  → Text + vector stored together in Weaviate

Step 2 — Asking a Question (Hybrid Search)

Your Question
  → Weaviate runs TWO searches simultaneously:
      1. Dense (Semantic) Search — finds chunks with similar meaning
      2. BM25 Keyword Search     — finds chunks with exact word matches
  → Both results are blended 50/50 (alpha = 0.5)
  → Top 5 most relevant chunks are returned

Why Hybrid Search? Pure AI search often misses exact names, IDs, and numbers. BM25 catches those. Together, they give the best of both worlds.

Step 3 — Answer Generation

Top 5 Chunks + Your Question
  → Sent to Gemma 4 (or Phi-3 Mini) via Ollama
  → Model is instructed: "Answer ONLY using the context below"
  → Final human-readable answer is shown in the chat

Prerequisites — Install These First

Note

PocketRAG works on macOS, Windows, and Linux. All tools below are cross-platform.

1. Node.js (v24.15.0)

Download from: https://nodejs.org/

To verify: open your terminal and run:

node --version
# Should print: v24.15.0

This project was built and tested on Node.js v24.15.0. Use this version to avoid compatibility issues.

2. Docker Desktop

Docker lets you run Weaviate without installing it directly.

Download from: https://www.docker.com/products/docker-desktop/

After installing, open Docker Desktop and make sure it's running (you'll see the Docker whale icon in your taskbar/menu bar).

3. Ollama — Run AI Models Locally

Ollama is the most important piece. It lets you run AI models (like Gemma and Phi) entirely on your own computer, for free, with no API keys.

Download and install from: https://ollama.com/download

After installing, Ollama runs silently in the background. To verify it's working, open your terminal and run:

ollama --version
# Should print something like: ollama version 0.x.x

Setup (Step by Step)

Step 1 — Start Weaviate (the database)

Open your terminal and run this Docker command:

docker run -d \
  -p 8080:8080 \
  -p 50051:50051 \
  cr.weaviate.io/semitechnologies/weaviate:1.27.0

To verify it's running: open http://localhost:8080/v1/meta in your browser. You should see a JSON response.

You only need to run this once. Next time, use docker start <container-id> or restart from Docker Desktop.

Step 2 — Download the AI Models via Ollama

Open your terminal and run these commands one at a time:

# This is the embedding model — converts text into searchable vectors
# Size: ~274 MB
ollama pull nomic-embed-text

# This is the main LLM — reads your chunks and writes answers
# Size: ~3.3 GB (takes a few minutes to download)
ollama pull gemma4

# This is a smaller, faster alternative LLM (optional but recommended)
# Size: ~2.2 GB
ollama pull phi3:mini

These only need to be downloaded once. After that, they live on your machine permanently.

Step 3 — Clone and Install the App

# Install dependencies
npm install --legacy-peer-deps

The --legacy-peer-deps flag is needed because some LangChain packages have minor version conflicts. This is safe and required.

Step 4 — Run the App

npm run dev

Open http://localhost:3000 in your browser. You should see the PDF Assistant chat interface.

Usage

Select a PDF — Click "Select PDF" in the top bar and choose any PDF file from your computer.
Upload & Index — Click Upload. Wait for the status to change to "PDF Indexed ✅". (This may take 30–60 seconds for large PDFs.)
Choose a Model — Use the dropdown to switch between Gemma 4 (more capable) and Phi-3 Mini (faster).
Ask Questions — Type your question and press Enter or click the send button.
Clear & Reload — Click Clear DB to remove the current PDF and upload a new one.

Limitations

Important

Only text-based PDFs are supported.

This app extracts text directly from the PDF file. It does not support:

❌ Scanned PDFs — PDFs that are photos/images of documents (no selectable text layer)
❌ Image-only PDFs — PDFs where content is embedded as pictures
❌ Handwritten documents — even if scanned
❌ Password-protected PDFs
❌ OCR (Optical Character Recognition) — not built in

How to check if your PDF is supported: Open it in any viewer and try to highlight text with your cursor. If you can select individual words, it will work. If the cursor selects the whole page like an image, it won't.

If you upload an unsupported PDF, the app will show a clear error message in the chat.

Troubleshooting

Problem	Fix
Status shows "No PDF uploaded ❌" after refresh even though I uploaded	Make sure the Weaviate Docker container is still running
App is slow to respond	The LLM is running on your CPU. Phi-3 Mini is faster if you need quicker responses
`ollama pull` fails	Make sure Ollama is installed and running (`ollama serve` in terminal)
Docker command fails	Make sure Docker Desktop is open and running
`npm install` errors	Try `npm install --legacy-peer-deps` — the flag is required

Local Debugging

For development, the app has a built-in debug mode controlled by lib/settings.ts:

// lib/settings.ts
const settings = {
  LOCAL_DEBUGGING: true,  // set to false to disable
};

When LOCAL_DEBUGGING is true, two files are automatically written to the _local_debug/ folder:

File	Written when	Contents
`debug_uploaded_file_chunks.txt`	PDF is uploaded	Total chunk count + full text of every chunk
`debug_qna.txt`	Question is asked	Timestamp, question, retrieved context, LLM answer

Use these to diagnose:

Is the PDF being parsed correctly? (debug_uploaded_file_chunks.txt)
Is Weaviate retrieving the right context? (debug_qna.txt)
Is the LLM answering from context or hallucinating?

Note

The _local_debug/ folder is git-ignored — debug files are never pushed to GitHub.

Project Structure

doc-search-app/
├── app/
│   ├── api/
│   │   ├── ask/route.ts       # POST /api/ask — runs hybrid search + LLM
│   │   ├── delete/route.ts    # POST /api/delete — wipes Weaviate collection
│   │   ├── status/route.ts    # GET /api/status — checks if PDF is indexed
│   │   └── upload/route.ts    # POST /api/upload — ingests PDF into Weaviate
│   ├── page.tsx               # Chat UI
│   └── layout.tsx             # Root layout
├── lib/
│   ├── rag.ts                 # Core RAG logic (loadPDF, askQuestion, deletePDFData)
│   └── settings.ts            # App-wide settings (LOCAL_DEBUGGING toggle)
├── _local_debug/              # Debug output — git-ignored, local use only
│   ├── debug_uploaded_file_chunks.txt
│   └── debug_qna.txt
└── uploads/                   # Temporary storage for uploaded PDFs

UI

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
_local_debug		_local_debug
app		app
lib		lib
uploads		uploads
.gitignore		.gitignore
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PocketRAG — Local RAG for PDF Documents

Key Features

What is RAG? (Simple Explanation)

What Tools Are Used (and Why)

How It Works (Under the Hood)

Step 1 — PDF Upload & Indexing

Step 2 — Asking a Question (Hybrid Search)

Step 3 — Answer Generation

Prerequisites — Install These First

1. Node.js (v24.15.0)

2. Docker Desktop

3. Ollama — Run AI Models Locally

Setup (Step by Step)

Step 1 — Start Weaviate (the database)

Step 2 — Download the AI Models via Ollama

Step 3 — Clone and Install the App

Step 4 — Run the App

Usage

Limitations

Troubleshooting

Local Debugging

Project Structure

UI

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PocketRAG — Local RAG for PDF Documents

Key Features

What is RAG? (Simple Explanation)

What Tools Are Used (and Why)

How It Works (Under the Hood)

Step 1 — PDF Upload & Indexing

Step 2 — Asking a Question (Hybrid Search)

Step 3 — Answer Generation

Prerequisites — Install These First

1. Node.js (v24.15.0)

2. Docker Desktop

3. Ollama — Run AI Models Locally

Setup (Step by Step)

Step 1 — Start Weaviate (the database)

Step 2 — Download the AI Models via Ollama

Step 3 — Clone and Install the App

Step 4 — Run the App

Usage

Limitations

Troubleshooting

Local Debugging

Project Structure

UI

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages