🧠 RAG Chat Assistant

A production-ready Retrieval-Augmented Generation (RAG) system that combines vector search with LLM capabilities to answer questions from your documents.

PREVIEW:

🎯 What is RAG?

Retrieval-Augmented Generation (RAG) is an AI architecture that enhances LLM responses by retrieving relevant context from a knowledge base before generating answers. This eliminates hallucinations and enables AI to answer questions about your private documents.

┌─────────────────────────────────────────────────────────────────────────────┐
│                           RAG ARCHITECTURE                                   │
└─────────────────────────────────────────────────────────────────────────────┘

    📄 Document                    🔍 Query                      💬 Response
        │                              │                              ▲
        ▼                              ▼                              │
  ┌───────────┐                 ┌───────────┐                 ┌───────────┐
  │  Chunking │                 │  Embed    │                 │    LLM    │
  │  & Embed  │                 │  Query    │                 │  Generate │
  └─────┬─────┘                 └─────┬─────┘                 └─────┬─────┘
        │                              │                              │
        ▼                              ▼                              │
  ┌─────────────────────────────────────────────┐                    │
  │           🗄️ PINECONE VECTOR DATABASE        │◄───────────────────┘
  │                                             │     Retrieved
  │   [████] doc-1    similarity: 0.92         │     Context
  │   [████] doc-2    similarity: 0.87         │
  │   [████] doc-3    similarity: 0.81         │
  └─────────────────────────────────────────────┘

✨ Features

Feature	Description
📤 Document Upload	Upload PDFs and text files via web UI
🔍 Semantic Search	Find relevant content using vector similarity
🎯 Reranking	Improve search accuracy with BGE reranker
💬 Chat Interface	Modern, responsive chat UI
🧠 LLM Integration	Groq's Llama 3.3 70B for fast responses
📊 Context Display	View retrieved sources for transparency
🔄 Session Memory	Multi-turn conversations with context

🏗️ System Architecture

┌──────────────────────────────────────────────────────────────────────────────┐
│                              SYSTEM OVERVIEW                                  │
└──────────────────────────────────────────────────────────────────────────────┘

                                   ┌─────────────┐
                                   │   Browser   │
                                   │  (Chat UI)  │
                                   └──────┬──────┘
                                          │
                           HTTP POST /api/chat, /api/ingest
                                          │
                                          ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                           EXPRESS.JS SERVER                                   │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐               │
│  │   /api/chat     │  │  /api/ingest    │  │  /api/health    │               │
│  │                 │  │                 │  │                 │               │
│  │  • Receive msg  │  │  • Upload file  │  │  • Index stats  │               │
│  │  • RAG search   │  │  • Chunk text   │  │  • Health check │               │
│  │  • LLM call     │  │  • Store embeds │  │                 │               │
│  └────────┬────────┘  └────────┬────────┘  └─────────────────┘               │
└───────────┼─────────────────────┼────────────────────────────────────────────┘
            │                     │
            ▼                     ▼
┌───────────────────┐   ┌───────────────────┐   ┌───────────────────┐
│     PINECONE      │   │    PDF LOADER     │   │       GROQ        │
│   Vector Store    │   │   Text Splitter   │   │   LLM (Llama 3)   │
│                   │   │                   │   │                   │
│ • Store vectors   │   │ • Parse PDFs      │   │ • Generate answer │
│ • Semantic search │   │ • Chunk @ 500     │   │ • Tool calling    │
│ • BGE reranking   │   │ • 100 overlap     │   │ • Fast inference  │
└───────────────────┘   └───────────────────┘   └───────────────────┘

🚀 Quick Start

Prerequisites

Node.js 18+ installed
Pinecone account (free tier works)
Groq account (free tier works)

1. Clone & Install

git clone <your-repo-url>
cd RAG
npm install

2. Environment Setup

Create a .env file:

# Pinecone - Get from https://console.pinecone.io
PINECONE_API_KEY=pcsk_xxxxxxxxxxxxx

# Groq - Get from https://console.groq.com
GROQ_API_KEY=gsk_xxxxxxxxxxxxx

# Optional
OPENAI_API_KEY=sk-xxxxxxxxxxxxx

3. Create Pinecone Index

The index uses integrated embeddings (Pinecone generates embeddings automatically):

# First time only - creates index with llama-text-embed-v2 model
npm run dev

Or manually via Pinecone CLI:

pc index create -n rag-embedded-index -m cosine -c aws -r us-east-1 \
  --model llama-text-embed-v2 --field_map text=content

4. Start the Server

npm start
# or for development with hot reload:
npm run server

5. Open the App

Navigate to http://localhost:3000 and start chatting!

📁 Project Structure

RAG/
├── 📄 server.js          # Express server with API endpoints
├── 📄 index.js           # Document ingestion utilities
├── 📁 public/
│   └── 📄 index.html     # Chat interface (single-page app)
├── 📁 data/              # Sample documents
├── 📁 uploads/           # Temporary upload storage
├── 📄 package.json       # Dependencies
├── 📄 .env               # Environment variables
└── 📄 README.md          # You are here!

🔌 API Reference

Chat Endpoint

POST /api/chat
Content-Type: application/json

{
  "message": "What are the key ML concepts?",
  "sessionId": "optional-session-id"
}

Response:

{
  "response": "Based on the knowledge base, key ML concepts include...",
  "toolsUsed": ["rag_search"],
  "context": "[Source 1] (Score: 0.92)\nML basics include..."
}

Document Ingestion

POST /api/ingest
Content-Type: multipart/form-data

file: <PDF or TXT file>

Response:

{
  "success": true,
  "message": "Successfully ingested document.pdf",
  "chunksCreated": 15,
  "totalRecords": 24
}

Health Check

GET /api/health

Response:

{
  "status": "ok",
  "index": "rag-embedded-index",
  "records": 24
}

🎨 Data Flow Diagrams

Document Ingestion Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                        DOCUMENT INGESTION PIPELINE                           │
└─────────────────────────────────────────────────────────────────────────────┘

  📄 PDF/TXT File
       │
       ▼
  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
  │   Upload    │───▶│   Parse     │───▶│   Chunk     │───▶│   Store     │
  │   (Multer)  │    │ (PDFLoader) │    │ (500 chars) │    │ (Pinecone)  │
  └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
                                              │
                                              ▼
                                    ┌─────────────────┐
                                    │  Chunk 1: "..."  │
                                    │  Chunk 2: "..."  │
                                    │  Chunk 3: "..."  │
                                    │      ...         │
                                    └─────────────────┘
                                              │
                                              ▼
                                    ┌─────────────────┐
                                    │    Pinecone     │
                                    │  Auto-Embeds    │
                                    │  (llama-text)   │
                                    └─────────────────┘

Query Processing Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                          QUERY PROCESSING FLOW                               │
└─────────────────────────────────────────────────────────────────────────────┘

  👤 User: "What is transformer architecture?"
       │
       ▼
  ┌─────────────┐
  │  Groq LLM   │──── Decides to call rag_search tool
  └──────┬──────┘
         │
         ▼
  ┌─────────────────────────────────────────────────────────────────┐
  │                     PINECONE SEARCH                              │
  │                                                                  │
  │   Query: "transformer architecture"                              │
  │                           │                                      │
  │                           ▼                                      │
  │   ┌─────────────────────────────────────────────────────────┐   │
  │   │              SEMANTIC SEARCH (Top 6)                     │   │
  │   └─────────────────────────────────────────────────────────┘   │
  │                           │                                      │
  │                           ▼                                      │
  │   ┌─────────────────────────────────────────────────────────┐   │
  │   │           BGE RERANKER (Top 3)                          │   │
  │   │                                                          │   │
  │   │   #1 [0.92] "Transformer architecture uses attention..." │   │
  │   │   #2 [0.87] "Attention mechanism allows the model..."    │   │
  │   │   #3 [0.81] "Tokenization is the process of..."          │   │
  │   └─────────────────────────────────────────────────────────┘   │
  └─────────────────────────────────────────────────────────────────┘
         │
         ▼
  ┌─────────────┐
  │  Groq LLM   │──── Generates answer using retrieved context
  └──────┬──────┘
         │
         ▼
  💬 "Transformer architecture is a neural network design that uses
      self-attention mechanisms to process sequences..."

🛠️ Tech Stack

Category	Technology	Purpose
Runtime	Node.js 18+	JavaScript runtime
Server	Express 5.x	HTTP server & routing
Vector DB	Pinecone	Vector storage & search
Embeddings	llama-text-embed-v2	Text to vectors (integrated)
Reranker	bge-reranker-v2-m3	Result reranking
LLM	Groq (Llama 3.3 70B)	Response generation
PDF Parsing	LangChain PDFLoader	Document extraction
Chunking	RecursiveCharacterTextSplitter	Text segmentation
File Upload	Multer	Multipart form handling

📊 Performance Characteristics

Metric	Value	Notes
Embedding Dimension	1024	llama-text-embed-v2
Chunk Size	500 chars	With 100 char overlap
Search + Rerank	~200ms	Pinecone serverless
LLM Response	~1-3s	Groq inference
Max Upload	~10MB	PDF/TXT files

🔧 Configuration Options

Chunking Strategy

Modify in server.js:

const splitter = new RecursiveCharacterTextSplitter({
    chunkSize: 500,      // Characters per chunk
    chunkOverlap: 100    // Overlap between chunks
});

Search Parameters

const results = await index.namespace(NAMESPACE).searchRecords({
    query: {
        topK: 6,  // Initial candidates
        inputs: { text: query }
    },
    rerank: {
        model: "bge-reranker-v2-m3",
        topN: 3,  // Final results after reranking
        rankFields: ["content"]
    }
});

🚧 Future Improvements

Multi-file batch upload - Upload multiple documents at once
Document management - Delete/update specific documents
Namespace support - Separate knowledge bases per user/topic
Streaming responses - Real-time token streaming
Authentication - User login and access control
Analytics dashboard - Query logs and usage metrics
Hybrid search - Combine semantic + keyword search

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing)
Open a Pull Request

📜 License

MIT License - feel free to use this project for learning or production!

🙏 Acknowledgments

Pinecone for vector database infrastructure
Groq for blazing-fast LLM inference
LangChain for document processing utilities

Built with ❤️ for the AI Engineering community

⭐ Star this repo • 🐛 Report Bug • ✨ Request Feature

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🧠 RAG Chat Assistant

🎯 What is RAG?

✨ Features

🏗️ System Architecture

🚀 Quick Start

Prerequisites

1. Clone & Install

2. Environment Setup

3. Create Pinecone Index

4. Start the Server

5. Open the App

📁 Project Structure

🔌 API Reference

Chat Endpoint

Document Ingestion

Health Check

🎨 Data Flow Diagrams

Document Ingestion Flow

Query Processing Flow

🛠️ Tech Stack

📊 Performance Characteristics

🔧 Configuration Options

Chunking Strategy

Search Parameters

🚧 Future Improvements

🤝 Contributing

📜 License

🙏 Acknowledgments

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🧠 RAG Chat Assistant

🎯 What is RAG?

✨ Features

🏗️ System Architecture

🚀 Quick Start

Prerequisites

1. Clone & Install

2. Environment Setup

3. Create Pinecone Index

4. Start the Server

5. Open the App

📁 Project Structure

🔌 API Reference

Chat Endpoint

Document Ingestion

Health Check

🎨 Data Flow Diagrams

Document Ingestion Flow

Query Processing Flow

🛠️ Tech Stack

📊 Performance Characteristics

🔧 Configuration Options

Chunking Strategy

Search Parameters

🚧 Future Improvements

🤝 Contributing

📜 License

🙏 Acknowledgments