Skip to content

fariha-batool/Agentic-Rag-for-Harry-Potter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agentic RAG (Retrieval-Augmented Generation) CLI

An interactive command-line assistant that answers questions using local documents and web search, powered by Groq LLMs and LangGraph. It supports memory, reranking, and citation-based answers with source attribution.

Features

  • Retrieval-Augmented Generation — Combines vector search (Chroma) and BM25 with optional web search for robust answers.
  • Source attribution — Answers cite sources with labels: [L1] (local docs), [W1] (web), [M1] (memory).
  • Memory store — Optional recall and storage of facts across sessions.
  • Web search — DuckDuckGo for live web results.
  • Document ingestion — PDF and text files in data/, chunked and indexed for retrieval.
  • Reranking — Cross-encoder to select the most relevant context before generation.
  • Interactive CLI — Rich interface with session management and command history.
  • Critic module — Verifies factual claims against evidence; can trigger a second pass (e.g. web) when more evidence is needed.

Requirements

  • Python 3.12+
  • Groq API key (required for the LLM)

Setup

  1. Clone and install dependencies

    cd Agentic-Rag-for-Harry-Potter
    uv sync
    # or: pip install -e .
  2. Configure environment

    cp .env.example .env

    Edit .env and set at least:

  3. Ingest documents

    Place PDFs or text files in data/ (e.g. data/harrypotter.pdf), then run:

    python -m src.ingest

    This builds the vector index (Chroma), BM25 index, and parent map under ./storage/. Re-run after adding or changing documents.

Usage

Interactive CLI (default)

python main.py

Type a question and press Enter. Use:

  • /new — start a new session (new thread id)
  • /id — show current session id
  • /quit — exit
  • /help — list commands
  • /clear-mem — clear long-term memories

Single question (non-interactive)

python main.py -q "Who is Harry Potter?"
python main.py --thread-id my-session -q "What house is Harry in?"

Environment variables

Variable Required Default Description
GROQ_API_KEY Yes Groq API key for the LLM.
RAG_USE_MEMORY No 1 Use memory recall: 1 or 0.
RAG_ALLOW_MEMORY_CITATIONS No 0 Allow citing memory (M#) in answers: 1 or 0.
RAG_REMEMBER_ANSWERS No 0 Persist answers into memory after each run: 1 or 0.

See .env.example for a copy-paste template.

Project layout

├── main.py              # Entry point: CLI, single-question mode
├── data/                 # Documents to ingest (PDF, txt, etc.)
├── storage/              # Created by ingest: Chroma, BM25, parent map
├── state.sqlite          # LangGraph checkpoint state (created on first run)
├── src/
│   ├── graph.py          # LangGraph pipeline: plan → retrieve → maybe_web → rerank → answer → critic
│   ├── ingest.py         # Document chunking, embeddings, Chroma + BM25
│   ├── retrieval.py     # Vector + BM25 search, RRF merge, parent expand
│   ├── compression.py   # Contextual compression of snippets
│   ├── rerank.py        # Cross-encoder reranking
│   ├── webtools.py      # Web search (DuckDuckGo) + page fetch
│   ├── llm.py           # Groq chat wrapper
│   └── memory_store.py  # Long-term memory (recall / remember)
├── .env.example
└── README.md

How it works

  1. Plan — Classifies the query as local-only or web-augmented (e.g. “actor”, “movie”, “today” → web).
  2. Retrieve — Vector + BM25 over storage/, RRF merge, parent expansion, contextual compression; optional memory recall.
  3. Maybe web — If routed to web, runs search, fetches pages, compresses and adds as W# sources.
  4. Rerank — Cross-encoder narrows context to top snippets (keeps at least one web snippet when web was used).
  5. Answer — LLM generates an answer with citations only from the allowed labels.
  6. Critic — Second LLM checks facts against evidence; if it returns NEED_MORE:, the graph loops back (e.g. forces a web pass) and continues.

State is checkpointed in state.sqlite per --thread-id, so multi-turn and sessions are stable across runs.

About

An interactive command-line assistant that answers questions using local documents and web search, powered by Groq LLMs and LangGraph. It supports memory, reranking, and citation-based answers with source attribution.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages