An interactive command-line assistant that answers questions using local documents and web search, powered by Groq LLMs and LangGraph. It supports memory, reranking, and citation-based answers with source attribution.
- Retrieval-Augmented Generation — Combines vector search (Chroma) and BM25 with optional web search for robust answers.
- Source attribution — Answers cite sources with labels:
[L1](local docs),[W1](web),[M1](memory). - Memory store — Optional recall and storage of facts across sessions.
- Web search — DuckDuckGo for live web results.
- Document ingestion — PDF and text files in
data/, chunked and indexed for retrieval. - Reranking — Cross-encoder to select the most relevant context before generation.
- Interactive CLI — Rich interface with session management and command history.
- Critic module — Verifies factual claims against evidence; can trigger a second pass (e.g. web) when more evidence is needed.
- Python 3.12+
- Groq API key (required for the LLM)
-
Clone and install dependencies
cd Agentic-Rag-for-Harry-Potter uv sync # or: pip install -e .
-
Configure environment
cp .env.example .env
Edit
.envand set at least:GROQ_API_KEY— get one at Groq Console.
-
Ingest documents
Place PDFs or text files in
data/(e.g.data/harrypotter.pdf), then run:python -m src.ingest
This builds the vector index (Chroma), BM25 index, and parent map under
./storage/. Re-run after adding or changing documents.
Interactive CLI (default)
python main.pyType a question and press Enter. Use:
/new— start a new session (new thread id)/id— show current session id/quit— exit/help— list commands/clear-mem— clear long-term memories
Single question (non-interactive)
python main.py -q "Who is Harry Potter?"
python main.py --thread-id my-session -q "What house is Harry in?"| Variable | Required | Default | Description |
|---|---|---|---|
GROQ_API_KEY |
Yes | — | Groq API key for the LLM. |
RAG_USE_MEMORY |
No | 1 |
Use memory recall: 1 or 0. |
RAG_ALLOW_MEMORY_CITATIONS |
No | 0 |
Allow citing memory (M#) in answers: 1 or 0. |
RAG_REMEMBER_ANSWERS |
No | 0 |
Persist answers into memory after each run: 1 or 0. |
See .env.example for a copy-paste template.
├── main.py # Entry point: CLI, single-question mode
├── data/ # Documents to ingest (PDF, txt, etc.)
├── storage/ # Created by ingest: Chroma, BM25, parent map
├── state.sqlite # LangGraph checkpoint state (created on first run)
├── src/
│ ├── graph.py # LangGraph pipeline: plan → retrieve → maybe_web → rerank → answer → critic
│ ├── ingest.py # Document chunking, embeddings, Chroma + BM25
│ ├── retrieval.py # Vector + BM25 search, RRF merge, parent expand
│ ├── compression.py # Contextual compression of snippets
│ ├── rerank.py # Cross-encoder reranking
│ ├── webtools.py # Web search (DuckDuckGo) + page fetch
│ ├── llm.py # Groq chat wrapper
│ └── memory_store.py # Long-term memory (recall / remember)
├── .env.example
└── README.md
- Plan — Classifies the query as local-only or web-augmented (e.g. “actor”, “movie”, “today” → web).
- Retrieve — Vector + BM25 over
storage/, RRF merge, parent expansion, contextual compression; optional memory recall. - Maybe web — If routed to web, runs search, fetches pages, compresses and adds as
W#sources. - Rerank — Cross-encoder narrows context to top snippets (keeps at least one web snippet when web was used).
- Answer — LLM generates an answer with citations only from the allowed labels.
- Critic — Second LLM checks facts against evidence; if it returns
NEED_MORE:, the graph loops back (e.g. forces a web pass) and continues.
State is checkpointed in state.sqlite per --thread-id, so multi-turn and sessions are stable across runs.