You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BookX - AI-Powered Book Explainer & Learning Platform
BookX is an AI-powered book explainer and learning platform that transforms any book or document into a rich, interactive study experience. Upload your content as a PDF, and BookX auto-generates a structured table of contents, delivers AI-powered explanations with voice narration, enables interactive Q&A (text and voice), creates flashcards, generates quizzes, and provides a context-aware chat assistant — all driven by cutting-edge AI models.
Features
Upload & Management — Upload books or documents as PDFs via Cloudinary (max 25 MB); Gemini auto-generates a title, description, and structured index (chapters/sections with page mappings) in the background.
AI Explanation with Voice — Select any page range from the index; Gemini 2.5 Flash generates a concise explanation that is converted to speech via Minimax TTS (fallback: Gemini TTS) and cached in MongoDB.
Interactive Q&A (Voice + Text) — Ask follow-up questions by text or microphone (audio → Groq Whisper STT → Groq Llama 3.1 answering → Minimax/Gemini TTS audio response).
WebSocket Explain Mode — Real-time streaming explanation over WebSocket (/ws/explain/{pdf_id}) with pause/resume/stop controls and sentence-level playback sync.
Flashcard Generation — Gemini 2.5 Flash reads extracted PDF pages and produces exactly 10 Q&A flashcards per topic/section.
Quiz Generation & Submission — Gemini produces multiple-choice quizzes with answer explanations; quiz attempts (score, time) are stored in MongoDB.
PDF Chat — Send any query with a page range; Gemini 2.0 Flash answers with the selected PDF pages as context.
Page Extraction — Backend can return a selected page range as a raw PDF file or as base64-encoded PNG images (with configurable zoom).
Google OAuth Authentication — Sign in with Google; JWT tokens (HS256, ~50-hour expiry) used for all protected endpoints.
Google Cloud Console project (OAuth 2.0 client ID)
Cloudinary account + unsigned upload preset
Google AI Studio API key (Gemini)
Groq API key
Minimax API key (optional — Gemini TTS used as fallback)
Backend Setup
cd backend
# Install dependencies
pip install -r requirements.txt
# Copy and fill in environment variables
cp .env.example .env
# Start the server
python main.py
Note: BookX uses PDF as its input format, but it is designed to work with any book, textbook, research paper, or document — the PDF format is simply the medium for delivering your content to the AI.
Environment Variables
Backend (backend/.env)
Variable
Description
MONGODB_URL
MongoDB connection string
DATABASE_NAME
Database name (e.g. bookx)
JWT_SECRET_KEY
Secret key for signing JWT tokens
GOOGLE_CLIENT_ID
Google OAuth 2.0 client ID
GOOGLE_API_KEY / GEMINI_API_KEY
Google Gemini API key
CLOUDINARY_CLOUD_NAME
Cloudinary cloud name
CLOUDINARY_UPLOAD_PRESET
Cloudinary unsigned upload preset
GROQ_API_KEY
Groq API key (LLM + Whisper STT)
GROQ_CHAT_MODEL
Groq chat model (default: llama-3.1-8b-instant)
GROQ_STT_MODEL
Groq STT model (default: whisper-large-v3)
MINIMAX_API_KEY
Minimax TTS API key (optional)
Frontend (frontend/.env.local)
Variable
Description
NEXT_PUBLIC_API_URL
Backend API base URL
NEXT_PUBLIC_GOOGLE_CLIENT_ID
Google OAuth 2.0 client ID
API Endpoints
Authentication (/auth)
Method
Path
Description
POST
/auth/google
Verify Google ID token → return JWT + user info
POST
/auth/logout
Logout (client-side token removal)
GET
/auth/verify
Validate JWT token
GET
/auth/me
Get current authenticated user
PDF Management (/pdfs)
Method
Path
Description
POST
/pdfs/process
Register Cloudinary-uploaded PDF; runs Gemini analysis + background index extraction
GET
/pdfs/
List all PDFs for the authenticated user
GET
/pdfs/{pdf_id}
Get PDF metadata & index
DELETE
/pdfs/{pdf_id}
Delete a PDF
POST
/pdfs/{pdf_id}/analyze
Trigger (re-)index extraction in background
GET
/pdfs/{pdf_id}/pages
Stream selected page range as a PDF file
GET
/pdfs/{pdf_id}/pages/images
Return selected pages as base64 PNG images
AI Features (/pdfs/{pdf_id}/content, /api/pdfs/...)
Method
Path
Description
POST
/pdfs/{pdf_id}/content
Generate read/explain content for a page range
POST
/api/pdfs/{pdf_id}/explain
Generate AI explanation + TTS audio (cached in MongoDB)
POST
/api/pdfs/{pdf_id}/qa
Text Q&A → Groq Llama answer → TTS audio (base64)
POST
/api/pdfs/{pdf_id}/qa/audio
Voice Q&A → Groq Whisper STT → Llama → TTS audio
POST
/api/pdfs/{pdf_id}/chat
Chat with PDF pages as context via Gemini
Flashcards
Method
Path
Description
POST
/pdfs/{pdf_id}/flashcards
Generate 10 flashcards for a page range/topic via Gemini
GET
/pdfs/{pdf_id}/flashcards
List saved flashcard sets
GET
/flashcards/{flashcard_id}
Get specific flashcard set
DELETE
/flashcards/{flashcard_id}
Delete flashcard set
Quizzes
Method
Path
Description
POST
/pdfs/{pdf_id}/quizzes
Generate multiple-choice quiz for a page range/topic
GET
/pdfs/{pdf_id}/quizzes
List quizzes for a PDF
GET
/quizzes/{quiz_id}
Get specific quiz
DELETE
/quizzes/{quiz_id}
Delete quiz
POST
/quizzes/{quiz_id}/submit
Submit quiz attempt (saves score + time)
GET
/quizzes/{quiz_id}/attempts
Get all attempts for a quiz
WebSocket
Path
Description
WS /ws/explain/{pdf_id}
Real-time streaming explanation with pause/resume/stop and audio input support
User Workflow
Sign In — Authenticate with Google OAuth
Upload PDF — Upload via Cloudinary widget (max 25 MB)
Auto-Analysis — Gemini extracts title, description, and a structured chapter/section index in the background
Browse Index — Navigate chapters and sections from the generated table of contents
Learn — Choose a section to:
Read — View selected pages
Listen — Get an AI-generated explanation read aloud (Minimax/Gemini TTS, cached)
Ask Questions — Follow up by text or voice; get spoken answers from Groq + TTS
Flashcards — Generate 10 Q&A cards for active recall
Quiz — Take a multiple-choice quiz with scoring and time tracking
Chat — Ask free-form questions about any page range
License
MIT License
About
AI-powered book explainer that turns any PDF into an interactive learning experience — voice explanations, Q&A, flashcards, quizzes, and chapter-wise chat using Gemini, Groq & Minimax.