BookX - AI-Powered Book Explainer & Learning Platform

BookX is an AI-powered book explainer and learning platform that transforms any book or document into a rich, interactive study experience. Upload your content as a PDF, and BookX auto-generates a structured table of contents, delivers AI-powered explanations with voice narration, enables interactive Q&A (text and voice), creates flashcards, generates quizzes, and provides a context-aware chat assistant — all driven by cutting-edge AI models.

Features

Upload & Management — Upload books or documents as PDFs via Cloudinary (max 25 MB); Gemini auto-generates a title, description, and structured index (chapters/sections with page mappings) in the background.
AI Explanation with Voice — Select any page range from the index; Gemini 2.5 Flash generates a concise explanation that is converted to speech via Minimax TTS (fallback: Gemini TTS) and cached in MongoDB.
Interactive Q&A (Voice + Text) — Ask follow-up questions by text or microphone (audio → Groq Whisper STT → Groq Llama 3.1 answering → Minimax/Gemini TTS audio response).
WebSocket Explain Mode — Real-time streaming explanation over WebSocket (/ws/explain/{pdf_id}) with pause/resume/stop controls and sentence-level playback sync.
Flashcard Generation — Gemini 2.5 Flash reads extracted PDF pages and produces exactly 10 Q&A flashcards per topic/section.
Quiz Generation & Submission — Gemini produces multiple-choice quizzes with answer explanations; quiz attempts (score, time) are stored in MongoDB.
PDF Chat — Send any query with a page range; Gemini 2.0 Flash answers with the selected PDF pages as context.
Page Extraction — Backend can return a selected page range as a raw PDF file or as base64-encoded PNG images (with configurable zoom).
Google OAuth Authentication — Sign in with Google; JWT tokens (HS256, ~50-hour expiry) used for all protected endpoints.

Tech Stack

Layer	Technology
Frontend	Next.js 14, React 18, TailwindCSS 3
Backend	FastAPI (Python), Uvicorn
Database	MongoDB Atlas (async Motor driver)
Authentication	Google OAuth 2.0, JWT (python-jose)
File Storage	Cloudinary (unsigned uploads)
AI - PDF Analysis & Explanation	Google Gemini 2.5 Flash (`gemini-2.5-flash`)
AI - Q&A Chat	Google Gemini 2.0 Flash (`gemini-2.0-flash-exp`)
AI - TTS (primary)	Minimax TTS (`speech-2.5-hd-preview`)
AI - TTS (fallback)	Google Gemini TTS (`gemini-2.5-flash-preview-tts`)
AI - Q&A LLM	Groq — Llama 3.1 8B Instant
AI - Speech-to-Text	Groq — Whisper Large v3
PDF Processing	PyMuPDF (fitz), PyPDF2

Project Structure

BookX/
├── backend/                        # FastAPI Python application
│   ├── main.py                     # App entry point, all route registrations, WebSocket
│   ├── auth.py                     # Google OAuth verification, JWT creation/validation
│   ├── pdfs.py                     # PDF CRUD, page extraction (PDF & images), background analysis
│   ├── models.py                   # Pydantic models (User, PDF, Note, Flashcard, Quiz, etc.)
│   ├── database.py                 # Motor async MongoDB connection & collection helpers
│   ├── gemini_service.py           # PDF metadata analysis + structured index extraction
│   ├── gemini_tts_service.py       # Minimax/Gemini TTS + Cloudinary audio upload
│   ├── explanation_service.py      # HTTP explanation API, Q&A (text/audio), PDF chat
│   ├── explain_websocket_service.py# WebSocket real-time explain mode
│   ├── content_service.py          # PDF loading, page extraction (bytes + images)
│   ├── flashcard_service.py        # Flashcard generation via Gemini
│   ├── quiz_service.py             # Quiz generation, attempt submission & retrieval
│   └── requirements.txt
├── frontend/                       # Next.js 14 application
│   ├── app/
│   │   ├── page.js                 # Home: Google sign-in, PDF upload & library
│   │   ├── layout.js               # Root layout
│   │   ├── globals.css
│   │   ├── lib/
│   │   │   ├── auth.js             # Auth service (token storage, user state)
│   │   │   └── api.js              # API client helpers
│   │   ├── components/
│   │   │   ├── GoogleSignIn.js     # Google OAuth button
│   │   │   ├── PDFUpload.js        # Cloudinary upload widget
│   │   │   ├── PDFList.js          # PDF library grid
│   │   │   ├── AboutDocument.js    # PDF metadata & document info panel
│   │   │   ├── IndexContent.js     # Structured table of contents navigation
│   │   │   ├── ReadingExplanation.js # AI explanation + TTS player + WebSocket Q&A
│   │   │   ├── FlashcardComponent.js  # Interactive flashcard UI
│   │   │   ├── FlashcardListComponent.js
│   │   │   ├── QuizComponent.js    # Quiz taking UI with timer & scoring
│   │   │   ├── QuizListComponent.js
│   │   │   └── PDFChatInterface.js # PDF chat query interface
│   │   └── pdf/
│   │       └── [id]/
│   │           ├── page.js         # PDF detail page (index + section navigation)
│   │           ├── read/           # Reading mode page
│   │           ├── explain/        # AI explanation + voice mode page
│   │           ├── flashcard/      # Flashcard generation page
│   │           ├── quiz/           # Quiz page
│   │           └── chat/           # PDF chat page
│   ├── package.json
│   ├── next.config.js
│   └── tailwind.config.js

Quick Start

Prerequisites

Node.js 18+
Python 3.9+
MongoDB Atlas cluster (or local MongoDB)
Google Cloud Console project (OAuth 2.0 client ID)
Cloudinary account + unsigned upload preset
Google AI Studio API key (Gemini)
Groq API key
Minimax API key (optional — Gemini TTS used as fallback)

Backend Setup

cd backend

# Install dependencies
pip install -r requirements.txt

# Copy and fill in environment variables
cp .env.example .env

# Start the server
python main.py

Backend runs at http://localhost:8000. Interactive API docs at http://localhost:8000/docs.

Frontend Setup

cd frontend

# Install dependencies
npm install

# Copy and fill in environment variables
cp .env.example .env.local

# Start the dev server
npm run dev

Frontend runs at http://localhost:3000.

Note: BookX uses PDF as its input format, but it is designed to work with any book, textbook, research paper, or document — the PDF format is simply the medium for delivering your content to the AI.

Environment Variables

Backend (`backend/.env`)

Variable	Description
`MONGODB_URL`	MongoDB connection string
`DATABASE_NAME`	Database name (e.g. `bookx`)
`JWT_SECRET_KEY`	Secret key for signing JWT tokens
`GOOGLE_CLIENT_ID`	Google OAuth 2.0 client ID
`GOOGLE_API_KEY` / `GEMINI_API_KEY`	Google Gemini API key
`CLOUDINARY_CLOUD_NAME`	Cloudinary cloud name
`CLOUDINARY_UPLOAD_PRESET`	Cloudinary unsigned upload preset
`GROQ_API_KEY`	Groq API key (LLM + Whisper STT)
`GROQ_CHAT_MODEL`	Groq chat model (default: `llama-3.1-8b-instant`)
`GROQ_STT_MODEL`	Groq STT model (default: `whisper-large-v3`)
`MINIMAX_API_KEY`	Minimax TTS API key (optional)

Frontend (`frontend/.env.local`)

Variable	Description
`NEXT_PUBLIC_API_URL`	Backend API base URL
`NEXT_PUBLIC_GOOGLE_CLIENT_ID`	Google OAuth 2.0 client ID

API Endpoints

Authentication (`/auth`)

Method	Path	Description
`POST`	`/auth/google`	Verify Google ID token → return JWT + user info
`POST`	`/auth/logout`	Logout (client-side token removal)
`GET`	`/auth/verify`	Validate JWT token
`GET`	`/auth/me`	Get current authenticated user

PDF Management (`/pdfs`)

Method	Path	Description
`POST`	`/pdfs/process`	Register Cloudinary-uploaded PDF; runs Gemini analysis + background index extraction
`GET`	`/pdfs/`	List all PDFs for the authenticated user
`GET`	`/pdfs/{pdf_id}`	Get PDF metadata & index
`DELETE`	`/pdfs/{pdf_id}`	Delete a PDF
`POST`	`/pdfs/{pdf_id}/analyze`	Trigger (re-)index extraction in background
`GET`	`/pdfs/{pdf_id}/pages`	Stream selected page range as a PDF file
`GET`	`/pdfs/{pdf_id}/pages/images`	Return selected pages as base64 PNG images

AI Features (`/pdfs/{pdf_id}/content`, `/api/pdfs/...`)

Method	Path	Description
`POST`	`/pdfs/{pdf_id}/content`	Generate read/explain content for a page range
`POST`	`/api/pdfs/{pdf_id}/explain`	Generate AI explanation + TTS audio (cached in MongoDB)
`POST`	`/api/pdfs/{pdf_id}/qa`	Text Q&A → Groq Llama answer → TTS audio (base64)
`POST`	`/api/pdfs/{pdf_id}/qa/audio`	Voice Q&A → Groq Whisper STT → Llama → TTS audio
`POST`	`/api/pdfs/{pdf_id}/chat`	Chat with PDF pages as context via Gemini

Flashcards

Method	Path	Description
`POST`	`/pdfs/{pdf_id}/flashcards`	Generate 10 flashcards for a page range/topic via Gemini
`GET`	`/pdfs/{pdf_id}/flashcards`	List saved flashcard sets
`GET`	`/flashcards/{flashcard_id}`	Get specific flashcard set
`DELETE`	`/flashcards/{flashcard_id}`	Delete flashcard set

Quizzes

Method	Path	Description
`POST`	`/pdfs/{pdf_id}/quizzes`	Generate multiple-choice quiz for a page range/topic
`GET`	`/pdfs/{pdf_id}/quizzes`	List quizzes for a PDF
`GET`	`/quizzes/{quiz_id}`	Get specific quiz
`DELETE`	`/quizzes/{quiz_id}`	Delete quiz
`POST`	`/quizzes/{quiz_id}/submit`	Submit quiz attempt (saves score + time)
`GET`	`/quizzes/{quiz_id}/attempts`	Get all attempts for a quiz

WebSocket

Path	Description
`WS /ws/explain/{pdf_id}`	Real-time streaming explanation with pause/resume/stop and audio input support

User Workflow

Sign In — Authenticate with Google OAuth
Upload PDF — Upload via Cloudinary widget (max 25 MB)
Auto-Analysis — Gemini extracts title, description, and a structured chapter/section index in the background
Browse Index — Navigate chapters and sections from the generated table of contents
Learn — Choose a section to:
- Read — View selected pages
- Listen — Get an AI-generated explanation read aloud (Minimax/Gemini TTS, cached)
- Ask Questions — Follow up by text or voice; get spoken answers from Groq + TTS
- Flashcards — Generate 10 Q&A cards for active recall
- Quiz — Take a multiple-choice quiz with scoring and time tracking
- Chat — Ask free-form questions about any page range

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BookX - AI-Powered Book Explainer & Learning Platform

Features

Tech Stack

Project Structure

Quick Start

Prerequisites

Backend Setup

Frontend Setup

Environment Variables

Backend (`backend/.env`)

Frontend (`frontend/.env.local`)

API Endpoints

Authentication (`/auth`)

PDF Management (`/pdfs`)

AI Features (`/pdfs/{pdf_id}/content`, `/api/pdfs/...`)

Flashcards

Quizzes

WebSocket

User Workflow

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BookX - AI-Powered Book Explainer & Learning Platform

Features

Tech Stack

Project Structure

Quick Start

Prerequisites

Backend Setup

Frontend Setup

Environment Variables

Backend (backend/.env)

Frontend (frontend/.env.local)

API Endpoints

Authentication (/auth)

PDF Management (/pdfs)

AI Features (/pdfs/{pdf_id}/content, /api/pdfs/...)

Flashcards

Quizzes

WebSocket

User Workflow

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Backend (`backend/.env`)

Frontend (`frontend/.env.local`)

Authentication (`/auth`)

PDF Management (`/pdfs`)

AI Features (`/pdfs/{pdf_id}/content`, `/api/pdfs/...`)

Packages