Skip to content

harsh8423/BookX-AI_based_digital_content_explainer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BookX - AI-Powered Book Explainer & Learning Platform

BookX is an AI-powered book explainer and learning platform that transforms any book or document into a rich, interactive study experience. Upload your content as a PDF, and BookX auto-generates a structured table of contents, delivers AI-powered explanations with voice narration, enables interactive Q&A (text and voice), creates flashcards, generates quizzes, and provides a context-aware chat assistant — all driven by cutting-edge AI models.

Features

  • Upload & Management — Upload books or documents as PDFs via Cloudinary (max 25 MB); Gemini auto-generates a title, description, and structured index (chapters/sections with page mappings) in the background.
  • AI Explanation with Voice — Select any page range from the index; Gemini 2.5 Flash generates a concise explanation that is converted to speech via Minimax TTS (fallback: Gemini TTS) and cached in MongoDB.
  • Interactive Q&A (Voice + Text) — Ask follow-up questions by text or microphone (audio → Groq Whisper STT → Groq Llama 3.1 answering → Minimax/Gemini TTS audio response).
  • WebSocket Explain Mode — Real-time streaming explanation over WebSocket (/ws/explain/{pdf_id}) with pause/resume/stop controls and sentence-level playback sync.
  • Flashcard Generation — Gemini 2.5 Flash reads extracted PDF pages and produces exactly 10 Q&A flashcards per topic/section.
  • Quiz Generation & Submission — Gemini produces multiple-choice quizzes with answer explanations; quiz attempts (score, time) are stored in MongoDB.
  • PDF Chat — Send any query with a page range; Gemini 2.0 Flash answers with the selected PDF pages as context.
  • Page Extraction — Backend can return a selected page range as a raw PDF file or as base64-encoded PNG images (with configurable zoom).
  • Google OAuth Authentication — Sign in with Google; JWT tokens (HS256, ~50-hour expiry) used for all protected endpoints.

Tech Stack

Layer Technology
Frontend Next.js 14, React 18, TailwindCSS 3
Backend FastAPI (Python), Uvicorn
Database MongoDB Atlas (async Motor driver)
Authentication Google OAuth 2.0, JWT (python-jose)
File Storage Cloudinary (unsigned uploads)
AI - PDF Analysis & Explanation Google Gemini 2.5 Flash (gemini-2.5-flash)
AI - Q&A Chat Google Gemini 2.0 Flash (gemini-2.0-flash-exp)
AI - TTS (primary) Minimax TTS (speech-2.5-hd-preview)
AI - TTS (fallback) Google Gemini TTS (gemini-2.5-flash-preview-tts)
AI - Q&A LLM Groq — Llama 3.1 8B Instant
AI - Speech-to-Text Groq — Whisper Large v3
PDF Processing PyMuPDF (fitz), PyPDF2

Project Structure

BookX/
├── backend/                        # FastAPI Python application
│   ├── main.py                     # App entry point, all route registrations, WebSocket
│   ├── auth.py                     # Google OAuth verification, JWT creation/validation
│   ├── pdfs.py                     # PDF CRUD, page extraction (PDF & images), background analysis
│   ├── models.py                   # Pydantic models (User, PDF, Note, Flashcard, Quiz, etc.)
│   ├── database.py                 # Motor async MongoDB connection & collection helpers
│   ├── gemini_service.py           # PDF metadata analysis + structured index extraction
│   ├── gemini_tts_service.py       # Minimax/Gemini TTS + Cloudinary audio upload
│   ├── explanation_service.py      # HTTP explanation API, Q&A (text/audio), PDF chat
│   ├── explain_websocket_service.py# WebSocket real-time explain mode
│   ├── content_service.py          # PDF loading, page extraction (bytes + images)
│   ├── flashcard_service.py        # Flashcard generation via Gemini
│   ├── quiz_service.py             # Quiz generation, attempt submission & retrieval
│   └── requirements.txt
├── frontend/                       # Next.js 14 application
│   ├── app/
│   │   ├── page.js                 # Home: Google sign-in, PDF upload & library
│   │   ├── layout.js               # Root layout
│   │   ├── globals.css
│   │   ├── lib/
│   │   │   ├── auth.js             # Auth service (token storage, user state)
│   │   │   └── api.js              # API client helpers
│   │   ├── components/
│   │   │   ├── GoogleSignIn.js     # Google OAuth button
│   │   │   ├── PDFUpload.js        # Cloudinary upload widget
│   │   │   ├── PDFList.js          # PDF library grid
│   │   │   ├── AboutDocument.js    # PDF metadata & document info panel
│   │   │   ├── IndexContent.js     # Structured table of contents navigation
│   │   │   ├── ReadingExplanation.js # AI explanation + TTS player + WebSocket Q&A
│   │   │   ├── FlashcardComponent.js  # Interactive flashcard UI
│   │   │   ├── FlashcardListComponent.js
│   │   │   ├── QuizComponent.js    # Quiz taking UI with timer & scoring
│   │   │   ├── QuizListComponent.js
│   │   │   └── PDFChatInterface.js # PDF chat query interface
│   │   └── pdf/
│   │       └── [id]/
│   │           ├── page.js         # PDF detail page (index + section navigation)
│   │           ├── read/           # Reading mode page
│   │           ├── explain/        # AI explanation + voice mode page
│   │           ├── flashcard/      # Flashcard generation page
│   │           ├── quiz/           # Quiz page
│   │           └── chat/           # PDF chat page
│   ├── package.json
│   ├── next.config.js
│   └── tailwind.config.js

Quick Start

Prerequisites

  • Node.js 18+
  • Python 3.9+
  • MongoDB Atlas cluster (or local MongoDB)
  • Google Cloud Console project (OAuth 2.0 client ID)
  • Cloudinary account + unsigned upload preset
  • Google AI Studio API key (Gemini)
  • Groq API key
  • Minimax API key (optional — Gemini TTS used as fallback)

Backend Setup

cd backend

# Install dependencies
pip install -r requirements.txt

# Copy and fill in environment variables
cp .env.example .env

# Start the server
python main.py

Backend runs at http://localhost:8000. Interactive API docs at http://localhost:8000/docs.

Frontend Setup

cd frontend

# Install dependencies
npm install

# Copy and fill in environment variables
cp .env.example .env.local

# Start the dev server
npm run dev

Frontend runs at http://localhost:3000.

Note: BookX uses PDF as its input format, but it is designed to work with any book, textbook, research paper, or document — the PDF format is simply the medium for delivering your content to the AI.

Environment Variables

Backend (backend/.env)

Variable Description
MONGODB_URL MongoDB connection string
DATABASE_NAME Database name (e.g. bookx)
JWT_SECRET_KEY Secret key for signing JWT tokens
GOOGLE_CLIENT_ID Google OAuth 2.0 client ID
GOOGLE_API_KEY / GEMINI_API_KEY Google Gemini API key
CLOUDINARY_CLOUD_NAME Cloudinary cloud name
CLOUDINARY_UPLOAD_PRESET Cloudinary unsigned upload preset
GROQ_API_KEY Groq API key (LLM + Whisper STT)
GROQ_CHAT_MODEL Groq chat model (default: llama-3.1-8b-instant)
GROQ_STT_MODEL Groq STT model (default: whisper-large-v3)
MINIMAX_API_KEY Minimax TTS API key (optional)

Frontend (frontend/.env.local)

Variable Description
NEXT_PUBLIC_API_URL Backend API base URL
NEXT_PUBLIC_GOOGLE_CLIENT_ID Google OAuth 2.0 client ID

API Endpoints

Authentication (/auth)

Method Path Description
POST /auth/google Verify Google ID token → return JWT + user info
POST /auth/logout Logout (client-side token removal)
GET /auth/verify Validate JWT token
GET /auth/me Get current authenticated user

PDF Management (/pdfs)

Method Path Description
POST /pdfs/process Register Cloudinary-uploaded PDF; runs Gemini analysis + background index extraction
GET /pdfs/ List all PDFs for the authenticated user
GET /pdfs/{pdf_id} Get PDF metadata & index
DELETE /pdfs/{pdf_id} Delete a PDF
POST /pdfs/{pdf_id}/analyze Trigger (re-)index extraction in background
GET /pdfs/{pdf_id}/pages Stream selected page range as a PDF file
GET /pdfs/{pdf_id}/pages/images Return selected pages as base64 PNG images

AI Features (/pdfs/{pdf_id}/content, /api/pdfs/...)

Method Path Description
POST /pdfs/{pdf_id}/content Generate read/explain content for a page range
POST /api/pdfs/{pdf_id}/explain Generate AI explanation + TTS audio (cached in MongoDB)
POST /api/pdfs/{pdf_id}/qa Text Q&A → Groq Llama answer → TTS audio (base64)
POST /api/pdfs/{pdf_id}/qa/audio Voice Q&A → Groq Whisper STT → Llama → TTS audio
POST /api/pdfs/{pdf_id}/chat Chat with PDF pages as context via Gemini

Flashcards

Method Path Description
POST /pdfs/{pdf_id}/flashcards Generate 10 flashcards for a page range/topic via Gemini
GET /pdfs/{pdf_id}/flashcards List saved flashcard sets
GET /flashcards/{flashcard_id} Get specific flashcard set
DELETE /flashcards/{flashcard_id} Delete flashcard set

Quizzes

Method Path Description
POST /pdfs/{pdf_id}/quizzes Generate multiple-choice quiz for a page range/topic
GET /pdfs/{pdf_id}/quizzes List quizzes for a PDF
GET /quizzes/{quiz_id} Get specific quiz
DELETE /quizzes/{quiz_id} Delete quiz
POST /quizzes/{quiz_id}/submit Submit quiz attempt (saves score + time)
GET /quizzes/{quiz_id}/attempts Get all attempts for a quiz

WebSocket

Path Description
WS /ws/explain/{pdf_id} Real-time streaming explanation with pause/resume/stop and audio input support

User Workflow

  1. Sign In — Authenticate with Google OAuth
  2. Upload PDF — Upload via Cloudinary widget (max 25 MB)
  3. Auto-Analysis — Gemini extracts title, description, and a structured chapter/section index in the background
  4. Browse Index — Navigate chapters and sections from the generated table of contents
  5. Learn — Choose a section to:
    • Read — View selected pages
    • Listen — Get an AI-generated explanation read aloud (Minimax/Gemini TTS, cached)
    • Ask Questions — Follow up by text or voice; get spoken answers from Groq + TTS
    • Flashcards — Generate 10 Q&A cards for active recall
    • Quiz — Take a multiple-choice quiz with scoring and time tracking
    • Chat — Ask free-form questions about any page range

License

MIT License

About

AI-powered book explainer that turns any PDF into an interactive learning experience — voice explanations, Q&A, flashcards, quizzes, and chapter-wise chat using Gemini, Groq & Minimax.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors