TubeMind

TubeMind is a board-based research app for learning from YouTube. Instead of treating each question as a one-off search, it groups related questions into topic-bound boards, pulls transcript evidence from relevant videos, indexes that material with LightRAG, and turns each answer into a reusable note with linked source evidence.

The current app is a server-rendered FastHTML experience with a redesigned premium UI, persistent light/dark theme toggle, Google OAuth or demo auth, durable SQLite state, and Railway-ready deployment support.

What TubeMind Does

Creates a new board automatically from your first question.
Keeps follow-up questions inside the same topic region instead of starting from scratch each time.
Searches YouTube for caption-friendly, embeddable videos when the current board does not already have enough evidence.
Fetches transcripts, normalizes them, caches them on disk, and indexes them into a per-board LightRAG knowledge base.
Generates note answers backed by transcript chunks, then stores those notes so the board becomes more useful over time.
Lets you open note detail pages with evidence excerpts, linked timestamps, and the original search queries that expanded the board.

How It Works

A user asks a question in a board.
TubeMind first queries the existing board corpus.
If the board does not have enough evidence yet, TubeMind plans or falls back to YouTube search queries.
It searches YouTube for videos that are more likely to work well in hosted environments.
It fetches transcripts using layered fallbacks:
- TranscriptAPI
- youtube-transcript-api
- yt-dlp subtitle download fallback
It stores cleaned transcript artifacts under the app data directory and indexes them into that board's LightRAG store.
It answers the question, stores the note, stores the source chunks, and refreshes the board summary over time.

Each board has its own transcript cache and LightRAG working directory, so follow-up notes stay grounded in the same topic instead of polluting a single global corpus.

Stack

FastHTML for the server-rendered app and route layer
HTMX for incremental UI interactions
LightRAG for retrieval and graph-backed indexing
OpenAI API for planning, synthesis, and answer generation
YouTube Data API v3 for video search and metadata
TranscriptAPI plus transcript fallbacks for transcript acquisition
SQLite for durable user, board, note, and evidence metadata
uv for dependency management and running the app

Product Structure

The main files are:

tubemind/routes.py: app factory, routes, theme bootstrap, auth guards, health endpoint
tubemind/ui.py: server-rendered UI builders for login, workspace, note detail, topbar, and theme toggle
tubemind/services.py: board runtime orchestration, YouTube search, transcript fetching, indexing, retrieval, answer generation
tubemind/auth.py: Google OAuth helpers, demo auth, SQLite tables, board persistence
tubemind/config.py: environment loading, app constants, path configuration
static/tubemind.css: full visual system for light and dark themes
tubemind/__main__.py: python -m tubemind entrypoint

Local Development

Prerequisites

Python 3.12+
uv
OpenAI API key
YouTube Data API key
TranscriptAPI key
Google OAuth credentials if you want Google login locally

Environment Variables

Create a .env file in the repo root. At minimum, local development needs:

OPENAI_API_KEY=your_openai_key
OPENAI_MODEL=gpt-4.1-nano
YOUTUBE_API_KEY=your_youtube_api_key
TRANSCRIPTAPI_API_KEY=your_transcriptapi_key
BASE_URL=http://localhost:5001
SESSION_SECRET=any-long-random-string

If you want Google OAuth locally, also set:

GOOGLE_CLIENT_ID=your_google_client_id
GOOGLE_CLIENT_SECRET=your_google_client_secret

Optional variables:

DEMO_AUTH_ENABLED=false
DEMO_USER_ID=demo-user
DEMO_USER_NAME=Coursework Demo
DEMO_USER_EMAIL=demo@tubemind.local
DEMO_USER_PICTURE=
TUBEMIND_DATA_DIR=.local
YOUTUBE_TRANSCRIPT_COOKIES_FILE=
YOUTUBE_COOKIES_BROWSER=
PORT=5001

Run Locally

cd TubeMind
UV_CACHE_DIR=.local/uv-cache uv sync
UV_CACHE_DIR=.local/uv-cache uv run python -m tubemind

Open:

http://127.0.0.1:5001

Stop the server with Ctrl+C.

Authentication Modes

TubeMind supports two sign-in modes:

Google OAuth:
- best for normal usage
- requires GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET, and a matching BASE_URL
Demo auth:
- best for coursework demos or simpler hosted deployments
- enable with DEMO_AUTH_ENABLED=true
- creates a synthetic local user session without Google sign-in

If both are configured, the login page shows both options.

Data Model

TubeMind stores two kinds of state:

SQLite app state:
- users
- boards
- notes
- board search queries
- note evidence chunks
- indexed video metadata
Board filesystem state:
- transcript artifacts
- per-board LightRAG working directories

By default this lives under TUBEMIND_DATA_DIR. In production, that directory should be mounted to persistent storage.

Railway Deployment

Railway is the recommended hosted path for this repo.

Recommended Setup

Push the repo to GitHub.
Create a Railway service from the repo.
Add a volume to the same service.
Mount the volume at:

/data/tubemind

Set:

TUBEMIND_DATA_DIR=/data/tubemind

That path is correct for the current production setup.

Required Railway Variables

OPENAI_API_KEY=...
OPENAI_MODEL=gpt-4.1-nano
YOUTUBE_API_KEY=...
TRANSCRIPTAPI_API_KEY=...
BASE_URL=https://your-service.up.railway.app
SESSION_SECRET=choose-a-long-random-string
TUBEMIND_DATA_DIR=/data/tubemind

Choose One Auth Mode

For Google OAuth:

GOOGLE_CLIENT_ID=...
GOOGLE_CLIENT_SECRET=...
DEMO_AUTH_ENABLED=false

For coursework/demo mode:

DEMO_AUTH_ENABLED=true

Optional Hosted Transcript Variable

If yt-dlp needs cookies to get around YouTube bot checks on hosted infrastructure, add:

YOUTUBE_TRANSCRIPT_COOKIES_FILE=/data/tubemind/youtube-cookies.txt

Do not commit youtube-cookies.txt to GitHub. Upload it only to the mounted Railway volume.

Health Check

The app exposes:

/health

It returns:

{"ok": true}

Deploy Behavior

The app reads Railway's injected PORT automatically.
The Docker image runs python -m tubemind.
The stylesheet URL is cache-busted so CSS changes deploy more reliably.

Transcript Pipeline

Transcript fetching is intentionally layered because hosted deployments are less forgiving than local machines.

Primary path:

TranscriptAPI using the YouTube-specific transcript endpoint

Fallbacks:

youtube-transcript-api
yt-dlp subtitle download

TubeMind also prefers caption-friendly and embeddable YouTube search results to improve transcript success on Railway.

Security Notes

Never commit youtube-cookies.txt.
Rotate any API keys or OAuth secrets that were accidentally exposed.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
benchmarks		benchmarks
static		static
tests		tests
tubemind		tubemind
vendor/hnswlib-dist-shim		vendor/hnswlib-dist-shim
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TubeMind

What TubeMind Does

How It Works

Stack

Product Structure

Local Development

Prerequisites

Environment Variables

Run Locally

Authentication Modes

Data Model

Railway Deployment

Recommended Setup

Required Railway Variables

Choose One Auth Mode

Optional Hosted Transcript Variable

Health Check

Deploy Behavior

Transcript Pipeline

Security Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TubeMind

What TubeMind Does

How It Works

Stack

Product Structure

Local Development

Prerequisites

Environment Variables

Run Locally

Authentication Modes

Data Model

Railway Deployment

Recommended Setup

Required Railway Variables

Choose One Auth Mode

Optional Hosted Transcript Variable

Health Check

Deploy Behavior

Transcript Pipeline

Security Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages