Epstein File Explorer

A public record explorer for the Jeffrey Epstein case files released by the U.S. Department of Justice. Browse, search, and analyze documents across 12 data sets including court filings, depositions, FBI reports, flight logs, financial records, and more.

Live at epstein-file-explorer.com

Features

Document Browser — Paginated, filterable view of all documents with PDF/image/video viewers, redaction status, AI-generated summaries, and keyboard navigation (prev/next)
Document Comparison — Side-by-side document comparison view
Full-Text Page Search — Search across 3.5M+ extracted document pages with highlighted snippets and direct page links
People Directory — 200+ named individuals with categories (key figures, associates, victims, witnesses, legal, political), document counts, and connection counts
Person Profiles — Card-based overview with AI-generated summaries, background sections, key facts, top contacts, email counts, and linked timeline events
Wikipedia Integration — Automated person data enrichment from Wikipedia, including profile photos displayed in the network graph
Network Graph — Interactive D3 force-directed graph visualizing connections between persons, with category/connection-type filtering, time range slider, keyword search, and Wikipedia profile photos
Timeline — 5,400+ chronological events with significance scoring, linked to people and documents
Cross-Entity Search — Search across documents, people, and events with saved searches, search history, and bookmarks
AI Insights — DeepSeek-powered analysis extracting persons, connections, events, locations, key facts, and document classifications from extracted text
Export — JSON and CSV export for documents, persons, and search results
Dark/Light Theme — Full theme support with system preference detection

Tech Stack

Layer	Technology
Frontend	React 18, TypeScript, Tailwind CSS, shadcn/ui, Radix UI, D3.js, Recharts, Framer Motion
Routing	Wouter
Data	TanStack React Query
Backend	Express 5, TypeScript, Drizzle ORM, Zod
Database	PostgreSQL (with full-text search indexes)
Storage	Cloudflare R2 (documents), local filesystem (staging)
AI	DeepSeek API (document analysis, person classification)
Deployment	Fly.io (US East), Docker multi-stage build

Data Sources

Data origin: U.S. Department of Justice — official public releases of case files across 12 data sets
Distribution: yung-megafone/Epstein-Files — community archive preserving publicly released materials via torrents after DOJ removed several data sets (9, 10, 11) from their site in February 2026
Person data: Wikipedia — scraped and enriched with AI classification for categories, roles, and occupations

All data in this project comes from publicly released government records.

Getting Started

Prerequisites

Node.js 20+
PostgreSQL
aria2 (for torrent downloads): brew install aria2

Setup

# Install dependencies
npm install

# Configure environment
cp .env.example .env
# Edit .env with your PostgreSQL connection string, R2 credentials, DeepSeek API key

# Push database schema
npm run db:push

# Start development server
npm run dev

The app runs on port 3000 in development (port 5000 in production).

Environment Variables

Variable	Description
`DATABASE_URL`	PostgreSQL connection string
`R2_ACCOUNT_ID`	Cloudflare R2 account ID
`R2_ACCESS_KEY_ID`	R2 access key
`R2_SECRET_ACCESS_KEY`	R2 secret key
`R2_BUCKET_NAME`	R2 bucket name
`R2_PUBLIC_URL`	R2 public URL for serving documents
`DEEPSEEK_API_KEY`	DeepSeek API key for AI analysis

Pipeline

The data pipeline handles downloading, processing, and analyzing documents:

scrape-wikipedia → download-torrent → import-downloads → upload-r2 → process →
classify-media → analyze-ai → load-persons → load-documents →
load-ai-results → extract-connections → update-counts → dedup-persons

Running Pipeline Stages

# Run a specific stage
npx tsx scripts/pipeline/run-pipeline.ts <stage>

# Download specific data sets via torrent
npx tsx scripts/pipeline/run-pipeline.ts download-torrent --data-sets 1,3,8

# Run all stages
npx tsx scripts/pipeline/run-pipeline.ts all

Pipeline Stages

Stage	Description
`scrape-wikipedia`	Scrapes Wikipedia for person list, enriches with DeepSeek AI classification
`download-torrent`	Downloads data sets via BitTorrent (aria2c), extracts archives
`import-downloads`	Imports downloaded files into the database
`upload-r2`	Uploads documents to Cloudflare R2 storage
`process`	Extracts text from PDFs using pdf.js
`classify-media`	Classifies documents by media type, assigns AI analysis priority (1-5)
`analyze-ai`	Two-tier AI analysis: rule-based (free) + DeepSeek API with budget tracking
`load-persons`	Loads persons from Wikipedia scrape into database
`load-documents`	Loads document metadata from DOJ catalog
`load-ai-results`	Upserts AI analysis results into database (persons, connections, events)
`extract-connections`	Extracts relationships between persons from descriptions
`update-counts`	Recalculates document and connection counts
`dedup-persons`	Merges duplicate person records with fuzzy matching

Data Sets

DS	Description	Size	Status
1-8	Court documents, legal filings	1-10 GB	Available via DOJ + torrent
9	Communications, emails, media	~143 GB	DOJ offline — torrent only
10	Visual media (180K+ images, 2K+ videos)	~79 GB	DOJ offline — torrent only
11	Financial ledgers, flight manifests	~28 GB	DOJ offline — torrent only
12	Court documents	114 MB	Available via DOJ + torrent

API

Documents

GET /api/documents — Paginated list with server-side filtering (search, type, dataSet, redacted, mediaType)
GET /api/documents/:id — Document detail with associated persons and timeline events
GET /api/documents/:id/adjacent — Previous/next document IDs for navigation
GET /api/documents/:id/pdf — PDF proxy (streams from R2 or local, handles CORS)
GET /api/documents/:id/image — Image proxy/redirect
GET /api/documents/:id/video — Video proxy with Range request support
GET /api/documents/:id/content-url — Presigned R2 URL
GET /api/documents/filters — Available filter options

Persons

GET /api/persons — List all persons (with optional pagination)
GET /api/persons/:id — Person detail with documents, connections, timeline events, and AI mentions

Search

GET /api/search — Cross-entity search (persons, documents, events)
GET /api/search/pages — Full-text page search with headline snippets

Network & Timeline

GET /api/network — Network graph data (persons + connections with year ranges)
GET /api/timeline — Timeline events with significance scoring

AI Analysis

GET /api/ai-analyses — List all AI analyses (with optional pagination)
GET /api/ai-analyses/aggregate — Aggregate AI analysis statistics (database-driven)
GET /api/ai-analyses/:fileName — Individual AI analysis detail

Bookmarks

GET /api/bookmarks — User bookmarks
POST /api/bookmarks — Create bookmark (person/document/search)
DELETE /api/bookmarks/:id — Delete bookmark

Export

GET /api/export/persons — Export persons (JSON/CSV)
GET /api/export/documents — Export documents (JSON/CSV)
GET /api/export/search — Export search results (JSON/CSV)

Stats & Pipeline

GET /api/stats — Dashboard statistics
GET /api/sidebar-counts — Sidebar navigation counts
GET /api/pipeline/jobs — Pipeline job status
GET /api/pipeline/stats — Pipeline statistics
GET /api/budget — AI cost budget summary

Database Schema

9 primary tables managed with Drizzle ORM:

Table	Description
`persons`	Named individuals with categories, aliases, Wikipedia data, profile sections (JSONB), top contacts
`documents`	Documents with metadata, processing status, R2 storage keys, AI analysis status
`document_pages`	Extracted page content with full-text search indexes
`connections`	Relationships between persons with type, strength (1-5), and source documents
`person_documents`	Join table linking persons to documents with context and mention type
`timeline_events`	Chronological events with significance scoring, linked to person and document IDs
`pipeline_jobs`	Pipeline task tracking with retry logic
`budget_tracking`	AI analysis cost tracking per document/job
`bookmarks`	User bookmarks for persons, documents, and searches

Project Structure

client/src/
  pages/              # 12 route pages
    dashboard.tsx      # Home with overview stats
    documents.tsx      # Paginated document browser
    document-detail.tsx # Document viewer (PDF/image/video)
    document-compare.tsx # Side-by-side comparison
    people.tsx         # People directory
    person-detail.tsx  # Person profile (Overview, Documents, Connections, Timeline tabs)
    network.tsx        # D3 force-directed network graph
    timeline.tsx       # Chronological event viewer
    search.tsx         # Cross-entity + full-text page search
    ai-insights.tsx    # AI analysis dashboard
  components/
    network-graph.tsx  # D3 force simulation with zoom/pan/search
    pdf-viewer.tsx     # PDF renderer (pdf.js)
    timeline-viz.tsx   # Timeline visualization
    person-hover-card.tsx # Quick person info popover
    app-sidebar.tsx    # Main navigation
    export-button.tsx  # CSV/JSON export
  hooks/
    use-bookmarks.ts   # Bookmark management
    use-keyboard-shortcuts.ts # Global keyboard shortcuts
    use-search-history.ts # Local search history
    use-url-filters.ts # URL-synced filter state
  components/ui/       # 67 shadcn/ui components
server/
  routes.ts            # 30+ API endpoints
  storage.ts           # Database queries (Drizzle ORM)
  db.ts                # PostgreSQL connection pool
shared/
  schema.ts            # Drizzle schema (9 tables) + Zod validation + TypeScript types
scripts/pipeline/
  run-pipeline.ts      # Pipeline orchestrator (13 stages)
  wikipedia-scraper.ts # Wikipedia person list scraper
  torrent-downloader.ts # BitTorrent download via aria2c
  ai-analyzer.ts       # Two-tier AI analysis (rule-based + DeepSeek)
  db-loader.ts         # Database loading operations
  pdf-processor.ts     # PDF text extraction
  media-classifier.ts  # Media type classification
  r2-migration.ts      # R2 storage upload
  load-pages.ts        # Document page content loader
  generate-profiles.ts # Person profile generation
  doj-scraper.ts       # DOJ catalog scraper
data/                  # Local data (gitignored)
  downloads/           # Downloaded documents by data set
  ai-analyzed/         # AI analysis JSON results

Deployment

Deployed on Fly.io with Docker multi-stage build:

Region: iad (US East — Virginia)
VM: 2 shared CPUs, 1 GB RAM
Auto-scaling: Suspends when idle, auto-starts on requests
Concurrency: 25 soft / 50 hard request limit
HTTPS: Forced

# Deploy
fly deploy

# View logs
fly logs

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 255 Commits
.claude		.claude
.github/workflows		.github/workflows
attached_assets		attached_assets
client		client
data		data
e2e		e2e
script		script
scripts		scripts
server		server
shared		shared
test		test
tmp		tmp
typesense		typesense
.dockerignore		.dockerignore
.gitignore		.gitignore
.tool-versions		.tool-versions
Dockerfile		Dockerfile
README.md		README.md
components.json		components.json
drizzle.config.ts		drizzle.config.ts
fly.toml		fly.toml
generated-icon.png		generated-icon.png
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
postcss.config.js		postcss.config.js
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Epstein File Explorer

Features

Tech Stack

Data Sources

Getting Started

Prerequisites

Setup

Environment Variables

Pipeline

Running Pipeline Stages

Pipeline Stages

Data Sets

API

Documents

Persons

Search

Network & Timeline

AI Analysis

Bookmarks

Export

Stats & Pipeline

Database Schema

Project Structure

Deployment

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Epstein File Explorer

Features

Tech Stack

Data Sources

Getting Started

Prerequisites

Setup

Environment Variables

Pipeline

Running Pipeline Stages

Pipeline Stages

Data Sets

API

Documents

Persons

Search

Network & Timeline

AI Analysis

Bookmarks

Export

Stats & Pipeline

Database Schema

Project Structure

Deployment

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages