A public record explorer for the Jeffrey Epstein case files released by the U.S. Department of Justice. Browse, search, and analyze documents across 12 data sets including court filings, depositions, FBI reports, flight logs, financial records, and more.
Live at epstein-file-explorer.com
- Document Browser — Paginated, filterable view of all documents with PDF/image/video viewers, redaction status, AI-generated summaries, and keyboard navigation (prev/next)
- Document Comparison — Side-by-side document comparison view
- Full-Text Page Search — Search across 3.5M+ extracted document pages with highlighted snippets and direct page links
- People Directory — 200+ named individuals with categories (key figures, associates, victims, witnesses, legal, political), document counts, and connection counts
- Person Profiles — Card-based overview with AI-generated summaries, background sections, key facts, top contacts, email counts, and linked timeline events
- Wikipedia Integration — Automated person data enrichment from Wikipedia, including profile photos displayed in the network graph
- Network Graph — Interactive D3 force-directed graph visualizing connections between persons, with category/connection-type filtering, time range slider, keyword search, and Wikipedia profile photos
- Timeline — 5,400+ chronological events with significance scoring, linked to people and documents
- Cross-Entity Search — Search across documents, people, and events with saved searches, search history, and bookmarks
- AI Insights — DeepSeek-powered analysis extracting persons, connections, events, locations, key facts, and document classifications from extracted text
- Export — JSON and CSV export for documents, persons, and search results
- Dark/Light Theme — Full theme support with system preference detection
| Layer | Technology |
|---|---|
| Frontend | React 18, TypeScript, Tailwind CSS, shadcn/ui, Radix UI, D3.js, Recharts, Framer Motion |
| Routing | Wouter |
| Data | TanStack React Query |
| Backend | Express 5, TypeScript, Drizzle ORM, Zod |
| Database | PostgreSQL (with full-text search indexes) |
| Storage | Cloudflare R2 (documents), local filesystem (staging) |
| AI | DeepSeek API (document analysis, person classification) |
| Deployment | Fly.io (US East), Docker multi-stage build |
- Data origin: U.S. Department of Justice — official public releases of case files across 12 data sets
- Distribution: yung-megafone/Epstein-Files — community archive preserving publicly released materials via torrents after DOJ removed several data sets (9, 10, 11) from their site in February 2026
- Person data: Wikipedia — scraped and enriched with AI classification for categories, roles, and occupations
All data in this project comes from publicly released government records.
- Node.js 20+
- PostgreSQL
- aria2 (for torrent downloads):
brew install aria2
# Install dependencies
npm install
# Configure environment
cp .env.example .env
# Edit .env with your PostgreSQL connection string, R2 credentials, DeepSeek API key
# Push database schema
npm run db:push
# Start development server
npm run devThe app runs on port 3000 in development (port 5000 in production).
| Variable | Description |
|---|---|
DATABASE_URL |
PostgreSQL connection string |
R2_ACCOUNT_ID |
Cloudflare R2 account ID |
R2_ACCESS_KEY_ID |
R2 access key |
R2_SECRET_ACCESS_KEY |
R2 secret key |
R2_BUCKET_NAME |
R2 bucket name |
R2_PUBLIC_URL |
R2 public URL for serving documents |
DEEPSEEK_API_KEY |
DeepSeek API key for AI analysis |
The data pipeline handles downloading, processing, and analyzing documents:
scrape-wikipedia → download-torrent → import-downloads → upload-r2 → process →
classify-media → analyze-ai → load-persons → load-documents →
load-ai-results → extract-connections → update-counts → dedup-persons
# Run a specific stage
npx tsx scripts/pipeline/run-pipeline.ts <stage>
# Download specific data sets via torrent
npx tsx scripts/pipeline/run-pipeline.ts download-torrent --data-sets 1,3,8
# Run all stages
npx tsx scripts/pipeline/run-pipeline.ts all| Stage | Description |
|---|---|
scrape-wikipedia |
Scrapes Wikipedia for person list, enriches with DeepSeek AI classification |
download-torrent |
Downloads data sets via BitTorrent (aria2c), extracts archives |
import-downloads |
Imports downloaded files into the database |
upload-r2 |
Uploads documents to Cloudflare R2 storage |
process |
Extracts text from PDFs using pdf.js |
classify-media |
Classifies documents by media type, assigns AI analysis priority (1-5) |
analyze-ai |
Two-tier AI analysis: rule-based (free) + DeepSeek API with budget tracking |
load-persons |
Loads persons from Wikipedia scrape into database |
load-documents |
Loads document metadata from DOJ catalog |
load-ai-results |
Upserts AI analysis results into database (persons, connections, events) |
extract-connections |
Extracts relationships between persons from descriptions |
update-counts |
Recalculates document and connection counts |
dedup-persons |
Merges duplicate person records with fuzzy matching |
| DS | Description | Size | Status |
|---|---|---|---|
| 1-8 | Court documents, legal filings | 1-10 GB | Available via DOJ + torrent |
| 9 | Communications, emails, media | ~143 GB | DOJ offline — torrent only |
| 10 | Visual media (180K+ images, 2K+ videos) | ~79 GB | DOJ offline — torrent only |
| 11 | Financial ledgers, flight manifests | ~28 GB | DOJ offline — torrent only |
| 12 | Court documents | 114 MB | Available via DOJ + torrent |
GET /api/documents— Paginated list with server-side filtering (search, type, dataSet, redacted, mediaType)GET /api/documents/:id— Document detail with associated persons and timeline eventsGET /api/documents/:id/adjacent— Previous/next document IDs for navigationGET /api/documents/:id/pdf— PDF proxy (streams from R2 or local, handles CORS)GET /api/documents/:id/image— Image proxy/redirectGET /api/documents/:id/video— Video proxy with Range request supportGET /api/documents/:id/content-url— Presigned R2 URLGET /api/documents/filters— Available filter options
GET /api/persons— List all persons (with optional pagination)GET /api/persons/:id— Person detail with documents, connections, timeline events, and AI mentions
GET /api/search— Cross-entity search (persons, documents, events)GET /api/search/pages— Full-text page search with headline snippets
GET /api/network— Network graph data (persons + connections with year ranges)GET /api/timeline— Timeline events with significance scoring
GET /api/ai-analyses— List all AI analyses (with optional pagination)GET /api/ai-analyses/aggregate— Aggregate AI analysis statistics (database-driven)GET /api/ai-analyses/:fileName— Individual AI analysis detail
GET /api/bookmarks— User bookmarksPOST /api/bookmarks— Create bookmark (person/document/search)DELETE /api/bookmarks/:id— Delete bookmark
GET /api/export/persons— Export persons (JSON/CSV)GET /api/export/documents— Export documents (JSON/CSV)GET /api/export/search— Export search results (JSON/CSV)
GET /api/stats— Dashboard statisticsGET /api/sidebar-counts— Sidebar navigation countsGET /api/pipeline/jobs— Pipeline job statusGET /api/pipeline/stats— Pipeline statisticsGET /api/budget— AI cost budget summary
9 primary tables managed with Drizzle ORM:
| Table | Description |
|---|---|
persons |
Named individuals with categories, aliases, Wikipedia data, profile sections (JSONB), top contacts |
documents |
Documents with metadata, processing status, R2 storage keys, AI analysis status |
document_pages |
Extracted page content with full-text search indexes |
connections |
Relationships between persons with type, strength (1-5), and source documents |
person_documents |
Join table linking persons to documents with context and mention type |
timeline_events |
Chronological events with significance scoring, linked to person and document IDs |
pipeline_jobs |
Pipeline task tracking with retry logic |
budget_tracking |
AI analysis cost tracking per document/job |
bookmarks |
User bookmarks for persons, documents, and searches |
client/src/
pages/ # 12 route pages
dashboard.tsx # Home with overview stats
documents.tsx # Paginated document browser
document-detail.tsx # Document viewer (PDF/image/video)
document-compare.tsx # Side-by-side comparison
people.tsx # People directory
person-detail.tsx # Person profile (Overview, Documents, Connections, Timeline tabs)
network.tsx # D3 force-directed network graph
timeline.tsx # Chronological event viewer
search.tsx # Cross-entity + full-text page search
ai-insights.tsx # AI analysis dashboard
components/
network-graph.tsx # D3 force simulation with zoom/pan/search
pdf-viewer.tsx # PDF renderer (pdf.js)
timeline-viz.tsx # Timeline visualization
person-hover-card.tsx # Quick person info popover
app-sidebar.tsx # Main navigation
export-button.tsx # CSV/JSON export
hooks/
use-bookmarks.ts # Bookmark management
use-keyboard-shortcuts.ts # Global keyboard shortcuts
use-search-history.ts # Local search history
use-url-filters.ts # URL-synced filter state
components/ui/ # 67 shadcn/ui components
server/
routes.ts # 30+ API endpoints
storage.ts # Database queries (Drizzle ORM)
db.ts # PostgreSQL connection pool
shared/
schema.ts # Drizzle schema (9 tables) + Zod validation + TypeScript types
scripts/pipeline/
run-pipeline.ts # Pipeline orchestrator (13 stages)
wikipedia-scraper.ts # Wikipedia person list scraper
torrent-downloader.ts # BitTorrent download via aria2c
ai-analyzer.ts # Two-tier AI analysis (rule-based + DeepSeek)
db-loader.ts # Database loading operations
pdf-processor.ts # PDF text extraction
media-classifier.ts # Media type classification
r2-migration.ts # R2 storage upload
load-pages.ts # Document page content loader
generate-profiles.ts # Person profile generation
doj-scraper.ts # DOJ catalog scraper
data/ # Local data (gitignored)
downloads/ # Downloaded documents by data set
ai-analyzed/ # AI analysis JSON results
Deployed on Fly.io with Docker multi-stage build:
- Region:
iad(US East — Virginia) - VM: 2 shared CPUs, 1 GB RAM
- Auto-scaling: Suspends when idle, auto-starts on requests
- Concurrency: 25 soft / 50 hard request limit
- HTTPS: Forced
# Deploy
fly deploy
# View logs
fly logsMIT