📖 Navigation: Getting Started | Demo Commands | Pipeline Specs | Main Documentation
VideoAnnotator is a modern video analysis toolkit that uses AI models for comprehensive behavioral annotation. This guide covers installation using our modern uv-based workflow for fast, reliable dependency management.
- Python 3.12+ (required)
- Git for cloning repositories
- uv package manager (fast, modern Python dependency management)
- CUDA Toolkit 12.4+ (recommended for GPU acceleration)
- NVIDIA GPU with CUDA support (GTX 1060 6GB+ or better recommended)
- CPU: Intel Core i5 (8th gen) or AMD Ryzen 5 (2000 series) or better
- RAM: 16GB
- Storage: 20GB free space
- GPU: NVIDIA GTX 1060 6GB (for CUDA acceleration)
- OS: Windows 10/11, Ubuntu 20.04+, or macOS 11+
- CPU: Intel Core i7/i9 (11th gen+) or AMD Ryzen 7/9 (5000 series+)
- RAM: 32GB or more
- Storage: 50GB+ SSD
- GPU: NVIDIA RTX 3060 12GB or better
- OS: Windows 11, Ubuntu 22.04+, or macOS 12+
These tips address the most common macOS installation issues.
- Architecture: Apple Silicon (M1/M2/M3) is supported in CPU mode. CUDA GPU acceleration is not available on macOS.
- Prerequisites (Homebrew):
# Install Homebrew if missing: https://brew.sh brew install libomp ffmpeg node- libomp: fixes OpenMP errors and segfaults in some audio models.
- ffmpeg: required by whisper/librosa/audio pipelines.
- node: needed if you use the companion web viewer.
- uv in PATH:
# If 'uv' is not found after install export PATH="$HOME/.local/bin:$PATH"
- Shell config permissions: If
~/.zshrcor~/.configare owned by root (from earlier sudo usage), fix ownership before writing PATH updates:sudo chown -R "$USER" ~/.zshrc ~/.config chmod u+w ~/.zshrc # If .zshrc is missing, write PATH to ~/.zprofile instead for login shells
- Hugging Face token (for speaker diarization): set
HF_AUTH_TOKENin a.envfile or your shell. See Environment Setup guide. - Viewer start command: inside the viewer repo, use
npm run dev(notnpm start).
# Install uv (fast, modern Python package manager)
# Linux/Mac:
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows (PowerShell):
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"# Clone the repository
git clone https://github.com/your-org/VideoAnnotator.git
cd VideoAnnotator
# Install all dependencies with uv (fast and reliable)
uv sync
# Install development dependencies
uv sync --extra dev
# Initialize the local SQLite database (creates tables + admin API key)
uv run videoannotator setup-db --admin-email you@example.com --admin-username youThe
setup-dbcommand is idempotent. Re-run it after pulling new schema changes or use--forcewhen you want to drop and recreate tables. Pass--skip-adminif you prefer to manage API keys yourself later withvideoannotator generate-token.
# Note: This repo pins Torch sources via `pyproject.toml` to the CUDA 12.4 wheel index.
# In most cases `uv sync` is sufficient.
# If you need to force a reinstall of CUDA wheels in your local environment:
uv pip install --upgrade \
"torch==2.8.*+cu124" "torchvision==0.21.*+cu124" "torchaudio==2.8.*+cu124" \
--index-url https://download.pytorch.org/whl/cu124Some dependencies like dlib require system-level installation:
# If using conda for native dependencies:
conda install -n videoannotator cmake dlib -c conda-forge
# Otherwise, install cmake system-wide from cmake.orgOpenFace 3.0 requires manual compilation. Follow these steps:
# Install additional dependencies
conda install -c conda-forge boost
conda install -c conda-forge openblas
# Clone OpenFace 3.0
git clone https://github.com/CMU-MultiComp-Lab/OpenFace-3.0.git
cd OpenFace-3.0
# Build OpenFace
mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=Release ..
cmake --build . --config Release
# Install Python bindings
cd ../python
python setup.py install# Install system dependencies (Ubuntu/Debian)
sudo apt-get update
sudo apt-get install build-essential cmake pkg-config
sudo apt-get install libopenblas-dev liblapack-dev
sudo apt-get install libboost-all-dev
# Clone and build OpenFace 3.0
git clone https://github.com/CMU-MultiComp-Lab/OpenFace-3.0.git
cd OpenFace-3.0
mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=Release ..
make -j4
# Install Python bindings
cd ../python
python setup.py install# Test the installation
uv run python -c "
import torch
print(f'PyTorch version: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
print(f'CUDA version: {torch.version.cuda}')
print(f'GPU device: {torch.cuda.get_device_name(0)}')
# Test YOLO11 installation
from ultralytics import YOLO
print('YOLO11 available!')
# Test other key components
import cv2, transformers, scenedetect
print('All core dependencies installed!')
"
# Test the API server
uv run videoannotator server --host 0.0.0.0 --port 18011
# Should start server on http://localhost:18011Once installed, use these commands for development:
# Start API server
uv run videoannotator server --host 0.0.0.0 --port 18011
# Run linting and formatting
uv run ruff check .
uv run ruff format .
# Run type checking
uv run mypy src
# Run tests
uv run pytest
# CLI help
uv run videoannotator --help
# API server (compatibility wrapper)
uv run python api_server.py# Build and run CPU version
docker build -f Dockerfile.cpu -t videoannotator:cpu .
docker run --rm -v $(pwd)/data:/app/data videoannotator:cpu# Build and run GPU version
docker build -f Dockerfile.gpu -t videoannotator:gpu .
docker run --gpus all --rm -v $(pwd)/data:/app/data videoannotator:gpuOpen the project in VS Code and use "Reopen in Container" for a complete GPU-enabled development environment.
-
uv command not found:
# Restart your terminal or run: export PATH="$HOME/.local/bin:$PATH"
-
Permission errors:
# On Windows, run PowerShell as Administrator # On Linux/Mac, ensure you have write permissions to ~/.local/
-
Slow dependency resolution: uv is typically very fast, but large ML dependencies can take time on first install.
-
CUDA not found: Run
nvidia-smito check GPU detection. Update drivers if needed. -
PyTorch doesn't detect CUDA:
# Verify CUDA-enabled PyTorch installation uv run python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
-
CUDA version mismatch: Ensure CUDA Toolkit matches PyTorch CUDA version:
- Check CUDA Toolkit:
nvcc --version - Check PyTorch CUDA:
uv run python -c "import torch; print(torch.version.cuda)"
- Check CUDA Toolkit:
- dlib/cmake errors: Install cmake system-wide or use conda for native dependencies
- OpenFace compilation: Requires manual setup - see OpenFace section above
- Build tool errors: Ensure you have Visual Studio Build Tools (Windows) or GCC (Linux/Mac)
VideoAnnotator v1.2.0 uses:
- uv - Fast, reliable Python package management
- Ruff - Unified linting and formatting (replaces Black, isort, flake8)
- FastAPI - Modern API framework
- Hatchling/setuptools - Modern build backend
- Docker - CPU and GPU containerization
- Python 3.12+ - Latest Python with performance improvements
| Category | Tools | Purpose | CUDA Support |
|---|---|---|---|
| Detection | YOLO11 | Person/object detection | ✅ |
| Tracking | ByteTrack | Multi-object tracking | ✅ |
| Scene | PySceneDetect, OpenCLIP | Scene segmentation & classification | ✅ |
| Face | OpenFace 3.0, LAION Face | Face analysis & emotion | ✅ |
| Audio | Whisper, pyannote.audio | Speech & audio processing | ✅ |
| API | FastAPI, uvicorn | REST API server | N/A |
After installation:
- See
docs/usage/GETTING_STARTED.mdfor usage examples - Check
docs/development/for development workflows - Review
configs/for configuration options - Use
uv run videoannotator --helpto test the CLI - Use
uv run videoannotator server --host 0.0.0.0 --port 18011to start the server
- GPU acceleration: Install CUDA-enabled PyTorch for 10x speedup
- Batch processing: Process multiple videos for efficiency
- Memory management: Use appropriate model sizes for your GPU
- Container deployment: Use Docker for consistent environments