AI video quality evaluation backend service powered by VBench.
User (Web UI / CLI)
↓ HTTPS POST (upload video)
Cloudflare Tunnel / LAN
↓
VBench API (FastAPI) ← Mac Mini (MPS)
↓
Evaluation Engine (VBench dimensions)
↓
JSON Results
# 1. Install dependencies
pip install -r requirements.txt
# 2. (Optional) Pre-download models
bash scripts/download_models.sh
# 3. Start the server
python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8999
# 4. (Optional) Expose via Cloudflare Tunnel
cloudflared tunnel run vbench-apiModel weights are downloaded on demand. When a dimension is evaluated for the first time, the service attempts to find the model in these locations (in order):
models/(project directory — runscripts/download_models.shto populate)~/.cache/vbench/(user cache directory)- Auto-download to
~/.cache/vbench/from the original source URL
Required models (~199MB total):
aesthetic_model/sa_0_4_vit_l_14_linear.pth— Aesthetic quality scoringamt_model/amt-s.pth— Frame interpolation (motion smoothness)raft_model/raft-things.pth— Optical flow (motion/dynamic degree)pyiqa_model/musiq_spaq_ckpt-358bb6af.pth— Image quality scoring
Evaluate one or more videos.
Request (multipart/form-data):
| Field | Type | Required | Description |
|---|---|---|---|
videos |
File[] | Yes | Video files (mp4/gif), multiple allowed |
prompts |
string | No | JSON array string, one prompt per video |
dimensions |
string | No | Dimensions to evaluate, comma-separated. All by default |
Supported dimensions:
subject_consistency— Subject consistency (DINO)background_consistency— Background consistency (CLIP)temporal_flickering— Temporal flickering (pixel MAE)motion_smoothness— Motion smoothness (AMT)dynamic_degree— Dynamic degree (RAFT optical flow)aesthetic_quality— Aesthetic quality (LAION Aesthetic)imaging_quality— Imaging quality (MUSIQ)object_class— Object class detection (CLIP zero-shot)multiple_objects— Multiple objects presence (CLIP)color— Color matching (CLIP)spatial_relationship— Spatial relationship (CLIP)scene— Scene matching (CLIP)temporal_style— Temporal style similarity (CLIP)overall_consistency— Overall consistency (CLIP)human_action— Human action matching (CLIP)appearance_style— Appearance style (CLIP)
Response (JSON):
{
"task_id": "uuid",
"status": "completed",
"results": {
"subject_consistency": 0.92,
"motion_smoothness": 0.85
},
"per_video": [
{
"filename": "demo.mp4",
"scores": {
"subject_consistency": 0.91,
"motion_smoothness": 0.86
}
}
]
}Returns service status and cached model info.
vbench-api/
├── app/
│ ├── main.py # FastAPI entry point
│ ├── routers/
│ │ └── evaluate.py # Evaluation API routes
│ ├── services/
│ │ └── vbench_service.py # VBench wrapper
│ └── models/
│ └── schemas.py # Pydantic models
├── models/ # Model weights (run download script)
├── cache/ # Temporary upload files
├── scripts/
│ └── download_models.sh # Pre-download all model weights
├── requirements.txt
└── README.md