Convert Ultralytics YOLO-pose model outputs into DeepLabCut, SLEAP, and movement formats — and run a B-SOID-style behavioral clustering pipeline on top, with a PyQt6 desktop GUI built for non-technical lab users.
You have: a YOLO .pt model trained on your animal's keypoints, plus one or
more videos. You want: behavior labels per frame.
This project uses uv for everything. If you
don't have it yet:
curl -LsSf https://astral.sh/uv/install.sh | sh # macOS / Linux
# or: winget install --id=astral-sh.uv -e # WindowsThen, from the project root:
# 1. Install everything yolo2pose needs (Python 3.10–3.13).
uv sync --extra all
# 2. Verify the install — every dependency listed should show ✓.
uv run yolo2pose doctor
# 3. Launch the GUI.
uv run yolo2pose uiIn the GUI:
- Click Choose folder… in the sidebar and pick an empty folder for your project.
- Step 1 · Load data — Browse to your DLC pose CSV(s) (or run
yolo2pose convert video.mp4 --model best.pt --bodyparts ...first to make one). Optionally pair a source video. Click Save selection. - Step 2 · Configure — defaults work for 30 fps mouse video. Hover any field for advice.
- Step 3 · Train — click Fit pipeline. Takes 30–90 s on CPU. Diagnostics populate automatically: cluster count, noise %, embedding scatter, feature importance.
- Step 4 · Review clusters — click Render cluster clips, watch the auto-rendered MP4s, type behavioral names (groom / rear / locomote / immobile / …) and save.
- Step 5 · Apply — point at a folder of new CSVs from other sessions; get a per-frame ethogram CSV for each.
- Step 6 · Live classifier — drop in a video + your YOLO
.pt; get a labeled MP4 + ethogram + transition matrix in one pass.
If anything errors, run uv run yolo2pose doctor — it tells you exactly which package is missing and the uv add … command to fix it. See INSTALL.md for platform-specific notes (PyOpenGL on macOS, Python 3.13 wheels, GPU setup).
You trained a custom keypoint model with Ultralytics YOLO because it's fast and accurate. Now you want to do behavioral analysis, but every downstream tool — B-SOID, VAME, Keypoint-MoSeq, SimBA — expects DeepLabCut-style CSVs (or SLEAP HDF5s, or movement datasets).
yolo2pose is the missing bridge. It runs your .pt model on a video and emits
exactly the format your analysis tool expects, including the correct multi-row
header DLC requires.
yolo2pose uses uv for dependency
management. From the project root:
# Everything (recommended for most users)
uv sync --extra all
# Or pick the specific extras you want
uv sync --extra yolo # just the YOLO inference path
uv sync --extra movement # movement xarray writer
uv sync --extra sleap # SLEAP analysis HDF5 writer
uv sync --extra ui # the desktop GUIFor development:
git clone https://github.com/yourlab/yolo2pose
cd yolo2pose
uv sync --extra dev --extra all
uv run pytestIf you'd rather use plain pip, every uv sync --extra X has a pip install "yolo2pose[X]" equivalent.
Python ≥ 3.9. PyTorch is pulled in via Ultralytics; CPU-only works fine for inference, but a CUDA build is much faster.
yolo2pose convert mouse_session.mp4 \
--model best.pt \
--bodyparts snout,left_ear,right_ear,neck,body_center,left_hip,right_hip,tail_base \
--output mouse_session_DLC.csv \
--format dlc \
--smoothOther formats:
yolo2pose convert video.mp4 -m best.pt -b snout,... -o out.sleap.h5 --format sleap
yolo2pose convert video.mp4 -m best.pt -b snout,... -o out.nc --format movement
yolo2pose convert video.mp4 -m best.pt -b snout,... -o out --format all # writes .csv, .sleap.h5, .ncGPU is auto-detected; pass --device cuda:0 (or mps, or cpu) to be
explicit, and --require-gpu to error out instead of silently falling back
to CPU. See the GPU usage section.
Sanity-check before processing 100 videos:
yolo2pose overlay video.mp4 video_DLC.csv --output sanity.png --n 6Open sanity.png and confirm each named keypoint sits where you'd expect.
This is the single most important step. See the warning below.
from yolo2pose import infer_video
from yolo2pose.converters import dlc, sleap, movement
from yolo2pose.filtering import smooth
KEYPOINTS = [
"snout", "left_ear", "right_ear", "neck",
"body_center", "left_hip", "right_hip", "tail_base",
]
pose = infer_video(
model_path="best.pt",
video_path="mouse_session.mp4",
keypoint_names=KEYPOINTS,
conf=0.25,
)
pose = smooth(pose, confidence_threshold=0.5, max_gap=5, median_window=5)
dlc.to_dlc_csv(pose, "out_DLC.csv")
sleap.to_sleap_analysis_h5(pose, "out.sleap.h5")
ds = movement.to_movement_dataset(pose) # in-memory xarray.Dataset
movement.to_movement_netcdf(pose, "out.nc") # or save to diskPoseData is the format-agnostic container. Holds xy of shape
(n_frames, n_keypoints, 2), confidence of shape (n_frames, n_keypoints),
plus the keypoint name list, fps, and metadata. Every converter consumes it.
The order of keypoint_names must match the order your model was trained
with. There is no automatic check — YOLO emits keypoints by index, and we just
pair index i with name i.
Look at the data.yaml you trained with:
kpt_shape: [8, 3]
flip_idx: [0, 2, 1, 3, 4, 6, 5, 7]
names: ['mouse']
# Your labels follow the order:
# 0 snout
# 1 left_ear
# 2 right_ear
# 3 neck
# 4 body_center
# 5 left_hip
# 6 right_hip
# 7 tail_basePass that exact list. If your downstream behavioral clusters look weird,
this is the first thing to check. Always run yolo2pose overlay on at least
one video and confirm visually.
Inference will use a GPU automatically if one is available, but Ultralytics
silently falls back to CPU if it can't find one — which can turn a 5-minute
job into a 50-minute one without any warning. yolo2pose validates the
device up front and prints what it resolved to.
Check what torch sees:
yolo2pose gpu
# PyTorch 2.3.0
# CUDA available: 1 device(s) (CUDA 12.1, cuDNN 8907)
# [0] NVIDIA RTX A6000 (47.99 GB, sm_86)
# MPS available: noPick a specific device:
yolo2pose convert video.mp4 -m best.pt -b ... -o out.csv --device cuda:0
yolo2pose convert video.mp4 -m best.pt -b ... -o out.csv --device mps # Apple Silicon
yolo2pose convert video.mp4 -m best.pt -b ... -o out.csv --device cpu # force CPUFail loudly if no GPU is available (recommended for production / cluster jobs):
yolo2pose convert video.mp4 -m best.pt -b ... -o out.csv --require-gpu
# RuntimeError: GPU was required but none is available...In a SLURM job, do the pre-flight as the first step:
#SBATCH --gres=gpu:1
yolo2pose gpu --require-gpu || exit 1
yolo2pose convert ... --require-gpuFrom Python:
from yolo2pose import infer_video, gpu_info, resolve_device
print(gpu_info()) # dict you can log / serialize
device = resolve_device("auto", require_gpu=True) # raises if no GPU
pose = infer_video(model, video, names, device=device, require_gpu=True)infer_video echoes the resolved device on the first line of stdout so it's
easy to grep in logs:
[yolo2pose] device = cuda:0 (NVIDIA RTX A6000 (47.99 GB))
Run the built-in pre-flight diagnostic first:
yolo2pose doctorIt checks each link in the chain (torch installed → CUDA-enabled build →
NVIDIA driver visible → torch.cuda.is_available()) and tells you which
step failed. The two failure modes that account for ~95% of "my GPU isn't
seen" reports:
1. You installed a CPU-only PyTorch wheel. This is the default on PyPI
for many platforms. Diagnostic: python -c "import torch; print(torch.__version__, torch.version.cuda)" — if the version ends in +cpu or
torch.version.cuda is None, you have the CPU build. The fix:
# pip
pip uninstall torch torchvision
pip install torch --index-url https://download.pytorch.org/whl/cu121
# uv (recommended): add a dedicated index to pyproject.tomlFor uv, plain uv sync will keep reinstalling whatever your
uv.lock resolved to. To get a CUDA wheel, configure the PyTorch index in
your pyproject.toml:
[[tool.uv.index]]
name = "pytorch-cu121"
url = "https://download.pytorch.org/whl/cu121"
explicit = true
[tool.uv.sources]
torch = [
{ index = "pytorch-cu121", marker = "sys_platform != 'darwin'" },
]
torchvision = [
{ index = "pytorch-cu121", marker = "sys_platform != 'darwin'" },
]Then:
uv lock --upgrade-package torch
uv sync
yolo2pose doctor # verifyThe marker = "sys_platform != 'darwin'" clause means macOS still pulls
the standard PyPI wheel (which already includes MPS support); only Linux
and Windows go to the CUDA index.
If you're targeting a different CUDA version, swap cu121 for cu118,
cu124, etc. — the URL pattern is https://download.pytorch.org/whl/cuXYZ.
Match it to your driver's CUDA version (nvidia-smi shows it in the top right).
2. CUDA_VISIBLE_DEVICES is set to an empty string. Some job schedulers
do this when no GPUs are allocated. Check with echo $CUDA_VISIBLE_DEVICES;
if it's set to "" (rather than unset), your script saw --gres=gpu:0 or
forgot the GPU flag.
For SLURM-style preflight:
#SBATCH --gres=gpu:1
yolo2pose doctor || exit 1
yolo2pose convert ... --require-gpuYOLO-pose is jittery frame-to-frame compared to DLC. The default smoothing
pipeline (yolo2pose.filtering.smooth) does three things:
- mask — set xy to NaN where likelihood < threshold (default 0.5)
- interpolate — linear-fill NaN gaps shorter than
max_gapframes - median filter — temporal median, default window 5
Tune per dataset. For 30 fps mouse video with a well-trained model, defaults work; for noisier setups, raise the threshold and the median window.
If you ever want to retrain in DLC, you can convert your YOLO label files:
yolo2pose labels images/ labels/ \
--bodyparts snout,left_ear,... \
--output labeled-data/session1 \
--scorer yolo_importThis produces a CollectedData_<scorer>.h5 and matching CSV in the standard
DLC labeled-data/<videoname>/ layout, ready to drop into a project created
via deeplabcut.create_new_project.
Once you have DLC CSVs, the package can take you the rest of the way to labeled behaviors. The pipeline is B-SOID-style: feature engineering → sliding-window aggregation → UMAP → HDBSCAN → random-forest classifier. The trained pipeline is portable, so once you've named clusters from one training run you can apply them to every future video without re-clustering.
# Train on one or more sessions. Optionally pair videos for cluster-clip rendering.
yolo2pose behavior train \
sessions/*.csv \
--output models/cohort_v1/ \
--window 16 --stride 4 --min-cluster-size 50 \
--video sessions/session_001.mp4 \
--video sessions/session_002.mp4
# Apply trained pipeline to a new pose CSV.
yolo2pose behavior apply new_session.csv \
--model models/cohort_v1/ \
--output new_session_ethogram.csvThe output ethogram has one row per frame:
frame,cluster,name
0,3,locomote
1,3,locomote
2,7,rear
...
Once you've trained a pipeline and named the clusters, the live classifier takes one video and produces an annotated MP4 + ethogram + bout statistics
- behavior transition matrix. Both the tracking and classifying steps run in one pass — you don't need to convert the video to a DLC CSV first.
# Track + classify in one shot (recommended).
yolo2pose behavior live session.mp4 \
--model models/cohort_v1/ \
--yolo best.pt \
--bodyparts-file examples/bodyparts.yaml \
--output live_out/ \
--device cuda:0
# Or, if you've already converted the video to a DLC CSV.
yolo2pose behavior live session.mp4 \
--model models/cohort_v1/ \
--pose-csv session_DLC.csv \
--output live_out/Outputs in live_out/:
annotated.mp4— H.264 video with the current behavior name burned onto every frame plus a colored bar matching the label palette. Plays anywhere (browser, QuickTime, VLC).ethogram.csv— one row per frame:frame, time, cluster, name.bouts.csv— segmented runs:start_frame, end_frame, duration, cluster, name.stats.csv— per-behavior summary: total time, fraction, n_bouts, mean & median bout duration.transitions.csv— bout-to-bout transition matrixP(next | current).
The same workflow is available in the Streamlit GUI on Step 6 · Live classifier with an embedded video player, an interactive Plotly ethogram, the stats table, and a transition-matrix heatmap.
pip install yolo2pose[ui]
yolo2pose uiNative PyQt6 window with six wizard steps in a sidebar. Every path field has 📁/📄 Browse buttons that open Qt's native file dialogs.
- Load data — Browse to a folder of DLC CSVs (or pick individual files) and matching videos.
- Configure — spinboxes / radios for window length, stride, embedding (UMAP / PCA), HDBSCAN min cluster size, cluster selection method, NaN tolerance.
- Train — runs in a
QThreadso the window stays responsive. Shows headline metrics, per-keypoint reliability bars, embedding scatter (2D / 3D toggle, hide-noise toggle), cumulative-coverage chart, cluster sizes, feature-importance tabs (top features / by type / by keypoint). - Review clusters — auto-renders short MP4s per cluster. Watch in an embedded
QMediaPlayer, type a behavioral name, save. - Apply — runs the trained pipeline on a folder (or list) of new CSVs and writes per-frame ethogram CSVs.
- Live classifier — drop in a video, run YOLO + behavior in one pass, get the annotated MP4 + ethogram + stats + transition heatmap, all in inline tabs.
State persists to <project>/state.json; closing and reopening the app picks up where you left off.
from yolo2pose.behavior import BehaviorPipeline
from yolo2pose.behavior.pipeline import PipelineConfig
from yolo2pose.behavior.features import FeatureSpec
from yolo2pose.converters.dlc import from_dlc_csv
# Train.
poses = [from_dlc_csv(p, fps=30.0) for p in csvs]
config = PipelineConfig(
feature_spec=FeatureSpec(body_axis=(0, poses[0].n_keypoints - 1)),
window=16, stride=4, embedding="umap", min_cluster_size=50,
)
pipe = BehaviorPipeline(config).fit(poses)
pipe.save("models/cohort_v1/")
# Apply.
pipe = BehaviorPipeline.load("models/cohort_v1/")
labels = pipe.predict(from_dlc_csv("new_session.csv", fps=30.0)) # (n_frames,)
names = [pipe.label_to_name(int(l)) for l in labels]Three knobs matter most:
window— how long is one behavioral unit? At 30 fps, 15 frames ≈ 500 ms (good for most mouse syllables). Halve for fast behaviors (paw twitches), double for sustained ones (long grooming bouts).min_cluster_size— smaller → more clusters → finer-grained behaviors but noisier. Start at 0.5–1% of total windows; if clusters look mushy, raise it.embedding—umapis what B-SOID uses;pcais faster and deterministic (useful for tests / very large datasets).
If too many windows end up as noise (HDBSCAN's -1 label), lower min_cluster_size or min_samples. If clusters merge that shouldn't, raise umap_n_neighbors.
This isn't a replacement for B-SOID, MoSeq, or VAME if you've already invested
in their analysis ecosystems. It's a lightweight, self-contained pipeline that
takes the DLC CSVs yolo2pose produces and runs them through the same kind of
unsupervised clustering, with a friendly GUI that meets non-technical users
where they are. Use it as a fast first pass; export to DLC CSV and feed those
into the more specialized tools when you need their specific outputs.
| Tool | Recommended format |
|---|---|
| B-SOID | DLC CSV |
| VAME | DLC CSV |
| Keypoint-MoSeq | DLC CSV or movement |
| SimBA | DLC CSV |
| sleap-anipose | SLEAP analysis HDF5 |
| movement-based pipelines | movement NetCDF / xarray |
- v0.2 — ✓ behavioral clustering pipeline (B-SOID-style) + Streamlit GUI
- v0.3 — ✓ live classifier (recorded video → annotated MP4 + ethogram + transitions)
- v0.4 — real-time webcam classifier (streamlit-webrtc + per-frame YOLO+behavior); HMM smoothing for cluster labels
- v0.5 — multi-animal support (
model.track+ ByteTrack/BoT-SORT, multi-individual DLC header); NWB pose extension export (ndx-pose) - v0.6 — group-level statistics in the GUI; Anipose triangulation helper for multi-camera setups
MIT. See LICENSE.
If you use yolo2pose in published work, please cite this repository and the
underlying tools (Ultralytics YOLO, plus DeepLabCut / SLEAP / movement / your
behavioral-analysis tool of choice).