Skip to content

LaingLab/yolo2pose

Repository files navigation

yolo2pose

Convert Ultralytics YOLO-pose model outputs into DeepLabCut, SLEAP, and movement formats — and run a B-SOID-style behavioral clustering pipeline on top, with a PyQt6 desktop GUI built for non-technical lab users.

For researchers — 5-minute quickstart

You have: a YOLO .pt model trained on your animal's keypoints, plus one or more videos. You want: behavior labels per frame.

This project uses uv for everything. If you don't have it yet:

curl -LsSf https://astral.sh/uv/install.sh | sh   # macOS / Linux
# or: winget install --id=astral-sh.uv -e         # Windows

Then, from the project root:

# 1. Install everything yolo2pose needs (Python 3.10–3.13).
uv sync --extra all

# 2. Verify the install — every dependency listed should show ✓.
uv run yolo2pose doctor

# 3. Launch the GUI.
uv run yolo2pose ui

In the GUI:

  1. Click Choose folder… in the sidebar and pick an empty folder for your project.
  2. Step 1 · Load data — Browse to your DLC pose CSV(s) (or run yolo2pose convert video.mp4 --model best.pt --bodyparts ... first to make one). Optionally pair a source video. Click Save selection.
  3. Step 2 · Configure — defaults work for 30 fps mouse video. Hover any field for advice.
  4. Step 3 · Train — click Fit pipeline. Takes 30–90 s on CPU. Diagnostics populate automatically: cluster count, noise %, embedding scatter, feature importance.
  5. Step 4 · Review clusters — click Render cluster clips, watch the auto-rendered MP4s, type behavioral names (groom / rear / locomote / immobile / …) and save.
  6. Step 5 · Apply — point at a folder of new CSVs from other sessions; get a per-frame ethogram CSV for each.
  7. Step 6 · Live classifier — drop in a video + your YOLO .pt; get a labeled MP4 + ethogram + transition matrix in one pass.

If anything errors, run uv run yolo2pose doctor — it tells you exactly which package is missing and the uv add … command to fix it. See INSTALL.md for platform-specific notes (PyOpenGL on macOS, Python 3.13 wheels, GPU setup).

Why this exists

You trained a custom keypoint model with Ultralytics YOLO because it's fast and accurate. Now you want to do behavioral analysis, but every downstream tool — B-SOID, VAME, Keypoint-MoSeq, SimBA — expects DeepLabCut-style CSVs (or SLEAP HDF5s, or movement datasets).

yolo2pose is the missing bridge. It runs your .pt model on a video and emits exactly the format your analysis tool expects, including the correct multi-row header DLC requires.

Install

yolo2pose uses uv for dependency management. From the project root:

# Everything (recommended for most users)
uv sync --extra all

# Or pick the specific extras you want
uv sync --extra yolo         # just the YOLO inference path
uv sync --extra movement     # movement xarray writer
uv sync --extra sleap        # SLEAP analysis HDF5 writer
uv sync --extra ui           # the desktop GUI

For development:

git clone https://github.com/yourlab/yolo2pose
cd yolo2pose
uv sync --extra dev --extra all
uv run pytest

If you'd rather use plain pip, every uv sync --extra X has a pip install "yolo2pose[X]" equivalent.

Python ≥ 3.9. PyTorch is pulled in via Ultralytics; CPU-only works fine for inference, but a CUDA build is much faster.

Quickstart (CLI)

yolo2pose convert mouse_session.mp4 \
    --model best.pt \
    --bodyparts snout,left_ear,right_ear,neck,body_center,left_hip,right_hip,tail_base \
    --output mouse_session_DLC.csv \
    --format dlc \
    --smooth

Other formats:

yolo2pose convert video.mp4 -m best.pt -b snout,... -o out.sleap.h5 --format sleap
yolo2pose convert video.mp4 -m best.pt -b snout,... -o out.nc       --format movement
yolo2pose convert video.mp4 -m best.pt -b snout,... -o out          --format all   # writes .csv, .sleap.h5, .nc

GPU is auto-detected; pass --device cuda:0 (or mps, or cpu) to be explicit, and --require-gpu to error out instead of silently falling back to CPU. See the GPU usage section.

Sanity-check before processing 100 videos:

yolo2pose overlay video.mp4 video_DLC.csv --output sanity.png --n 6

Open sanity.png and confirm each named keypoint sits where you'd expect. This is the single most important step. See the warning below.

Quickstart (Python API)

from yolo2pose import infer_video
from yolo2pose.converters import dlc, sleap, movement
from yolo2pose.filtering import smooth

KEYPOINTS = [
    "snout", "left_ear", "right_ear", "neck",
    "body_center", "left_hip", "right_hip", "tail_base",
]

pose = infer_video(
    model_path="best.pt",
    video_path="mouse_session.mp4",
    keypoint_names=KEYPOINTS,
    conf=0.25,
)

pose = smooth(pose, confidence_threshold=0.5, max_gap=5, median_window=5)

dlc.to_dlc_csv(pose, "out_DLC.csv")
sleap.to_sleap_analysis_h5(pose, "out.sleap.h5")
ds = movement.to_movement_dataset(pose)         # in-memory xarray.Dataset
movement.to_movement_netcdf(pose, "out.nc")     # or save to disk

PoseData is the format-agnostic container. Holds xy of shape (n_frames, n_keypoints, 2), confidence of shape (n_frames, n_keypoints), plus the keypoint name list, fps, and metadata. Every converter consumes it.

⚠️ Body-part order is the silent bug

The order of keypoint_names must match the order your model was trained with. There is no automatic check — YOLO emits keypoints by index, and we just pair index i with name i.

Look at the data.yaml you trained with:

kpt_shape: [8, 3]
flip_idx: [0, 2, 1, 3, 4, 6, 5, 7]
names: ['mouse']
# Your labels follow the order:
#   0 snout
#   1 left_ear
#   2 right_ear
#   3 neck
#   4 body_center
#   5 left_hip
#   6 right_hip
#   7 tail_base

Pass that exact list. If your downstream behavioral clusters look weird, this is the first thing to check. Always run yolo2pose overlay on at least one video and confirm visually.

GPU usage

Inference will use a GPU automatically if one is available, but Ultralytics silently falls back to CPU if it can't find one — which can turn a 5-minute job into a 50-minute one without any warning. yolo2pose validates the device up front and prints what it resolved to.

Check what torch sees:

yolo2pose gpu
# PyTorch 2.3.0
# CUDA available: 1 device(s) (CUDA 12.1, cuDNN 8907)
#   [0] NVIDIA RTX A6000  (47.99 GB, sm_86)
# MPS available: no

Pick a specific device:

yolo2pose convert video.mp4 -m best.pt -b ... -o out.csv --device cuda:0
yolo2pose convert video.mp4 -m best.pt -b ... -o out.csv --device mps     # Apple Silicon
yolo2pose convert video.mp4 -m best.pt -b ... -o out.csv --device cpu     # force CPU

Fail loudly if no GPU is available (recommended for production / cluster jobs):

yolo2pose convert video.mp4 -m best.pt -b ... -o out.csv --require-gpu
# RuntimeError: GPU was required but none is available...

In a SLURM job, do the pre-flight as the first step:

#SBATCH --gres=gpu:1
yolo2pose gpu --require-gpu || exit 1
yolo2pose convert ... --require-gpu

From Python:

from yolo2pose import infer_video, gpu_info, resolve_device

print(gpu_info())                       # dict you can log / serialize
device = resolve_device("auto", require_gpu=True)  # raises if no GPU
pose = infer_video(model, video, names, device=device, require_gpu=True)

infer_video echoes the resolved device on the first line of stdout so it's easy to grep in logs:

[yolo2pose] device = cuda:0  (NVIDIA RTX A6000 (47.99 GB))

Troubleshooting: GPU not seen

Run the built-in pre-flight diagnostic first:

yolo2pose doctor

It checks each link in the chain (torch installed → CUDA-enabled build → NVIDIA driver visible → torch.cuda.is_available()) and tells you which step failed. The two failure modes that account for ~95% of "my GPU isn't seen" reports:

1. You installed a CPU-only PyTorch wheel. This is the default on PyPI for many platforms. Diagnostic: python -c "import torch; print(torch.__version__, torch.version.cuda)" — if the version ends in +cpu or torch.version.cuda is None, you have the CPU build. The fix:

# pip
pip uninstall torch torchvision
pip install torch --index-url https://download.pytorch.org/whl/cu121

# uv (recommended): add a dedicated index to pyproject.toml

For uv, plain uv sync will keep reinstalling whatever your uv.lock resolved to. To get a CUDA wheel, configure the PyTorch index in your pyproject.toml:

[[tool.uv.index]]
name = "pytorch-cu121"
url = "https://download.pytorch.org/whl/cu121"
explicit = true

[tool.uv.sources]
torch = [
    { index = "pytorch-cu121", marker = "sys_platform != 'darwin'" },
]
torchvision = [
    { index = "pytorch-cu121", marker = "sys_platform != 'darwin'" },
]

Then:

uv lock --upgrade-package torch
uv sync
yolo2pose doctor          # verify

The marker = "sys_platform != 'darwin'" clause means macOS still pulls the standard PyPI wheel (which already includes MPS support); only Linux and Windows go to the CUDA index.

If you're targeting a different CUDA version, swap cu121 for cu118, cu124, etc. — the URL pattern is https://download.pytorch.org/whl/cuXYZ. Match it to your driver's CUDA version (nvidia-smi shows it in the top right).

2. CUDA_VISIBLE_DEVICES is set to an empty string. Some job schedulers do this when no GPUs are allocated. Check with echo $CUDA_VISIBLE_DEVICES; if it's set to "" (rather than unset), your script saw --gres=gpu:0 or forgot the GPU flag.

For SLURM-style preflight:

#SBATCH --gres=gpu:1
yolo2pose doctor || exit 1
yolo2pose convert ... --require-gpu

Smoothing

YOLO-pose is jittery frame-to-frame compared to DLC. The default smoothing pipeline (yolo2pose.filtering.smooth) does three things:

  1. mask — set xy to NaN where likelihood < threshold (default 0.5)
  2. interpolate — linear-fill NaN gaps shorter than max_gap frames
  3. median filter — temporal median, default window 5

Tune per dataset. For 30 fps mouse video with a well-trained model, defaults work; for noisier setups, raise the threshold and the median window.

Importing YOLO labels into DeepLabCut

If you ever want to retrain in DLC, you can convert your YOLO label files:

yolo2pose labels images/ labels/ \
    --bodyparts snout,left_ear,... \
    --output labeled-data/session1 \
    --scorer yolo_import

This produces a CollectedData_<scorer>.h5 and matching CSV in the standard DLC labeled-data/<videoname>/ layout, ready to drop into a project created via deeplabcut.create_new_project.

Behavioral clustering (v0.2)

Once you have DLC CSVs, the package can take you the rest of the way to labeled behaviors. The pipeline is B-SOID-style: feature engineering → sliding-window aggregation → UMAP → HDBSCAN → random-forest classifier. The trained pipeline is portable, so once you've named clusters from one training run you can apply them to every future video without re-clustering.

CLI

# Train on one or more sessions. Optionally pair videos for cluster-clip rendering.
yolo2pose behavior train \
    sessions/*.csv \
    --output models/cohort_v1/ \
    --window 16 --stride 4 --min-cluster-size 50 \
    --video sessions/session_001.mp4 \
    --video sessions/session_002.mp4

# Apply trained pipeline to a new pose CSV.
yolo2pose behavior apply new_session.csv \
    --model models/cohort_v1/ \
    --output new_session_ethogram.csv

The output ethogram has one row per frame:

frame,cluster,name
0,3,locomote
1,3,locomote
2,7,rear
...

Live behavior classifier

Once you've trained a pipeline and named the clusters, the live classifier takes one video and produces an annotated MP4 + ethogram + bout statistics

  • behavior transition matrix. Both the tracking and classifying steps run in one pass — you don't need to convert the video to a DLC CSV first.
# Track + classify in one shot (recommended).
yolo2pose behavior live session.mp4 \
    --model models/cohort_v1/ \
    --yolo best.pt \
    --bodyparts-file examples/bodyparts.yaml \
    --output live_out/ \
    --device cuda:0

# Or, if you've already converted the video to a DLC CSV.
yolo2pose behavior live session.mp4 \
    --model models/cohort_v1/ \
    --pose-csv session_DLC.csv \
    --output live_out/

Outputs in live_out/:

  • annotated.mp4 — H.264 video with the current behavior name burned onto every frame plus a colored bar matching the label palette. Plays anywhere (browser, QuickTime, VLC).
  • ethogram.csv — one row per frame: frame, time, cluster, name.
  • bouts.csv — segmented runs: start_frame, end_frame, duration, cluster, name.
  • stats.csv — per-behavior summary: total time, fraction, n_bouts, mean & median bout duration.
  • transitions.csv — bout-to-bout transition matrix P(next | current).

The same workflow is available in the Streamlit GUI on Step 6 · Live classifier with an embedded video player, an interactive Plotly ethogram, the stats table, and a transition-matrix heatmap.

Desktop GUI (for non-technical users)

pip install yolo2pose[ui]
yolo2pose ui

Native PyQt6 window with six wizard steps in a sidebar. Every path field has 📁/📄 Browse buttons that open Qt's native file dialogs.

  1. Load data — Browse to a folder of DLC CSVs (or pick individual files) and matching videos.
  2. Configure — spinboxes / radios for window length, stride, embedding (UMAP / PCA), HDBSCAN min cluster size, cluster selection method, NaN tolerance.
  3. Train — runs in a QThread so the window stays responsive. Shows headline metrics, per-keypoint reliability bars, embedding scatter (2D / 3D toggle, hide-noise toggle), cumulative-coverage chart, cluster sizes, feature-importance tabs (top features / by type / by keypoint).
  4. Review clusters — auto-renders short MP4s per cluster. Watch in an embedded QMediaPlayer, type a behavioral name, save.
  5. Apply — runs the trained pipeline on a folder (or list) of new CSVs and writes per-frame ethogram CSVs.
  6. Live classifier — drop in a video, run YOLO + behavior in one pass, get the annotated MP4 + ethogram + stats + transition heatmap, all in inline tabs.

State persists to <project>/state.json; closing and reopening the app picks up where you left off.

Python API

from yolo2pose.behavior import BehaviorPipeline
from yolo2pose.behavior.pipeline import PipelineConfig
from yolo2pose.behavior.features import FeatureSpec
from yolo2pose.converters.dlc import from_dlc_csv

# Train.
poses = [from_dlc_csv(p, fps=30.0) for p in csvs]
config = PipelineConfig(
    feature_spec=FeatureSpec(body_axis=(0, poses[0].n_keypoints - 1)),
    window=16, stride=4, embedding="umap", min_cluster_size=50,
)
pipe = BehaviorPipeline(config).fit(poses)
pipe.save("models/cohort_v1/")

# Apply.
pipe = BehaviorPipeline.load("models/cohort_v1/")
labels = pipe.predict(from_dlc_csv("new_session.csv", fps=30.0))   # (n_frames,)
names = [pipe.label_to_name(int(l)) for l in labels]

How to tune it

Three knobs matter most:

  • window — how long is one behavioral unit? At 30 fps, 15 frames ≈ 500 ms (good for most mouse syllables). Halve for fast behaviors (paw twitches), double for sustained ones (long grooming bouts).
  • min_cluster_size — smaller → more clusters → finer-grained behaviors but noisier. Start at 0.5–1% of total windows; if clusters look mushy, raise it.
  • embeddingumap is what B-SOID uses; pca is faster and deterministic (useful for tests / very large datasets).

If too many windows end up as noise (HDBSCAN's -1 label), lower min_cluster_size or min_samples. If clusters merge that shouldn't, raise umap_n_neighbors.

What this is not

This isn't a replacement for B-SOID, MoSeq, or VAME if you've already invested in their analysis ecosystems. It's a lightweight, self-contained pipeline that takes the DLC CSVs yolo2pose produces and runs them through the same kind of unsupervised clustering, with a friendly GUI that meets non-technical users where they are. Use it as a fast first pass; export to DLC CSV and feed those into the more specialized tools when you need their specific outputs.

What works with what

Tool Recommended format
B-SOID DLC CSV
VAME DLC CSV
Keypoint-MoSeq DLC CSV or movement
SimBA DLC CSV
sleap-anipose SLEAP analysis HDF5
movement-based pipelines movement NetCDF / xarray

Roadmap

  • v0.2 — ✓ behavioral clustering pipeline (B-SOID-style) + Streamlit GUI
  • v0.3 — ✓ live classifier (recorded video → annotated MP4 + ethogram + transitions)
  • v0.4 — real-time webcam classifier (streamlit-webrtc + per-frame YOLO+behavior); HMM smoothing for cluster labels
  • v0.5 — multi-animal support (model.track + ByteTrack/BoT-SORT, multi-individual DLC header); NWB pose extension export (ndx-pose)
  • v0.6 — group-level statistics in the GUI; Anipose triangulation helper for multi-camera setups

License

MIT. See LICENSE.

Citation

If you use yolo2pose in published work, please cite this repository and the underlying tools (Ultralytics YOLO, plus DeepLabCut / SLEAP / movement / your behavioral-analysis tool of choice).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages