Skip to content

[train] Add DRO (Direct Reward Optimization) policy loss#1259

Draft
tyler-griggs wants to merge 2 commits intomainfrom
tgriggs/dro-loss
Draft

[train] Add DRO (Direct Reward Optimization) policy loss#1259
tyler-griggs wants to merge 2 commits intomainfrom
tgriggs/dro-loss

Commits

Commits on Mar 3, 2026