gspo: GSPO loss + DeepSpeed parity fixes (loss/grad divisors, SDP, fp32_lm_head, docs_per_step, temperature)#502
Open
bigximik wants to merge 9 commits intogrpo-metricsfrom
Open
gspo: GSPO loss + DeepSpeed parity fixes (loss/grad divisors, SDP, fp32_lm_head, docs_per_step, temperature)#502bigximik wants to merge 9 commits intogrpo-metricsfrom
bigximik wants to merge 9 commits intogrpo-metricsfrom