Skip to content

[train] Add KL-in-advantages mode#1262

Draft
tyler-griggs wants to merge 1 commit intomainfrom
tgriggs/kl-in-advantages
Draft

[train] Add KL-in-advantages mode#1262
tyler-griggs wants to merge 1 commit intomainfrom
tgriggs/kl-in-advantages

Conversation

@tyler-griggs
Copy link
Member

Summary

  • New KL penalty mode: batch-centered relative KL applied to advantages after group normalization
  • advantage += coef * (avg_batch_KL - token_KL)
  • Tokens drifting more than average get penalized; tokens drifting less get a bonus
  • Configure with use_kl_in_advantages: true, kl_advantages_coef: 0.01

Test plan

  • Existing trainer tests pass
  • KL advantage sums to ~0 (batch-centered)

🤖 Generated with Claude Code

Adds a new KL penalty mode that modifies advantages after group
normalization with a batch-centered relative KL signal:

    advantage += coef * (avg_batch_KL - token_KL)

Tokens drifting more than the batch average from the reference get
penalized; tokens drifting less get a bonus. The sum is approximately
zero (variance-reducing). Avoids the gradient bias of KL-in-loss.

Configure with use_kl_in_advantages: true, kl_advantages_coef: 0.01.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant