Skip to content

[train] Add per-token hard masking for off-policy correction#1264

Draft
tyler-griggs wants to merge 1 commit intomainfrom
tgriggs/per-token-masking
Draft

[train] Add per-token hard masking for off-policy correction#1264
tyler-griggs wants to merge 1 commit intomainfrom
tgriggs/per-token-masking

Conversation

@tyler-griggs
Copy link
Member

Summary

  • Zeros individual divergent tokens where train/infer IS ratio exits configurable bounds
  • Unlike outlier masking (rejects entire sequences), this masks only the specific tokens
  • Configure with off_policy_correction.token_mask_eps_low/high
  • For full IS-corrected masking, combine with tis_ratio_type: "token"

Test plan

  • Existing off-policy correction tests pass
  • In-bounds tokens kept, divergent tokens zeroed

🤖 Generated with Claude Code

Zeros individual tokens where the train/infer importance ratio falls
outside configurable bounds, while keeping the rest of the sequence.
Unlike outlier_token_mask (which rejects entire sequences), this
surgically removes only the divergent tokens.

Configure with:
    off_policy_correction.token_mask_eps_low: 0.2   # lower bound = 0.8
    off_policy_correction.token_mask_eps_high: 0.28  # upper bound = 1.28

For full IS-corrected masking, combine with tis_ratio_type: "token".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant