Skip to content

fix(sft): sanitize generation config to prevent save_pretrained crash#24

Merged
Neonkraft merged 4 commits intomainfrom
fix/olmo3-generation-config
May 1, 2026
Merged

fix(sft): sanitize generation config to prevent save_pretrained crash#24
Neonkraft merged 4 commits intomainfrom
fix/olmo3-generation-config

Conversation

@Neonkraft
Copy link
Copy Markdown
Collaborator

Summary

OLMo-3 Think models ship a generation_config.json with temperature/top_p set but do_sample=False. This is harmless at training time (we never call model.generate), but transformers >= 5.x runs strict validation inside GenerationConfig.save_pretrained and rejects the inconsistency:

ValueError: GenerationConfig is invalid:
  - `temperature` is set to 0.6 -- this flag is only used in sample-based
    generation modes. You should set `do_sample=True` or unset `temperature`.

Since every checkpoint save calls model.save_pretrained, this crashes the job at the very first checkpoint. The fix sets do_sample=True in-memory on the trainer's model immediately after construction — local to the run, upstream Hub files unmodified.

AllenAI's open-instruct solves the same issue by stripping the sampling params instead; we prefer setting do_sample=True to preserve the model's recommended inference settings in saved checkpoints.

Type of change

  • Bug fix
  • New feature
  • Refactor
  • Performance
  • Documentation
  • Maintenance

…LMo-3 Think

OLMo-3 Think models ship temperature/top_p with do_sample=False.
transformers >= 5.x strict validation rejects this in
GenerationConfig.save_pretrained, crashing every checkpoint save.
Set do_sample=True in-memory on the trainer's model after construction.
The upstream Hub files are unmodified; saved checkpoints preserve the
model's recommended inference settings.
@Neonkraft Neonkraft requested a review from KonstiNik April 29, 2026 14:38
Copy link
Copy Markdown
Collaborator

@KonstiNik KonstiNik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. One question:

Is it worth applying this to dpo as well? dpo.py:59-66 constructs DPOTrainer(model=name_or_path) the same way and saves checkpoints via the same path. Suggest moving _sanitize_generation_config to common.py and calling it from both build_sft_trainer and build_dpo_trainer. What do you think?

@Neonkraft
Copy link
Copy Markdown
Collaborator Author

Yes, makes sense. Fixed.

Copy link
Copy Markdown
Collaborator

@KonstiNik KonstiNik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that DPO is covered as well!
As a forward-looking note: HF's strict validator actually checks eight sampling-mode parameters, not just three — min_p, top_h, typical_p, epsilon_cutoff and eta_cutoff aren't covered by the current heuristic (configuration_utils.py:626-654). Fine for OLMo-3 Think specifically, but worth keeping in mind for future models that ship with those params set.

@Neonkraft Neonkraft changed the title fix(sft): sanitize generation config to prevent save_pretrained crash on OLMo-3 Think fix(sft): sanitize generation config to prevent save_pretrained crash Apr 30, 2026
@Neonkraft
Copy link
Copy Markdown
Collaborator Author

Thanks for pointing this out. Might as well deal with it now, because LeBlanc's Law. Fixed.

@KonstiNik
Copy link
Copy Markdown
Collaborator

Great addition! Approving from my side.

@Neonkraft Neonkraft merged commit 7045e45 into main May 1, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants