fix(sft): sanitize generation config to prevent save_pretrained crash#24
fix(sft): sanitize generation config to prevent save_pretrained crash#24
Conversation
…LMo-3 Think OLMo-3 Think models ship temperature/top_p with do_sample=False. transformers >= 5.x strict validation rejects this in GenerationConfig.save_pretrained, crashing every checkpoint save. Set do_sample=True in-memory on the trainer's model after construction. The upstream Hub files are unmodified; saved checkpoints preserve the model's recommended inference settings.
There was a problem hiding this comment.
Thanks for the fix. One question:
Is it worth applying this to dpo as well? dpo.py:59-66 constructs DPOTrainer(model=name_or_path) the same way and saves checkpoints via the same path. Suggest moving _sanitize_generation_config to common.py and calling it from both build_sft_trainer and build_dpo_trainer. What do you think?
|
Yes, makes sense. Fixed. |
KonstiNik
left a comment
There was a problem hiding this comment.
Great that DPO is covered as well!
As a forward-looking note: HF's strict validator actually checks eight sampling-mode parameters, not just three — min_p, top_h, typical_p, epsilon_cutoff and eta_cutoff aren't covered by the current heuristic (configuration_utils.py:626-654). Fine for OLMo-3 Think specifically, but worth keeping in mind for future models that ship with those params set.
|
Thanks for pointing this out. Might as well deal with it now, because LeBlanc's Law. Fixed. |
|
Great addition! Approving from my side. |
Summary
OLMo-3 Think models ship a
generation_config.jsonwithtemperature/top_pset butdo_sample=False. This is harmless at training time (we never callmodel.generate), buttransformers >= 5.xruns strict validation insideGenerationConfig.save_pretrainedand rejects the inconsistency:Since every checkpoint save calls
model.save_pretrained, this crashes the job at the very first checkpoint. The fix setsdo_sample=Truein-memory on the trainer's model immediately after construction — local to the run, upstream Hub files unmodified.AllenAI's open-instruct solves the same issue by stripping the sampling params instead; we prefer setting
do_sample=Trueto preserve the model's recommended inference settings in saved checkpoints.Type of change