Skip to content

fix: forward warmup_ratio correctly to TrainingArguments#21

Merged
Neonkraft merged 2 commits intomainfrom
fix/warmup-steps
May 1, 2026
Merged

fix: forward warmup_ratio correctly to TrainingArguments#21
Neonkraft merged 2 commits intomainfrom
fix/warmup-steps

Conversation

@Neonkraft
Copy link
Copy Markdown
Collaborator

Summary

Drop warmup_ratio from TrainingConfig and make warmup_steps a float. The original bug: warmup_ratio (e.g. 0.03) was being forwarded to warmup_steps (an int field in TrainingArguments), silently truncating to 0 and disabling warmup entirely. HuggingFace TrainingArguments already interprets a float value for warmup_steps as a ratio, so a single float field covers both use cases with no special handling needed.

Type of change

  • Bug fix
  • New feature
  • Refactor
  • Performance
  • Documentation
  • Maintenance

warmup_ratio (float 0.03) was passed to warmup_steps (int), truncating
to 0 and silently disabling warmup entirely. Add a warmup_steps config
field (default 0) and forward both fields to TrainingArguments so each
reaches the correct parameter.
@Neonkraft Neonkraft requested a review from KonstiNik April 29, 2026 14:05
Copy link
Copy Markdown
Collaborator

@KonstiNik KonstiNik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up!

Tho, I'd argue we should follow HF's API more directly here. Maintaining warmup_ratio and warmup_steps as two separate fields for the same underlying concept doesn't really hold up — HF themselves consolidated this in 5.2 into a single warmup_steps: float where < 1 is interpreted as a ratio and >= 1 as absolute steps (training_args.py:788-793, training_args.py:2047-2054). One knob covers both use cases cleanly.

Concretely I suggest:

  1. Drop warmup_ratio from TrainingConfig and rename to warmup_steps. The piece I think matters most: users need to be told what to put in this field, since accepting both ratios and absolute step counts is non-obvious from the name alone. Use the field(metadata=...) form (same pattern HF uses in TrainingArguments), and we can surface it in --help or auto-generated docs later:
warmup_steps: float = field(
    default=0.03,
    metadata={
        "help": (
            "Linear warmup duration. Values in [0, 1) are interpreted as a "
            "ratio of total training steps; values >= 1 are interpreted as an "
            "absolute number of steps; 0 disables warmup."
        )
    },
)
  1. Forward only warmup_steps to TrainingArguments.
    Add a small shim in PostTrainingConfig.load (config.py:270, between OmegaConf.load and the schema merge) so old YAMLs don't break:
if training is not None and "warmup_ratio" in training:
    logger.warning(
        "training.warmup_ratio is deprecated; use training.warmup_steps "
        "(values < 1 are interpreted as a ratio). Auto-migrating."
    )
    training.warmup_steps = training.pop("warmup_ratio")

(Needs import logging + a module-level logger — neither exists in config.py today.)

…at field

Drop warmup_ratio; warmup_steps is now a float where values in [0, 1)
are treated as a ratio and values >= 1 as absolute steps, matching HF's
TrainingArguments behaviour. Adds a backward-compat shim that
auto-migrates warmup_ratio in old YAMLs with a deprecation warning.
@Neonkraft
Copy link
Copy Markdown
Collaborator Author

Done.

Copy link
Copy Markdown
Collaborator

@KonstiNik KonstiNik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor looks great. One small question: the default went from 0.03 to 0.0, so configs that didn't set warmup explicitly will now get no warmup instead of 3%. Was the default change intentional, or worth flipping back to 0.03 to keep the old behavior?

@Neonkraft
Copy link
Copy Markdown
Collaborator Author

Was the default change intentional, or worth flipping back to 0.03

It was intentional. TRL's default is 0.0 so I thought it best to mimic it.

@Neonkraft Neonkraft merged commit 895eab0 into main May 1, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants