Skip to content

[train][FullyAsync] Support customizable weight sync frequency #1205

@CharlieFRuan

Description

@CharlieFRuan

Currently for fully async training, we force train_batch_size==mini_batch_size.

However it does not have to be the case. Say train_batch_size=4 * mini_batch_size, this would just mean that, we only pause generation + in-flight-weight-update + resume generation every 4 training steps.

This shouldn't be too hard to implement in the fully_async_trainer.py

This is a good first issue to tackle.

  • Reference: understand fully async training by reading the docs here and the papers it links to (PipelineRL and AReal)
  • Hardware: 4xL4 or equiv. for testing, 2xL4 or equiv. for dev should be enough
  • Verification: a fully async gsm8k run with the new config suffices

Motivation is GLM-5 paper 4.1.1:

To reduce policy lag and keep the training approximately on-policy, the model weights used by the rollout engine are periodically synchronized with those of the training engine. The training engine updates the model parameters and pushes the new weights back to the inference engine every K gradient updates.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions