Currently for fully async training, we force train_batch_size==mini_batch_size.
However it does not have to be the case. Say train_batch_size=4 * mini_batch_size, this would just mean that, we only pause generation + in-flight-weight-update + resume generation every 4 training steps.
This shouldn't be too hard to implement in the fully_async_trainer.py
This is a good first issue to tackle.
- Reference: understand fully async training by reading the docs here and the papers it links to (PipelineRL and AReal)
- Hardware: 4xL4 or equiv. for testing, 2xL4 or equiv. for dev should be enough
- Verification: a fully async gsm8k run with the new config suffices
Motivation is GLM-5 paper 4.1.1:
To reduce policy lag and keep the training approximately on-policy, the model weights used by the rollout engine are periodically synchronized with those of the training engine. The training engine updates the model parameters and pushes the new weights back to the inference engine every K gradient updates.
Currently for fully async training, we force
train_batch_size==mini_batch_size.However it does not have to be the case. Say
train_batch_size=4 * mini_batch_size, this would just mean that, we only pause generation + in-flight-weight-update + resume generation every 4 training steps.This shouldn't be too hard to implement in the
fully_async_trainer.pyThis is a good first issue to tackle.
Motivation is GLM-5 paper 4.1.1: