[train][FullyAsync] Support customizable weight sync frequency

Currently for fully async training, we force `train_batch_size==mini_batch_size`.

However it does not have to be the case. Say `train_batch_size=4 * mini_batch_size`, this would just mean that, we only pause generation + in-flight-weight-update + resume generation every 4 training steps.

This shouldn't be too hard to implement in the `fully_async_trainer.py`

This is a good first issue to tackle.
- Reference: understand fully async training by reading the docs here and the papers it links to (PipelineRL and AReal)
  - https://docs.skyrl.ai/docs/tutorials/fully_async
- Hardware: 4xL4 or equiv. for testing, 2xL4 or equiv. for dev should be enough
- Verification: a fully async gsm8k run with the new config suffices

Motivation is GLM-5 paper 4.1.1:
> To reduce policy lag and keep the training approximately on-policy, the model weights used by the rollout engine are periodically synchronized with those of the training engine. The training engine updates the model parameters and pushes the new weights back to the inference engine every K gradient updates.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[train][FullyAsync] Support customizable weight sync frequency #1205

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[train][FullyAsync] Support customizable weight sync frequency #1205

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions