Skip to content

Investigate global-batch load balancing loss. #154

@avirtane-amd

Description

@avirtane-amd

Look into global-batch load balancing loss used in routing in Qwen3 MoE and see if it is implemented in some Megatron repo.

TODO: test both Megatron-LM and Megatron-Bridge CPT setups to see loss differences.

Metadata

Metadata

Assignees

Type

Projects

Status

No status

Relationships

None yet

Development

No branches or pull requests

Issue actions