Skip to content

Latest commit

 

History

History
15 lines (8 loc) · 653 Bytes

File metadata and controls

15 lines (8 loc) · 653 Bytes

Using FSDP activation checkpointing

Activation checkpointing lets you avoid storing intermediate activations to free up memory, at the cost of re-doing compute.
However, by freeing up memory, batch sizes can scale such that the total throughput for training time can be significantly increased and thus making activation checkpointing an important training tool for maximizing the speed of model training.

FSDP checkpointing is shard aware and must be done after the FSDP init and sharding.

Details in our video here: https://www.loom.com/share/31749107033841959989aa8da45487c7

and in the notebook: activation_checkpointing_tutorial.ipynb