[LTX-2] Fix flash attention shard_map for sequence lengths not divisible by context mesh axis by mbohlool · Pull Request #363 · AI-Hypercomputer/maxdiffusion

mbohlool · 2026-03-24T16:59:44Z

Description:

When the sequence length (e.g., audio tokens) is not evenly divisible by the context mesh axis size, shard_map in _tpu_flash_attention raises a ValueError because it cannot partition the array evenly across devices.

For example, LTX-2 with 121 frames at 24 fps produces 126 audio latent tokens. On an 8-device context axis, 126 is not divisible by 8, causing the failure.

The existing _pad_data_for_flash already pads sequences for flash block-size alignment inside shard_map, but the shard_map itself requires even partitioning before entry.

This fix pads query/key/value sequence dimensions to the nearest multiple of the context mesh axis size before shard_map, and trims the output back to the original length afterward. Segment-ID masking inside wrap_flash_attention ensures padded positions do not affect attention results.

github-actions · 2026-03-24T16:59:53Z

e2e testgrid: https://8bcf50593faf4ea38060e236169827e5-dot-us-central1.composer.googleusercontent.com/dags/maxdiffusion_tpu_e2e/grid

src/maxdiffusion/models/attention_flax.py

…ontext mesh axis

mbohlool · 2026-03-25T18:37:47Z

@entrpn PTAL

mbohlool requested a review from entrpn as a code owner March 24, 2026 16:59

mbohlool requested a review from prishajain1 March 24, 2026 17:00

mbohlool changed the title ~~Fix flash attention shard_map for sequence lengths not divisible by context mesh axis~~ [LTX-2] Fix flash attention shard_map for sequence lengths not divisible by context mesh axis Mar 24, 2026

mbohlool force-pushed the ltx2_pipeline_fix branch from e22ec2c to b8296b2 Compare March 24, 2026 17:25

entrpn reviewed Mar 24, 2026

View reviewed changes

src/maxdiffusion/models/attention_flax.py Outdated Show resolved Hide resolved

Fix flash attention shard_map for sequence lengths not divisible by c…

9697900

…ontext mesh axis

mbohlool force-pushed the ltx2_pipeline_fix branch from b8296b2 to 9697900 Compare March 25, 2026 18:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LTX-2] Fix flash attention shard_map for sequence lengths not divisible by context mesh axis#363

[LTX-2] Fix flash attention shard_map for sequence lengths not divisible by context mesh axis#363
mbohlool wants to merge 1 commit intomainfrom
ltx2_pipeline_fix

mbohlool commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

Uh oh!

mbohlool commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mbohlool commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

Uh oh!

mbohlool commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants