[Feature] Add BSA (Bidirectional Sparse Attention) inference backend by Satyam-53 · Pull Request #1174 · hao-ai-lab/FastVideo

Satyam-53 · 2026-03-18T22:33:48Z

Purpose

Adds Bidirectional Sparse Attention (BSA) as a new attention backend for
training-free inference, implementing the method from
"Bidirectional Sparse Attention for Faster Video Diffusion Training".

Related: #803

Changes

Files added:

fastvideo/attention/backends/bsa_attn.py — backend, metadata, builder, impl
tests/test_bsa.py — 19 unit tests

Files modified:

fastvideo/platforms/interface.py — added BSA_ATTN enum
fastvideo/platforms/cuda.py — added BSA dispatch case
fastvideo/pipelines/stages/denoising.py — added BSA to supported backends
fastvideo/configs/models/dits/base.py — added BSA to supported backends

##Test Plan

# Unit tests (Mac, CPU)
python -m pytest tests/test_bsa.py -v

# End-to-end video generation (RTX 4090)
export FASTVIDEO_ATTENTION_BACKEND=BSA_ATTN
python generate.py  # Wan-AI/Wan2.1-T2V-1.3B-Diffusers, 50 steps

FASTVIDEO_ATTENTION_BACKEND=BSA_ATTN python -c "
from fastvideo import VideoGenerator
generator = VideoGenerator.from_pretrained('Wan-AI/Wan2.1-T2V-1.3B-Diffusers', num_gpus=1)
generator.generate_video('A dog walking in a garden', output_path='./output_videos/', save_video=True)
"

Test Results

Unit tests: 19/19 passed

End-to-end inference on RTX 4090 with Wan2.1-1.3B:

Backend confirmed active: Selected backend: AttentionBackendEnum.BSA_ATTN
50 denoising steps completed successfully
Output video is coherent (dog walking in garden, correct prompt adherence)
Generation time: ~10 min (expected — pure PyTorch reference, no kernel optimization)

Current limitations

Pure PyTorch reference implementation (loops over query blocks) —
not optimized for wall-clock speed yet
KV mask assumed uniform across batch/head dimension

Follow-up work

Replace loop with batched FlashAttention or block-sparse kernel
Add BSA-specific CLI args (query_keep_ratio, kv_threshold)
Benchmark quality vs full attention (VBench comparison)
Training support with differentiable query pruning

Checklist

I ran pre-commit run --all-files and fixed all issues
I added or updated tests for my changes
I updated documentation if needed
I considered GPU memory impact of my changes

For model/pipeline changes, also check:

N/A — new backend addition, no existing model behavior changed
I verified SSIM regression tests pass
I updated the support matrix if adding a new model

gemini-code-assist · 2026-03-18T22:34:08Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates Bidirectional Sparse Attention (BSA) into the FastVideo framework, providing a novel attention backend designed to accelerate video diffusion inference. By selectively pruning redundant queries and key-value blocks, BSA aims to improve computational efficiency without requiring model retraining. This initial implementation serves as a functional reference, laying the groundwork for future performance optimizations.

Highlights

New Attention Backend: Introduced Bidirectional Sparse Attention (BSA) as a new attention backend for training-free inference in video diffusion models.
Methodology: Implemented the BSA method based on the paper "Bidirectional Sparse Attention for Faster Video Diffusion Training" (arXiv:2509.01085), which sparsifies both queries and key-value pairs.
Compatibility: Enabled BSA to work with any model trained with full attention by applying sparsity at inference time, without requiring retraining.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions

Welcome to FastVideo! Thanks for your first pull request.

How our CI works:

PRs run a two-tier CI system:

Pre-commit — formatting (yapf), linting (ruff), type checking (mypy). Runs immediately on every PR.
Fastcheck — core GPU tests (encoders, VAEs, transformers, kernels, unit tests). Runs automatically via Buildkite on relevant file changes (~10-15 min).
Full Suite — integration tests, training pipelines, SSIM regression. Runs only when a reviewer adds the ready label.

Before your PR is reviewed:

pre-commit run --all-files passes locally
You've added or updated tests for your changes
The PR description explains what and why

If pre-commit fails, a bot comment will explain how to fix it. Fastcheck and Full Suite results appear in the Checks section below.

Useful links:

gemini-code-assist

Code Review

This pull request introduces a new attention backend, Bidirectional Sparse Attention (BSA), as a pure PyTorch reference implementation. The changes include the core BSA logic, integration into the attention backend dispatch system, and a suite of unit tests. My review focuses on the correctness and potential performance issues of the new implementation, as well as the completeness of the tests.

I've identified some critical correctness issues related to assumptions about uniform sparsity patterns across batches and heads, which could lead to incorrect results in more general scenarios. I've also pointed out an opportunity for performance improvement in the reconstruction logic and a minor code safety improvement. Additionally, the test suite could be expanded to cover the core attention computation and multi-batch scenarios to ensure robustness.

mergify · 2026-03-30T00:02:09Z

Buildkite CI tests failed

Hi @Satyam-53, some Buildkite CI tests have failed. Check the build for details:
View Buildkite build →

Common causes:

Test failures: Check the failing step's output for assertion errors or tracebacks
Import errors: Make sure new dependencies are added to pyproject.toml
GPU memory: Some tests require specific GPU types (L40S, H100 NVL)
Kernel build: If you changed fastvideo-kernel/, the build may have failed

If the failure is unrelated to your changes, leave a comment explaining why.

mergify · 2026-03-30T00:24:01Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 PR merge requirements

Wonderful, this rule succeeded.

#approved-reviews-by>=1
check-success=fastcheck-passed
check-success=full-suite-passed
check-success~=pre-commit
title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model)\]

mergify · 2026-03-30T00:24:52Z

Pre-commit checks failed

Hi @Satyam-53, the pre-commit checks have failed. To fix them locally:

# Install pre-commit if you haven't already
uv pip install pre-commit
pre-commit install

# Run all checks and auto-fix what's possible
pre-commit run --all-files

Common fixes:

yapf: yapf -i <file> (formatting)
ruff: ruff check --fix <file> (linting)
codespell: codespell --write-changes <file> (spelling)

After fixing, commit and push the changes. The checks will re-run automatically.

For future commits, pre-commit will run automatically on changed files before each commit.

mergify · 2026-03-30T01:14:00Z

Buildkite CI tests failed

Hi @Satyam-53, some Buildkite CI tests have failed. Check the build for details:
View Buildkite build →

Common causes:

Test failures: Check the failing step's output for assertion errors or tracebacks
Import errors: Make sure new dependencies are added to pyproject.toml
GPU memory: Some tests require specific GPU types (L40S, H100 NVL)
Kernel build: If you changed fastvideo-kernel/, the build may have failed

If the failure is unrelated to your changes, leave a comment explaining why.

mergify · 2026-03-30T05:57:37Z

❌ CI tests failed

@Satyam-53 — to see what failed:

Scroll to the Checks section below
Find the check marked with ❌ (e.g. buildkite/ci/microscope-transformer-tests)
Click Details to view the full build log

Or view all builds for this branch on Buildkite →

Common causes:

Assertion error / test failure — check the failing test's traceback
Import error — new dependency missing from pyproject.toml
OOM — some tests need specific GPUs (L40S, H100 NVL)

If the failure looks unrelated to your changes, comment why and a maintainer will review.

alexzms · 2026-03-31T00:17:50Z

Just left a round of reviewing comment, also could you confirmed that the e2e pipeline work for BSA?

Satyam-53 · 2026-03-31T18:14:55Z

Just left a round of reviewing comment, also could you confirmed that the e2e pipeline work for BSA?

Yes I generated videos with the BSA_ATTN backend and it is able to successfully generate videos on the stock wan checkpoint model "Wan-AI/Wan2.1-T2V-1.3B-Diffusers". However, this BSA is just the inference time working on the model trained with full attention.

Eigensystem · 2026-04-02T04:49:13Z

@Mergifyio rebase

mergify · 2026-04-02T04:49:31Z

rebase

✅ Branch has been successfully rebased

Eigensystem

Overall looks great. Could you add an end to end test for supporting this inference backend on a specific model that we supported and check the generation quality? You can also compare the performance and results generated by the backend and the FA, and post them here.

Satyam-53 · 2026-04-04T22:08:54Z

Overall looks great. Could you add an end to end test for supporting this inference backend on a specific model that we supported and check the generation quality? You can also compare the performance and results generated by the backend and the FA, and post them here.

@Eigensystem I added an end to end test_bsa_inference.py file at fastvideo/tests/inference/bsa/
Also the generation quality matches with the full attention. However I could not see the speedup in generation since the BSA is inference time only and there is an overhead to BSA as well. The actual speedup should be in the training and the BSA + distillation inference.

here are the video generation logs from both the full and BSA attention.

BSA ATTN-

Trying FASTVIDEO_ATTENTION_BACKEND=BSA_ATTN
(Worker pid=1749557) INFO 04-04 21:28:18.641 [cuda.py:118] Selected backend: AttentionBackendEnum.BSA_ATTN
(Worker pid=1749557) INFO 04-04 21:28:18.647 [cuda.py:161] Using BSA Attention backend.
(Worker pid=1749557) INFO 04-04 21:28:18.647 [composed_pipeline_base.py:448] Running pipeline stages: dict_keys(['input_validation_stage', 'prompt_encoding_stage', 'conditioning_stage', 'timestep_preparation_stage', 'latent_preparation_stage', 'denoising_stage', 'decoding_stage'])
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [05:11<00:00,  6.23s/it]
(Worker pid=1749557) INFO 04-04 21:33:52.529 [multiproc_executor.py:656] Worker 0 starting event loop...
INFO 04-04 21:33:52.953 [video_generator.py:432] Generated successfully in 334.32 seconds
INFO 04-04 21:33:54.636 [video_generator.py:451] Saved video to outputs_video/bsa_1.3B/A majestic lion strides across the golden savanna, its powerful frame glistening under the warm afte_2.mp4
INFO 04-04 21:33:54.712 [multiproc_executor.py:316] Shutting down MultiprocExecutor...
(Worker pid=1749557) INFO 04-04 21:33:54.713 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1749557) INFO 04-04 21:33:54.749 [gpu_worker.py:89] Worker 0 shutdown complete
(Worker pid=1749557) INFO 04-04 21:33:54.750 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1749557) INFO 04-04 21:33:54.750 [gpu_worker.py:89] Worker 0 shutdown complete
INFO 04-04 21:34:01.905 [multiproc_executor.py:371] MultiprocExecutor shutdown complete

Full Attention-

Trying FASTVIDEO_ATTENTION_BACKEND=None
(Worker pid=1750178) INFO 04-04 21:51:09.743 [cuda.py:118] Selected backend: None
(Worker pid=1750178) INFO 04-04 21:51:09.746 [cuda.py:232] Cannot use FlashAttention-2 backend because the flash_attn package is not found. Make sure that flash_attn was built and installed (on by default).
(Worker pid=1750178) INFO 04-04 21:51:09.746 [cuda.py:239] Using Torch SDPA backend.
(Worker pid=1750178) INFO 04-04 21:51:09.746 [composed_pipeline_base.py:448] Running pipeline stages: dict_keys(['input_validation_stage', 'prompt_encoding_stage', 'conditioning_stage', 'timestep_preparation_stage', 'latent_preparation_stage', 'denoising_stage', 'decoding_stage'])
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [05:05<00:00,  6.11s/it]
(Worker pid=1750178) INFO 04-04 21:56:37.587 [multiproc_executor.py:656] Worker 0 starting event loop...
INFO 04-04 21:56:38.018 [video_generator.py:432] Generated successfully in 328.28 seconds
INFO 04-04 21:56:39.804 [video_generator.py:451] Saved video to outputs_video/bsa_1.3B/A majestic lion strides across the golden savanna, its powerful frame glistening under the warm afte_3.mp4
INFO 04-04 21:56:39.877 [multiproc_executor.py:316] Shutting down MultiprocExecutor...
(Worker pid=1750178) INFO 04-04 21:56:39.878 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1750178) INFO 04-04 21:56:39.899 [gpu_worker.py:89] Worker 0 shutdown complete
(Worker pid=1750178) INFO 04-04 21:56:39.900 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1750178) INFO 04-04 21:56:39.900 [gpu_worker.py:89] Worker 0 shutdown complete
INFO 04-04 21:56:46.796 [multiproc_executor.py:371] MultiprocExecutor shutdown complete

Eigensystem · 2026-04-04T22:16:22Z

Overall looks great. Could you add an end to end test for supporting this inference backend on a specific model that we supported and check the generation quality? You can also compare the performance and results generated by the backend and the FA, and post them here.

@Eigensystem I added an end to end test_bsa_inference.py file at fastvideo/tests/inference/bsa/ Also the generation quality matches with the full attention. However I could not see the speedup in generation since the BSA is inference time only and there is an overhead to BSA as well. The actual speedup should be in the training and the BSA + distillation inference.

here are the video generation logs from both the full and BSA attention.

BSA ATTN-

Trying FASTVIDEO_ATTENTION_BACKEND=BSA_ATTN
(Worker pid=1749557) INFO 04-04 21:28:18.641 [cuda.py:118] Selected backend: AttentionBackendEnum.BSA_ATTN
(Worker pid=1749557) INFO 04-04 21:28:18.647 [cuda.py:161] Using BSA Attention backend.
(Worker pid=1749557) INFO 04-04 21:28:18.647 [composed_pipeline_base.py:448] Running pipeline stages: dict_keys(['input_validation_stage', 'prompt_encoding_stage', 'conditioning_stage', 'timestep_preparation_stage', 'latent_preparation_stage', 'denoising_stage', 'decoding_stage'])
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [05:11<00:00,  6.23s/it]
(Worker pid=1749557) INFO 04-04 21:33:52.529 [multiproc_executor.py:656] Worker 0 starting event loop...
INFO 04-04 21:33:52.953 [video_generator.py:432] Generated successfully in 334.32 seconds
INFO 04-04 21:33:54.636 [video_generator.py:451] Saved video to outputs_video/bsa_1.3B/A majestic lion strides across the golden savanna, its powerful frame glistening under the warm afte_2.mp4
INFO 04-04 21:33:54.712 [multiproc_executor.py:316] Shutting down MultiprocExecutor...
(Worker pid=1749557) INFO 04-04 21:33:54.713 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1749557) INFO 04-04 21:33:54.749 [gpu_worker.py:89] Worker 0 shutdown complete
(Worker pid=1749557) INFO 04-04 21:33:54.750 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1749557) INFO 04-04 21:33:54.750 [gpu_worker.py:89] Worker 0 shutdown complete
INFO 04-04 21:34:01.905 [multiproc_executor.py:371] MultiprocExecutor shutdown complete

Full Attention-

Trying FASTVIDEO_ATTENTION_BACKEND=None
(Worker pid=1750178) INFO 04-04 21:51:09.743 [cuda.py:118] Selected backend: None
(Worker pid=1750178) INFO 04-04 21:51:09.746 [cuda.py:232] Cannot use FlashAttention-2 backend because the flash_attn package is not found. Make sure that flash_attn was built and installed (on by default).
(Worker pid=1750178) INFO 04-04 21:51:09.746 [cuda.py:239] Using Torch SDPA backend.
(Worker pid=1750178) INFO 04-04 21:51:09.746 [composed_pipeline_base.py:448] Running pipeline stages: dict_keys(['input_validation_stage', 'prompt_encoding_stage', 'conditioning_stage', 'timestep_preparation_stage', 'latent_preparation_stage', 'denoising_stage', 'decoding_stage'])
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [05:05<00:00,  6.11s/it]
(Worker pid=1750178) INFO 04-04 21:56:37.587 [multiproc_executor.py:656] Worker 0 starting event loop...
INFO 04-04 21:56:38.018 [video_generator.py:432] Generated successfully in 328.28 seconds
INFO 04-04 21:56:39.804 [video_generator.py:451] Saved video to outputs_video/bsa_1.3B/A majestic lion strides across the golden savanna, its powerful frame glistening under the warm afte_3.mp4
INFO 04-04 21:56:39.877 [multiproc_executor.py:316] Shutting down MultiprocExecutor...
(Worker pid=1750178) INFO 04-04 21:56:39.878 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1750178) INFO 04-04 21:56:39.899 [gpu_worker.py:89] Worker 0 shutdown complete
(Worker pid=1750178) INFO 04-04 21:56:39.900 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1750178) INFO 04-04 21:56:39.900 [gpu_worker.py:89] Worker 0 shutdown complete
INFO 04-04 21:56:46.796 [multiproc_executor.py:371] MultiprocExecutor shutdown complete

Could you post these two generated video here? BTW, you should rebase to main instead of cherrypick all the commits from main. Thank you.

Satyam-53 · 2026-04-04T22:23:06Z

Overall looks great. Could you add an end to end test for supporting this inference backend on a specific model that we supported and check the generation quality? You can also compare the performance and results generated by the backend and the FA, and post them here.

@Eigensystem I added an end to end test_bsa_inference.py file at fastvideo/tests/inference/bsa/ Also the generation quality matches with the full attention. However I could not see the speedup in generation since the BSA is inference time only and there is an overhead to BSA as well. The actual speedup should be in the training and the BSA + distillation inference.
here are the video generation logs from both the full and BSA attention.
BSA ATTN-

Trying FASTVIDEO_ATTENTION_BACKEND=BSA_ATTN
(Worker pid=1749557) INFO 04-04 21:28:18.641 [cuda.py:118] Selected backend: AttentionBackendEnum.BSA_ATTN
(Worker pid=1749557) INFO 04-04 21:28:18.647 [cuda.py:161] Using BSA Attention backend.
(Worker pid=1749557) INFO 04-04 21:28:18.647 [composed_pipeline_base.py:448] Running pipeline stages: dict_keys(['input_validation_stage', 'prompt_encoding_stage', 'conditioning_stage', 'timestep_preparation_stage', 'latent_preparation_stage', 'denoising_stage', 'decoding_stage'])
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [05:11<00:00,  6.23s/it]
(Worker pid=1749557) INFO 04-04 21:33:52.529 [multiproc_executor.py:656] Worker 0 starting event loop...
INFO 04-04 21:33:52.953 [video_generator.py:432] Generated successfully in 334.32 seconds
INFO 04-04 21:33:54.636 [video_generator.py:451] Saved video to outputs_video/bsa_1.3B/A majestic lion strides across the golden savanna, its powerful frame glistening under the warm afte_2.mp4
INFO 04-04 21:33:54.712 [multiproc_executor.py:316] Shutting down MultiprocExecutor...
(Worker pid=1749557) INFO 04-04 21:33:54.713 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1749557) INFO 04-04 21:33:54.749 [gpu_worker.py:89] Worker 0 shutdown complete
(Worker pid=1749557) INFO 04-04 21:33:54.750 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1749557) INFO 04-04 21:33:54.750 [gpu_worker.py:89] Worker 0 shutdown complete
INFO 04-04 21:34:01.905 [multiproc_executor.py:371] MultiprocExecutor shutdown complete

Full Attention-

Trying FASTVIDEO_ATTENTION_BACKEND=None
(Worker pid=1750178) INFO 04-04 21:51:09.743 [cuda.py:118] Selected backend: None
(Worker pid=1750178) INFO 04-04 21:51:09.746 [cuda.py:232] Cannot use FlashAttention-2 backend because the flash_attn package is not found. Make sure that flash_attn was built and installed (on by default).
(Worker pid=1750178) INFO 04-04 21:51:09.746 [cuda.py:239] Using Torch SDPA backend.
(Worker pid=1750178) INFO 04-04 21:51:09.746 [composed_pipeline_base.py:448] Running pipeline stages: dict_keys(['input_validation_stage', 'prompt_encoding_stage', 'conditioning_stage', 'timestep_preparation_stage', 'latent_preparation_stage', 'denoising_stage', 'decoding_stage'])
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [05:05<00:00,  6.11s/it]
(Worker pid=1750178) INFO 04-04 21:56:37.587 [multiproc_executor.py:656] Worker 0 starting event loop...
INFO 04-04 21:56:38.018 [video_generator.py:432] Generated successfully in 328.28 seconds
INFO 04-04 21:56:39.804 [video_generator.py:451] Saved video to outputs_video/bsa_1.3B/A majestic lion strides across the golden savanna, its powerful frame glistening under the warm afte_3.mp4
INFO 04-04 21:56:39.877 [multiproc_executor.py:316] Shutting down MultiprocExecutor...
(Worker pid=1750178) INFO 04-04 21:56:39.878 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1750178) INFO 04-04 21:56:39.899 [gpu_worker.py:89] Worker 0 shutdown complete
(Worker pid=1750178) INFO 04-04 21:56:39.900 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1750178) INFO 04-04 21:56:39.900 [gpu_worker.py:89] Worker 0 shutdown complete
INFO 04-04 21:56:46.796 [multiproc_executor.py:371] MultiprocExecutor shutdown complete

Could you post these two generated video here? BTW, you should rebase to main instead of cherrypick all the commits from main. Thank you.

BSA_ATTN video-
https://github.com/user-attachments/assets/1f6b5a84-1bda-42aa-9286-775f2be0a5c3

Full Attention video-
https://github.com/user-attachments/assets/5b0a1b45-3c92-41d5-ba29-762659bec64a

Sure, I will do rebase from main.

…ion and impl.forward

Eigensystem · 2026-04-04T22:57:45Z

/merge

Satyam-53 · 2026-04-05T04:46:17Z

@Eigensystem can you please retrigger the failed tests?

Eigensystem · 2026-04-05T04:48:34Z

/test transformer

Eigensystem · 2026-04-05T05:03:24Z

Thank you @Satyam-53 . Could you also try to implement the training part of BSA?

Satyam-53 · 2026-04-05T05:07:55Z

Yeah Sure @Eigensystem , but I would require GPU resources to work on the same.

…ao-ai-lab#1174) Co-authored-by: Satyam Srivastava <satyam53@Mac.lan1> Co-authored-by: Satyam Srivastava <satyam53@Satyams-MacBook-Air.local> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

github-actions bot reviewed Mar 18, 2026

View reviewed changes

Satyam-53 mentioned this pull request Mar 18, 2026

[Feature] Implement Bidirectional Sparse Attention (BSA) for Faster Video Diffusion Training #803

Open

gemini-code-assist bot reviewed Mar 18, 2026

View reviewed changes

Comment thread fastvideo/attention/backends/bsa_attn.py Outdated

Comment thread fastvideo/attention/backends/bsa_attn.py Outdated

Comment thread tests/test_bsa.py

Comment thread fastvideo/attention/backends/bsa_attn.py Outdated

Comment thread fastvideo/attention/backends/bsa_attn.py Outdated

Satyam-53 marked this pull request as ready for review March 21, 2026 00:05

Satyam-53 marked this pull request as draft March 21, 2026 00:05

SolitaryThinker requested a review from alexzms March 27, 2026 03:08

alexzms marked this pull request as ready for review March 27, 2026 03:08

SolitaryThinker added the ready PR is ready to merge label Mar 27, 2026

mergify bot added attention labels Mar 29, 2026

Eigensystem removed attention labels Mar 30, 2026

mergify bot added type: feat New feature or capability scope: inference Inference pipeline, serving, CLI scope: attention Attention backends (VSA, STA, Flash, etc.) labels Mar 30, 2026

Eigensystem removed the tests label Mar 30, 2026

Eigensystem removed the ready PR is ready to merge label Apr 2, 2026

Eigensystem force-pushed the bsa-attention branch from 4ecb718 to 47fedbc Compare April 2, 2026 04:49

Eigensystem reviewed Apr 4, 2026

View reviewed changes

Satyam-53 requested a review from Eigensystem April 4, 2026 22:10

Satyam Srivastava and others added 6 commits April 4, 2026 15:25

Add BSA attention backend

db94ded

Add BSA (Bidirectional Sparse Attention) inference backend

8135312

Address review: vectorize reconstruction, add tests for sparse attent…

8d8cc6a

…ion and impl.forward

Add Flash Attn API support to BSA for faster inference

f94baf9

Fix per head handling and other minor potential bugs

488c1db

[feat] Add end to end bsa inference test

26a7951

Satyam-53 force-pushed the bsa-attention branch from 90bc2b9 to 26a7951 Compare April 4, 2026 22:34

github-actions bot added the ready PR is ready to merge label Apr 4, 2026

Eigensystem approved these changes Apr 4, 2026

View reviewed changes

mergify bot added the scope: infra CI, tests, Docker, build label Apr 4, 2026

Merge branch 'main' into bsa-attention

6db7e2b

mergify bot merged commit f6e65ff into hao-ai-lab:main Apr 5, 2026
19 checks passed

Conversation

Satyam-53 commented Mar 18, 2026

Purpose

Changes

Test Results

Current limitations

Follow-up work

Checklist

Uh oh!

gemini-code-assist bot commented Mar 18, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Mar 30, 2026

Buildkite CI tests failed

Uh oh!

mergify bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🟢 PR merge requirements

Uh oh!

mergify bot commented Mar 30, 2026

Pre-commit checks failed

Uh oh!

mergify bot commented Mar 30, 2026

Buildkite CI tests failed

Uh oh!

mergify bot commented Mar 30, 2026

❌ CI tests failed

Uh oh!

alexzms commented Mar 31, 2026

Uh oh!

Satyam-53 commented Mar 31, 2026

Uh oh!

Eigensystem commented Apr 2, 2026

Uh oh!

mergify bot commented Apr 2, 2026

✅ Branch has been successfully rebased

Uh oh!

Eigensystem left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Satyam-53 commented Apr 4, 2026

Uh oh!

Eigensystem commented Apr 4, 2026

Uh oh!

Satyam-53 commented Apr 4, 2026

Uh oh!

Eigensystem commented Apr 4, 2026

Uh oh!

Satyam-53 commented Apr 5, 2026

Uh oh!

Eigensystem commented Apr 5, 2026

Uh oh!

Uh oh!

Eigensystem commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Satyam-53 commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

mergify bot commented Mar 30, 2026 •

edited

Loading

Eigensystem left a comment •

edited

Loading

Eigensystem commented Apr 5, 2026 •

edited

Loading

Satyam-53 commented Apr 5, 2026 •

edited

Loading