[Feature] Add BSA (Bidirectional Sparse Attention) inference backend#1174
[Feature] Add BSA (Bidirectional Sparse Attention) inference backend#1174mergify[bot] merged 7 commits intohao-ai-lab:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates Bidirectional Sparse Attention (BSA) into the FastVideo framework, providing a novel attention backend designed to accelerate video diffusion inference. By selectively pruning redundant queries and key-value blocks, BSA aims to improve computational efficiency without requiring model retraining. This initial implementation serves as a functional reference, laying the groundwork for future performance optimizations. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Welcome to FastVideo! Thanks for your first pull request.
How our CI works:
PRs run a two-tier CI system:
- Pre-commit — formatting (yapf), linting (ruff), type checking (mypy). Runs immediately on every PR.
- Fastcheck — core GPU tests (encoders, VAEs, transformers, kernels, unit tests). Runs automatically via Buildkite on relevant file changes (~10-15 min).
- Full Suite — integration tests, training pipelines, SSIM regression. Runs only when a reviewer adds the
readylabel.
Before your PR is reviewed:
-
pre-commit run --all-filespasses locally - You've added or updated tests for your changes
- The PR description explains what and why
If pre-commit fails, a bot comment will explain how to fix it. Fastcheck and Full Suite results appear in the Checks section below.
Useful links:
There was a problem hiding this comment.
Code Review
This pull request introduces a new attention backend, Bidirectional Sparse Attention (BSA), as a pure PyTorch reference implementation. The changes include the core BSA logic, integration into the attention backend dispatch system, and a suite of unit tests. My review focuses on the correctness and potential performance issues of the new implementation, as well as the completeness of the tests.
I've identified some critical correctness issues related to assumptions about uniform sparsity patterns across batches and heads, which could lead to incorrect results in more general scenarios. I've also pointed out an opportunity for performance improvement in the reconstruction logic and a minor code safety improvement. Additionally, the test suite could be expanded to cover the core attention computation and multi-batch scenarios to ensure robustness.
Buildkite CI tests failedHi @Satyam-53, some Buildkite CI tests have failed. Check the build for details: Common causes:
If the failure is unrelated to your changes, leave a comment explaining why. |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 PR merge requirementsWonderful, this rule succeeded.
|
Pre-commit checks failedHi @Satyam-53, the pre-commit checks have failed. To fix them locally: # Install pre-commit if you haven't already
uv pip install pre-commit
pre-commit install
# Run all checks and auto-fix what's possible
pre-commit run --all-filesCommon fixes:
After fixing, commit and push the changes. The checks will re-run automatically. For future commits, |
Buildkite CI tests failedHi @Satyam-53, some Buildkite CI tests have failed. Check the build for details: Common causes:
If the failure is unrelated to your changes, leave a comment explaining why. |
❌ CI tests failed@Satyam-53 — to see what failed:
Or view all builds for this branch on Buildkite → Common causes:
If the failure looks unrelated to your changes, comment why and a maintainer will review. |
|
Just left a round of reviewing comment, also could you confirmed that the e2e pipeline work for BSA? |
Yes I generated videos with the BSA_ATTN backend and it is able to successfully generate videos on the stock wan checkpoint model "Wan-AI/Wan2.1-T2V-1.3B-Diffusers". However, this BSA is just the inference time working on the model trained with full attention. |
|
@Mergifyio rebase |
✅ Branch has been successfully rebased |
4ecb718 to
47fedbc
Compare
There was a problem hiding this comment.
Overall looks great. Could you add an end to end test for supporting this inference backend on a specific model that we supported and check the generation quality? You can also compare the performance and results generated by the backend and the FA, and post them here.
@Eigensystem I added an end to end test_bsa_inference.py file at fastvideo/tests/inference/bsa/ here are the video generation logs from both the full and BSA attention. BSA ATTN- Full Attention- |
Could you post these two generated video here? BTW, you should rebase to |
BSA_ATTN video- Full Attention video- Sure, I will do rebase from main. |
…ion and impl.forward
|
/merge |
|
@Eigensystem can you please retrigger the failed tests? |
|
/test transformer |
|
Thank you @Satyam-53 . Could you also try to implement the training part of BSA? |
|
Yeah Sure @Eigensystem , but I would require GPU resources to work on the same. |
…ao-ai-lab#1174) Co-authored-by: Satyam Srivastava <satyam53@Mac.lan1> Co-authored-by: Satyam Srivastava <satyam53@Satyams-MacBook-Air.local> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…ao-ai-lab#1174) Co-authored-by: Satyam Srivastava <satyam53@Mac.lan1> Co-authored-by: Satyam Srivastava <satyam53@Satyams-MacBook-Air.local> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Purpose
Adds Bidirectional Sparse Attention (BSA) as a new attention backend for
training-free inference, implementing the method from
"Bidirectional Sparse Attention for Faster Video Diffusion Training".
Related: #803
Changes
Files added:
fastvideo/attention/backends/bsa_attn.py— backend, metadata, builder, impltests/test_bsa.py— 19 unit testsFiles modified:
fastvideo/platforms/interface.py— addedBSA_ATTNenumfastvideo/platforms/cuda.py— added BSA dispatch casefastvideo/pipelines/stages/denoising.py— added BSA to supported backendsfastvideo/configs/models/dits/base.py— added BSA to supported backends##Test Plan
Test Results
Unit tests: 19/19 passed
End-to-end inference on RTX 4090 with Wan2.1-1.3B:
Selected backend: AttentionBackendEnum.BSA_ATTNCurrent limitations
not optimized for wall-clock speed yet
Follow-up work
Checklist
pre-commit run --all-filesand fixed all issuesFor model/pipeline changes, also check: