Skip to content

[Feature] Add BSA (Bidirectional Sparse Attention) inference backend#1174

Merged
mergify[bot] merged 7 commits intohao-ai-lab:mainfrom
Satyam-53:bsa-attention
Apr 5, 2026
Merged

[Feature] Add BSA (Bidirectional Sparse Attention) inference backend#1174
mergify[bot] merged 7 commits intohao-ai-lab:mainfrom
Satyam-53:bsa-attention

Conversation

@Satyam-53
Copy link
Copy Markdown
Contributor

Purpose

Adds Bidirectional Sparse Attention (BSA) as a new attention backend for
training-free inference, implementing the method from
"Bidirectional Sparse Attention for Faster Video Diffusion Training".

Related: #803

Changes

Files added:

  • fastvideo/attention/backends/bsa_attn.py — backend, metadata, builder, impl
  • tests/test_bsa.py — 19 unit tests

Files modified:

  • fastvideo/platforms/interface.py — added BSA_ATTN enum
  • fastvideo/platforms/cuda.py — added BSA dispatch case
  • fastvideo/pipelines/stages/denoising.py — added BSA to supported backends
  • fastvideo/configs/models/dits/base.py — added BSA to supported backends

##Test Plan

# Unit tests (Mac, CPU)
python -m pytest tests/test_bsa.py -v

# End-to-end video generation (RTX 4090)
export FASTVIDEO_ATTENTION_BACKEND=BSA_ATTN
python generate.py  # Wan-AI/Wan2.1-T2V-1.3B-Diffusers, 50 steps
FASTVIDEO_ATTENTION_BACKEND=BSA_ATTN python -c "
from fastvideo import VideoGenerator
generator = VideoGenerator.from_pretrained('Wan-AI/Wan2.1-T2V-1.3B-Diffusers', num_gpus=1)
generator.generate_video('A dog walking in a garden', output_path='./output_videos/', save_video=True)
"

Test Results

Unit tests: 19/19 passed

End-to-end inference on RTX 4090 with Wan2.1-1.3B:

  • Backend confirmed active: Selected backend: AttentionBackendEnum.BSA_ATTN
  • 50 denoising steps completed successfully
  • Output video is coherent (dog walking in garden, correct prompt adherence)
  • Generation time: ~10 min (expected — pure PyTorch reference, no kernel optimization)

Current limitations

  • Pure PyTorch reference implementation (loops over query blocks) —
    not optimized for wall-clock speed yet
  • KV mask assumed uniform across batch/head dimension

Follow-up work

  • Replace loop with batched FlashAttention or block-sparse kernel
  • Add BSA-specific CLI args (query_keep_ratio, kv_threshold)
  • Benchmark quality vs full attention (VBench comparison)
  • Training support with differentiable query pruning

Checklist

  • I ran pre-commit run --all-files and fixed all issues
  • I added or updated tests for my changes
  • I updated documentation if needed
  • I considered GPU memory impact of my changes

For model/pipeline changes, also check:

  • N/A — new backend addition, no existing model behavior changed
  • I verified SSIM regression tests pass
  • I updated the support matrix if adding a new model

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates Bidirectional Sparse Attention (BSA) into the FastVideo framework, providing a novel attention backend designed to accelerate video diffusion inference. By selectively pruning redundant queries and key-value blocks, BSA aims to improve computational efficiency without requiring model retraining. This initial implementation serves as a functional reference, laying the groundwork for future performance optimizations.

Highlights

  • New Attention Backend: Introduced Bidirectional Sparse Attention (BSA) as a new attention backend for training-free inference in video diffusion models.
  • Methodology: Implemented the BSA method based on the paper "Bidirectional Sparse Attention for Faster Video Diffusion Training" (arXiv:2509.01085), which sparsifies both queries and key-value pairs.
  • Compatibility: Enabled BSA to work with any model trained with full attention by applying sparsity at inference time, without requiring retraining.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome to FastVideo! Thanks for your first pull request.

How our CI works:

PRs run a two-tier CI system:

  1. Pre-commit — formatting (yapf), linting (ruff), type checking (mypy). Runs immediately on every PR.
  2. Fastcheck — core GPU tests (encoders, VAEs, transformers, kernels, unit tests). Runs automatically via Buildkite on relevant file changes (~10-15 min).
  3. Full Suite — integration tests, training pipelines, SSIM regression. Runs only when a reviewer adds the ready label.

Before your PR is reviewed:

  • pre-commit run --all-files passes locally
  • You've added or updated tests for your changes
  • The PR description explains what and why

If pre-commit fails, a bot comment will explain how to fix it. Fastcheck and Full Suite results appear in the Checks section below.

Useful links:

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new attention backend, Bidirectional Sparse Attention (BSA), as a pure PyTorch reference implementation. The changes include the core BSA logic, integration into the attention backend dispatch system, and a suite of unit tests. My review focuses on the correctness and potential performance issues of the new implementation, as well as the completeness of the tests.

I've identified some critical correctness issues related to assumptions about uniform sparsity patterns across batches and heads, which could lead to incorrect results in more general scenarios. I've also pointed out an opportunity for performance improvement in the reconstruction logic and a minor code safety improvement. Additionally, the test suite could be expanded to cover the core attention computation and multi-batch scenarios to ensure robustness.

Comment thread fastvideo/attention/backends/bsa_attn.py Outdated
Comment thread fastvideo/attention/backends/bsa_attn.py Outdated
Comment thread tests/test_bsa.py
Comment thread fastvideo/attention/backends/bsa_attn.py Outdated
Comment thread fastvideo/attention/backends/bsa_attn.py Outdated
@Satyam-53 Satyam-53 marked this pull request as ready for review March 21, 2026 00:05
@Satyam-53 Satyam-53 marked this pull request as draft March 21, 2026 00:05
@SolitaryThinker SolitaryThinker requested a review from alexzms March 27, 2026 03:08
@alexzms alexzms marked this pull request as ready for review March 27, 2026 03:08
@SolitaryThinker SolitaryThinker added the ready PR is ready to merge label Mar 27, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2026

Buildkite CI tests failed

Hi @Satyam-53, some Buildkite CI tests have failed. Check the build for details:
View Buildkite build →

Common causes:

  • Test failures: Check the failing step's output for assertion errors or tracebacks
  • Import errors: Make sure new dependencies are added to pyproject.toml
  • GPU memory: Some tests require specific GPU types (L40S, H100 NVL)
  • Kernel build: If you changed fastvideo-kernel/, the build may have failed

If the failure is unrelated to your changes, leave a comment explaining why.

@mergify mergify bot added type: feat New feature or capability scope: inference Inference pipeline, serving, CLI scope: attention Attention backends (VSA, STA, Flash, etc.) labels Mar 30, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 PR merge requirements

Wonderful, this rule succeeded.
  • #approved-reviews-by>=1
  • check-success=fastcheck-passed
  • check-success=full-suite-passed
  • check-success~=pre-commit
  • title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model)\]

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2026

Pre-commit checks failed

Hi @Satyam-53, the pre-commit checks have failed. To fix them locally:

# Install pre-commit if you haven't already
uv pip install pre-commit
pre-commit install

# Run all checks and auto-fix what's possible
pre-commit run --all-files

Common fixes:

  • yapf: yapf -i <file> (formatting)
  • ruff: ruff check --fix <file> (linting)
  • codespell: codespell --write-changes <file> (spelling)

After fixing, commit and push the changes. The checks will re-run automatically.

For future commits, pre-commit will run automatically on changed files before each commit.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2026

Buildkite CI tests failed

Hi @Satyam-53, some Buildkite CI tests have failed. Check the build for details:
View Buildkite build →

Common causes:

  • Test failures: Check the failing step's output for assertion errors or tracebacks
  • Import errors: Make sure new dependencies are added to pyproject.toml
  • GPU memory: Some tests require specific GPU types (L40S, H100 NVL)
  • Kernel build: If you changed fastvideo-kernel/, the build may have failed

If the failure is unrelated to your changes, leave a comment explaining why.

@Eigensystem Eigensystem removed the tests label Mar 30, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2026

❌ CI tests failed

@Satyam-53 — to see what failed:

  1. Scroll to the Checks section below
  2. Find the check marked with ❌ (e.g. buildkite/ci/microscope-transformer-tests)
  3. Click Details to view the full build log

Or view all builds for this branch on Buildkite →

Common causes:

  • Assertion error / test failure — check the failing test's traceback
  • Import error — new dependency missing from pyproject.toml
  • OOM — some tests need specific GPUs (L40S, H100 NVL)

If the failure looks unrelated to your changes, comment why and a maintainer will review.

@alexzms
Copy link
Copy Markdown
Collaborator

alexzms commented Mar 31, 2026

Just left a round of reviewing comment, also could you confirmed that the e2e pipeline work for BSA?

@Satyam-53
Copy link
Copy Markdown
Contributor Author

Just left a round of reviewing comment, also could you confirmed that the e2e pipeline work for BSA?

Yes I generated videos with the BSA_ATTN backend and it is able to successfully generate videos on the stock wan checkpoint model "Wan-AI/Wan2.1-T2V-1.3B-Diffusers". However, this BSA is just the inference time working on the model trained with full attention.

@Eigensystem Eigensystem removed the ready PR is ready to merge label Apr 2, 2026
@Eigensystem
Copy link
Copy Markdown
Collaborator

@Mergifyio rebase

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Apr 2, 2026

rebase

✅ Branch has been successfully rebased

Copy link
Copy Markdown
Collaborator

@Eigensystem Eigensystem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great. Could you add an end to end test for supporting this inference backend on a specific model that we supported and check the generation quality? You can also compare the performance and results generated by the backend and the FA, and post them here.

@Satyam-53
Copy link
Copy Markdown
Contributor Author

Overall looks great. Could you add an end to end test for supporting this inference backend on a specific model that we supported and check the generation quality? You can also compare the performance and results generated by the backend and the FA, and post them here.

@Eigensystem I added an end to end test_bsa_inference.py file at fastvideo/tests/inference/bsa/
Also the generation quality matches with the full attention. However I could not see the speedup in generation since the BSA is inference time only and there is an overhead to BSA as well. The actual speedup should be in the training and the BSA + distillation inference.

here are the video generation logs from both the full and BSA attention.

BSA ATTN-

Trying FASTVIDEO_ATTENTION_BACKEND=BSA_ATTN
(Worker pid=1749557) INFO 04-04 21:28:18.641 [cuda.py:118] Selected backend: AttentionBackendEnum.BSA_ATTN
(Worker pid=1749557) INFO 04-04 21:28:18.647 [cuda.py:161] Using BSA Attention backend.
(Worker pid=1749557) INFO 04-04 21:28:18.647 [composed_pipeline_base.py:448] Running pipeline stages: dict_keys(['input_validation_stage', 'prompt_encoding_stage', 'conditioning_stage', 'timestep_preparation_stage', 'latent_preparation_stage', 'denoising_stage', 'decoding_stage'])
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [05:11<00:00,  6.23s/it]
(Worker pid=1749557) INFO 04-04 21:33:52.529 [multiproc_executor.py:656] Worker 0 starting event loop...
INFO 04-04 21:33:52.953 [video_generator.py:432] Generated successfully in 334.32 seconds
INFO 04-04 21:33:54.636 [video_generator.py:451] Saved video to outputs_video/bsa_1.3B/A majestic lion strides across the golden savanna, its powerful frame glistening under the warm afte_2.mp4
INFO 04-04 21:33:54.712 [multiproc_executor.py:316] Shutting down MultiprocExecutor...
(Worker pid=1749557) INFO 04-04 21:33:54.713 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1749557) INFO 04-04 21:33:54.749 [gpu_worker.py:89] Worker 0 shutdown complete
(Worker pid=1749557) INFO 04-04 21:33:54.750 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1749557) INFO 04-04 21:33:54.750 [gpu_worker.py:89] Worker 0 shutdown complete
INFO 04-04 21:34:01.905 [multiproc_executor.py:371] MultiprocExecutor shutdown complete

Full Attention-

Trying FASTVIDEO_ATTENTION_BACKEND=None
(Worker pid=1750178) INFO 04-04 21:51:09.743 [cuda.py:118] Selected backend: None
(Worker pid=1750178) INFO 04-04 21:51:09.746 [cuda.py:232] Cannot use FlashAttention-2 backend because the flash_attn package is not found. Make sure that flash_attn was built and installed (on by default).
(Worker pid=1750178) INFO 04-04 21:51:09.746 [cuda.py:239] Using Torch SDPA backend.
(Worker pid=1750178) INFO 04-04 21:51:09.746 [composed_pipeline_base.py:448] Running pipeline stages: dict_keys(['input_validation_stage', 'prompt_encoding_stage', 'conditioning_stage', 'timestep_preparation_stage', 'latent_preparation_stage', 'denoising_stage', 'decoding_stage'])
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [05:05<00:00,  6.11s/it]
(Worker pid=1750178) INFO 04-04 21:56:37.587 [multiproc_executor.py:656] Worker 0 starting event loop...
INFO 04-04 21:56:38.018 [video_generator.py:432] Generated successfully in 328.28 seconds
INFO 04-04 21:56:39.804 [video_generator.py:451] Saved video to outputs_video/bsa_1.3B/A majestic lion strides across the golden savanna, its powerful frame glistening under the warm afte_3.mp4
INFO 04-04 21:56:39.877 [multiproc_executor.py:316] Shutting down MultiprocExecutor...
(Worker pid=1750178) INFO 04-04 21:56:39.878 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1750178) INFO 04-04 21:56:39.899 [gpu_worker.py:89] Worker 0 shutdown complete
(Worker pid=1750178) INFO 04-04 21:56:39.900 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1750178) INFO 04-04 21:56:39.900 [gpu_worker.py:89] Worker 0 shutdown complete
INFO 04-04 21:56:46.796 [multiproc_executor.py:371] MultiprocExecutor shutdown complete

@Satyam-53 Satyam-53 requested a review from Eigensystem April 4, 2026 22:10
@Eigensystem
Copy link
Copy Markdown
Collaborator

Overall looks great. Could you add an end to end test for supporting this inference backend on a specific model that we supported and check the generation quality? You can also compare the performance and results generated by the backend and the FA, and post them here.

@Eigensystem I added an end to end test_bsa_inference.py file at fastvideo/tests/inference/bsa/ Also the generation quality matches with the full attention. However I could not see the speedup in generation since the BSA is inference time only and there is an overhead to BSA as well. The actual speedup should be in the training and the BSA + distillation inference.

here are the video generation logs from both the full and BSA attention.

BSA ATTN-

Trying FASTVIDEO_ATTENTION_BACKEND=BSA_ATTN
(Worker pid=1749557) INFO 04-04 21:28:18.641 [cuda.py:118] Selected backend: AttentionBackendEnum.BSA_ATTN
(Worker pid=1749557) INFO 04-04 21:28:18.647 [cuda.py:161] Using BSA Attention backend.
(Worker pid=1749557) INFO 04-04 21:28:18.647 [composed_pipeline_base.py:448] Running pipeline stages: dict_keys(['input_validation_stage', 'prompt_encoding_stage', 'conditioning_stage', 'timestep_preparation_stage', 'latent_preparation_stage', 'denoising_stage', 'decoding_stage'])
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [05:11<00:00,  6.23s/it]
(Worker pid=1749557) INFO 04-04 21:33:52.529 [multiproc_executor.py:656] Worker 0 starting event loop...
INFO 04-04 21:33:52.953 [video_generator.py:432] Generated successfully in 334.32 seconds
INFO 04-04 21:33:54.636 [video_generator.py:451] Saved video to outputs_video/bsa_1.3B/A majestic lion strides across the golden savanna, its powerful frame glistening under the warm afte_2.mp4
INFO 04-04 21:33:54.712 [multiproc_executor.py:316] Shutting down MultiprocExecutor...
(Worker pid=1749557) INFO 04-04 21:33:54.713 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1749557) INFO 04-04 21:33:54.749 [gpu_worker.py:89] Worker 0 shutdown complete
(Worker pid=1749557) INFO 04-04 21:33:54.750 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1749557) INFO 04-04 21:33:54.750 [gpu_worker.py:89] Worker 0 shutdown complete
INFO 04-04 21:34:01.905 [multiproc_executor.py:371] MultiprocExecutor shutdown complete

Full Attention-

Trying FASTVIDEO_ATTENTION_BACKEND=None
(Worker pid=1750178) INFO 04-04 21:51:09.743 [cuda.py:118] Selected backend: None
(Worker pid=1750178) INFO 04-04 21:51:09.746 [cuda.py:232] Cannot use FlashAttention-2 backend because the flash_attn package is not found. Make sure that flash_attn was built and installed (on by default).
(Worker pid=1750178) INFO 04-04 21:51:09.746 [cuda.py:239] Using Torch SDPA backend.
(Worker pid=1750178) INFO 04-04 21:51:09.746 [composed_pipeline_base.py:448] Running pipeline stages: dict_keys(['input_validation_stage', 'prompt_encoding_stage', 'conditioning_stage', 'timestep_preparation_stage', 'latent_preparation_stage', 'denoising_stage', 'decoding_stage'])
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [05:05<00:00,  6.11s/it]
(Worker pid=1750178) INFO 04-04 21:56:37.587 [multiproc_executor.py:656] Worker 0 starting event loop...
INFO 04-04 21:56:38.018 [video_generator.py:432] Generated successfully in 328.28 seconds
INFO 04-04 21:56:39.804 [video_generator.py:451] Saved video to outputs_video/bsa_1.3B/A majestic lion strides across the golden savanna, its powerful frame glistening under the warm afte_3.mp4
INFO 04-04 21:56:39.877 [multiproc_executor.py:316] Shutting down MultiprocExecutor...
(Worker pid=1750178) INFO 04-04 21:56:39.878 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1750178) INFO 04-04 21:56:39.899 [gpu_worker.py:89] Worker 0 shutdown complete
(Worker pid=1750178) INFO 04-04 21:56:39.900 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1750178) INFO 04-04 21:56:39.900 [gpu_worker.py:89] Worker 0 shutdown complete
INFO 04-04 21:56:46.796 [multiproc_executor.py:371] MultiprocExecutor shutdown complete

Could you post these two generated video here? BTW, you should rebase to main instead of cherrypick all the commits from main. Thank you.

@Satyam-53
Copy link
Copy Markdown
Contributor Author

Overall looks great. Could you add an end to end test for supporting this inference backend on a specific model that we supported and check the generation quality? You can also compare the performance and results generated by the backend and the FA, and post them here.

@Eigensystem I added an end to end test_bsa_inference.py file at fastvideo/tests/inference/bsa/ Also the generation quality matches with the full attention. However I could not see the speedup in generation since the BSA is inference time only and there is an overhead to BSA as well. The actual speedup should be in the training and the BSA + distillation inference.
here are the video generation logs from both the full and BSA attention.
BSA ATTN-

Trying FASTVIDEO_ATTENTION_BACKEND=BSA_ATTN
(Worker pid=1749557) INFO 04-04 21:28:18.641 [cuda.py:118] Selected backend: AttentionBackendEnum.BSA_ATTN
(Worker pid=1749557) INFO 04-04 21:28:18.647 [cuda.py:161] Using BSA Attention backend.
(Worker pid=1749557) INFO 04-04 21:28:18.647 [composed_pipeline_base.py:448] Running pipeline stages: dict_keys(['input_validation_stage', 'prompt_encoding_stage', 'conditioning_stage', 'timestep_preparation_stage', 'latent_preparation_stage', 'denoising_stage', 'decoding_stage'])
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [05:11<00:00,  6.23s/it]
(Worker pid=1749557) INFO 04-04 21:33:52.529 [multiproc_executor.py:656] Worker 0 starting event loop...
INFO 04-04 21:33:52.953 [video_generator.py:432] Generated successfully in 334.32 seconds
INFO 04-04 21:33:54.636 [video_generator.py:451] Saved video to outputs_video/bsa_1.3B/A majestic lion strides across the golden savanna, its powerful frame glistening under the warm afte_2.mp4
INFO 04-04 21:33:54.712 [multiproc_executor.py:316] Shutting down MultiprocExecutor...
(Worker pid=1749557) INFO 04-04 21:33:54.713 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1749557) INFO 04-04 21:33:54.749 [gpu_worker.py:89] Worker 0 shutdown complete
(Worker pid=1749557) INFO 04-04 21:33:54.750 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1749557) INFO 04-04 21:33:54.750 [gpu_worker.py:89] Worker 0 shutdown complete
INFO 04-04 21:34:01.905 [multiproc_executor.py:371] MultiprocExecutor shutdown complete

Full Attention-

Trying FASTVIDEO_ATTENTION_BACKEND=None
(Worker pid=1750178) INFO 04-04 21:51:09.743 [cuda.py:118] Selected backend: None
(Worker pid=1750178) INFO 04-04 21:51:09.746 [cuda.py:232] Cannot use FlashAttention-2 backend because the flash_attn package is not found. Make sure that flash_attn was built and installed (on by default).
(Worker pid=1750178) INFO 04-04 21:51:09.746 [cuda.py:239] Using Torch SDPA backend.
(Worker pid=1750178) INFO 04-04 21:51:09.746 [composed_pipeline_base.py:448] Running pipeline stages: dict_keys(['input_validation_stage', 'prompt_encoding_stage', 'conditioning_stage', 'timestep_preparation_stage', 'latent_preparation_stage', 'denoising_stage', 'decoding_stage'])
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [05:05<00:00,  6.11s/it]
(Worker pid=1750178) INFO 04-04 21:56:37.587 [multiproc_executor.py:656] Worker 0 starting event loop...
INFO 04-04 21:56:38.018 [video_generator.py:432] Generated successfully in 328.28 seconds
INFO 04-04 21:56:39.804 [video_generator.py:451] Saved video to outputs_video/bsa_1.3B/A majestic lion strides across the golden savanna, its powerful frame glistening under the warm afte_3.mp4
INFO 04-04 21:56:39.877 [multiproc_executor.py:316] Shutting down MultiprocExecutor...
(Worker pid=1750178) INFO 04-04 21:56:39.878 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1750178) INFO 04-04 21:56:39.899 [gpu_worker.py:89] Worker 0 shutdown complete
(Worker pid=1750178) INFO 04-04 21:56:39.900 [gpu_worker.py:80] Worker 0 shutting down...
(Worker pid=1750178) INFO 04-04 21:56:39.900 [gpu_worker.py:89] Worker 0 shutdown complete
INFO 04-04 21:56:46.796 [multiproc_executor.py:371] MultiprocExecutor shutdown complete

Could you post these two generated video here? BTW, you should rebase to main instead of cherrypick all the commits from main. Thank you.

BSA_ATTN video-
https://github.com/user-attachments/assets/1f6b5a84-1bda-42aa-9286-775f2be0a5c3

Full Attention video-
https://github.com/user-attachments/assets/5b0a1b45-3c92-41d5-ba29-762659bec64a

Sure, I will do rebase from main.

@Eigensystem
Copy link
Copy Markdown
Collaborator

/merge

@github-actions github-actions bot added the ready PR is ready to merge label Apr 4, 2026
@Eigensystem Eigensystem removed scope: training Training pipeline, methods, configs scope: kernel CUDA kernels, fastvideo-kernel scope: data Data preprocessing, datasets scope: infra CI, tests, Docker, build scope: docs Documentation scope: ui Job Runner UI scope: model Model architecture (DiTs, encoders, VAEs) labels Apr 4, 2026
@mergify mergify bot added the scope: infra CI, tests, Docker, build label Apr 4, 2026
@Satyam-53
Copy link
Copy Markdown
Contributor Author

@Eigensystem can you please retrigger the failed tests?

@Eigensystem
Copy link
Copy Markdown
Collaborator

/test transformer

@mergify mergify bot merged commit f6e65ff into hao-ai-lab:main Apr 5, 2026
19 checks passed
@Eigensystem
Copy link
Copy Markdown
Collaborator

Eigensystem commented Apr 5, 2026

Thank you @Satyam-53 . Could you also try to implement the training part of BSA?

@Satyam-53
Copy link
Copy Markdown
Contributor Author

Satyam-53 commented Apr 5, 2026

Yeah Sure @Eigensystem , but I would require GPU resources to work on the same.

shijiew555 pushed a commit to Gary-ChenJL/FastVideo that referenced this pull request Apr 8, 2026
…ao-ai-lab#1174)

Co-authored-by: Satyam Srivastava <satyam53@Mac.lan1>
Co-authored-by: Satyam Srivastava <satyam53@Satyams-MacBook-Air.local>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
alexzms pushed a commit to FoundationResearch/FastVideo that referenced this pull request Apr 8, 2026
…ao-ai-lab#1174)

Co-authored-by: Satyam Srivastava <satyam53@Mac.lan1>
Co-authored-by: Satyam Srivastava <satyam53@Satyams-MacBook-Air.local>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready PR is ready to merge scope: attention Attention backends (VSA, STA, Flash, etc.) scope: inference Inference pipeline, serving, CLI scope: infra CI, tests, Docker, build type: feat New feature or capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants