fix: add gate_compress parameter to all attention backends (#817)#1182
fix: add gate_compress parameter to all attention backends (#817)#1182KyleNeverGivesUp wants to merge 3 commits intohao-ai-lab:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request standardizes the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Welcome to FastVideo! Thanks for your first pull request.
How our CI works:
PRs run a two-tier CI system:
- Pre-commit — formatting (yapf), linting (ruff), type checking (mypy). Runs immediately on every PR.
- Fastcheck — core GPU tests (encoders, VAEs, transformers, kernels, unit tests). Runs automatically via Buildkite on relevant file changes (~10-15 min).
- Full Suite — integration tests, training pipelines, SSIM regression. Runs only when a reviewer adds the
readylabel.
Before your PR is reviewed:
-
pre-commit run --all-filespasses locally - You've added or updated tests for your changes
- The PR description explains what and why
If pre-commit fails, a bot comment will explain how to fix it. Fastcheck and Full Suite results appear in the Checks section below.
Useful links:
There was a problem hiding this comment.
Code Review
This pull request introduces a new optional parameter gate_compress and makes the attn_metadata parameter optional in the forward method signatures across various attention backend implementations. However, making attn_metadata optional without corresponding None checks in the method bodies introduces potential AttributeError issues in sla.py and vmoba.py when attn_metadata is None. The reviewer suggests adding None checks before accessing attn_metadata attributes, or in the case of vmoba.py, considering if attn_metadata should remain non-optional.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d9f00fc717
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| gate_compress: torch.Tensor | None = None, | ||
| attn_metadata: SDPAMetadata | None = None, |
There was a problem hiding this comment.
Keep attn_metadata as the fourth positional parameter
This signature change makes existing positional call sites pass metadata into gate_compress instead of attn_metadata. For example, fastvideo/attention/layer.py:125,286 and fastvideo/models/dits/ltx2.py:1141,1234 still call forward(q, k, v, ctx_attn_metadata), so after this commit attn_metadata becomes None inside SDPA/Flash/Sage/SLA/VMOBA paths. In masked or variable-length attention flows, that silently drops metadata-derived masking behavior and can produce incorrect attention outputs rather than raising an error.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed. Moved gate_compress to after attn_metadata in all backend forward signatures, so existing call sites remain unaffected.
|
Why would |
agree |
|
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 PR merge requirementsThis rule is failing.
|
Buildkite CI tests failedHi @KyleNeverGivesUp, some Buildkite CI tests have failed. Check the build for details: Common causes:
If the failure is unrelated to your changes, leave a comment explaining why. |
|
Closing this PR. Issue solved by #1183 |
Purpose
Fixes #817
Changes
gate_compress: torch.Tensor | None = Noneparameter toforward()in all attention backends (sdpa, flash_attn, sage_attn, sage_attn3, sla, vmoba, abstract)TypeError: SDPAImpl.forward() takes 5 positional arguments but 6 were givenerror whenDistributedAttention_VSAis used with non-VSA backends# type: ignore[call-arg]comment inlayer.pyline 207 confirms this was a known type contract violationTest Plan
pytest fastvideo/tests/ -k "sparse" -vTest Results
Unable to run full tests locally due to no NVIDIA GPU available (Mac environment).
Code change is minimal - only adding
gate_compress: torch.Tensor | None = Noneas an optional parameter to existing forward() signatures, which should not affect
existing functionality.
Checklist
pre-commit run --all-filesand fixed all issues