Skip to content

Kandinsky5 lite dit#1060

Closed
jaisurya27 wants to merge 3 commits intohao-ai-lab:mainfrom
jaisurya27:kandinsky5-lite-dit
Closed

Kandinsky5 lite dit#1060
jaisurya27 wants to merge 3 commits intohao-ai-lab:mainfrom
jaisurya27:kandinsky5-lite-dit

Conversation

@jaisurya27
Copy link
Copy Markdown
Contributor

  • Add Kandinsky5 DiT config + mapping so diffusers weights load cleanly.
  • Implement a FastVideo DiT wrapper around diffusers Kandinsky5Transformer3DModel for parity.
  • Register Kandinsky5 DiT in the model registry.
  • Add transformer‑only parity test comparing FastVideo vs diffusers with fixed inputs.
  • Materialize Kandinsky5 non‑persistent buffers after meta init to avoid meta‑device errors.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @jaisurya27, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the Kandinsky5 Diffusion Transformer (DiT) model into the FastVideo framework. It provides the necessary configuration, a compatibility wrapper for Diffusers weights, and ensures correct model registration and buffer handling. A dedicated parity test has been added to validate the new integration, alongside updates to core dependencies to support these changes.

Highlights

  • Kandinsky5 DiT Integration: Added configuration and mapping for the Kandinsky5 Diffusion Transformer (DiT) model, enabling its use within the FastVideo framework.
  • FastVideo Wrapper: Implemented a FastVideo DiT wrapper around diffusers.Kandinsky5Transformer3DModel to ensure clean loading of Diffusers weights and compatibility.
  • Model Registry Update: Registered the new Kandinsky5 DiT model within the FastVideo model registry, making it discoverable and usable.
  • Parity Testing: Introduced a transformer-only parity test to compare the FastVideo implementation against the original Diffusers model with fixed inputs.
  • Buffer Materialization: Ensured non-persistent buffers for Kandinsky5 are materialized after meta-initialization to prevent meta-device errors during model loading.
  • Dependency Updates: Updated Flash Attention wheel URLs in Dockerfiles and increased the minimum PyTorch version requirement to 2.10.0.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docker/Dockerfile.python3.10
    • Updated Flash Attention wheel URL to v0.7.16 for cu128torch2.10.
  • docker/Dockerfile.python3.11
    • Updated Flash Attention wheel URL to v0.7.16 for cu128torch2.10.
  • docker/Dockerfile.python3.12
    • Updated Flash Attention wheel URL to v0.7.16 for cu128torch2.10.
  • fastvideo/configs/models/dits/init.py
    • Added Kandinsky5VideoConfig to the list of exported DiT configurations.
  • fastvideo/configs/models/dits/kandinsky5.py
    • Introduced new configuration classes (Kandinsky5ArchConfig, Kandinsky5VideoConfig) to define the architecture and specific settings for the Kandinsky5 DiT model, including FSDP sharding and parameter mapping.
  • fastvideo/models/dits/kandinsky5.py
    • Implemented the Kandinsky5Transformer3DModel class, which acts as a FastVideo wrapper for the Diffusers Kandinsky5 model, handling its forward pass and materializing non-persistent buffers.
  • fastvideo/models/loader/fsdp_load.py
    • Modified the FSDP model loading logic to call materialize_non_persistent_buffers if available on the loaded model, addressing meta-device initialization issues.
  • fastvideo/models/registry.py
    • Registered the new Kandinsky5Transformer3DModel in the _DIT_MODELS dictionary, making it discoverable by the framework.
  • pyproject.toml
    • Updated the minimum required PyTorch version to torch>=2.10.0.
  • pyproject_other.toml
    • Updated the minimum required PyTorch version to torch>=2.10.0.
  • tests/local_tests/transformers/test_kandinsky5_lite_transformer_parity.py
    • Added a comprehensive test to ensure numerical parity between the FastVideo Kandinsky5 implementation and the original Diffusers model.
Activity
  • The pull request was initiated by jaisurya27.
  • The author provided a detailed description outlining the key changes and their purpose.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the Kandinsky5 DiT model, including configuration, a model wrapper, and a parity test. The changes also update the torch dependency and corresponding pre-built wheels. The overall implementation is good, but I've identified a couple of high-severity bugs in the new model wrapper and its test, along with a medium-severity suggestion for code improvement. Please see my detailed comments.

Comment on lines +105 to +107
if return_dict:
return outputs
return outputs.sample if hasattr(outputs, "sample") else outputs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The forward method's return type hint is torch.Tensor, but the current implementation can return a tuple, which violates the base class contract. When return_dict=False, the underlying diffusers model returns a tuple (sample,), and your code returns this tuple directly. This will likely cause issues downstream. The logic should be adjusted to always return a tensor, by extracting it from the output object or tuple.

        if hasattr(outputs, "sample"):
            return outputs.sample
        return outputs[0]

Comment on lines +154 to +156
assert ref_out.shape == fv_out.shape
assert ref_out.dtype == fv_out.dtype
assert_close(ref_out, fv_out, atol=1e-4, rtol=1e-4)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

These assertions will fail because both the reference diffusers model and the current fastvideo wrapper implementation return a tuple (sample,) when return_dict=False. You are attempting to access attributes like .shape and .dtype on a tuple, which will raise an AttributeError. You need to extract the tensor from the tuple for both model outputs before performing the assertions.

Suggested change
assert ref_out.shape == fv_out.shape
assert ref_out.dtype == fv_out.dtype
assert_close(ref_out, fv_out, atol=1e-4, rtol=1e-4)
ref_out_tensor = ref_out[0]
fv_out_tensor = fv_out[0]
assert ref_out_tensor.shape == fv_out_tensor.shape
assert ref_out_tensor.dtype == fv_out_tensor.dtype
assert_close(ref_out_tensor, fv_out_tensor, atol=1e-4, rtol=1e-4)

Comment on lines +15 to +22
_fsdp_shard_conditions = Kandinsky5VideoConfig()._fsdp_shard_conditions
_compile_conditions = Kandinsky5VideoConfig()._compile_conditions
param_names_mapping = Kandinsky5VideoConfig().param_names_mapping
reverse_param_names_mapping = Kandinsky5VideoConfig(
).reverse_param_names_mapping
lora_param_names_mapping = Kandinsky5VideoConfig().lora_param_names_mapping
_supported_attention_backends = Kandinsky5VideoConfig(
)._supported_attention_backends
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Creating a new Kandinsky5VideoConfig instance for each class attribute is inefficient and can be simplified. It's better to create a single instance and reuse it for all attributes. This improves readability and avoids unnecessary object creation at module load time.

_config = Kandinsky5VideoConfig()
_fsdp_shard_conditions = _config._fsdp_shard_conditions
_compile_conditions = _config._compile_conditions
param_names_mapping = _config.param_names_mapping
reverse_param_names_mapping = _config.reverse_param_names_mapping
lora_param_names_mapping = _config.lora_param_names_mapping
_supported_attention_backends = _config._supported_attention_backends

@SolitaryThinker
Copy link
Copy Markdown
Collaborator

Hi Thanks! Initial progress looks good. Could you directly port the implementation of DiT to FastVideo rather than importing from diffusers? thanks!

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

This pull request has been automatically marked as stale because it has not had any activity within 60 days. It will be automatically closed if no further activity occurs within 14 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions bot added the stale Inactive — will auto-close soon label Apr 6, 2026
@mergify mergify bot added scope: infra CI, tests, Docker, build scope: model Model architecture (DiTs, encoders, VAEs) labels Apr 6, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Apr 6, 2026

⚠️ PR title format required

Your PR title must start with a type tag in brackets. Examples:

  • [feat] Add new model support
  • [bugfix] Fix VAE tiling corruption
  • [refactor] Restructure training pipeline
  • [perf] Optimize attention kernel
  • [ci] Update test infrastructure
  • [docs] Add inference guide
  • [misc] Clean up configs
  • [new-model] Port Flux2 to FastVideo

Valid tags: feat, feature, bugfix, fix, refactor, perf, ci, doc, docs, misc, chore, kernel, new-model

Please update your PR title and the merge protection check will pass automatically.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Apr 6, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 PR merge requirements

This rule is failing.
  • #approved-reviews-by>=1
  • check-success=fastcheck-passed
  • check-success=full-suite-passed
  • check-success~=pre-commit
  • title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model)\]

@github-actions github-actions bot added unstale Reactivated after being stale and removed stale Inactive — will auto-close soon labels Apr 7, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Apr 7, 2026

This PR has merge conflicts with the base branch. Please rebase:

git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease

@mergify mergify bot added the needs-rebase PR has merge conflicts label Apr 7, 2026
@jaisurya27 jaisurya27 closed this Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase PR has merge conflicts scope: infra CI, tests, Docker, build scope: model Model architecture (DiTs, encoders, VAEs) unstale Reactivated after being stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants