Kandinsky5 lite dit by jaisurya27 · Pull Request #1060 · hao-ai-lab/FastVideo

jaisurya27 · 2026-02-04T20:44:38Z

Add Kandinsky5 DiT config + mapping so diffusers weights load cleanly.
Implement a FastVideo DiT wrapper around diffusers Kandinsky5Transformer3DModel for parity.
Register Kandinsky5 DiT in the model registry.
Add transformer‑only parity test comparing FastVideo vs diffusers with fixed inputs.
Materialize Kandinsky5 non‑persistent buffers after meta init to avoid meta‑device errors.

gemini-code-assist · 2026-02-04T20:45:01Z

Summary of Changes

Hello @jaisurya27, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the Kandinsky5 Diffusion Transformer (DiT) model into the FastVideo framework. It provides the necessary configuration, a compatibility wrapper for Diffusers weights, and ensures correct model registration and buffer handling. A dedicated parity test has been added to validate the new integration, alongside updates to core dependencies to support these changes.

Highlights

Kandinsky5 DiT Integration: Added configuration and mapping for the Kandinsky5 Diffusion Transformer (DiT) model, enabling its use within the FastVideo framework.
FastVideo Wrapper: Implemented a FastVideo DiT wrapper around diffusers.Kandinsky5Transformer3DModel to ensure clean loading of Diffusers weights and compatibility.
Model Registry Update: Registered the new Kandinsky5 DiT model within the FastVideo model registry, making it discoverable and usable.
Parity Testing: Introduced a transformer-only parity test to compare the FastVideo implementation against the original Diffusers model with fixed inputs.
Buffer Materialization: Ensured non-persistent buffers for Kandinsky5 are materialized after meta-initialization to prevent meta-device errors during model loading.
Dependency Updates: Updated Flash Attention wheel URLs in Dockerfiles and increased the minimum PyTorch version requirement to 2.10.0.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

docker/Dockerfile.python3.10
- Updated Flash Attention wheel URL to v0.7.16 for cu128torch2.10.
docker/Dockerfile.python3.11
- Updated Flash Attention wheel URL to v0.7.16 for cu128torch2.10.
docker/Dockerfile.python3.12
- Updated Flash Attention wheel URL to v0.7.16 for cu128torch2.10.
fastvideo/configs/models/dits/init.py
- Added Kandinsky5VideoConfig to the list of exported DiT configurations.
fastvideo/configs/models/dits/kandinsky5.py
- Introduced new configuration classes (Kandinsky5ArchConfig, Kandinsky5VideoConfig) to define the architecture and specific settings for the Kandinsky5 DiT model, including FSDP sharding and parameter mapping.
fastvideo/models/dits/kandinsky5.py
- Implemented the Kandinsky5Transformer3DModel class, which acts as a FastVideo wrapper for the Diffusers Kandinsky5 model, handling its forward pass and materializing non-persistent buffers.
fastvideo/models/loader/fsdp_load.py
- Modified the FSDP model loading logic to call materialize_non_persistent_buffers if available on the loaded model, addressing meta-device initialization issues.
fastvideo/models/registry.py
- Registered the new Kandinsky5Transformer3DModel in the _DIT_MODELS dictionary, making it discoverable by the framework.
pyproject.toml
- Updated the minimum required PyTorch version to torch>=2.10.0.
pyproject_other.toml
- Updated the minimum required PyTorch version to torch>=2.10.0.
tests/local_tests/transformers/test_kandinsky5_lite_transformer_parity.py
- Added a comprehensive test to ensure numerical parity between the FastVideo Kandinsky5 implementation and the original Diffusers model.

Activity

The pull request was initiated by jaisurya27.
The author provided a detailed description outlining the key changes and their purpose.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the Kandinsky5 DiT model, including configuration, a model wrapper, and a parity test. The changes also update the torch dependency and corresponding pre-built wheels. The overall implementation is good, but I've identified a couple of high-severity bugs in the new model wrapper and its test, along with a medium-severity suggestion for code improvement. Please see my detailed comments.

gemini-code-assist · 2026-02-04T20:46:35Z

+        if return_dict:
+            return outputs
+        return outputs.sample if hasattr(outputs, "sample") else outputs


The forward method's return type hint is torch.Tensor, but the current implementation can return a tuple, which violates the base class contract. When return_dict=False, the underlying diffusers model returns a tuple (sample,), and your code returns this tuple directly. This will likely cause issues downstream. The logic should be adjusted to always return a tensor, by extracting it from the output object or tuple.

if hasattr(outputs, "sample"): return outputs.sample return outputs[0]

gemini-code-assist · 2026-02-04T20:46:35Z

+    assert ref_out.shape == fv_out.shape
+    assert ref_out.dtype == fv_out.dtype
+    assert_close(ref_out, fv_out, atol=1e-4, rtol=1e-4)


These assertions will fail because both the reference diffusers model and the current fastvideo wrapper implementation return a tuple (sample,) when return_dict=False. You are attempting to access attributes like .shape and .dtype on a tuple, which will raise an AttributeError. You need to extract the tensor from the tuple for both model outputs before performing the assertions.

Suggested change

assert ref_out.shape == fv_out.shape

assert ref_out.dtype == fv_out.dtype

assert_close(ref_out, fv_out, atol=1e-4, rtol=1e-4)

ref_out_tensor = ref_out[0]

fv_out_tensor = fv_out[0]

assert ref_out_tensor.shape == fv_out_tensor.shape

assert ref_out_tensor.dtype == fv_out_tensor.dtype

assert_close(ref_out_tensor, fv_out_tensor, atol=1e-4, rtol=1e-4)

gemini-code-assist · 2026-02-04T20:46:35Z

+    _fsdp_shard_conditions = Kandinsky5VideoConfig()._fsdp_shard_conditions
+    _compile_conditions = Kandinsky5VideoConfig()._compile_conditions
+    param_names_mapping = Kandinsky5VideoConfig().param_names_mapping
+    reverse_param_names_mapping = Kandinsky5VideoConfig(
+    ).reverse_param_names_mapping
+    lora_param_names_mapping = Kandinsky5VideoConfig().lora_param_names_mapping
+    _supported_attention_backends = Kandinsky5VideoConfig(
+    )._supported_attention_backends


Creating a new Kandinsky5VideoConfig instance for each class attribute is inefficient and can be simplified. It's better to create a single instance and reuse it for all attributes. This improves readability and avoids unnecessary object creation at module load time.

_config = Kandinsky5VideoConfig() _fsdp_shard_conditions = _config._fsdp_shard_conditions _compile_conditions = _config._compile_conditions param_names_mapping = _config.param_names_mapping reverse_param_names_mapping = _config.reverse_param_names_mapping lora_param_names_mapping = _config.lora_param_names_mapping _supported_attention_backends = _config._supported_attention_backends

SolitaryThinker · 2026-02-04T21:03:41Z

Hi Thanks! Initial progress looks good. Could you directly port the implementation of DiT to FastVideo rather than importing from diffusers? thanks!

github-actions · 2026-04-06T04:24:50Z

This pull request has been automatically marked as stale because it has not had any activity within 60 days. It will be automatically closed if no further activity occurs within 14 days. Leave a comment if you feel this pull request should remain open. Thank you!

mergify · 2026-04-06T04:25:31Z

⚠️ PR title format required

Your PR title must start with a type tag in brackets. Examples:

[feat] Add new model support
[bugfix] Fix VAE tiling corruption
[refactor] Restructure training pipeline
[perf] Optimize attention kernel
[ci] Update test infrastructure
[docs] Add inference guide
[misc] Clean up configs
[new-model] Port Flux2 to FastVideo

Valid tags: feat, feature, bugfix, fix, refactor, perf, ci, doc, docs, misc, chore, kernel, new-model

Please update your PR title and the merge protection check will pass automatically.

mergify · 2026-04-06T04:25:32Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 PR merge requirements

This rule is failing.

#approved-reviews-by>=1
check-success=fastcheck-passed
check-success=full-suite-passed
check-success~=pre-commit
title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model)\]

mergify · 2026-04-07T04:14:32Z

This PR has merge conflicts with the base branch. Please rebase:

git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease

SolitaryThinker added 2 commits January 28, 2026 22:47

upgrade

c2d4071

fa version

aef461a

gemini-code-assist bot reviewed Feb 4, 2026

View reviewed changes

Add Kandinsky5 Lite DiT wrapper and parity tests

5e886c8

jaisurya27 force-pushed the kandinsky5-lite-dit branch from afdea2e to 5e886c8 Compare February 4, 2026 22:05

github-actions bot added the stale Inactive — will auto-close soon label Apr 6, 2026

mergify bot added scope: infra CI, tests, Docker, build scope: model Model architecture (DiTs, encoders, VAEs) labels Apr 6, 2026

github-actions bot added unstale Reactivated after being stale and removed stale Inactive — will auto-close soon labels Apr 7, 2026

mergify bot added the needs-rebase PR has merge conflicts label Apr 7, 2026

jaisurya27 closed this Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kandinsky5 lite dit#1060

Kandinsky5 lite dit#1060
jaisurya27 wants to merge 3 commits intohao-ai-lab:mainfrom
jaisurya27:kandinsky5-lite-dit

jaisurya27 commented Feb 4, 2026

Uh oh!

gemini-code-assist bot commented Feb 4, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 4, 2026

Uh oh!

gemini-code-assist bot Feb 4, 2026

Uh oh!

gemini-code-assist bot Feb 4, 2026

Uh oh!

SolitaryThinker commented Feb 4, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

mergify bot commented Apr 6, 2026

Uh oh!

mergify bot commented Apr 6, 2026

Uh oh!

mergify bot commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    assert ref_out.shape == fv_out.shape
-    assert ref_out.dtype == fv_out.dtype
-    assert_close(ref_out, fv_out, atol=1e-4, rtol=1e-4)
+    ref_out_tensor = ref_out[0]
+    fv_out_tensor = fv_out[0]
+    assert ref_out_tensor.shape == fv_out_tensor.shape
+    assert ref_out_tensor.dtype == fv_out_tensor.dtype
+    assert_close(ref_out_tensor, fv_out_tensor, atol=1e-4, rtol=1e-4)

Conversation

jaisurya27 commented Feb 4, 2026

Uh oh!

gemini-code-assist bot commented Feb 4, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

SolitaryThinker commented Feb 4, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

mergify bot commented Apr 6, 2026

⚠️ PR title format required

Uh oh!

mergify bot commented Apr 6, 2026

Merge Protections

🔴 PR merge requirements

Uh oh!

mergify bot commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants