Conversation
jaisurya27
commented
Feb 4, 2026
- Add Kandinsky5 DiT config + mapping so diffusers weights load cleanly.
- Implement a FastVideo DiT wrapper around diffusers Kandinsky5Transformer3DModel for parity.
- Register Kandinsky5 DiT in the model registry.
- Add transformer‑only parity test comparing FastVideo vs diffusers with fixed inputs.
- Materialize Kandinsky5 non‑persistent buffers after meta init to avoid meta‑device errors.
Summary of ChangesHello @jaisurya27, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates the Kandinsky5 Diffusion Transformer (DiT) model into the FastVideo framework. It provides the necessary configuration, a compatibility wrapper for Diffusers weights, and ensures correct model registration and buffer handling. A dedicated parity test has been added to validate the new integration, alongside updates to core dependencies to support these changes. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds support for the Kandinsky5 DiT model, including configuration, a model wrapper, and a parity test. The changes also update the torch dependency and corresponding pre-built wheels. The overall implementation is good, but I've identified a couple of high-severity bugs in the new model wrapper and its test, along with a medium-severity suggestion for code improvement. Please see my detailed comments.
| if return_dict: | ||
| return outputs | ||
| return outputs.sample if hasattr(outputs, "sample") else outputs |
There was a problem hiding this comment.
The forward method's return type hint is torch.Tensor, but the current implementation can return a tuple, which violates the base class contract. When return_dict=False, the underlying diffusers model returns a tuple (sample,), and your code returns this tuple directly. This will likely cause issues downstream. The logic should be adjusted to always return a tensor, by extracting it from the output object or tuple.
if hasattr(outputs, "sample"):
return outputs.sample
return outputs[0]| assert ref_out.shape == fv_out.shape | ||
| assert ref_out.dtype == fv_out.dtype | ||
| assert_close(ref_out, fv_out, atol=1e-4, rtol=1e-4) |
There was a problem hiding this comment.
These assertions will fail because both the reference diffusers model and the current fastvideo wrapper implementation return a tuple (sample,) when return_dict=False. You are attempting to access attributes like .shape and .dtype on a tuple, which will raise an AttributeError. You need to extract the tensor from the tuple for both model outputs before performing the assertions.
| assert ref_out.shape == fv_out.shape | |
| assert ref_out.dtype == fv_out.dtype | |
| assert_close(ref_out, fv_out, atol=1e-4, rtol=1e-4) | |
| ref_out_tensor = ref_out[0] | |
| fv_out_tensor = fv_out[0] | |
| assert ref_out_tensor.shape == fv_out_tensor.shape | |
| assert ref_out_tensor.dtype == fv_out_tensor.dtype | |
| assert_close(ref_out_tensor, fv_out_tensor, atol=1e-4, rtol=1e-4) |
| _fsdp_shard_conditions = Kandinsky5VideoConfig()._fsdp_shard_conditions | ||
| _compile_conditions = Kandinsky5VideoConfig()._compile_conditions | ||
| param_names_mapping = Kandinsky5VideoConfig().param_names_mapping | ||
| reverse_param_names_mapping = Kandinsky5VideoConfig( | ||
| ).reverse_param_names_mapping | ||
| lora_param_names_mapping = Kandinsky5VideoConfig().lora_param_names_mapping | ||
| _supported_attention_backends = Kandinsky5VideoConfig( | ||
| )._supported_attention_backends |
There was a problem hiding this comment.
Creating a new Kandinsky5VideoConfig instance for each class attribute is inefficient and can be simplified. It's better to create a single instance and reuse it for all attributes. This improves readability and avoids unnecessary object creation at module load time.
_config = Kandinsky5VideoConfig()
_fsdp_shard_conditions = _config._fsdp_shard_conditions
_compile_conditions = _config._compile_conditions
param_names_mapping = _config.param_names_mapping
reverse_param_names_mapping = _config.reverse_param_names_mapping
lora_param_names_mapping = _config.lora_param_names_mapping
_supported_attention_backends = _config._supported_attention_backends|
Hi Thanks! Initial progress looks good. Could you directly port the implementation of DiT to FastVideo rather than importing from diffusers? thanks! |
afdea2e to
5e886c8
Compare
|
This pull request has been automatically marked as stale because it has not had any activity within 60 days. It will be automatically closed if no further activity occurs within 14 days. Leave a comment if you feel this pull request should remain open. Thank you! |
|
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 PR merge requirementsThis rule is failing.
|
|
This PR has merge conflicts with the base branch. Please rebase: git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease |