[feat] Add MatrixGame3.0 by H1yori233 · Pull Request #1201 · hao-ai-lab/FastVideo

H1yori233 · 2026-03-31T02:31:37Z

Purpose

Add MatrixGame3.0 inference support to FastVideo.
https://huggingface.co/FastVideo/Matrix-Game-3.0-Base-Distilled-Diffusers

Changes

Add matrixgame3.0 model and action module fastvideo/models/dits/matrixgame3.
Add matrixgame3.0 pipeline fastvideo/pipelines/basic/matrixgame.
Add matrixgame3.0 specific denoising stage fastvideo/pipelines/stages/matrixgame_denoising.py.
Add matrixgame3.0 specific light vae.

test.1.mp4

ours.mp4

TODO

migrate origin matrixgame2.0 model/pipeline name
inference for base model (bidirectional multi-step)
code cleanup

Test Plan

# Commands you ran

Test Results

Test output

# Paste output here

Checklist

I ran pre-commit run --all-files and fixed all issues
I added or updated tests for my changes
I updated documentation if needed
I considered GPU memory impact of my changes

For model/pipeline changes, also check:

I verified SSIM regression tests pass
I updated the support matrix if adding a new model

mergify · 2026-03-31T02:32:18Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 PR merge requirements

Waiting for:

#approved-reviews-by>=1
check-success=fastcheck-passed
check-success=full-suite-passed
check-success~=pre-commit

This rule is failing.

#approved-reviews-by>=1
check-success=fastcheck-passed
check-success=full-suite-passed
check-success~=pre-commit
title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model)\]

gemini-code-assist

Code Review

This pull request introduces support for MatrixGame 3.0, adding new DiT architectures, action modules, and I2V pipelines while refactoring existing MatrixGame 2.0 components for clarity. Feedback identifies a critical import mismatch for extrinsic builders and a batching issue where only the first item's actions are used for camera trajectories. Performance concerns were raised regarding inefficient VAE device transfers and redundant timestep embedding computations. Additionally, the review suggests using modulo for angle normalization, warns against silent tensor cropping, and notes potential compatibility regressions in the VAE normalization logic.

gemini-code-assist · 2026-03-31T02:34:16Z

+            batch.keyboard_cond = batch.keyboard_cond.to(device=device, dtype=target_dtype)
+            batch.mouse_cond = batch.mouse_cond.to(device=device, dtype=target_dtype)
+
+        extrinsics_all = build_matrixgame3_extrinsics_from_actions(batch.keyboard_cond[0], batch.mouse_cond[0]).to(device)


The camera extrinsics are computed using only the first item in the batch (batch.keyboard_cond[0]), which means all items in a batch will share the same camera trajectory even if they have different action inputs. This will lead to incorrect results for batch sizes greater than 1 when different actions are provided for each batch item.

gemini-code-assist · 2026-03-31T02:34:17Z

+        self.vae = self.vae.to(get_local_torch_device())
+
+        image = batch.pil_image
+        if isinstance(image, torch.Tensor):
+            if image.dim() == 5:
+                image = image[:, :, :1]
+            elif image.dim() == 4:
+                image = image.unsqueeze(2)
+            else:
+                raise ValueError(f"Unexpected tensor dimensions for MatrixGame3 image: {image.shape}")
+            video_condition = image.to(get_local_torch_device(), dtype=torch.float32)
+        else:
+            image = self.preprocess(image,
+                                    vae_scale_factor=self.vae.spatial_compression_ratio,
+                                    height=height,
+                                    width=width).to(get_local_torch_device(), dtype=torch.float32)
+            video_condition = image.unsqueeze(2).to(get_local_torch_device(), dtype=torch.float32)
+
+        vae_dtype = PRECISION_TO_TYPE[fastvideo_args.pipeline_config.vae_precision]
+        vae_autocast_enabled = (vae_dtype != torch.float32) and not fastvideo_args.disable_autocast
+
+        with torch.autocast(device_type="cuda", dtype=vae_dtype, enabled=vae_autocast_enabled):
+            if fastvideo_args.pipeline_config.vae_tiling:
+                self.vae.enable_tiling()
+            if not vae_autocast_enabled:
+                video_condition = video_condition.to(vae_dtype)
+            encoder_output = self.vae.encode(video_condition)
+
+        img_cond = encoder_output.mode()
+
+        if (hasattr(self.vae.config, 'latents_mean') and hasattr(self.vae.config, 'latents_std')):
+            latents_mean = torch.tensor(self.vae.config.latents_mean, device=img_cond.device,
+                                        dtype=img_cond.dtype).view(1, -1, 1, 1, 1)
+            latents_std = torch.tensor(self.vae.config.latents_std, device=img_cond.device,
+                                       dtype=img_cond.dtype).view(1, -1, 1, 1, 1)
+            img_cond = (img_cond - latents_mean) / latents_std
+        elif (hasattr(self.vae, "shift_factor") and self.vae.shift_factor is not None):
+            if isinstance(self.vae.shift_factor, torch.Tensor):
+                img_cond -= self.vae.shift_factor.to(img_cond.device, img_cond.dtype)
+            else:
+                img_cond -= self.vae.shift_factor
+
+            if hasattr(self.vae, 'scaling_factor'):
+                if isinstance(self.vae.scaling_factor, torch.Tensor):
+                    img_cond = img_cond * self.vae.scaling_factor.to(img_cond.device, img_cond.dtype)
+                else:
+                    img_cond = img_cond * self.vae.scaling_factor
+
+        batch.image_latent = img_cond
+
+        if hasattr(self, 'maybe_free_model_hooks'):
+            self.maybe_free_model_hooks()
+
+        self.vae.to("cpu")


Moving the VAE model to the GPU and back to CPU within the forward method is extremely inefficient. This triggers heavy synchronization and PCIe transfers for every batch, significantly impacting inference performance. Model placement should be managed at the pipeline level or via a dedicated component offloader.

gemini-code-assist · 2026-03-31T02:34:17Z

+    while new_yaw > 180:
+        new_yaw -= 360
+    while new_yaw < -180:
+        new_yaw += 360


Using while loops for angle normalization is inefficient and can be slow if the input values are far outside the target range. A more efficient approach is to use the modulo operator.

new_yaw = (new_yaw + 180) % 360 - 180

gemini-code-assist · 2026-03-31T02:34:17Z

+                    if noise_pred.shape != current_latents.shape:
+                        aligned_t = min(noise_pred.shape[2], current_latents.shape[2])
+                        aligned_h = min(noise_pred.shape[3], current_latents.shape[3])
+                        aligned_w = min(noise_pred.shape[4], current_latents.shape[4])
+                        logger.warning(
+                            "Aligning noise/sample shapes before scheduler.step: noise=%s sample=%s -> (*,*,%d,%d,%d)",
+                            tuple(noise_pred.shape),
+                            tuple(current_latents.shape),
+                            aligned_t,
+                            aligned_h,
+                            aligned_w,
+                        )
+                        noise_pred = noise_pred[:, :, :aligned_t, :aligned_h, :aligned_w]
+                        current_latents = current_latents[:, :, :aligned_t, :aligned_h, :aligned_w]
+


Silently cropping the noise_pred or current_latents when shapes mismatch can hide underlying issues with model configuration, patch alignment, or padding logic. It is recommended to ensure consistent shapes through proper padding within the model wrapper or by investigating the root cause of the mismatch to avoid data loss at the edges of the video.

gemini-code-assist · 2026-03-31T02:34:17Z

+            timestep_tokens = timestep_tokens.unsqueeze(1).repeat(
+                1, post_patch_num_frames * post_patch_height * post_patch_width
+            )


Repeating the timestep for every spatial token and then passing the flattened tensor to the condition_embedder leads to a massive amount of redundant computation in the TimestepEmbedder. Since the timesteps are identical for all tokens in this fallback case, it is much more efficient to compute the embedding once per batch and then expand it spatially.

gemini-code-assist · 2026-03-31T02:34:17Z

+        dims = (1 if self.channel_first else -1)
+        rms = (x.pow(2).mean(dims, keepdim=True) + 1e-6).sqrt()
+        return (x / rms) * self.gamma + self.bias


The manual RMS calculation replaces F.normalize and removes self.scale. If self.scale was non-unity in the original implementation, this change will break compatibility with existing checkpoints. Additionally, the epsilon 1e-6 is hardcoded; it would be better to use a parameter or a standard constant for consistency.

mergify · 2026-04-05T05:01:30Z

This PR has merge conflicts with the base branch. Please rebase:

git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease

H1yori233 and others added 7 commits March 27, 2026 22:31

add mg3 model

74c0c72

light vae for mg3

202a81b

cleanup

4dd54bf

fix action

c383441

some check in denoising stage

0cf1c10

update model

941e843

Merge remote-tracking branch 'origin/main' into feat/kaiqin/add-mg-3

ece6fab

mergify bot added type: feat New feature or capability scope: training Training pipeline, methods, configs scope: inference Inference pipeline, serving, CLI scope: infra CI, tests, Docker, build scope: model Model architecture (DiTs, encoders, VAEs) labels Mar 31, 2026

gemini-code-assist bot reviewed Mar 31, 2026

View reviewed changes

cleanup and precommit

1ae0efe

mergify bot added the needs-rebase PR has merge conflicts label Apr 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Add MatrixGame3.0#1201

[feat] Add MatrixGame3.0#1201
H1yori233 wants to merge 8 commits intohao-ai-lab:mainfrom
H1yori233:feat/kaiqin/add-mg-3

H1yori233 commented Mar 31, 2026 •

edited

Loading

Uh oh!

mergify bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

mergify bot commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

H1yori233 commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

TODO

Test Plan

Test Results

Checklist

Uh oh!

mergify bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🔴 PR merge requirements

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

H1yori233 commented Mar 31, 2026 •

edited

Loading

mergify bot commented Mar 31, 2026 •

edited

Loading