[feat] Add Waypoint-1-Small interactive world model support by Satvikmatta18 · Pull Request #1058 · hao-ai-lab/FastVideo

Satvikmatta18 · 2026-02-03T00:05:36Z

Summary

Adds support for Overworld's Waypoint-1-Small, a 2.3B parameter interactive world model for real-time video generation conditioned on text and controller inputs.

Changes

Transformer: fastvideo/models/dits/waypoint_transformer.py - Full DiT implementation with causal attention, GQA, and control conditioning
Config: fastvideo/configs/models/dits/waypoint_transformer.py - Architecture config
Pipeline: fastvideo/pipelines/basic/waypoint/ - Streaming pipeline with CtrlInput support
Pipeline Config: fastvideo/configs/pipelines/waypoint.py - Pipeline configuration
Sampling: fastvideo/configs/sample/waypoint.py - Default sampling parameters
Tests: Parity tests for transformer and pipeline smoke tests
Example: examples/inference/basic/basic_waypoint_streaming.py
Docs: Pipeline README

Testing

All transformer parity tests pass (weight loading + forward pass)
Pipeline smoke tests pass

References

Model: https://huggingface.co/Overworld/Waypoint-1-Small
Official repo: https://github.com/Overworldai/world_engine

gemini-code-assist · 2026-02-03T00:06:00Z

Summary of Changes

Hello @Satvikmatta18, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the integration of Overworld's Waypoint-1-Small, an interactive world model, into the FastVideo framework. The primary goal is to enable real-time video generation that responds dynamically to both text prompts and user-controlled inputs like mouse movements and keyboard presses. This significantly expands the framework's capabilities by adding support for highly interactive and controllable video synthesis, moving beyond static text-to-video generation.

Highlights

Waypoint-1-Small Model Integration: Added comprehensive support for Overworld's Waypoint-1-Small, a 2.3B parameter interactive world model, enabling real-time video generation conditioned on text and controller inputs.
New Transformer Implementation: Introduced WaypointWorldModel with a 22-layer DiT architecture featuring causal attention, Grouped Query Attention (GQA), and dedicated control conditioning via MLP fusion.
Streaming Inference Pipeline: Developed a new streaming pipeline (WaypointPipeline) that supports interactive generation, allowing users to provide real-time mouse, keyboard, and scroll inputs to influence video output.
Configuration and Example Usage: Included detailed configuration files for the model architecture, pipeline settings, and sampling parameters, along with an interactive example script (basic_waypoint_streaming.py) and comprehensive documentation.
Testing and Validation: Implemented parity tests for the transformer to ensure weight loading and forward pass correctness, and smoke tests for the pipeline to validate basic functionality.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

examples/inference/basic/basic_waypoint_streaming.py
- Added a new example script demonstrating basic streaming inference for the Waypoint-1-Small model.
- Includes interactive keyboard and mouse control handling for real-time input.
fastvideo/configs/models/dits/init.py
- Updated to import and expose the new WaypointConfig.
fastvideo/configs/models/dits/waypoint_transformer.py
- Added a new configuration file defining the WaypointArchConfig for the Waypoint-1-Small transformer.
- Specifies model architecture details including d_model, n_heads, n_kv_heads (GQA), n_layers, causal attention, and control/prompt conditioning parameters.
fastvideo/configs/pipelines/registry.py
- Modified to import WaypointT2VConfig.
- Registered WaypointT2VConfig under the model ID 'Overworld/Waypoint-1-Small' and added a detector for 'waypoint' or 'worldengine' IDs.
fastvideo/configs/pipelines/waypoint.py
- Added a new pipeline configuration file for WaypointT2VConfig.
- Defines specific settings for VAE (DCAE-based), UMT5-XL text encoder postprocessing, precision, and Waypoint-specific parameters like fixed sigma schedule and causal generation.
fastvideo/configs/sample/registry.py
- Modified to import WaypointSamplingParam.
- Registered WaypointSamplingParam for 'Overworld/Waypoint-1-Small' and added a detector for 'waypoint' or 'worldengine' IDs.
fastvideo/configs/sample/waypoint.py
- Added a new sampling parameter configuration file for WaypointSamplingParam.
- Specifies default video parameters (360p, 640x360, 60fps) and denoising parameters (4 inference steps, 1.0 guidance scale).
fastvideo/models/dits/waypoint_transformer.py
- Added the core implementation of the WaypointWorldModel transformer.
- Includes custom building blocks like MLP, AdaLN, CFG, ControllerInputEmbedding, NoiseConditioner, MLPFusion, CondHead, GatedSelfAttention (with GQA and per-head gating), and CrossAttention.
- The WaypointBlock integrates these components for noise, prompt, and control conditioning.
- The WaypointWorldModel handles patch embedding, transformer forward pass, and unpatching.
fastvideo/pipelines/basic/waypoint/README.md
- Added documentation for the Waypoint-1-Small pipeline, covering overview, architecture, usage, control input format, configuration, related files, and hardware requirements.
fastvideo/pipelines/basic/waypoint/init.py
- Added an __init__.py file to expose WaypointPipeline.
fastvideo/pipelines/basic/waypoint/waypoint_pipeline.py
- Added the WaypointPipeline class, implementing the streaming inference logic.
- Includes CtrlInput dataclass for controller inputs and StreamingContext for managing streaming state.
- Provides streaming_reset, streaming_step, and streaming_clear methods for interactive generation.
fastvideo/pipelines/pipeline_registry.py
- Updated _PIPELINE_NAME_TO_ARCHITECTURE_NAME to include WaypointPipeline.
tests/local_tests/pipelines/test_waypoint_pipeline_smoke.py
- Added smoke tests for the Waypoint pipeline.
- Verifies transformer loading, forward pass, pipeline import, and config loading.
tests/local_tests/transformers/test_waypoint_transformer.py
- Added parity tests for the Waypoint transformer.
- Ensures correct weight loading and functional forward pass, comparing against official checkpoint keys.

Activity

Initial implementation of the Waypoint-1-Small model and its associated streaming pipeline.
All transformer parity tests have passed, confirming correct weight loading and forward pass behavior.
Pipeline smoke tests have passed, validating the basic functionality of the interactive video generation pipeline.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for the Waypoint-1-Small interactive world model. The changes are comprehensive, including the model implementation, configurations, a streaming pipeline, tests, documentation, and an example script. The implementation is well-structured. However, I've identified a critical issue where the Rotary Position Embeddings (RoPE) are not implemented, which will significantly impact model performance. I've also noted several areas for improvement regarding code duplication, best practices, and clarity. Please see the detailed comments for suggestions.

Eigensystem

plz fix pre-commit errors

Eigensystem · 2026-02-12T08:34:44Z

Hi @Satvikmatta18, I tried to generate a video but the results were all blurry. Did you check the correctness / compare it with diffusers results?

Eigensystem

Hi @Satvikmatta18 . Could you please refer to the google doc to check whether your implementation matches all the requirement in architecture level? Thanks.
https://docs.google.com/document/d/1h7UOPEOsw9BwnHWGJLcm7FOAv4I1is3HWe2iWln6zyY/edit?tab=t.2w8cxbq3lg5x#heading=h.iaae91tz7zew

SolitaryThinker · 2026-02-15T01:44:00Z

could you also address the gemini comments? And resolve them if you have or they do not make sense. Thanks!

Eigensystem · 2026-02-16T03:26:24Z

The generated video looks still blurry on my side. Could you write a ssim test in fastvideo/tests/ssim to check the correctness? @Satvikmatta18

Satvikmatta18 · 2026-02-16T08:09:10Z

Hi @Eigensystem Eigensystem,
Thanks for the review. I’ve made these changes to address the blurriness:

flex_attention – Waypoint now uses flex_attention (when available) for the kv_cache=None path, aligned with WanVideo/MatrixGame.
Sigma schedule – Updated to the official schedule [1.0, 0.861, 0.729, 0.321, 0.0].
Reproducible noise – Switched to per-frame torch.Generator seeding for consistent results.
Seed passing – The --seed flag is now passed into reset() for reproducibility.

Eigensystem

plz fix pre-commit errors

mergify · 2026-03-28T05:34:10Z

This PR has merge conflicts with the base branch. Please rebase:

git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease

mergify · 2026-03-30T00:24:18Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 PR merge requirements

Waiting for:

#approved-reviews-by>=1
check-success=fastcheck-passed
check-success=full-suite-passed
check-success~=pre-commit

This rule is failing.

#approved-reviews-by>=1
check-success=fastcheck-passed
check-success=full-suite-passed
check-success~=pre-commit
title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model)\]

mergify · 2026-03-30T00:24:56Z

Pre-commit checks failed

Hi @Satvikmatta18, the pre-commit checks have failed. To fix them locally:

# Install pre-commit if you haven't already
uv pip install pre-commit
pre-commit install

# Run all checks and auto-fix what's possible
pre-commit run --all-files

Common fixes:

yapf: yapf -i <file> (formatting)
ruff: ruff check --fix <file> (linting)
codespell: codespell --write-changes <file> (spelling)

After fixing, commit and push the changes. The checks will re-run automatically.

For future commits, pre-commit will run automatically on changed files before each commit.

mergify · 2026-03-30T01:31:51Z

Buildkite CI tests failed

Hi @Satvikmatta18, some Buildkite CI tests have failed. Check the build for details:
View Buildkite build →

Common causes:

Test failures: Check the failing step's output for assertion errors or tracebacks
Import errors: Make sure new dependencies are added to pyproject.toml
GPU memory: Some tests require specific GPU types (L40S, H100 NVL)
Kernel build: If you changed fastvideo-kernel/, the build may have failed

If the failure is unrelated to your changes, leave a comment explaining why.

mergify · 2026-03-30T01:34:13Z

Buildkite CI tests failed

Hi @Satvikmatta18, some Buildkite CI tests have failed. Check the build for details:
View Buildkite build →

Common causes:

Test failures: Check the failing step's output for assertion errors or tracebacks
Import errors: Make sure new dependencies are added to pyproject.toml
GPU memory: Some tests require specific GPU types (L40S, H100 NVL)
Kernel build: If you changed fastvideo-kernel/, the build may have failed

If the failure is unrelated to your changes, leave a comment explaining why.

mergify · 2026-03-30T06:07:45Z

❌ CI tests failed

@Satvikmatta18 — to see what failed:

Scroll to the Checks section below
Find the check marked with ❌ (e.g. buildkite/ci/microscope-transformer-tests)
Click Details to view the full build log

Or view all builds for this branch on Buildkite →

Common causes:

Assertion error / test failure — check the failing test's traceback
Import error — new dependency missing from pyproject.toml
OOM — some tests need specific GPUs (L40S, H100 NVL)

If the failure looks unrelated to your changes, comment why and a maintainer will review.

- Add WaypointWorldModel in fastvideo/models/dits/waypoint_transformer.py - Add WaypointConfig in fastvideo/configs/models/dits/waypoint_transformer.py - Add CtrlInput dataclass for controller inputs (mouse, buttons, scroll) - Add parity test for weight loading validation - Model architecture: 22 layers, 2560 dim, 40 heads (GQA with 20 KV heads) - Supports control conditioning via MLPFusion and prompt via cross-attention Ref: https://huggingface.co/Overworld/Waypoint-1-Small

…lfAttention uses DistributedAttention, CrossAttention uses LocalAttention - Moved set_forward_context to pipeline level (matching FastVideo pattern)

…EADME

Made-with: Cursor

… review)

The HF model google/umt5-xl reports UMT5ForConditionalGeneration as its architecture, but the registry only had UMT5EncoderModel. This caused the loader to fall back to TransformersModel which is unsupported. Made-with: Cursor

The register_configs() signature on main now requires workload_types. The Waypoint registration was missing it, causing a TypeError on import. Made-with: Cursor

mergify · 2026-04-09T23:13:12Z

Pre-commit checks failed

Hi @Satvikmatta18, the pre-commit checks have failed. To fix them locally:

# Install pre-commit if you haven't already
uv pip install pre-commit
pre-commit install

# Run all checks and auto-fix what's possible
pre-commit run --all-files

Common fixes:

yapf: yapf -i <file> (formatting)
ruff: ruff check --fix <file> (linting)
codespell: codespell --write-changes <file> (spelling)

After fixing, commit and push the changes. The checks will re-run automatically.

For future commits, pre-commit will run automatically on changed files before each commit.

Made-with: Cursor

mergify · 2026-04-09T23:33:08Z

Pre-commit checks failed

Hi @Satvikmatta18, the pre-commit checks have failed. To fix them locally:

# Install pre-commit if you haven't already
uv pip install pre-commit
pre-commit install

# Run all checks and auto-fix what's possible
pre-commit run --all-files

Common fixes:

yapf: yapf -i <file> (formatting)
ruff: ruff check --fix <file> (linting)
codespell: codespell --write-changes <file> (spelling)

After fixing, commit and push the changes. The checks will re-run automatically.

For future commits, pre-commit will run automatically on changed files before each commit.

Eigensystem · 2026-04-09T23:33:43Z

Hi @Satvikmatta18. Please run pre-commit locally and fix the errors. Thanks!

Made-with: Cursor

gemini-code-assist bot reviewed Feb 3, 2026

View reviewed changes

SolitaryThinker requested changes Feb 4, 2026

View reviewed changes

Comment thread examples/inference/basic/basic_waypoint_streaming.py Outdated

Comment thread fastvideo/models/dits/waypoint_transformer.py

Satvikmatta18 force-pushed the feat/waypoint-1-small branch from 290cfaa to 594e725 Compare February 9, 2026 22:57

Eigensystem self-requested a review February 10, 2026 07:16

Eigensystem added the go label Feb 12, 2026

Eigensystem reviewed Feb 12, 2026

View reviewed changes

Comment thread fastvideo/models/dits/waypoint_transformer.py

Eigensystem requested changes Feb 14, 2026

View reviewed changes

Comment thread fastvideo/models/dits/waypoint_transformer.py Outdated

Comment thread fastvideo/models/dits/waypoint_transformer.py Outdated

Comment thread fastvideo/models/dits/waypoint_transformer.py Outdated

Comment thread fastvideo/models/dits/waypoint_transformer.py Outdated

Eigensystem reviewed Feb 18, 2026

View reviewed changes

Eigensystem approved these changes Feb 20, 2026

View reviewed changes

Satvikmatta18 force-pushed the feat/waypoint-1-small branch 4 times, most recently from fb317da to fd47a7f Compare February 27, 2026 07:01

SolitaryThinker requested changes Mar 1, 2026

View reviewed changes

Comment thread fastvideo/pipelines/stages/__init__.py Outdated

Eigensystem removed the go label Mar 28, 2026

mergify bot added transformer labels Mar 28, 2026

mergify bot added lora needs-rebase PR has merge conflicts labels Mar 28, 2026

Eigensystem removed documentation labels Mar 30, 2026

Satvikmatta18 pushed a commit to Satvikmatta18/FastVideo that referenced this pull request Apr 5, 2026

[feat]: stabilize Waypoint streaming (hao-ai-lab#1058)

07ecff5

Satvik Matta and others added 12 commits April 9, 2026 16:05

[feat] Add Waypoint pipeline config and sampling params

8b20e6d

refactor(waypoint): use FastVideo attention layers directly - GatedSe…

e1568b6

…lfAttention uses DistributedAttention, CrossAttention uses LocalAttention - Moved set_forward_context to pipeline level (matching FastVideo pattern)

[waypoint] Address PR feedback: model_index.json errors, pymarkdown R…

2e2375d

…EADME

[style]: yapf reformat for CI

640e654

chore: remove _rebase_backup from repo

dfa929a

Made-with: Cursor

chore: revert .gitignore and .pre-commit-config to main

1a2ef2c

fix: remove StepvideoPromptEncodingStage from stages __init__.py (per…

f9a4500

… review)

[feat] waypoint: rebase on main, BaseDiT, basic_waypoint VideoGenerator

5d5ceab

[bugfix] Add missing workload_types to Waypoint register_configs call

364cf91

The register_configs() signature on main now requires workload_types. The Waypoint registration was missing it, causing a TypeError on import. Made-with: Cursor

[feat]: stabilize Waypoint streaming (hao-ai-lab#1058)

5525123

Satvikmatta18 force-pushed the feat/waypoint-1-small branch from b8de124 to 5525123 Compare April 9, 2026 23:05

Satvikmatta18 changed the title ~~feat: Add Waypoint-1-Small interactive world model support~~ [feat] Add Waypoint-1-Small interactive world model support Apr 9, 2026

mergify bot added type: feat New feature or capability and removed needs-rebase PR has merge conflicts labels Apr 9, 2026

[style]: fix ruff SIM108 ternary + yapf reformat

06eac77

Made-with: Cursor

[style]: yapf + ruff auto-format for pre-commit CI

cc76f71

Made-with: Cursor

Conversation

Satvikmatta18 commented Feb 3, 2026

Summary

Changes

Testing

References

Uh oh!

gemini-code-assist bot commented Feb 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Eigensystem left a comment

Choose a reason for hiding this comment

Uh oh!

Eigensystem commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Eigensystem left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SolitaryThinker commented Feb 15, 2026

Uh oh!

Eigensystem commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Satvikmatta18 commented Feb 16, 2026

Uh oh!

Eigensystem left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Mar 28, 2026

Uh oh!

mergify bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🔴 PR merge requirements

Uh oh!

mergify bot commented Mar 30, 2026

Pre-commit checks failed

Uh oh!

mergify bot commented Mar 30, 2026

Buildkite CI tests failed

Uh oh!

mergify bot commented Mar 30, 2026

Buildkite CI tests failed

Uh oh!

mergify bot commented Mar 30, 2026

❌ CI tests failed

Uh oh!

mergify bot commented Apr 9, 2026

Pre-commit checks failed

Uh oh!

mergify bot commented Apr 9, 2026

Pre-commit checks failed

Uh oh!

Eigensystem commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Eigensystem commented Feb 12, 2026 •

edited

Loading

Eigensystem commented Feb 16, 2026 •

edited

Loading

mergify bot commented Mar 30, 2026 •

edited

Loading