Skip to content

[feat] Add Waypoint-1-Small interactive world model support#1058

Open
Satvikmatta18 wants to merge 14 commits intohao-ai-lab:mainfrom
Satvikmatta18:feat/waypoint-1-small
Open

[feat] Add Waypoint-1-Small interactive world model support#1058
Satvikmatta18 wants to merge 14 commits intohao-ai-lab:mainfrom
Satvikmatta18:feat/waypoint-1-small

Conversation

@Satvikmatta18
Copy link
Copy Markdown

Summary

Adds support for Overworld's Waypoint-1-Small, a 2.3B parameter interactive world model for real-time video generation conditioned on text and controller inputs.

Changes

  • Transformer: fastvideo/models/dits/waypoint_transformer.py - Full DiT implementation with causal attention, GQA, and control conditioning
  • Config: fastvideo/configs/models/dits/waypoint_transformer.py - Architecture config
  • Pipeline: fastvideo/pipelines/basic/waypoint/ - Streaming pipeline with CtrlInput support
  • Pipeline Config: fastvideo/configs/pipelines/waypoint.py - Pipeline configuration
  • Sampling: fastvideo/configs/sample/waypoint.py - Default sampling parameters
  • Tests: Parity tests for transformer and pipeline smoke tests
  • Example: examples/inference/basic/basic_waypoint_streaming.py
  • Docs: Pipeline README

Testing

  • All transformer parity tests pass (weight loading + forward pass)
  • Pipeline smoke tests pass

References

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @Satvikmatta18, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the integration of Overworld's Waypoint-1-Small, an interactive world model, into the FastVideo framework. The primary goal is to enable real-time video generation that responds dynamically to both text prompts and user-controlled inputs like mouse movements and keyboard presses. This significantly expands the framework's capabilities by adding support for highly interactive and controllable video synthesis, moving beyond static text-to-video generation.

Highlights

  • Waypoint-1-Small Model Integration: Added comprehensive support for Overworld's Waypoint-1-Small, a 2.3B parameter interactive world model, enabling real-time video generation conditioned on text and controller inputs.
  • New Transformer Implementation: Introduced WaypointWorldModel with a 22-layer DiT architecture featuring causal attention, Grouped Query Attention (GQA), and dedicated control conditioning via MLP fusion.
  • Streaming Inference Pipeline: Developed a new streaming pipeline (WaypointPipeline) that supports interactive generation, allowing users to provide real-time mouse, keyboard, and scroll inputs to influence video output.
  • Configuration and Example Usage: Included detailed configuration files for the model architecture, pipeline settings, and sampling parameters, along with an interactive example script (basic_waypoint_streaming.py) and comprehensive documentation.
  • Testing and Validation: Implemented parity tests for the transformer to ensure weight loading and forward pass correctness, and smoke tests for the pipeline to validate basic functionality.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • examples/inference/basic/basic_waypoint_streaming.py
    • Added a new example script demonstrating basic streaming inference for the Waypoint-1-Small model.
    • Includes interactive keyboard and mouse control handling for real-time input.
  • fastvideo/configs/models/dits/init.py
    • Updated to import and expose the new WaypointConfig.
  • fastvideo/configs/models/dits/waypoint_transformer.py
    • Added a new configuration file defining the WaypointArchConfig for the Waypoint-1-Small transformer.
    • Specifies model architecture details including d_model, n_heads, n_kv_heads (GQA), n_layers, causal attention, and control/prompt conditioning parameters.
  • fastvideo/configs/pipelines/registry.py
    • Modified to import WaypointT2VConfig.
    • Registered WaypointT2VConfig under the model ID 'Overworld/Waypoint-1-Small' and added a detector for 'waypoint' or 'worldengine' IDs.
  • fastvideo/configs/pipelines/waypoint.py
    • Added a new pipeline configuration file for WaypointT2VConfig.
    • Defines specific settings for VAE (DCAE-based), UMT5-XL text encoder postprocessing, precision, and Waypoint-specific parameters like fixed sigma schedule and causal generation.
  • fastvideo/configs/sample/registry.py
    • Modified to import WaypointSamplingParam.
    • Registered WaypointSamplingParam for 'Overworld/Waypoint-1-Small' and added a detector for 'waypoint' or 'worldengine' IDs.
  • fastvideo/configs/sample/waypoint.py
    • Added a new sampling parameter configuration file for WaypointSamplingParam.
    • Specifies default video parameters (360p, 640x360, 60fps) and denoising parameters (4 inference steps, 1.0 guidance scale).
  • fastvideo/models/dits/waypoint_transformer.py
    • Added the core implementation of the WaypointWorldModel transformer.
    • Includes custom building blocks like MLP, AdaLN, CFG, ControllerInputEmbedding, NoiseConditioner, MLPFusion, CondHead, GatedSelfAttention (with GQA and per-head gating), and CrossAttention.
    • The WaypointBlock integrates these components for noise, prompt, and control conditioning.
    • The WaypointWorldModel handles patch embedding, transformer forward pass, and unpatching.
  • fastvideo/pipelines/basic/waypoint/README.md
    • Added documentation for the Waypoint-1-Small pipeline, covering overview, architecture, usage, control input format, configuration, related files, and hardware requirements.
  • fastvideo/pipelines/basic/waypoint/init.py
    • Added an __init__.py file to expose WaypointPipeline.
  • fastvideo/pipelines/basic/waypoint/waypoint_pipeline.py
    • Added the WaypointPipeline class, implementing the streaming inference logic.
    • Includes CtrlInput dataclass for controller inputs and StreamingContext for managing streaming state.
    • Provides streaming_reset, streaming_step, and streaming_clear methods for interactive generation.
  • fastvideo/pipelines/pipeline_registry.py
    • Updated _PIPELINE_NAME_TO_ARCHITECTURE_NAME to include WaypointPipeline.
  • tests/local_tests/pipelines/test_waypoint_pipeline_smoke.py
    • Added smoke tests for the Waypoint pipeline.
    • Verifies transformer loading, forward pass, pipeline import, and config loading.
  • tests/local_tests/transformers/test_waypoint_transformer.py
    • Added parity tests for the Waypoint transformer.
    • Ensures correct weight loading and functional forward pass, comparing against official checkpoint keys.
Activity
  • Initial implementation of the Waypoint-1-Small model and its associated streaming pipeline.
  • All transformer parity tests have passed, confirming correct weight loading and forward pass behavior.
  • Pipeline smoke tests have passed, validating the basic functionality of the interactive video generation pipeline.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the Waypoint-1-Small interactive world model. The changes are comprehensive, including the model implementation, configurations, a streaming pipeline, tests, documentation, and an example script. The implementation is well-structured. However, I've identified a critical issue where the Rotary Position Embeddings (RoPE) are not implemented, which will significantly impact model performance. I've also noted several areas for improvement regarding code duplication, best practices, and clarity. Please see the detailed comments for suggestions.

Comment thread fastvideo/models/dits/waypoint_transformer.py Outdated
Comment thread fastvideo/models/dits/waypoint_transformer.py Outdated
Comment thread examples/inference/basic/basic_waypoint_streaming.py Outdated
Comment thread examples/inference/basic/basic_waypoint_streaming.py Outdated
Comment thread examples/inference/basic/basic_waypoint_streaming.py Outdated
Comment thread examples/inference/basic/basic_waypoint_streaming.py Outdated
Comment thread fastvideo/configs/models/dits/waypoint_transformer.py Outdated
Comment thread fastvideo/pipelines/basic/waypoint/waypoint_pipeline.py Outdated
Comment thread examples/inference/basic/basic_waypoint_streaming.py Outdated
Comment thread fastvideo/models/dits/waypoint_transformer.py
@Eigensystem Eigensystem self-requested a review February 10, 2026 07:16
@Eigensystem Eigensystem added the go label Feb 12, 2026
Copy link
Copy Markdown
Collaborator

@Eigensystem Eigensystem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz fix pre-commit errors

@Eigensystem
Copy link
Copy Markdown
Collaborator

Eigensystem commented Feb 12, 2026

Hi @Satvikmatta18, I tried to generate a video but the results were all blurry. Did you check the correctness / compare it with diffusers results?
Cursor 2026-02-12 00 30 45

Comment thread fastvideo/models/dits/waypoint_transformer.py
Copy link
Copy Markdown
Collaborator

@Eigensystem Eigensystem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Satvikmatta18 . Could you please refer to the google doc to check whether your implementation matches all the requirement in architecture level? Thanks.
https://docs.google.com/document/d/1h7UOPEOsw9BwnHWGJLcm7FOAv4I1is3HWe2iWln6zyY/edit?tab=t.2w8cxbq3lg5x#heading=h.iaae91tz7zew

Comment thread fastvideo/models/dits/waypoint_transformer.py Outdated
Comment thread fastvideo/models/dits/waypoint_transformer.py Outdated
Comment thread fastvideo/models/dits/waypoint_transformer.py Outdated
Comment thread fastvideo/models/dits/waypoint_transformer.py Outdated
@SolitaryThinker
Copy link
Copy Markdown
Collaborator

could you also address the gemini comments? And resolve them if you have or they do not make sense. Thanks!

@Eigensystem
Copy link
Copy Markdown
Collaborator

Eigensystem commented Feb 16, 2026

The generated video looks still blurry on my side. Could you write a ssim test in fastvideo/tests/ssim to check the correctness? @Satvikmatta18
CleanShot 2026-02-15 at 19 18 42@2x

@Satvikmatta18
Copy link
Copy Markdown
Author

Hi @Eigensystem Eigensystem,
Thanks for the review. I’ve made these changes to address the blurriness:

  • flex_attention – Waypoint now uses flex_attention (when available) for the kv_cache=None path, aligned with WanVideo/MatrixGame.
  • Sigma schedule – Updated to the official schedule [1.0, 0.861, 0.729, 0.321, 0.0].
  • Reproducible noise – Switched to per-frame torch.Generator seeding for consistent results.
  • Seed passing – The --seed flag is now passed into reset() for reproducibility.
image

Copy link
Copy Markdown
Collaborator

@Eigensystem Eigensystem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz fix pre-commit errors

@Satvikmatta18 Satvikmatta18 force-pushed the feat/waypoint-1-small branch 4 times, most recently from fb317da to fd47a7f Compare February 27, 2026 07:01
Comment thread fastvideo/pipelines/stages/__init__.py Outdated
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 28, 2026

This PR has merge conflicts with the base branch. Please rebase:

git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease

@mergify mergify bot added lora needs-rebase PR has merge conflicts labels Mar 28, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 PR merge requirements

Waiting for:

  • #approved-reviews-by>=1
  • check-success=fastcheck-passed
  • check-success=full-suite-passed
  • check-success~=pre-commit
This rule is failing.
  • #approved-reviews-by>=1
  • check-success=fastcheck-passed
  • check-success=full-suite-passed
  • check-success~=pre-commit
  • title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model)\]

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2026

Pre-commit checks failed

Hi @Satvikmatta18, the pre-commit checks have failed. To fix them locally:

# Install pre-commit if you haven't already
uv pip install pre-commit
pre-commit install

# Run all checks and auto-fix what's possible
pre-commit run --all-files

Common fixes:

  • yapf: yapf -i <file> (formatting)
  • ruff: ruff check --fix <file> (linting)
  • codespell: codespell --write-changes <file> (spelling)

After fixing, commit and push the changes. The checks will re-run automatically.

For future commits, pre-commit will run automatically on changed files before each commit.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2026

Buildkite CI tests failed

Hi @Satvikmatta18, some Buildkite CI tests have failed. Check the build for details:
View Buildkite build →

Common causes:

  • Test failures: Check the failing step's output for assertion errors or tracebacks
  • Import errors: Make sure new dependencies are added to pyproject.toml
  • GPU memory: Some tests require specific GPU types (L40S, H100 NVL)
  • Kernel build: If you changed fastvideo-kernel/, the build may have failed

If the failure is unrelated to your changes, leave a comment explaining why.

1 similar comment
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2026

Buildkite CI tests failed

Hi @Satvikmatta18, some Buildkite CI tests have failed. Check the build for details:
View Buildkite build →

Common causes:

  • Test failures: Check the failing step's output for assertion errors or tracebacks
  • Import errors: Make sure new dependencies are added to pyproject.toml
  • GPU memory: Some tests require specific GPU types (L40S, H100 NVL)
  • Kernel build: If you changed fastvideo-kernel/, the build may have failed

If the failure is unrelated to your changes, leave a comment explaining why.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2026

❌ CI tests failed

@Satvikmatta18 — to see what failed:

  1. Scroll to the Checks section below
  2. Find the check marked with ❌ (e.g. buildkite/ci/microscope-transformer-tests)
  3. Click Details to view the full build log

Or view all builds for this branch on Buildkite →

Common causes:

  • Assertion error / test failure — check the failing test's traceback
  • Import error — new dependency missing from pyproject.toml
  • OOM — some tests need specific GPUs (L40S, H100 NVL)

If the failure looks unrelated to your changes, comment why and a maintainer will review.

Satvikmatta18 pushed a commit to Satvikmatta18/FastVideo that referenced this pull request Apr 5, 2026
Satvik Matta and others added 12 commits April 9, 2026 16:05
- Add WaypointWorldModel in fastvideo/models/dits/waypoint_transformer.py
- Add WaypointConfig in fastvideo/configs/models/dits/waypoint_transformer.py
- Add CtrlInput dataclass for controller inputs (mouse, buttons, scroll)
- Add parity test for weight loading validation
- Model architecture: 22 layers, 2560 dim, 40 heads (GQA with 20 KV heads)
- Supports control conditioning via MLPFusion and prompt via cross-attention

Ref: https://huggingface.co/Overworld/Waypoint-1-Small
…lfAttention uses DistributedAttention, CrossAttention uses LocalAttention - Moved set_forward_context to pipeline level (matching FastVideo pattern)
The HF model google/umt5-xl reports UMT5ForConditionalGeneration as its
architecture, but the registry only had UMT5EncoderModel. This caused
the loader to fall back to TransformersModel which is unsupported.

Made-with: Cursor
The register_configs() signature on main now requires workload_types.
The Waypoint registration was missing it, causing a TypeError on import.

Made-with: Cursor
@Satvikmatta18 Satvikmatta18 force-pushed the feat/waypoint-1-small branch from b8de124 to 5525123 Compare April 9, 2026 23:05
@Satvikmatta18 Satvikmatta18 changed the title feat: Add Waypoint-1-Small interactive world model support [feat] Add Waypoint-1-Small interactive world model support Apr 9, 2026
@mergify mergify bot added type: feat New feature or capability and removed needs-rebase PR has merge conflicts labels Apr 9, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Apr 9, 2026

Pre-commit checks failed

Hi @Satvikmatta18, the pre-commit checks have failed. To fix them locally:

# Install pre-commit if you haven't already
uv pip install pre-commit
pre-commit install

# Run all checks and auto-fix what's possible
pre-commit run --all-files

Common fixes:

  • yapf: yapf -i <file> (formatting)
  • ruff: ruff check --fix <file> (linting)
  • codespell: codespell --write-changes <file> (spelling)

After fixing, commit and push the changes. The checks will re-run automatically.

For future commits, pre-commit will run automatically on changed files before each commit.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Apr 9, 2026

Pre-commit checks failed

Hi @Satvikmatta18, the pre-commit checks have failed. To fix them locally:

# Install pre-commit if you haven't already
uv pip install pre-commit
pre-commit install

# Run all checks and auto-fix what's possible
pre-commit run --all-files

Common fixes:

  • yapf: yapf -i <file> (formatting)
  • ruff: ruff check --fix <file> (linting)
  • codespell: codespell --write-changes <file> (spelling)

After fixing, commit and push the changes. The checks will re-run automatically.

For future commits, pre-commit will run automatically on changed files before each commit.

@Eigensystem
Copy link
Copy Markdown
Collaborator

Hi @Satvikmatta18. Please run pre-commit locally and fix the errors. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

scope: inference Inference pipeline, serving, CLI scope: infra CI, tests, Docker, build scope: model Model architecture (DiTs, encoders, VAEs) type: feat New feature or capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants