Skip to content

Conversation

@Feng0w0
Copy link
Contributor

@Feng0w0 Feng0w0 commented Jan 21, 2026

By submitting PR to the xfuser and yunchang repositories, these two repositories already natively support NPU.
Supplement the installation methods of these two repositories on NPU and the NPU adaptation code for USP features for Diffsynth-Studio

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Feng0w0, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the DiffSynth-Studio framework by integrating native support for Neural Processing Units (NPUs), specifically for the Unified Sequence Parallel (USP) feature. The changes involve adapting core attention mechanisms to leverage NPU capabilities, optimizing tensor operations for NPU environments, and providing comprehensive documentation and example scripts to facilitate NPU setup, inference, and training for users. This broadens the hardware compatibility and potential performance of the framework on NPU devices.

Highlights

  • NPU Support for USP Feature: Integrated NPU (Neural Processing Unit) support for the Unified Sequence Parallel (USP) feature within the xfuser long-context attention mechanism, enabling NPU-optimized attention types and ring implementation.
  • NPU-Specific Tensor Handling: Modified the pad_freqs function to handle tensor padding operations with NPU compatibility in mind, specifically by moving tensors to CPU for concatenation before returning them to the original device.
  • Comprehensive NPU Documentation: Added extensive documentation in both English and Chinese, detailing the installation of necessary third-party libraries for USP on NPU, how to configure code for NPU inference, and guidelines for NPU training, including environment variables and model-specific parameters.
  • NPU Training Examples: Provided new example shell scripts for NPU training, demonstrating the use of performance optimization environment variables and specific command-line arguments for NPU-enabled model training.
  • Streamlined NPU Installation: Introduced optional dependency groups in pyproject.toml for npu_aarch64 and npu, simplifying the installation process for torch-npu and related packages across different NPU architectures.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the USP (Unified Sequence Parallel) feature on NPU devices. The changes include adapting the code to use NPU-specific attention types and workarounds for tensor operations. Documentation is updated with instructions for NPU setup and usage, and a new training script for NPU is added. The overall approach is sound, but I've found a few issues that need addressing: a potential performance degradation on non-NPU devices, a duplicated configuration in pyproject.toml which will break package installation, and several temporary documentation files that seem to have been committed by mistake.

pyproject.toml Outdated
Comment on lines 35 to 45
[project.optional-dependencies]
npu_aarch64 = [
"torch==2.7.1",
"torch-npu==2.7.1",
"torchvision==0.22.1"
]
npu = [
"torch==2.7.1+cpu",
"torch-npu==2.7.1",
"torchvision==0.22.1+cpu"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The [project.optional-dependencies] table is duplicated in this file, which will cause parsing errors and break package installation. Please remove one of the occurrences to ensure the pyproject.toml file is valid. This likely happened due to a merge/rebase error.

Comment on lines 41 to 43
device='cpu')
original_tensor_device = original_tensor.device
padded_tensor = torch.cat([original_tensor.cpu(), padding_tensor], dim=0).to(device=original_tensor_device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This change forces tensor operations to be performed on the CPU, which might be a necessary workaround for NPU devices. However, this will introduce performance overhead for other devices like CUDA due to data transfers between CPU and GPU. It would be better to apply this logic conditionally, only for NPU devices, to avoid performance regressions on other platforms.

@@ -0,0 +1,86 @@
# GPU/NPU 支持
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file, along with GPU_support_样例2.md, Setup_NPU_样例1.md, and Setup_NPU_样例2.md, seems to be a temporary or example file (样例 means 'example'). It appears they were committed by mistake. Please consider removing them from the pull request to keep the repository clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant