Skip to content

Set 5090 not support TMA and fix only audio_server#1237

Merged
hiworldwzj merged 6 commits intomainfrom
qwen3_omni_dev
Mar 18, 2026
Merged

Set 5090 not support TMA and fix only audio_server#1237
hiworldwzj merged 6 commits intomainfrom
qwen3_omni_dev

Conversation

@WANDY666
Copy link
Contributor

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the lightllm framework by improving its handling of hardware-specific features and resource management. It introduces a configurable option to optimize shared memory usage within the audio server by allowing vision-related data initialization to be conditionally disabled. Furthermore, it refines the detection and compatibility logic for Triton's Tensor Memory Accelerator (TMA) feature, ensuring that specific GPU models, like the 5090 series, are correctly identified and handled to prevent potential issues with TMA support.

Highlights

  • Audio Server Shared Memory Control: Introduced a new init_shm_data parameter to the audio server's model initialization, allowing for conditional disabling of vision-related shared memory data based on the disable_vision argument.
  • NVIDIA GPU Detection: Added a new utility function is_nvidia() to accurately identify NVIDIA CUDA-enabled GPUs.
  • Triton TMA Compatibility: Updated the triton_support_tensor_descriptor logic to leverage the new is_nvidia() check and explicitly disable Tensor Memory Accelerator (TMA) support for 5090 series GPUs, even if their compute capability is otherwise sufficient.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a new is_nvidia utility function and refines the triton_support_tensor_descriptor logic to specifically handle 5090 GPUs, aligning with the title's intent to exclude 5090 from TMA support. It also propagates an init_shm_data flag to the CpuEmbedCacheClient initialization. However, there is a potential semantic mismatch in how the init_shm_data flag is derived from self.args.disable_vision.

"rank_id": rank_id,
"cache_port": self.cache_port,
"data_type": self.args.data_type,
"init_shm_data": self.args.disable_vision,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The init_shm_data parameter is being set using self.args.disable_vision. While this might functionally work if disable_vision happens to have the correct boolean value, its name suggests a purpose related to vision features, not shared memory data initialization for audio. This could lead to confusion or incorrect behavior if the meaning of disable_vision changes or if it's not always aligned with the intent of init_shm_data. It would be clearer and more robust to introduce a dedicated argument for init_shm_data or rename disable_vision if its scope has expanded.

@hiworldwzj hiworldwzj merged commit ec39ea2 into main Mar 18, 2026
1 check passed
@hiworldwzj hiworldwzj deleted the qwen3_omni_dev branch March 18, 2026 08:40
@hiworldwzj hiworldwzj restored the qwen3_omni_dev branch March 18, 2026 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants