Sync MOE layer input quantizer only by jenchen13 · Pull Request #903 · NVIDIA/Model-Optimizer

jenchen13 · 2026-02-18T17:32:37Z

What does this PR do?

Type of change: Bug fix

Overview: in MOE layer we currently sync both the weight and input quantizers so that all experts have the same weight amaxes and activation amaxes.

VLLM/TRTLLM actually support non-uniform weight amaxes in MOE so we only need to sync the activation amaxes.

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Bug Fixes
- Improved input quantizer synchronization for Mixture of Experts models to ensure correct amax value handling across local experts.
Documentation
- Fixed typos and clarified wording in quantization documentation.
Tests
- Added test coverage for Mixture of Experts quantizer synchronization functionality.

coderabbitai · 2026-02-18T17:32:59Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This change restricts layer_sync_moe_local_experts_amax to exclusively process input_quantizer amax values from TensorQuantizer instances, preventing unintended synchronization of other quantizers. Documentation is updated for clarity, and a new test validates the synchronization behavior across local MoE experts.

Changes

Cohort / File(s)	Summary
MoE Quantizer Synchronization Logic `modelopt/torch/quantization/plugins/megatron.py`	Modified `layer_sync_moe_local_experts_amax` to filter and sync only input_quantizer amax values instead of all TensorQuantizer instances. Fixed documentation typos (garuantee, synchonizing, lyaer).
Test Coverage for MoE Quantizer Sync `tests/gpu_megatron/torch/quantization/plugins/test_megatron.py`	Added `test_layer_sync_moe_local_experts_amax` test that verifies amax synchronization across local MoE experts with grouped and non-grouped configurations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Sync MOE layer input quantizer only' directly and clearly describes the main change: restricting synchronization to input quantizers only, not weight quantizers.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch jennifchen/moe_sync_input

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

modelopt/torch/quantization/plugins/megatron.py (1)
593-618: Core logic change LGTM; minor redundancy in the apply loop.

Restricting synchronization to name == "input_quantizer" is correct: input activations must be uniform across all experts to avoid EP-distributed-sync deadlocks, while weight amaxes are intentionally left non-uniform for VLLM/TRTLLM.

One nit: in the apply loop (lines 613–617), name in amax_dict and name == "input_quantizer" are redundant — amax_dict can only ever hold "input_quantizer" as a key. The name in amax_dict check implicitly handles the empty-dict (no calibrated amax) case. Consider keeping only the more general and intention-revealing form:
♻️ Optional simplification
         for expert in self.local_experts:
             for name, module in expert.named_modules():
                 if (
                     isinstance(module, TensorQuantizer)
                     and name in amax_dict
-                    and name == "input_quantizer"
                 ):
                     module.amax = amax_dict[name].detach().clone()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/plugins/megatron.py` around lines 593 - 618, The
apply loop redundantly checks both name == "input_quantizer" and name in
amax_dict; instead simplify by removing the membership check and safely lookup
from amax_dict: inside the second loop over self.local_experts and
expert.named_modules() when isinstance(module, TensorQuantizer) and name ==
"input_quantizer", do amax = amax_dict.get(name) and if amax is not None set
module.amax = amax.detach().clone(); reference symbols: amax_dict,
self.local_experts, TensorQuantizer, name == "input_quantizer", module.amax.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/plugins/megatron.py`:
- Around line 578-591: Fix two minor docstring nits in the megatron.py docstring
that describes the sync of input quantizer amax across local experts in a
SequentialMLP: insert a missing space after "amax." so it reads "amax. This",
and change "there are logic" to "there is logic" in the Note paragraph; update
the docstring associated with the sync function in
modelopt/torch/quantization/plugins/megatron.py (the function that syncs input
quantizer amax across local experts in a SequentialMLP) accordingly.

In `@tests/gpu_megatron/torch/quantization/plugins/test_megatron.py`:
- Around line 738-768: Add a pytest parametrize decorator and convert this test
into a MP-spawned helper to run with EP=2: create a helper (e.g.
_test_layer_sync_moe_local_experts_amax_helper(rank, size)) and call it via
spawn_multiprocess_job from the top-level test decorated with
`@pytest.mark.parametrize`("moe_grouped_gemm", [True, False]) and using the
need_2_gpus fixture; inside the helper call initialize_for_megatron with
expert_model_parallel_size=2, obtain the forward closure correctly by calling
forward(model) (not forward()), uncomment and run mtq.quantize(...) to perform
calibration so expert.input_quantizer.amax is set, iterate model.named_modules()
and for each module instance of _MegatronSequentialMLP call
module.layer_sync_moe_local_experts_amax() and then inspect module.local_experts
for matching amax values, remove the stray print(model) and ensure
destroy_model_parallel() is called at the end of the helper.

---

Nitpick comments:
In `@modelopt/torch/quantization/plugins/megatron.py`:
- Around line 593-618: The apply loop redundantly checks both name ==
"input_quantizer" and name in amax_dict; instead simplify by removing the
membership check and safely lookup from amax_dict: inside the second loop over
self.local_experts and expert.named_modules() when isinstance(module,
TensorQuantizer) and name == "input_quantizer", do amax = amax_dict.get(name)
and if amax is not None set module.amax = amax.detach().clone(); reference
symbols: amax_dict, self.local_experts, TensorQuantizer, name ==
"input_quantizer", module.amax.

modelopt/torch/quantization/plugins/megatron.py

tests/gpu_megatron/torch/quantization/plugins/test_megatron.py

codecov · 2026-02-18T17:44:12Z

Codecov Report

❌ Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 73.54%. Comparing base (9e38041) to head (023b0a3).

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/model_calib.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #903   +/-   ##
=======================================
  Coverage   73.54%   73.54%           
=======================================
  Files         205      205           
  Lines       22000    22001    +1     
=======================================
+ Hits        16179    16180    +1     
  Misses       5821     5821

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tests/gpu_megatron/torch/quantization/plugins/test_megatron.py

modelopt/torch/quantization/plugins/megatron.py

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

realAsma · 2026-02-18T20:15:31Z

tests/gpu_megatron/torch/quantization/plugins/test_megatron.py

can we do an else condition test as well (i.e when this is False, the weight_quantizer amax should be different between local_experts.

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 requested a review from a team as a code owner February 18, 2026 17:32

jenchen13 requested a review from realAsma February 18, 2026 17:32

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

modelopt/torch/quantization/plugins/megatron.py Outdated Show resolved Hide resolved

tests/gpu_megatron/torch/quantization/plugins/test_megatron.py Outdated Show resolved Hide resolved

jenchen13 commented Feb 18, 2026

View reviewed changes

tests/gpu_megatron/torch/quantization/plugins/test_megatron.py Show resolved Hide resolved

jenchen13 requested a review from ChenhanYu February 18, 2026 19:29

realAsma reviewed Feb 18, 2026

View reviewed changes

modelopt/torch/quantization/plugins/megatron.py Outdated Show resolved Hide resolved

realAsma reviewed Feb 18, 2026

View reviewed changes

modelopt/torch/quantization/plugins/megatron.py Outdated Show resolved Hide resolved

realAsma approved these changes Feb 18, 2026

View reviewed changes

sync moe input quantizer only

28d5686

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 force-pushed the jennifchen/moe_sync_input branch from a290003 to 28d5686 Compare February 18, 2026 20:06

realAsma reviewed Feb 18, 2026

View reviewed changes

jenchen13 added 3 commits February 18, 2026 20:16

update configs with shared_moe_weight_scale

f74cd24

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

check when weight quantizers not same

002e6c6

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

fix test

023b0a3

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

realAsma requested review from cjluo-nv and meenchen February 18, 2026 22:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync MOE layer input quantizer only#903

Sync MOE layer input quantizer only#903
jenchen13 wants to merge 4 commits intomainfrom
jennifchen/moe_sync_input

jenchen13 commented Feb 18, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 18, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

realAsma Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

jenchen13 commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

realAsma Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

jenchen13 commented Feb 18, 2026 •

edited

Loading

coderabbitai bot commented Feb 18, 2026 •

edited

Loading

codecov bot commented Feb 18, 2026 •

edited

Loading