Skip to content

POTQ with QWEN Image Asymmetric#905

Open
AliesTaha wants to merge 4 commits intoNVIDIA:mainfrom
AliesTaha:at/qwenimageptq
Open

POTQ with QWEN Image Asymmetric#905
AliesTaha wants to merge 4 commits intoNVIDIA:mainfrom
AliesTaha:at/qwenimageptq

Conversation

@AliesTaha
Copy link

@AliesTaha AliesTaha commented Feb 18, 2026

What does this PR do?

Type of change: New example
Overview: Add Qwen-Image-2512 asymmetric NVFP4 PTQ support with single-layer hidden output filtering, laying groundwork for QAD on the PTQ checkpoint.

Usage

python quantize.py --model qwen-image-2512 --format nvfp4
python quantize.py --model qwen-image-2512 --format nvfp4

Testing

Ran end-to-end asymmetric PTQ calibration with 1024 calibration samples on Qwen-Image-2512; verified quantized checkpoint saves correctly.

Before your PR is "Ready for review"

Is this change backward compatible?: Yes
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: No
Did you update Changelog?: No

Additional Information

Building toward QAD (Quantization-Aware Distillation) on the asymmetric PTQ checkpoint. ONNX export is temporarily commented out pending a fix.

Summary by CodeRabbit

  • New Features

    • Added support for a new Qwen image diffusion model with preset inference defaults (1024×1024, guidance settings).
    • Introduced a new FP4 asymmetric quantization configuration and a model-specific quantization filter to improve compatibility and performance.
    • Improved checkpoint saving to better handle direct file paths.
  • Chores

    • Updated ignore rules to exclude experiment run directories.
    • Temporarily disabled ONNX export steps in the export workflow.

@AliesTaha AliesTaha requested a review from a team as a code owner February 18, 2026 20:02
@AliesTaha AliesTaha requested a review from kaix-nv February 18, 2026 20:02
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 18, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

No actionable comments were generated in the recent review. 🎉


📝 Walkthrough

Walkthrough

Adds NVFP4_ASYMMETRIC_CONFIG, a QWEN_IMAGE_2512 model type and defaults, a qwen-specific filter function, adjusts FP4 selection and checkpoint path handling in quantize flow, comments out ONNX export steps, and updates .gitignore to ignore an experiment run directory.

Changes

Cohort / File(s) Summary
Configuration & Quantize
examples/diffusers/quantization/config.py, examples/diffusers/quantization/quantize.py
Introduce NVFP4_ASYMMETRIC_CONFIG and use it in the FP4 selection path for non-flux models. Adjust checkpoint saving: treat .pt targets as file paths and ensure parent dir exists. Commented-out ONNX export steps and related import usage.
Model registry & defaults
examples/diffusers/quantization/models_utils.py
Add ModelType.QWEN_IMAGE_2512, register "Qwen/Qwen-Image-2512" in MODEL_REGISTRY, map to DiffusionPipeline in MODEL_PIPELINE, and add MODEL_DEFAULTS entry (backbone "transformer", dataset _SD_PROMPTS_DATASET, inference_extra_args including height/width 1024, guidance_scale 4.0, and a Chinese negative_prompt).
Filter utilities
examples/diffusers/quantization/utils.py
Add filter_func_qwen_image(name: str) -> bool using a regex to exclude specific non-transformer module names (time_text_embed, img_in, txt_in, norm_out, proj_out) from quantization.
Repository config
.gitignore
Add examples/diffusers/quantization/experiment_run to ignored paths.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant User
participant QuantizeScript as quantize.py
participant ModelRegistry as models_utils.py
participant FilterUtils as utils.py
participant Filesystem
User->>QuantizeScript: request quantize(model_type, output_path, options)
QuantizeScript->>ModelRegistry: resolve model metadata for ModelType.QWEN_IMAGE_2512
ModelRegistry-->>QuantizeScript: return model_id, pipeline, defaults
QuantizeScript->>FilterUtils: select filter_func (filter_func_qwen_image)
FilterUtils-->>QuantizeScript: return filter decision per module
QuantizeScript->>QuantizeScript: choose quant config (NVFP4_ASYMMETRIC_CONFIG for FP4 path)
QuantizeScript->>Filesystem: ensure parent dir if output_path endswith .pt
QuantizeScript->>Filesystem: save quantized checkpoint

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'POTQ with QWEN Image Asymmetric' accurately describes the main change: adding post-training quantization (POTQ) support for QWEN Image 2512 with asymmetric FP4 quantization.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: AliesTaha <ali.taha@baseten.co>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/diffusers/quantization/quantize.py (1)

133-137: ⚠️ Potential issue | 🔴 Critical

Breaking change: all non-flux FP4 models now use NVFP4_ASYMMETRIC_CONFIG instead of NVFP4_DEFAULT_CONFIG.

This else branch applies to every non-flux FP4 model (SDXL, SD3, SD3.5, LTX, LTX2, WAN, and the new Qwen-Image). The switch from symmetric to asymmetric quantization changes calibration behavior for all existing models, not just Qwen-Image-2512. If asymmetric is only intended for Qwen-Image, gate it on the model type:

Proposed fix
         elif self.config.format == QuantFormat.FP4:
             if self.model_config.model_type.value.startswith("flux"):
                 quant_config = NVFP4_FP8_MHA_CONFIG
+            elif self.model_config.model_type == ModelType.QWEN_IMAGE_2512:
+                quant_config = NVFP4_ASYMMETRIC_CONFIG
             else:
-                quant_config = NVFP4_ASYMMETRIC_CONFIG
+                quant_config = NVFP4_DEFAULT_CONFIG
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/diffusers/quantization/quantize.py` around lines 133 - 137, The FP4
branch currently assigns NVFP4_ASYMMETRIC_CONFIG for every non-flux model which
changes calibration for many models; change the else branch in the FP4 handling
(the block using self.config.format == QuantFormat.FP4 and
model_config.model_type.value) to only select NVFP4_ASYMMETRIC_CONFIG for the
specific Qwen-Image model identifier (e.g., check model_config.model_type.value
contains or equals the Qwen-Image token such as "qwen-image" or
"Qwen-Image-2512"), otherwise fall back to NVFP4_DEFAULT_CONFIG (or the previous
NVFP4_DEFAULT_CONFIG symbol) for all other non-flux models; ensure you keep
NVFP4_FP8_MHA_CONFIG for flux models as before.
🧹 Nitpick comments (4)
.gitignore (1)

63-64: Consider adding a trailing / to explicitly mark this as a directory pattern.

Without the trailing slash, git will also suppress a regular file named experiment_run at that path. Since the intent is to ignore an output directory, the more idiomatic form is:

 # Ignore experiment run
-examples/diffusers/quantization/experiment_run
+examples/diffusers/quantization/experiment_run/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore around lines 63 - 64, The ignore pattern
"examples/diffusers/quantization/experiment_run" should explicitly mark a
directory by appending a trailing slash; update that entry to
"examples/diffusers/quantization/experiment_run/" so Git ignores the directory
only (not a same-named file) and preserves the intended behavior.
examples/diffusers/quantization/quantize.py (2)

36-36: Commented-out import left in the file.

If ONNX export is disabled, consider removing the commented import entirely and adding it back when export is re-enabled, rather than leaving dead code. At minimum, the TODO comment from line 268 should be referenced here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/diffusers/quantization/quantize.py` at line 36, Remove the dead
commented import line referencing generate_fp8_scales and modelopt_export_sd
from onnx_utils.export; either delete the commented import entirely or, if you
must keep it as a reminder, replace it with a short TODO comment that references
the existing TODO at line 268 and explains to re-enable generate_fp8_scales and
modelopt_export_sd when ONNX export is turned back on.

268-282: export_onnx is now a no-op that still performs expensive device transfers.

Both generate_fp8_scales (line 269) and modelopt_export_sd (lines 278-280) are commented out, yet the method still moves the pipeline to CPU, clears CUDA cache, and moves backbone to CUDA (lines 271-273). If ONNX export is non-functional, consider either returning early with a TODO or commenting out the entire body to avoid wasted GPU memory operations.

Proposed fix
     def export_onnx(
         self,
         pipe: DiffusionPipeline,
         backbone: torch.nn.Module,
         model_type: ModelType,
         quant_format: QuantFormat,
     ) -> None:
         if not self.config.onnx_dir:
             return
 
-        self.logger.info(f"Starting ONNX export to {self.config.onnx_dir}")
-
-        if quant_format == QuantFormat.FP8 and self._has_conv_layers(backbone):
-            self.logger.info(
-                "Detected quantizing conv layers in backbone. Generating FP8 scales..."
-            )
-            # TODO: needs a fix, commenting out for now
-            # generate_fp8_scales(backbone)
-        self.logger.info("Preparing models for export...")
-        pipe.to("cpu")
-        torch.cuda.empty_cache()
-        backbone.to("cuda")
-        # Export to ONNX
-        backbone.eval()
-        with torch.no_grad():
-            self.logger.info("Exporting to ONNX...")
-            # modelopt_export_sd(
-            #     backbone, str(self.config.onnx_dir), model_type.value, quant_format.value
-            # )
-
-        self.logger.info("ONNX export completed successfully")
+        # TODO: ONNX export is currently disabled pending a fix for generate_fp8_scales
+        # and modelopt_export_sd. Re-enable once resolved.
+        self.logger.warning("ONNX export is currently disabled. Skipping.")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/diffusers/quantization/quantize.py` around lines 268 - 282, The
export_onnx flow is currently a no-op but still does expensive device transfers
(pipe.to("cpu"), torch.cuda.empty_cache(), backbone.to("cuda")); modify
export_onnx to short-circuit early when ONNX export is disabled or the export
calls (generate_fp8_scales and modelopt_export_sd) are commented out — e.g.,
check a flag or self.config.onnx_dir and return immediately with a log message
before moving pipe/backbone between devices; update the export_onnx method to
only perform device transfers and call generate_fp8_scales or modelopt_export_sd
when the export is actually going to run, referencing the existing symbols
export_onnx, generate_fp8_scales, modelopt_export_sd, pipe, and backbone to
locate the code to change.
examples/diffusers/quantization/config.py (1)

67-87: Missing *softmax_quantizer entry compared to NVFP4_DEFAULT_CONFIG.

NVFP4_DEFAULT_CONFIG (line 58-61) includes a *softmax_quantizer with num_bits: (4, 3), but NVFP4_ASYMMETRIC_CONFIG omits it entirely. If softmax quantization is intended for asymmetric FP4 as well, this needs to be added. If it's intentionally excluded, a brief comment explaining why would help future maintainers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/diffusers/quantization/config.py` around lines 67 - 87,
NVFP4_ASYMMETRIC_CONFIG is missing the *softmax_quantizer entry present in
NVFP4_DEFAULT_CONFIG; either add a "*softmax_quantizer" key to
NVFP4_ASYMMETRIC_CONFIG with the same settings (e.g., "num_bits": (4, 3) and any
other matching properties) so softmax quantization is enabled for the asymmetric
FP4 config, or if exclusion is intentional, add a one-line comment next to
NVFP4_ASYMMETRIC_CONFIG explaining why *softmax_quantizer is omitted to avoid
confusion for future maintainers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/diffusers/quantization/models_utils.py`:
- Around line 201-210: The negative_prompt string in the
ModelType.QWEN_IMAGE_2512 entry contains Chinese characters and should be
annotated to silence Ruff's ambiguous-unicode-character warning; update the
negative_prompt line inside the ModelType.QWEN_IMAGE_2512 dict (in
models_utils.py) to append the noqa directive (# noqa: RUF001) so the linter
ignores this ambiguous-unicode-character rule for that string.

In `@examples/diffusers/quantization/utils.py`:
- Around line 47-52: The docstring for filter_func_qwen_image incorrectly states
"6 standalone layers" while the regex in filter_func_qwen_image lists only five
alternatives (time_text_embed, img_in, txt_in, norm_out, proj_out); update the
function to make them consistent by either adding the missing sixth layer name
into the regex if there is a known layer to include, or simply change the
docstring to "5 standalone layers" to match the current pattern; reference
function filter_func_qwen_image and the regex pattern
r".*(time_text_embed|img_in|txt_in|norm_out|proj_out).*" when making the edit.

---

Outside diff comments:
In `@examples/diffusers/quantization/quantize.py`:
- Around line 133-137: The FP4 branch currently assigns NVFP4_ASYMMETRIC_CONFIG
for every non-flux model which changes calibration for many models; change the
else branch in the FP4 handling (the block using self.config.format ==
QuantFormat.FP4 and model_config.model_type.value) to only select
NVFP4_ASYMMETRIC_CONFIG for the specific Qwen-Image model identifier (e.g.,
check model_config.model_type.value contains or equals the Qwen-Image token such
as "qwen-image" or "Qwen-Image-2512"), otherwise fall back to
NVFP4_DEFAULT_CONFIG (or the previous NVFP4_DEFAULT_CONFIG symbol) for all other
non-flux models; ensure you keep NVFP4_FP8_MHA_CONFIG for flux models as before.

---

Nitpick comments:
In @.gitignore:
- Around line 63-64: The ignore pattern
"examples/diffusers/quantization/experiment_run" should explicitly mark a
directory by appending a trailing slash; update that entry to
"examples/diffusers/quantization/experiment_run/" so Git ignores the directory
only (not a same-named file) and preserves the intended behavior.

In `@examples/diffusers/quantization/config.py`:
- Around line 67-87: NVFP4_ASYMMETRIC_CONFIG is missing the *softmax_quantizer
entry present in NVFP4_DEFAULT_CONFIG; either add a "*softmax_quantizer" key to
NVFP4_ASYMMETRIC_CONFIG with the same settings (e.g., "num_bits": (4, 3) and any
other matching properties) so softmax quantization is enabled for the asymmetric
FP4 config, or if exclusion is intentional, add a one-line comment next to
NVFP4_ASYMMETRIC_CONFIG explaining why *softmax_quantizer is omitted to avoid
confusion for future maintainers.

In `@examples/diffusers/quantization/quantize.py`:
- Line 36: Remove the dead commented import line referencing generate_fp8_scales
and modelopt_export_sd from onnx_utils.export; either delete the commented
import entirely or, if you must keep it as a reminder, replace it with a short
TODO comment that references the existing TODO at line 268 and explains to
re-enable generate_fp8_scales and modelopt_export_sd when ONNX export is turned
back on.
- Around line 268-282: The export_onnx flow is currently a no-op but still does
expensive device transfers (pipe.to("cpu"), torch.cuda.empty_cache(),
backbone.to("cuda")); modify export_onnx to short-circuit early when ONNX export
is disabled or the export calls (generate_fp8_scales and modelopt_export_sd) are
commented out — e.g., check a flag or self.config.onnx_dir and return
immediately with a log message before moving pipe/backbone between devices;
update the export_onnx method to only perform device transfers and call
generate_fp8_scales or modelopt_export_sd when the export is actually going to
run, referencing the existing symbols export_onnx, generate_fp8_scales,
modelopt_export_sd, pipe, and backbone to locate the code to change.

AliesTaha and others added 2 commits February 18, 2026 12:08
Signed-off-by: AliesTaha <ali.taha@baseten.co>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: AliesTaha <ali.taha@baseten.co>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: AliesTaha <ali.taha@baseten.co>
Co-authored-by: Cursor <cursoragent@cursor.com>
@AliesTaha
Copy link
Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
examples/diffusers/quantization/models_utils.py (1)

201-210: Optional: Move QWEN_IMAGE_2512 entry to the end of MODEL_DEFAULTS for consistency.

In every other mapping (ModelType enum, filter_func_map, MODEL_REGISTRY, MODEL_PIPELINE), QWEN_IMAGE_2512 appears after WAN22_T2V_5b. The MODEL_DEFAULTS entry is the only one where it's inserted between the two WAN entries, splitting the WAN family. No functional impact, but the inconsistency makes it harder to scan across dicts.

♻️ Proposed fix
-    ModelType.WAN22_T2V_14b: {
+    ModelType.QWEN_IMAGE_2512: {
+        "backbone": "transformer",
+        "dataset": _SD_PROMPTS_DATASET,
+        "inference_extra_args": {
+            "height": 1024,
+            "width": 1024,
+            "guidance_scale": 4.0,
+            "negative_prompt": "低分辨率,低画质,肢体畸形,手指畸形,画面过饱和,蜡像感,人脸无细节,过度光滑,画面具有AI感。构图混乱。文字模糊,扭曲。",  # noqa: RUF001
+        },
+    },
+    ModelType.WAN22_T2V_14b: {
         **_WAN_BASE_CONFIG,
         ...
     },
-    ModelType.QWEN_IMAGE_2512: {
-        "backbone": "transformer",
-        "dataset": _SD_PROMPTS_DATASET,
-        "inference_extra_args": {
-            "height": 1024,
-            "width": 1024,
-            "guidance_scale": 4.0,
-            "negative_prompt": "低分辨率,低画质,肢体畸形,手指畸形,画面过饱和,蜡像感,人脸无细节,过度光滑,画面具有AI感。构图混乱。文字模糊,扭曲。",  # noqa: RUF001
-        },
-    },
     ModelType.WAN22_T2V_5b: {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/diffusers/quantization/models_utils.py` around lines 201 - 210, The
MODEL_DEFAULTS mapping has ModelType.QWEN_IMAGE_2512 placed between the
WAN22_T2V entries which breaks the ordering consistency with other mappings;
move the ModelType.QWEN_IMAGE_2512 entry in MODEL_DEFAULTS so it appears after
the WAN22_T2V_5b entry (i.e., place QWEN_IMAGE_2512 at the end of the
MODEL_DEFAULTS dict), preserving the existing entry content (backbone, dataset,
inference_extra_args) and comments so all dicts share the same ordering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@examples/diffusers/quantization/models_utils.py`:
- Around line 201-210: The MODEL_DEFAULTS mapping has ModelType.QWEN_IMAGE_2512
placed between the WAN22_T2V entries which breaks the ordering consistency with
other mappings; move the ModelType.QWEN_IMAGE_2512 entry in MODEL_DEFAULTS so it
appears after the WAN22_T2V_5b entry (i.e., place QWEN_IMAGE_2512 at the end of
the MODEL_DEFAULTS dict), preserving the existing entry content (backbone,
dataset, inference_extra_args) and comments so all dicts share the same
ordering.

Signed-off-by: AliesTaha <ali.taha@baseten.co>
Co-authored-by: Cursor <cursoragent@cursor.com>
@AliesTaha
Copy link
Author

@coderabbitai check again

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

@AliesTaha: I'll review the changes again for you.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments