Fix DeepSpeed import crash on runtime-only CUDA and improve NVFP4 uncalibrated weight error#896
Fix DeepSpeed import crash on runtime-only CUDA and improve NVFP4 uncalibrated weight error#896debo3 wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthroughThis PR expands error handling in quantization utilities: the deepspeed compatibility function now catches additional exception types (FileNotFoundError, RuntimeError), and NVFP4 tensor weight scaling computation replaces assertion with explicit validation and error messaging. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/quantization/plugins/transformers.py`:
- Around line 29-34: The current except clause "except (ImportError,
FileNotFoundError, RuntimeError):" can silently swallow real DeepSpeed
initialization errors; update it to capture the exception object (e.g., "except
(ImportError, FileNotFoundError, RuntimeError) as e:") and then handle
RuntimeError specially by either logging a warning with the exception details
(processLogger.warn / warning) or by inspecting e.args[0] / str(e) and only
suppressing when the message indicates missing nvcc/cuda (check for substrings
like "nvcc" or "cuda"), otherwise re-raise the RuntimeError so real failures
surface; keep FileNotFoundError and ImportError handling as-is but include the
error in logs for diagnostics.
🧹 Nitpick comments (1)
🤖 Fix all nitpicks with AI agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/plugins/transformers.py`: - Around line 29-34: The current except clause "except (ImportError, FileNotFoundError, RuntimeError):" can silently swallow real DeepSpeed initialization errors; update it to capture the exception object (e.g., "except (ImportError, FileNotFoundError, RuntimeError) as e:") and then handle RuntimeError specially by either logging a warning with the exception details (processLogger.warn / warning) or by inspecting e.args[0] / str(e) and only suppressing when the message indicates missing nvcc/cuda (check for substrings like "nvcc" or "cuda"), otherwise re-raise the RuntimeError so real failures surface; keep FileNotFoundError and ImportError handling as-is but include the error in logs for diagnostics.modelopt/torch/quantization/plugins/transformers.py (1)
29-34:RuntimeErroris broad and may silently mask real DeepSpeed initialization issues.
FileNotFoundErrorfor missingnvccis well-justified. However, catchingRuntimeErroris very broad — it could silently swallow legitimate DeepSpeed configuration errors (e.g., version mismatches, NCCL failures) and leave users without Zero3 compatibility when they actually need it, with no indication of the problem.Consider either:
- Logging a warning when
RuntimeErroris caught, so users have visibility.- Narrowing by inspecting the exception message (e.g., checking for
nvccorcudasubstrings) before suppressing.Option 1: Add a warning log
+import logging + +logger = logging.getLogger(__name__) + def make_deepspeed_compatible(model: nn.Module): """Make the model compatible with DeepSpeed.""" try: from deepspeed.runtime.zero.parameter_offload import ZeROOrderedDict - except (ImportError, FileNotFoundError, RuntimeError): + except ImportError: + return + except (FileNotFoundError, RuntimeError) as e: + logger.debug("DeepSpeed import failed, skipping compatibility check: %s", e) return🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/plugins/transformers.py` around lines 29 - 34, The current except clause "except (ImportError, FileNotFoundError, RuntimeError):" can silently swallow real DeepSpeed initialization errors; update it to capture the exception object (e.g., "except (ImportError, FileNotFoundError, RuntimeError) as e:") and then handle RuntimeError specially by either logging a warning with the exception details (processLogger.warn / warning) or by inspecting e.args[0] / str(e) and only suppressing when the message indicates missing nvcc/cuda (check for substrings like "nvcc" or "cuda"), otherwise re-raise the RuntimeError so real failures surface; keep FileNotFoundError and ImportError handling as-is but include the error in logs for diagnostics.
What does this PR do?
Type of change: Bug fix
Overview: Fixes two crashes that block NVFP4 quantization of large models (>1TB) on production GPU infrastructure.
Bug 1 — DeepSpeed import crashes on runtime-only CUDA systems:
During
mtq.quantize(),make_deepspeed_compatibleimports DeepSpeed to check for ZeRO-3 compatibility. DeepSpeed's import chain callsnvcc --versionto check CUDA compiler compatibility. On runtime-only CUDA installations (NGC containers, cloud GPU instances without the CUDA toolkit), this raisesFileNotFoundError. The existingexcept ImportErrordoesn't catch it, so quantization crashes before calibration even starts — even though the user isn't using DeepSpeed at all.Fix: Broaden the exception handler to also catch
FileNotFoundErrorandRuntimeError.Bug 2 — Opaque assertion when weight quantizers lack
_amax:NVFP4QTensor.get_weights_scaling_factor_2_from_quantizer()usesassert hasattr(weight_quantizer, "_amax")which produces an opaqueAssertionErrorwith no guidance on what went wrong. This is hit whenaccelerate'sdevice_map="auto"offloads layers to disk on large models — the quantizers are inserted but some may not accumulate_amaxduring calibration. The user loses hours of calibration time to an error that doesn't explain the cause or fix.Fix: Replace bare assert with
ValueErrorthat explains why_amaxis missing (disk offloading, insufficientcalib_size) and points to_ensure_weight_quantizer_calibrated()as the resolution.Usage
Testing
Tested on:
nvccinstalled)NVFP4_DEFAULT_CFG,device_map="auto", 1024 calibration samplesBefore fix:
FileNotFoundErroratmtq.quantize()(Bug 1),AssertionErroratexport_hf_checkpoint()(Bug 2)After fix: Both operations complete successfully
Before your PR is "Ready for review"
Additional Information
Related: PR #785 (Fix a nvfp4 weight amax attribute issue during export) added
_ensure_weight_quantizer_calibrated()which addresses the_amaxissue at thequant_utils.pycall site. This PR adds the safety net at thenvfp4_tensor.pylevel.Encountered while quantizing fine-tuned DeepSeek V3 671B models on NVIDIA B200 Blackwell GPUs.
Summary by CodeRabbit