[pyTorch] Replace the make_empty implementation to use C++ implementation#2666
[pyTorch] Replace the make_empty implementation to use C++ implementation#2666ptrendx wants to merge 5 commits intoNVIDIA:mainfrom
Conversation
|
/te-ci L1 pytorch |
1 similar comment
|
/te-ci L1 pytorch |
| """Construct quantized tensor with uninitialized data""" | ||
| raise NotImplementedError( | ||
| f"{self.__class__.__name__} class does not implement make_empty function, " | ||
| "required for construction of unintialized quantized tensor" |
There was a problem hiding this comment.
This clear NotImplementedError is beneficial for custom quantizers that do not override make_empty().
Now, if custom quantizer does not have make_empty(), then it will fail at the C++ convert_quantizer, because there is no registered C++ converter. C++ failure with NVTE_ERROR("Unexpected type for quantizer") is not as clear as NotImplementedError.
What about making C++ error more clear or even better add a check at base Quantizer.make_empty:
def make_empty(...):
if getattr(self, "custom", False):
raise NotImplementedError(
f"{self.__class__.__name__} does not implement make_empty"
)
# ... existing C++ path ...
known quantizers Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
98f9681 to
6be430a
Compare
|
/te-ci pytorch L1 |
Greptile SummaryUnified quantized tensor creation by migrating Python Key changes:
Confidence Score: 5/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Python: quantizer.make_empty] --> B[tex.create_empty_quantized_tensor]
B --> C[C++: create_empty_quantized_tensor]
C --> D[convert_quantizer]
C --> E[GetTransformerEngineDType]
D --> F[quantizer_cpp->create_tensor]
E --> F
F --> G{Quantizer Type}
G --> H[Float8Quantizer::create_tensor]
G --> I[Float8BlockQuantizer::create_tensor]
G --> J[MXFP8Quantizer::create_tensor]
G --> K[NVFP4Quantizer::create_tensor]
G --> L[NoneQuantizer::create_tensor]
H --> M[Create Float8Tensor with device/pin_memory]
I --> N[Create Float8BlockwiseQTensor with device/pin_memory]
J --> O[Create MXFP8Tensor with device/pin_memory]
K --> P[Create NVFP4Tensor with device/pin_memory]
L --> Q[Create unquantized Tensor with device/pin_memory]
M --> R[Return QuantizedTensor]
N --> R
O --> R
P --> R
Q --> R
R --> S[Python: Apply requires_grad if needed]
Last reviewed commit: 9cad6d0 |
Description
This PR unifies the implementation of the QuantizedTensor creation by using the C++ implementation of the create_tensor.
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: