My Environment
- GPU: NVIDIA RTX 5060 Ti (Blackwell architecture)
- OS: Ubuntu 24.04
- PyTorch: 2.10
- CUDA: 13.0
- Model: Flux.2 Klein 9B
- Quantization config: batch size 128, INT8
Issue Description
After quantizing Flux.2 Klein 9B to INT8 using the convert_to_quant tools, I observed the inference speed is not fast as expected(2x compare to BF16).

My Environment
Issue Description
After quantizing Flux.2 Klein 9B to INT8 using the convert_to_quant tools, I observed the inference speed is not fast as expected(2x compare to BF16).