Skip to content

inference discrepancy between onnxruntime and trt8.6(trt10.11 too) #869

@zzqiuzz

Description

@zzqiuzz

Before submitting an issue, please make sure it hasn't been already addressed by searching through the existing and past issues.

Describe the bug

  • I'd used tensort8.6.1 to convert an existing onnx model with qdq nodes, say a model exported using Model Optimizer Toolkit, to run on a orin-equiped platform. However, the inference result of the engine is not good as expected, it showed a large discrepancy compared with the original qdq onnx. For convenience, i again tried trt8.6.1 and trt10.11 on my local x86-64 workstation to convert the onnx into engine, to verify if i can see the same phenomenon. Unfortunately, yes. I calculated cosine similarity between qdq onnx and generated engine according to trt8.6 and trt10.11 respectively.
trt10.11 qdq onnx vs engine trt8.6.1 qdq onnx vs engine
As depicted above, we can indeed see a large discrepancy.

Steps/Code to reproduce #bug

  • convert command:
    trtexec --onnx=quant0206.onnx --saveEngine=quant_0206.engine --dumpProfile=true --best --verbose=true

Expected behavior

Who can help?

  • I really appreciate if anyone can help solve the problem, or some instructive advice on how to solve it. I guess the problem may be caused by the conversion process from onnx to trt in which precision lost occurred in some operators.

System information

  • Container used (if applicable): nvcr.io/nvidia/tensorrt-llm/release:1.0.0
  • OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): ? Ubuntu 20.04
  • CPU architecture (x86_64, aarch64): x86_64
  • GPU name (e.g. H100, A100, L40S): GT3060
  • GPU memory size: 12G
  • Number of GPUs: 1
  • Library versions (if applicable):
    • Python: 3.9
    • ModelOpt version or commit hash: 0.40.0
    • CUDA: nvcc: NVIDIA (R) Cuda compiler driver
      Copyright (c) 2005-2021 NVIDIA Corporation
      Built on Wed_Jun__2_19:15:15_PDT_2021
      Cuda compilation tools, release 11.4, V11.4.48
      Build cuda_11.4.r11.4/compiler.30033411_0
    • PyTorch: 2.0
    • Transformers: not used
    • TensorRT-LLM: not used
    • ONNXRuntime: 1.19.2-gpu
    • TensorRT: trt8.6 or trt10.11
  • Any other details that may help:

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions