Hugging Face Model integration in Superbench#803
Hugging Face Model integration in Superbench#803Aishwarya-Tonpe wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds HuggingFace Hub as a first-class model source across SuperBench training benchmarks and ORT/TensorRT inference micro-benchmarks, enabling users to benchmark arbitrary HF models via CLI flags (including gated models via HF_TOKEN).
Changes:
- Introduces
ModelSourceConfigandHuggingFaceModelLoaderfor unified HF model configuration/loading and memory-fit checks. - Extends PyTorch model benchmarks to optionally load HF backbones and wrap them with task-specific heads.
- Adds HF→ONNX export support and integrates HF flows into ORT and TensorRT inference micro-benchmarks, plus new tests and examples.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/benchmarks/micro_benchmarks/test_model_source_config.py | Adds unit tests for ModelSourceConfig validation/defaulting. |
| tests/benchmarks/micro_benchmarks/test_huggingface_loader.py | Adds unit tests for HF loader dtype handling, load flow, and size estimation. |
| tests/benchmarks/micro_benchmarks/test_huggingface_e2e.py | Adds integration tests that download real HF models and validate basic forward pass. |
| superbench/benchmarks/model_benchmarks/pytorch_mixtral_impl.py | Adds HF config customization + wrapper and HF-loading branch for Mixtral benchmark. |
| superbench/benchmarks/model_benchmarks/pytorch_lstm.py | Adds HF-loading path + wrapper and refactors in-house model creation. |
| superbench/benchmarks/model_benchmarks/pytorch_llama.py | Adds HF-loading path + wrapper and refactors in-house model creation. |
| superbench/benchmarks/model_benchmarks/pytorch_gpt2.py | Adds HF-loading path + wrapper and refactors in-house model creation. |
| superbench/benchmarks/model_benchmarks/pytorch_cnn.py | Adds HF-loading path + wrapper for HF vision backbones, keeps in-house torchvision path. |
| superbench/benchmarks/model_benchmarks/pytorch_bert.py | Adds HF-loading path + wrapper and refactors in-house model creation. |
| superbench/benchmarks/model_benchmarks/pytorch_base.py | Adds shared HF model loading flow, memory estimation, and CLI args for model source/identifier. |
| superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py | Adds HF model preprocessing: config-only memory check, HF load, ONNX export, TRT build command. |
| superbench/benchmarks/micro_benchmarks/ort_inference_performance.py | Adds HF preprocessing (config memory check, HF load, ONNX export/quantize) + dynamic input handling. |
| superbench/benchmarks/micro_benchmarks/model_source_config.py | New dataclass encapsulating model source, identifier, dtype, token, and loader kwargs. |
| superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py | New loader for HF Hub with tokenizer support, size/memory estimation utilities, and pre-checks. |
| superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py | Adds HF model ONNX export with vision/NLP detection, dynamic axes, and optional external data output. |
| examples/benchmarks/tensorrt_inference_performance.py | Updates example script to show in-house vs HF usage via CLI. |
| examples/benchmarks/pytorch_huggingface_models.py | New example demonstrating HF-backed training benchmarks, incl. distributed option. |
| examples/benchmarks/ort_inference_performance.py | Updates ORT example script to show in-house vs HF usage via CLI. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f689460 to
2a47dc8
Compare
2a47dc8 to
a61db26
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 13 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a61db26 to
6bebb38
Compare
6bebb38 to
4eec2f7
Compare
4eec2f7 to
2f24e0f
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2f24e0f to
18df07b
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
12a65ad to
7632427
Compare
dca9515 to
7094628
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…e benchmarks - Add HuggingFaceModelLoader for downloading and caching models from HF Hub - Support both NLP (AutoModelForCausalLM) and vision (AutoModelForImageClassification) models - Add model_source and model_identifier parameters to TensorRT/ORT benchmarks - Add ONNX export pipeline for HuggingFace models with dynamic axes - Derive vision input shapes from ONNX graph dims with HF config fallback - Filter ONNX initializers from graph.input for correct NLP input handling - Add PyTorch 2.8+ compatibility (external_data vs use_external_data_format) - Add example script, unit tests, and config schema updates - Support HF_TOKEN env var for gated model access
7094628 to
2ca3e68
Compare
Adds support for loading and benchmarking models from HuggingFace Hub across Inference micro-benchmarks -ORT/TensorRT inference. Users can run any compatible HF-hosted model through the existing benchmark harness using --model_source huggingface --model_identifier <org/model>.
SuperBench previously only supported in-house model definitions with hardcoded architectures. Adding new models required code changes. This PR allows benchmarking any compatible HuggingFace model with a CLI flag change, including gated models via HF_TOKEN.
Key Changes
New modules:
HuggingFaceModelLoader — Downloads, caches, and loads models from HF Hub. Estimates parameter count from model config (few KB) and checks GPU
memory before downloading full weights to avoid failed multi-GB downloads.
ModelSourceConfig — Dataclass for model source configuration (in-house / huggingface), dtype, revision, auth token, and device mapping.
Micro-benchmarks (inference):
ORT inference — Downloads HF model → exports to ONNX → runs ORT inference. Handles both vision (pixel_values) and NLP (input_ids) inputs
automatically.
TensorRT inference — Same flow: download → ONNX export → trtexec engine build → inference. Includes dynamic input shape detection from the
exported ONNX graph.
ONNX exporter — New export_huggingface_model() method with vision/NLP auto-detection, dynamic axes, and external data support for large models
(>2GB).
Testing
Usage
Training benchmark
ORT inference
python examples/benchmarks/ort_inference_performance.py
--model_source huggingface --model_identifier bert-base-uncased
TensorRT inference
python examples/benchmarks/tensorrt_inference_performance.py
--model_source huggingface --model_identifier microsoft/resnet-50
Gated models
export HF_TOKEN=hf_xxxxx