Hugging Face Model integration in Superbench by Aishwarya-Tonpe · Pull Request #803 · microsoft/superbenchmark

Aishwarya-Tonpe · 2026-04-13T17:35:59Z

Adds support for loading and benchmarking models from HuggingFace Hub across Inference micro-benchmarks -ORT/TensorRT inference. Users can run any compatible HF-hosted model through the existing benchmark harness using --model_source huggingface --model_identifier <org/model>.

SuperBench previously only supported in-house model definitions with hardcoded architectures. Adding new models required code changes. This PR allows benchmarking any compatible HuggingFace model with a CLI flag change, including gated models via HF_TOKEN.

Key Changes

New modules:

HuggingFaceModelLoader — Downloads, caches, and loads models from HF Hub. Estimates parameter count from model config (few KB) and checks GPU
memory before downloading full weights to avoid failed multi-GB downloads.
ModelSourceConfig — Dataclass for model source configuration (in-house / huggingface), dtype, revision, auth token, and device mapping.

Micro-benchmarks (inference):
ORT inference — Downloads HF model → exports to ONNX → runs ORT inference. Handles both vision (pixel_values) and NLP (input_ids) inputs
automatically.
TensorRT inference — Same flow: download → ONNX export → trtexec engine build → inference. Includes dynamic input shape detection from the
exported ONNX graph.
ONNX exporter — New export_huggingface_model() method with vision/NLP auto-detection, dynamic axes, and external data support for large models
(>2GB).

Testing

test_model_source_config.py — Unit tests for validation, defaults, and edge cases.
test_huggingface_loader.py — Unit tests for dtype conversion, model size calculation, memory estimation, and param count estimation.
test_huggingface_e2e.py — End-to-end integration tests covering micro-benchmarks with real HF models.

Usage

Training benchmark

ORT inference
python examples/benchmarks/ort_inference_performance.py
--model_source huggingface --model_identifier bert-base-uncased

TensorRT inference
python examples/benchmarks/tensorrt_inference_performance.py
--model_source huggingface --model_identifier microsoft/resnet-50

Gated models
export HF_TOKEN=hf_xxxxx

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds HuggingFace Hub as a first-class model source across SuperBench training benchmarks and ORT/TensorRT inference micro-benchmarks, enabling users to benchmark arbitrary HF models via CLI flags (including gated models via HF_TOKEN).

Changes:

Introduces ModelSourceConfig and HuggingFaceModelLoader for unified HF model configuration/loading and memory-fit checks.
Extends PyTorch model benchmarks to optionally load HF backbones and wrap them with task-specific heads.
Adds HF→ONNX export support and integrates HF flows into ORT and TensorRT inference micro-benchmarks, plus new tests and examples.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/benchmarks/micro_benchmarks/test_model_source_config.py	Adds unit tests for `ModelSourceConfig` validation/defaulting.
tests/benchmarks/micro_benchmarks/test_huggingface_loader.py	Adds unit tests for HF loader dtype handling, load flow, and size estimation.
tests/benchmarks/micro_benchmarks/test_huggingface_e2e.py	Adds integration tests that download real HF models and validate basic forward pass.
superbench/benchmarks/model_benchmarks/pytorch_mixtral_impl.py	Adds HF config customization + wrapper and HF-loading branch for Mixtral benchmark.
superbench/benchmarks/model_benchmarks/pytorch_lstm.py	Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_llama.py	Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_gpt2.py	Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_cnn.py	Adds HF-loading path + wrapper for HF vision backbones, keeps in-house torchvision path.
superbench/benchmarks/model_benchmarks/pytorch_bert.py	Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_base.py	Adds shared HF model loading flow, memory estimation, and CLI args for model source/identifier.
superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py	Adds HF model preprocessing: config-only memory check, HF load, ONNX export, TRT build command.
superbench/benchmarks/micro_benchmarks/ort_inference_performance.py	Adds HF preprocessing (config memory check, HF load, ONNX export/quantize) + dynamic input handling.
superbench/benchmarks/micro_benchmarks/model_source_config.py	New dataclass encapsulating model source, identifier, dtype, token, and loader kwargs.
superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py	New loader for HF Hub with tokenizer support, size/memory estimation utilities, and pre-checks.
superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py	Adds HF model ONNX export with vision/NLP detection, dynamic axes, and optional external data output.
examples/benchmarks/tensorrt_inference_performance.py	Updates example script to show in-house vs HF usage via CLI.
examples/benchmarks/pytorch_huggingface_models.py	New example demonstrating HF-backed training benchmarks, incl. distributed option.
examples/benchmarks/ort_inference_performance.py	Updates ORT example script to show in-house vs HF usage via CLI.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 13 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…e benchmarks - Add HuggingFaceModelLoader for downloading and caching models from HF Hub - Support both NLP (AutoModelForCausalLM) and vision (AutoModelForImageClassification) models - Add model_source and model_identifier parameters to TensorRT/ORT benchmarks - Add ONNX export pipeline for HuggingFace models with dynamic axes - Derive vision input shapes from ONNX graph dims with HF config fallback - Filter ONNX initializers from graph.input for correct NLP input handling - Add PyTorch 2.8+ compatibility (external_data vs use_external_data_format) - Add example script, unit tests, and config schema updates - Support HF_TOKEN env var for gated model access

Aishwarya-Tonpe requested a review from a team as a code owner April 13, 2026 17:36

Copilot AI review requested due to automatic review settings April 13, 2026 17:36

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Copilot started reviewing on behalf of Aishwarya-Tonpe April 13, 2026 17:48 View session

Aishwarya-Tonpe force-pushed the hf-models-clean branch from f689460 to 2a47dc8 Compare April 13, 2026 19:33

Aishwarya-Tonpe changed the title ~~Hf models clean~~ Hugging Face Model integration in Superbench Apr 14, 2026

Copilot AI review requested due to automatic review settings April 14, 2026 17:30

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 2a47dc8 to a61db26 Compare April 14, 2026 17:30

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Aishwarya-Tonpe force-pushed the hf-models-clean branch from a61db26 to 6bebb38 Compare April 14, 2026 18:27

Copilot started reviewing on behalf of Aishwarya-Tonpe April 14, 2026 18:34 View session

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 6bebb38 to 4eec2f7 Compare April 14, 2026 20:05

Copilot AI review requested due to automatic review settings April 14, 2026 20:34

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 4eec2f7 to 2f24e0f Compare April 14, 2026 20:34

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 2f24e0f to 18df07b Compare April 14, 2026 20:47

Copilot AI review requested due to automatic review settings April 14, 2026 20:51

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Copilot started reviewing on behalf of Aishwarya-Tonpe April 14, 2026 22:44 View session

Copilot started reviewing on behalf of Aishwarya-Tonpe April 14, 2026 23:07 View session

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 12a65ad to 7632427 Compare April 20, 2026 18:28

Copilot AI review requested due to automatic review settings April 23, 2026 22:31

Copilot started reviewing on behalf of Aishwarya-Tonpe April 23, 2026 22:31 View session

Aishwarya-Tonpe force-pushed the hf-models-clean branch 2 times, most recently from dca9515 to 7094628 Compare April 23, 2026 22:37

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 7094628 to 2ca3e68 Compare April 29, 2026 23:39

Conversation

Aishwarya-Tonpe commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

Testing

Usage

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Aishwarya-Tonpe commented Apr 13, 2026 •

edited

Loading