mlcommons
diff --git a/‎examples/08_Qwen2.5-0.5B_Example/.gitignore‎
Lines changed: 14 additions & 0 deletions b/‎examples/08_Qwen2.5-0.5B_Example/.gitignore‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎examples/08_Qwen2.5-0.5B_Example/QUICKSTART.md‎
Lines changed: 30 additions & 0 deletions b/‎examples/08_Qwen2.5-0.5B_Example/QUICKSTART.md‎
Lines changed: 30 additions & 0 deletions
diff --git a/‎examples/08_Qwen2.5-0.5B_Example/README.md‎
Lines changed: 120 additions & 0 deletions b/‎examples/08_Qwen2.5-0.5B_Example/README.md‎
Lines changed: 120 additions & 0 deletions
diff --git a/‎examples/08_Qwen2.5-0.5B_Example/offline_qwen_benchmark.yaml‎
Lines changed: 43 additions & 0 deletions b/‎examples/08_Qwen2.5-0.5B_Example/offline_qwen_benchmark.yaml‎
Lines changed: 43 additions & 0 deletions
diff --git a/‎examples/08_Qwen2.5-0.5B_Example/online_qwen_benchmark.yaml‎
Lines changed: 47 additions & 0 deletions b/‎examples/08_Qwen2.5-0.5B_Example/online_qwen_benchmark.yaml‎
Lines changed: 47 additions & 0 deletions
diff --git a/‎examples/08_Qwen2.5-0.5B_Example/prepare_dataset.py‎
Lines changed: 83 additions & 0 deletions b/‎examples/08_Qwen2.5-0.5B_Example/prepare_dataset.py‎
Lines changed: 83 additions & 0 deletions
@@ -0,0 +1,14 @@
+# Benchmark results
+results/
+
+# Generated data
+data/*.pkl
+
+# Logs
+*.log
+benchmark_output.log
+
+# Python cache
+__pycache__/
+*.pyc
+*.pyo
@@ -0,0 +1,30 @@
+# Quick Start - Qwen2.5-0.5B
+
+Use the wrapper script from the repo root:
+
+```bash
+# vLLM offline
+bash examples/08_Qwen2.5-0.5B_Example/run_benchmark.sh vllm offline
+
+# SGLang offline
+bash examples/08_Qwen2.5-0.5B_Example/run_benchmark.sh sglang offline
+
+# Online concurrency sweep
+bash examples/08_Qwen2.5-0.5B_Example/run_benchmark.sh vllm online
+```
+
+Outputs:
+
+- vLLM offline: `results/qwen_offline_benchmark/`
+- vLLM online: `results/qwen_online_benchmark/concurrency_sweep/`
+- SGLang offline: `results/qwen_sglang_offline_benchmark/`
+- SGLang online: `results/qwen_sglang_online_benchmark/concurrency_sweep/`
+
+Summarize an online sweep:
+
+```bash
+python scripts/concurrency_sweep/summarize.py \
+  results/qwen_online_benchmark/concurrency_sweep/
+```
+
+For manual setup, server commands, and config details, see [README.md](README.md).
@@ -0,0 +1,120 @@
+# Qwen2.5-0.5B-Instruct Benchmark Example
+
+This example benchmarks `Qwen/Qwen2.5-0.5B-Instruct` against either a vLLM or
+SGLang server. It is intended as a small-GPU example that works on typical
+8-16 GB cards.
+
+## Requirements
+
+- Python 3.12+
+- Docker with NVIDIA GPU support
+- NVIDIA GPU with at least 8 GB VRAM
+
+## Fastest Path
+
+From the repo root:
+
+```bash
+# Offline benchmark with vLLM
+bash examples/08_Qwen2.5-0.5B_Example/run_benchmark.sh vllm offline
+
+# Offline benchmark with SGLang
+bash examples/08_Qwen2.5-0.5B_Example/run_benchmark.sh sglang offline
+
+# Online concurrency sweep
+bash examples/08_Qwen2.5-0.5B_Example/run_benchmark.sh vllm online
+```
+
+The script prepares the dataset, starts or reuses a container, waits for the
+server, and runs the benchmark.
+
+## Manual Flow
+
+If you do not want to use `run_benchmark.sh`, the minimum manual flow is:
+
+```bash
+python3.12 -m venv .venv
+source .venv/bin/activate
+pip install -e ".[test]"
+
+python examples/08_Qwen2.5-0.5B_Example/prepare_dataset.py
+```
+
+Start one server:
+
+```bash
+# vLLM
+docker run --runtime nvidia --gpus all \
+  -v ~/.cache/huggingface:/root/.cache/huggingface \
+  -e PYTORCH_ALLOC_CONF=expandable_segments:True \
+  -p 8000:8000 \
+  --ipc=host \
+  --name vllm-qwen \
+  -d \
+  vllm/vllm-openai:latest \
+  --model Qwen/Qwen2.5-0.5B-Instruct \
+  --gpu-memory-utilization 0.85
+
+# SGLang
+docker run --runtime nvidia --gpus all --net host \
+  -v ~/.cache/huggingface:/root/.cache/huggingface \
+  --ipc=host \
+  --name sglang-qwen \
+  -d \
+  lmsysorg/sglang:latest \
+  python3 -m sglang.launch_server \
+  --model-path Qwen/Qwen2.5-0.5B-Instruct \
+  --host 0.0.0.0 \
+  --port 30000 \
+  --mem-fraction-static 0.9 \
+  --attention-backend flashinfer
+```
+
+Run one benchmark:
+
+```bash
+# vLLM offline
+inference-endpoint benchmark from-config \
+  -c examples/08_Qwen2.5-0.5B_Example/offline_qwen_benchmark.yaml
+
+# SGLang offline
+inference-endpoint benchmark from-config \
+  -c examples/08_Qwen2.5-0.5B_Example/sglang_offline_qwen_benchmark.yaml
+
+# vLLM online sweep
+python scripts/concurrency_sweep/run.py \
+  --config examples/08_Qwen2.5-0.5B_Example/online_qwen_benchmark.yaml
+
+# SGLang online sweep
+python scripts/concurrency_sweep/run.py \
+  --config examples/08_Qwen2.5-0.5B_Example/sglang_online_qwen_benchmark.yaml
+```
+
+## Files
+
+- `offline_qwen_benchmark.yaml`: vLLM offline benchmark
+- `online_qwen_benchmark.yaml`: vLLM online concurrency sweep
+- `sglang_offline_qwen_benchmark.yaml`: SGLang offline benchmark
+- `sglang_online_qwen_benchmark.yaml`: SGLang online concurrency sweep
+- `prepare_dataset.py`: converts `tests/datasets/dummy_1k.pkl` into the example dataset
+- `run_benchmark.sh`: wrapper that automates dataset prep, container startup, and benchmark execution
+
+## Results
+
+- vLLM offline: `results/qwen_offline_benchmark/`
+- vLLM online: `results/qwen_online_benchmark/concurrency_sweep/`
+- SGLang offline: `results/qwen_sglang_offline_benchmark/`
+- SGLang online: `results/qwen_sglang_online_benchmark/concurrency_sweep/`
+
+To summarize an online sweep:
+
+```bash
+python scripts/concurrency_sweep/summarize.py \
+  results/qwen_online_benchmark/concurrency_sweep/
+```
+
+## Notes
+
+- The online sweep defaults to `1 2 4 8 16 32 64 128 256 512 1024`.
+- Use `scripts/concurrency_sweep/run.py --concurrency ... --duration-ms ...` to shorten or customize the sweep.
+- If vLLM runs out of memory at higher concurrency, lower `--gpu-memory-utilization`.
@@ -0,0 +1,43 @@
+name: "qwen-0.5b-offline-benchmark"
+version: "1.0"
+type: "offline"
+
+model_params:
+  name: "Qwen/Qwen2.5-0.5B-Instruct"
+  temperature: 1.0
+  max_new_tokens: 100
+  top_p: 1.0
+  streaming: "on"
+
+datasets:
+  - name: "qwen-perf-test"
+    type: "performance"
+    path: "examples/08_Qwen2.5-0.5B_Example/data/test_dataset.pkl"
+    samples: 1000
+
+settings:
+  runtime:
+    min_duration_ms: 100
+    max_duration_ms: 60000
+    scheduler_random_seed: 42
+    dataloader_random_seed: 42
+
+  client:
+    workers: 1
+    max_connections: 100
+    warmup_connections: 0
+    record_worker_events: false
+
+metrics:
+  collect:
+    - "throughput"
+    - "latency"
+    - "ttft"
+    - "tpot"
+
+endpoint_config:
+  endpoints:
+    - "http://localhost:8000"
+  api_key: null
+
+report_dir: "results/qwen_offline_benchmark/"
@@ -0,0 +1,47 @@
+name: "qwen-0.5b-online-benchmark"
+version: "1.0"
+type: "online"
+
+model_params:
+  name: "Qwen/Qwen2.5-0.5B-Instruct"
+  temperature: 0.7
+  max_new_tokens: 128
+  top_p: 0.95
+  streaming: "on"
+
+datasets:
+  - name: "qwen-perf-test"
+    type: "performance"
+    path: "examples/08_Qwen2.5-0.5B_Example/data/test_dataset.pkl"
+    samples: 500
+
+settings:
+  runtime:
+    min_duration_ms: 600000
+    max_duration_ms: 600000
+    scheduler_random_seed: 42
+    dataloader_random_seed: 42
+
+  load_pattern:
+    type: "concurrency"
+    target_concurrency: 1
+
+  client:
+    workers: 1
+    max_connections: 2048
+    warmup_connections: 0
+    record_worker_events: false
+
+metrics:
+  collect:
+    - "throughput"
+    - "latency"
+    - "ttft"
+    - "tpot"
+
+endpoint_config:
+  endpoints:
+    - "http://localhost:8000"
+  api_key: null
+
+report_dir: "results/qwen_online_benchmark/"
@@ -0,0 +1,83 @@
+#!/usr/bin/env python3
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+Prepare test dataset for Qwen benchmark.
+
+This script creates a test dataset with the 'prompt' column required by
+the inference-endpoint benchmarking tool.
+"""
+
+import pickle
+import sys
+from pathlib import Path
+
+
+def prepare_dataset(
+    input_path: str = "tests/datasets/dummy_1k.pkl",
+    output_dir: str = "examples/08_Qwen2.5-0.5B_Example/data",
+    output_filename: str = "test_dataset.pkl",
+) -> None:
+    """
+    Prepare the test dataset by renaming columns to match expected format.
+
+    Args:
+        input_path: Path to the input dataset
+        output_dir: Directory to save the output dataset
+        output_filename: Name of the output file
+    """
+    print(f"Loading dataset from: {input_path}")
+
+    # Load the original dataset
+    try:
+        with open(input_path, "rb") as f:
+            data = pickle.load(f)
+    except FileNotFoundError:
+        print(f"ERROR: Input dataset not found at {input_path}")
+        print("Make sure you're running from the repository root directory")
+        sys.exit(1)
+
+    print(f"Loaded dataset with {len(data)} samples")
+    print(f"Original columns: {data.columns.tolist()}")
+
+    # Rename text_input to prompt
+    if "text_input" in data.columns:
+        data = data.rename(columns={"text_input": "prompt"})
+        print("Renamed 'text_input' to 'prompt'")
+    elif "prompt" not in data.columns:
+        print("ERROR: Dataset must have 'text_input' or 'prompt' column")
+        sys.exit(1)
+
+    print(f"Final columns: {data.columns.tolist()}")
+
+    # Create output directory
+    output_path = Path(output_dir)
+    output_path.mkdir(parents=True, exist_ok=True)
+
+    # Save the dataset
+    full_output_path = output_path / output_filename
+    with open(full_output_path, "wb") as f:
+        pickle.dump(data, f)
+
+    print(f"✅ Dataset saved to: {full_output_path}")
+    print(f"   Samples: {len(data)}")
+    print(f"   Columns: {data.columns.tolist()}")
+
+
+if __name__ == "__main__":
+    # Allow custom input path as command-line argument
+    input_path = sys.argv[1] if len(sys.argv) > 1 else "tests/datasets/dummy_1k.pkl"
+    prepare_dataset(input_path=input_path)