Contributing to MLPerf Inference Endpoints

Welcome! We're glad you're interested in contributing. This project is part of MLCommons and aims to build a high-performance benchmarking tool for LLM inference endpoints targeting 50k+ QPS.

Ways to Contribute
Development Setup
Code Style and Conventions
Testing
Submitting Changes
Issue Guidelines
MLCommons CLA

Ways to Contribute

Report bugs — use the Bug Report template
Request features — use the Feature Request template
Report performance issues — use the Performance Issue template
Request dataset support — use the Dataset Integration template
Improve documentation — fix typos, clarify guides, add examples
Pick up an issue — look for good first issue or help wanted
Review PRs — thoughtful reviews are as valuable as code

Development Setup

Prerequisites

Python 3.12+ (3.12 recommended)
Git
A Unix-like OS (Linux or macOS)

Getting Started

# Fork and clone
git clone https://github.com/<your-username>/endpoints.git
cd endpoints

# Create virtual environment
python3.12 -m venv venv
source venv/bin/activate

# Install with dev and test extras
pip install -e ".[dev,test]"

# Install pre-commit hooks
pre-commit install

# Verify your setup
pytest -m unit -x --timeout=60

Local Testing with Echo Server

# Start a local echo server
python -m inference_endpoint.testing.echo_server --port 8765

# Run a quick probe
inference-endpoint probe --endpoints http://localhost:8765 --model test-model

Code Style and Conventions

Formatting and Linting

We use ruff for formatting and linting, and mypy for type checking. Pre-commit hooks enforce these automatically.

# Run all checks manually
pre-commit run --all-files

Key Conventions

Line length: 88 characters
Quotes: Double quotes
License headers: Required on all Python files (auto-added by pre-commit)
Commit messages: Conventional commits — feat:, fix:, docs:, test:, chore:, perf:
Comments: Only where the why isn't obvious from the code. No over-documenting.

Serialization

Hot-path data (Query, QueryResult, StreamChunk): msgspec.Struct — encode/decode with msgspec.json, not stdlib json
Configuration: pydantic.BaseModel for validation
Do not use dataclass where neighboring types use msgspec

Performance-Sensitive Code

Code in load_generator/, endpoint_client/worker.py, and async_utils/transport/ is latency-critical. In these paths:

No match statements — use dict dispatch
Minimize async suspends
No pydantic validation or excessive logging
Use msgspec over json/pydantic for serialization

Testing

Running Tests

# All tests (excludes slow/performance)
pytest

# Unit tests only
pytest -m unit

# Integration tests
pytest -m integration

# Single file
pytest -xvs tests/unit/path/to/test_file.py

# With coverage
pytest --cov=src --cov-report=html

Test Markers

Every test function must have a marker:

@pytest.mark.unit
@pytest.mark.asyncio  # strict mode is configured globally in pyproject.toml
async def test_something():
    ...

Available markers: unit, integration, slow, performance, run_explicitly

Coverage

Target >90% coverage for all new code. Use existing fixtures from tests/conftest.py (e.g., mock_http_echo_server, mock_http_oracle_server, dummy_dataset) rather than mocking.

Submitting Changes

Branch Naming

feat/short-description
fix/short-description
docs/short-description

Pull Request Process

Create a focused PR — one logical change per PR
Fill out the PR template — describe what, why, and how to test
Ensure CI passes — pre-commit run --all-files and pytest -m unit locally before pushing
Link related issues — use Closes #123 or Relates to #123
Expect review within 2-3 business days — reviewers are auto-assigned based on changed files

What We Look For in Reviews

Does it follow existing patterns in the codebase?
Are tests included and meaningful (not mock-heavy)?
Is it focused — no unrelated refactoring or over-engineering?
Does it avoid adding unnecessary dependencies?

After Review

Address feedback with new commits (don't force-push during review)
Once approved, a maintainer will merge

Issue Guidelines

Before Filing

Search existing issues for duplicates
Use the appropriate issue template
Provide enough detail to reproduce or understand the request

Issue Lifecycle

New issues are auto-added to our project board and flow through: Inbox → Triage → Ready → In Progress → In Review → Done

Priority Levels

Priority	Meaning
ShowStopper	Drop everything — critical blocker
P0	Blocks release or users
P1	Must address this cycle
P2	Address within quarter
P3	Backlog, nice to have

MLCommons CLA

All contributors must sign the MLCommons Contributor License Agreement. A CLA bot will check your PR automatically.

To sign up:

Visit the MLCommons Subscription form
Submit your GitHub username
The CLA bot will verify on your next PR

Pull requests from non-members are welcome — you'll be prompted to sign the CLA during the PR process.

Questions?

File an issue. We aim to respond within a few business days.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to MLPerf Inference Endpoints

Table of Contents

Ways to Contribute

Development Setup

Prerequisites

Getting Started

Local Testing with Echo Server

Code Style and Conventions

Formatting and Linting

Key Conventions

Serialization

Performance-Sensitive Code

Testing

Running Tests

Test Markers

Coverage

Submitting Changes

Branch Naming

Pull Request Process

What We Look For in Reviews

After Review

Issue Guidelines

Before Filing

Issue Lifecycle

Priority Levels

MLCommons CLA

Questions?

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to MLPerf Inference Endpoints

Table of Contents

Ways to Contribute

Development Setup

Prerequisites

Getting Started

Local Testing with Echo Server

Code Style and Conventions

Formatting and Linting

Key Conventions

Serialization

Performance-Sensitive Code

Testing

Running Tests

Test Markers

Coverage

Submitting Changes

Branch Naming

Pull Request Process

What We Look For in Reviews

After Review

Issue Guidelines

Before Filing

Issue Lifecycle

Priority Levels

MLCommons CLA

Questions?