Skip to content

Latest commit

 

History

History
213 lines (149 loc) · 6.56 KB

File metadata and controls

213 lines (149 loc) · 6.56 KB

Contributing to MLPerf Inference Endpoints

Welcome! We're glad you're interested in contributing. This project is part of MLCommons and aims to build a high-performance benchmarking tool for LLM inference endpoints targeting 50k+ QPS.

Table of Contents

Ways to Contribute

Development Setup

Prerequisites

  • Python 3.12+ (3.12 recommended)
  • Git
  • A Unix-like OS (Linux or macOS)

Getting Started

# Fork and clone
git clone https://github.com/<your-username>/endpoints.git
cd endpoints

# Create virtual environment
python3.12 -m venv venv
source venv/bin/activate

# Install with dev and test extras
pip install -e ".[dev,test]"

# Install pre-commit hooks
pre-commit install

# Verify your setup
pytest -m unit -x --timeout=60

Local Testing with Echo Server

# Start a local echo server
python -m inference_endpoint.testing.echo_server --port 8765

# Run a quick probe
inference-endpoint probe --endpoints http://localhost:8765 --model test-model

Code Style and Conventions

Formatting and Linting

We use ruff for formatting and linting, and mypy for type checking. Pre-commit hooks enforce these automatically.

# Run all checks manually
pre-commit run --all-files

Key Conventions

  • Line length: 88 characters
  • Quotes: Double quotes
  • License headers: Required on all Python files (auto-added by pre-commit)
  • Commit messages: Conventional commitsfeat:, fix:, docs:, test:, chore:, perf:
  • Comments: Only where the why isn't obvious from the code. No over-documenting.

Serialization

  • Hot-path data (Query, QueryResult, StreamChunk): msgspec.Struct — encode/decode with msgspec.json, not stdlib json
  • Configuration: pydantic.BaseModel for validation
  • Do not use dataclass where neighboring types use msgspec

Performance-Sensitive Code

Code in load_generator/, endpoint_client/worker.py, and async_utils/transport/ is latency-critical. In these paths:

  • No match statements — use dict dispatch
  • Minimize async suspends
  • No pydantic validation or excessive logging
  • Use msgspec over json/pydantic for serialization

Testing

Running Tests

# All tests (excludes slow/performance)
pytest

# Unit tests only
pytest -m unit

# Integration tests
pytest -m integration

# Single file
pytest -xvs tests/unit/path/to/test_file.py

# With coverage
pytest --cov=src --cov-report=html

Test Markers

Every test function must have a marker:

@pytest.mark.unit
@pytest.mark.asyncio  # strict mode is configured globally in pyproject.toml
async def test_something():
    ...

Available markers: unit, integration, slow, performance, run_explicitly

Coverage

Target >90% coverage for all new code. Use existing fixtures from tests/conftest.py (e.g., mock_http_echo_server, mock_http_oracle_server, dummy_dataset) rather than mocking.

Submitting Changes

Branch Naming

feat/short-description
fix/short-description
docs/short-description

Pull Request Process

  1. Create a focused PR — one logical change per PR
  2. Fill out the PR template — describe what, why, and how to test
  3. Ensure CI passespre-commit run --all-files and pytest -m unit locally before pushing
  4. Link related issues — use Closes #123 or Relates to #123
  5. Expect review within 2-3 business days — reviewers are auto-assigned based on changed files

What We Look For in Reviews

  • Does it follow existing patterns in the codebase?
  • Are tests included and meaningful (not mock-heavy)?
  • Is it focused — no unrelated refactoring or over-engineering?
  • Does it avoid adding unnecessary dependencies?

After Review

  • Address feedback with new commits (don't force-push during review)
  • Once approved, a maintainer will merge

Issue Guidelines

Before Filing

  1. Search existing issues for duplicates
  2. Use the appropriate issue template
  3. Provide enough detail to reproduce or understand the request

Issue Lifecycle

New issues are auto-added to our project board and flow through: Inbox → Triage → Ready → In Progress → In Review → Done

Priority Levels

Priority Meaning
ShowStopper Drop everything — critical blocker
P0 Blocks release or users
P1 Must address this cycle
P2 Address within quarter
P3 Backlog, nice to have

MLCommons CLA

All contributors must sign the MLCommons Contributor License Agreement. A CLA bot will check your PR automatically.

To sign up:

  1. Visit the MLCommons Subscription form
  2. Submit your GitHub username
  3. The CLA bot will verify on your next PR

Pull requests from non-members are welcome — you'll be prompted to sign the CLA during the PR process.

Questions?

File an issue. We aim to respond within a few business days.