Welcome! We're glad you're interested in contributing. This project is part of MLCommons and aims to build a high-performance benchmarking tool for LLM inference endpoints targeting 50k+ QPS.
- Ways to Contribute
- Development Setup
- Code Style and Conventions
- Testing
- Submitting Changes
- Issue Guidelines
- MLCommons CLA
- Report bugs — use the Bug Report template
- Request features — use the Feature Request template
- Report performance issues — use the Performance Issue template
- Request dataset support — use the Dataset Integration template
- Improve documentation — fix typos, clarify guides, add examples
- Pick up an issue — look for
good first issueorhelp wanted - Review PRs — thoughtful reviews are as valuable as code
- Python 3.12+ (3.12 recommended)
- Git
- A Unix-like OS (Linux or macOS)
# Fork and clone
git clone https://github.com/<your-username>/endpoints.git
cd endpoints
# Create virtual environment
python3.12 -m venv venv
source venv/bin/activate
# Install with dev and test extras
pip install -e ".[dev,test]"
# Install pre-commit hooks
pre-commit install
# Verify your setup
pytest -m unit -x --timeout=60# Start a local echo server
python -m inference_endpoint.testing.echo_server --port 8765
# Run a quick probe
inference-endpoint probe --endpoints http://localhost:8765 --model test-modelWe use ruff for formatting and linting, and mypy for type checking. Pre-commit hooks enforce these automatically.
# Run all checks manually
pre-commit run --all-files- Line length: 88 characters
- Quotes: Double quotes
- License headers: Required on all Python files (auto-added by pre-commit)
- Commit messages: Conventional commits —
feat:,fix:,docs:,test:,chore:,perf: - Comments: Only where the why isn't obvious from the code. No over-documenting.
- Hot-path data (Query, QueryResult, StreamChunk):
msgspec.Struct— encode/decode withmsgspec.json, not stdlib json - Configuration:
pydantic.BaseModelfor validation - Do not use
dataclasswhere neighboring types usemsgspec
Code in load_generator/, endpoint_client/worker.py, and async_utils/transport/
is latency-critical. In these paths:
- No
matchstatements — use dict dispatch - Minimize async suspends
- No pydantic validation or excessive logging
- Use
msgspecoverjson/pydanticfor serialization
# All tests (excludes slow/performance)
pytest
# Unit tests only
pytest -m unit
# Integration tests
pytest -m integration
# Single file
pytest -xvs tests/unit/path/to/test_file.py
# With coverage
pytest --cov=src --cov-report=htmlEvery test function must have a marker:
@pytest.mark.unit
@pytest.mark.asyncio # strict mode is configured globally in pyproject.toml
async def test_something():
...Available markers: unit, integration, slow, performance, run_explicitly
Target >90% coverage for all new code. Use existing fixtures from
tests/conftest.py (e.g., mock_http_echo_server, mock_http_oracle_server,
dummy_dataset) rather than mocking.
feat/short-description
fix/short-description
docs/short-description
- Create a focused PR — one logical change per PR
- Fill out the PR template — describe what, why, and how to test
- Ensure CI passes —
pre-commit run --all-filesandpytest -m unitlocally before pushing - Link related issues — use
Closes #123orRelates to #123 - Expect review within 2-3 business days — reviewers are auto-assigned based on changed files
- Does it follow existing patterns in the codebase?
- Are tests included and meaningful (not mock-heavy)?
- Is it focused — no unrelated refactoring or over-engineering?
- Does it avoid adding unnecessary dependencies?
- Address feedback with new commits (don't force-push during review)
- Once approved, a maintainer will merge
- Search existing issues for duplicates
- Use the appropriate issue template
- Provide enough detail to reproduce or understand the request
New issues are auto-added to our project board and flow through: Inbox → Triage → Ready → In Progress → In Review → Done
| Priority | Meaning |
|---|---|
| ShowStopper | Drop everything — critical blocker |
| P0 | Blocks release or users |
| P1 | Must address this cycle |
| P2 | Address within quarter |
| P3 | Backlog, nice to have |
All contributors must sign the MLCommons Contributor License Agreement. A CLA bot will check your PR automatically.
To sign up:
- Visit the MLCommons Subscription form
- Submit your GitHub username
- The CLA bot will verify on your next PR
Pull requests from non-members are welcome — you'll be prompted to sign the CLA during the PR process.
File an issue. We aim to respond within a few business days.