From 060e9f612db01bab87bdeb618242851f258fdc44 Mon Sep 17 00:00:00 2001 From: Zhihan Jiang Date: Tue, 7 Apr 2026 13:39:09 -0700 Subject: [PATCH 01/14] docs: add project management design spec for labels, board, templates, and CONTRIBUTING.md Co-Authored-By: Claude Opus 4.6 (1M context) --- .../2026-04-07-project-management-design.md | 586 ++++++++++++++++++ 1 file changed, 586 insertions(+) create mode 100644 docs/superpowers/specs/2026-04-07-project-management-design.md diff --git a/docs/superpowers/specs/2026-04-07-project-management-design.md b/docs/superpowers/specs/2026-04-07-project-management-design.md new file mode 100644 index 00000000..b2d37156 --- /dev/null +++ b/docs/superpowers/specs/2026-04-07-project-management-design.md @@ -0,0 +1,586 @@ +# Project Management Design: Labels, Board, Templates, and CONTRIBUTING.md + +**Date:** 2026-04-07 +**Author:** Zhihan Jiang (nvzhihanj) +**Status:** Draft + +## Context + +The mlcommons/endpoints repository has 57 open issues with inconsistent labeling, +no issue templates, a minimal CONTRIBUTING.md, and no active project board. The +project has 3-4 core contributors (NVIDIA) and growing community participation +(Intel, MLCommons, external). The goal is to establish project management +infrastructure that serves the **broader MLCommons community** as the primary +audience — making it easy for external contributors to self-serve, pick up issues, +and understand the project roadmap. + +### Research Basis + +This design is informed by analysis of label taxonomies and project management +practices from: Kubernetes, PyTorch, vLLM, Ray, SGLang, MLCommons/inference, +and guidance from opensource.guide, GitHub Docs, CNCF, and Linux Foundation. + +### Phased Approach + +- **Phase 1 (now):** Labels, board, templates, CONTRIBUTING.md, issue migration +- **Phase 2 (when issue volume > 100 or contributors > 10):** Size/effort labels, + stale bot automation, iteration/sprint fields, disable blank issues + +--- + +## 1. Label Taxonomy (~28 labels) + +### Design Principles + +- **Prefixed naming** (`type:`, `priority:`, `area:`, `status:`) for filterability + and visual grouping — inspired by Ray and PyTorch +- **Coarse area labels** (7) grouping related modules — start coarse, split later +- **Severity-gradient colors** for priority — hotter = more urgent +- **Single color family** per label category for visual coherence + +### Type Labels + +| Label | Color | Description | +|-------|-------|-------------| +| `type: bug` | `#d73a4a` | Something isn't working | +| `type: feature` | `#a2eeef` | New feature or capability | +| `type: enhancement` | `#bfd4f2` | Improvement to existing functionality | +| `type: performance` | `#3ddd26` | Performance regression or improvement | +| `type: documentation` | `#0075ca` | Documentation only | +| `type: question` | `#d876e3` | Usage question or clarification | +| `type: RFC` | `#76fde7` | Request for comments / design proposal | +| `type: chore` | `#ededed` | Maintenance, deps, CI, tooling | + +### Priority Labels + +| Label | Color | Description | +|-------|-------|-------------| +| `priority: ShowStopper` | `#000000` | Drop everything — critical blocker, all hands on deck | +| `priority: P0` | `#b60205` | Critical — blocks release or users | +| `priority: P1` | `#d93f0b` | High — must address this cycle | +| `priority: P2` | `#fbca04` | Medium — address within quarter | +| `priority: P3` | `#0e8a16` | Low — backlog, nice to have | + +### Area Labels + +| Label | Color | Description | +|-------|-------|-------------| +| `area: core-engine` | `#c5def5` | Load generator, scheduler, async utils | +| `area: client` | `#c5def5` | Endpoint client, HTTP, transport, ZMQ | +| `area: metrics` | `#c5def5` | Event recorder, metrics reporter, reporting | +| `area: dataset` | `#c5def5` | Dataset manager, formats, predefined datasets | +| `area: config-cli` | `#c5def5` | Config schema, CLI commands, YAML | +| `area: evaluation` | `#c5def5` | Accuracy evaluation, scoring, extractors | +| `area: adapters` | `#c5def5` | OpenAI, SGLang protocol adapters | + +### Status Labels + +| Label | Color | Description | +|-------|-------|-------------| +| `status: needs-triage` | `#e99695` | New issue, awaiting review | +| `status: needs-info` | `#f9d0c4` | Awaiting more details from reporter | +| `status: blocked` | `#b60205` | Blocked on external dependency or decision | + +### Community Labels (keep existing) + +| Label | Color | Description | +|-------|-------|-------------| +| `good first issue` | `#7057ff` | Good for newcomers | +| `help wanted` | `#008672` | Extra attention needed | + +### Other (keep existing) + +| Label | Color | Description | +|-------|-------|-------------| +| `mlcommons` | `#e0703c` | MLCommons ruleset/submission integration | +| `dependencies` | `#9083cd` | Dependency updates | +| `security` | `#b60205` | Security vulnerability or hardening | +| `duplicate` | `#cfd3d7` | Duplicate issue | +| `invalid` | `#e4e669` | Not valid | +| `wontfix` | `#ffffff` | Will not be worked on | + +### Labels to Remove + +These are replaced by the prefixed equivalents above: + +| Old Label | Replaced By | +|-----------|-------------| +| `bug` | `type: bug` | +| `feature` | `type: feature` | +| `enhancement` | `type: enhancement` | +| `documentation` | `type: documentation` | +| `performance` | `type: performance` | +| `question` | `type: question` | +| `P0` | `priority: P0` | +| `P1` | `priority: P1` | +| `P2` | `priority: P2` | +| `ShowStopper` | `priority: ShowStopper` | +| `testing` | `type: chore` (context-dependent) | +| `accuracy` | `area: evaluation` | +| `dataset` | `area: dataset` | +| `Roadmap` | `type: RFC` | +| `blocked` | `status: blocked` | +| `rules` | `mlcommons` | +| `MLCommons` | `mlcommons` (lowercase) | + +--- + +## 2. Project Board #57 Structure + +### Status Columns + +``` +Inbox → Triage → Ready → In Progress → In Review → Done +``` + +| Column | Purpose | Entry Criteria | +|--------|---------|----------------| +| **Inbox** | New issues land here automatically | Auto-added when issue opened | +| **Triage** | Being evaluated for priority/area/assignee | Someone picked it up to review | +| **Ready** | Triaged, prioritized, ready to work on | Has priority + area labels | +| **In Progress** | Actively being worked on | Assigned, PR may be in flight | +| **In Review** | PR submitted, awaiting review | Linked PR exists | +| **Done** | Merged/resolved/closed | Auto-set when issue closed | + +### Custom Fields + +| Field | Type | Values | +|-------|------|--------| +| Priority | Single select | ShowStopper, P0, P1, P2, P3 | +| Area | Single select | core-engine, client, metrics, dataset, config-cli, evaluation, adapters, mlcommons | +| Target Release | Single select | v0.5.0, v1.0.0 (add as needed) | + +### Views (4) + +**1. Kanban (default)** +- Layout: Board +- Columns: Status field +- Group by: Priority (ShowStopper at top → P3 at bottom) +- Filter: status ≠ Done + +**2. Priority Table** +- Layout: Table +- Sort: Priority ascending (ShowStopper first), then updated date descending +- Columns: Title, Priority, Area, Status, Assignee, Target Release +- Filter: status ≠ Done + +**3. By Assignee** +- Layout: Table +- Group by: Assignee +- Sort: Priority ascending within each group +- Columns: Title, Priority, Area, Status +- Filter: status ≠ Done + +**4. Stale Issues** +- Layout: Table +- Sort: Updated date ascending (oldest first) +- Columns: Title, Priority, Area, Status, Assignee, Last Updated +- Filter: status ≠ Done AND last updated more than 30 days ago + +### Automations + +| Trigger | Action | +|---------|--------| +| Issue added to project | Set status → Inbox | +| Issue closed | Set status → Done | +| PR merged closing issue | Set status → Done | +| Item in Done 14+ days | Auto-archive | + +--- + +## 3. Issue Templates + +### Files + +- `.github/ISSUE_TEMPLATE/100-bug-report.yml` — Bug Report +- `.github/ISSUE_TEMPLATE/200-feature-request.yml` — Feature Request +- `.github/ISSUE_TEMPLATE/300-performance.yml` — Performance Issue +- `.github/ISSUE_TEMPLATE/400-dataset-integration.yml` — Dataset Integration +- `.github/ISSUE_TEMPLATE/config.yml` — Template chooser config + +### 100-bug-report.yml + +```yaml +name: Bug Report +description: Report a bug or unexpected behavior +title: "[Bug]: " +labels: ["type: bug", "status: needs-triage"] +body: + - type: textarea + id: description + attributes: + label: Bug Description + description: What happened vs. what you expected + placeholder: "When I run X, I expected Y but got Z" + validations: + required: true + - type: textarea + id: reproduction + attributes: + label: Steps to Reproduce + value: | + 1. + 2. + 3. + validations: + required: true + - type: textarea + id: environment + attributes: + label: Environment + description: OS, Python version, package version + placeholder: "OS: Ubuntu 22.04, Python 3.12, inference-endpoint v0.1.0" + validations: + required: true + - type: textarea + id: logs + attributes: + label: Relevant Logs + render: shell + - type: checkboxes + id: checklist + attributes: + label: Before submitting + options: + - label: I searched existing issues and found no duplicates + required: true +``` + +### 200-feature-request.yml + +```yaml +name: Feature Request +description: Suggest a new feature or enhancement +title: "[Feature]: " +labels: ["type: feature", "status: needs-triage"] +body: + - type: textarea + id: motivation + attributes: + label: Motivation + description: What problem does this solve? Why do you need it? + validations: + required: true + - type: textarea + id: proposal + attributes: + label: Proposed Solution + description: How should this work? Include API sketches if relevant. + validations: + required: true + - type: textarea + id: alternatives + attributes: + label: Alternatives Considered + - type: textarea + id: context + attributes: + label: Additional Context +``` + +### 300-performance.yml + +```yaml +name: Performance Issue +description: Report a performance regression or improvement opportunity +title: "[Perf]: " +labels: ["type: performance", "status: needs-triage"] +body: + - type: textarea + id: description + attributes: + label: Description + description: What performance issue did you observe? + placeholder: "QPS dropped from X to Y after upgrading to version Z" + validations: + required: true + - type: textarea + id: benchmark + attributes: + label: Benchmark Command + description: The exact command you ran + render: shell + validations: + required: true + - type: textarea + id: results + attributes: + label: Results + description: Expected vs actual numbers (QPS, latency, TTFT, TPOT, etc.) + placeholder: | + Expected: ~5000 QPS, p99 latency < 200ms + Actual: ~2000 QPS, p99 latency 800ms + validations: + required: true + - type: textarea + id: environment + attributes: + label: Environment + description: Hardware, OS, Python version, endpoint server details + placeholder: | + Hardware: 8x A100 80GB + OS: Ubuntu 22.04 + Python: 3.12 + Server: vLLM 0.6.0, Llama-3-70B + Workers: 4 + validations: + required: true + - type: textarea + id: profiling + attributes: + label: Profiling Data (optional) + description: Any profiling output, flame graphs, or bottleneck analysis + render: shell + - type: checkboxes + id: checklist + attributes: + label: Before submitting + options: + - label: I searched existing issues and found no duplicates + required: true + - label: I ran with default settings before tuning + required: false +``` + +### 400-dataset-integration.yml + +```yaml +name: Dataset Integration +description: Request support for a new dataset or evaluation benchmark +title: "[Dataset]: " +labels: ["type: feature", "area: dataset", "status: needs-triage"] +body: + - type: textarea + id: dataset + attributes: + label: Dataset Information + description: Name, URL, and brief description + placeholder: | + Name: MATH-500 + URL: https://huggingface.co/datasets/... + Description: 500 competition math problems for testing reasoning + validations: + required: true + - type: dropdown + id: format + attributes: + label: Dataset Format + options: + - JSONL + - HuggingFace Dataset + - CSV + - JSON + - Parquet + - Other + validations: + required: true + - type: textarea + id: evaluation + attributes: + label: Evaluation Method + description: How should responses be scored? + placeholder: "Exact match after extracting boxed answer, or pass@1 for code" + validations: + required: true + - type: textarea + id: samples + attributes: + label: Scale + description: Number of samples, expected prompt/response lengths + placeholder: "500 samples, avg prompt ~200 tokens, avg response ~500 tokens" + - type: textarea + id: context + attributes: + label: Additional Context + description: Related benchmarks, papers, or prior art +``` + +### config.yml + +```yaml +blank_issues_enabled: true +contact_links: + - name: Questions & Discussion + url: https://github.com/mlcommons/endpoints/discussions + about: Ask questions and discuss ideas before filing an issue +``` + +--- + +## 4. CONTRIBUTING.md + +Replace the existing minimal CONTRIBUTING.md with an expanded version (~250 lines) +covering: + +1. **Ways to Contribute** — links to all 4 issue templates, plus docs, PR reviews, + `good first issue` and `help wanted` labels +2. **Development Setup** — prerequisites, fork/clone, venv, `pip install -e ".[dev,test]"`, + pre-commit install, local echo server testing +3. **Code Style and Conventions** — ruff, mypy, line length 88, double quotes, + conventional commits, license headers, serialization conventions + (msgspec vs pydantic), performance-sensitive code guidelines +4. **Testing** — pytest commands, markers (`unit`, `integration`, `slow`, + `performance`), `@pytest.mark.asyncio(mode="strict")`, >90% coverage target, + use real fixtures over mocks +5. **Submitting Changes** — branch naming (`feat/`, `fix/`, `docs/`), PR template, + CI checks, review expectations (2-3 business days), review criteria +6. **Issue Guidelines** — search first, use templates, issue lifecycle + (Inbox → Triage → Ready → In Progress → In Review → Done), priority levels table +7. **MLCommons CLA** — existing CLA requirements preserved + +--- + +## 5. Issue Migration Plan + +### Duplicate Resolution + +Close duplicates with a comment explaining the closure and linking to the primary +issue. Copy any unique context from the duplicate into a comment on the primary +issue so no information is lost. + +| Close | Primary | Reason | +|-------|---------|--------| +| #205 "fully async benchmark" | #255 "Make Loadgen Async" | Same goal, #255 is cleaner | +| #170 "warmup with random dataset" | #86 "Warmup runs" | Subset of #86 | +| #226 "Initial multi-turn enabling" | #232 "multi-turn implementation" | Same feature | +| #29 "submission checker for 6.0" | #79 "submission checker compat mode" | #29 is version-specific, superseded | +| #207 "speedup tokenizer report" | #208 "optimize report generation" | #207 is a specific approach to #208 | +| #83 "Q1 Roadmap" | #223 "Phase 2 Roadmap" | Superseded | + +**Evaluation:** #73 "random dataset support" — keep if random dataset has value +beyond warmup use case; otherwise close as duplicate of #86. + +### Label Reassignment + +All 57 open issues are reassigned from old labels to the new prefixed taxonomy. +Full mapping follows, organized by priority tier. + +#### ShowStopper + +| # | Title | Labels | +|---|-------|--------| +| 84 | Pareto clarification | `priority: ShowStopper`, `area: config-cli`, `mlcommons` | +| 8 | Parity with MLPerf LoadGen | `priority: ShowStopper`, `type: performance`, `area: core-engine` | +| 4 | Accuracy evaluation for LLMs | `priority: ShowStopper`, `type: feature`, `area: evaluation` | + +#### P0 + +| # | Title | Labels | +|---|-------|--------| +| 86 | Warmup runs | `priority: P0`, `type: feature`, `area: core-engine` | +| 183 | Pub/Sub event recorder | `priority: P0`, `type: feature`, `area: metrics` | +| 138 | CI stress test upper bound | `priority: P0`, `type: chore`, `area: core-engine` | +| 6 | Final report structure | `priority: P0`, `type: feature`, `area: metrics` | +| 5 | Submission ruleset + config | `priority: P0`, `type: feature`, `area: config-cli`, `mlcommons` | + +#### P1 + +| # | Title | Labels | +|---|-------|--------| +| 9 | Roofline analysis | `priority: P1`, `type: performance`, `area: core-engine` | +| 255 | Make Loadgen Async | `priority: P1`, `type: feature`, `area: core-engine` | +| 269 | Low concurrency timeouts | `priority: P1`, `type: bug`, `area: client` | +| 237 | CLI fix --load-pattern + --target-qps | `priority: P1`, `type: bug`, `area: config-cli` | +| 219 | target_qps hardcoded in Offline | `priority: P1`, `type: bug`, `area: config-cli` | +| 221 | RuntimeSettings non-reproducible | `priority: P1`, `type: bug`, `area: config-cli` | +| 202 | max_throughput connection timeouts | `priority: P1`, `type: bug`, `area: client` | +| 199 | Perf discrepancy submission vs perf config | `priority: P1`, `type: bug`, `area: config-cli` | +| 217 | BURST and STEP load patterns | `priority: P1`, `type: feature`, `area: core-engine` | +| 222 | KVStore/ServiceLauncher lack tests | `priority: P1`, `type: chore`, `area: core-engine` | +| 220 | SGLang adapter tests skipped | `priority: P1`, `type: chore`, `area: adapters` | +| 182 | Text vs token perf on TRTLLM | `priority: P1`, `type: performance`, `area: metrics` | +| 179 | Humanity's Last Exam | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` | +| 178 | Healthbench integration | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` | +| 177 | MATH500 dataset | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` | +| 176 | MMLU/MMLU-Pro | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` | +| 173 | Investigate mlcr failures | `priority: P1`, `type: bug`, `mlcommons` | +| 113 | DeepSeek | `priority: P1`, `type: feature` | +| 210 | Wan2.2-T2V support | `priority: P1`, `type: feature` | +| 10 | System bottleneck tests | `priority: P1`, `type: performance`, `area: core-engine` | +| 7 | Runtime visualization | `priority: P1`, `type: feature`, `area: metrics` | + +#### P2 + +| # | Title | Labels | +|---|-------|--------| +| 268 | Phase 2 model selection | `priority: P2`, `type: feature` | +| 254 | Handling failed requests | `priority: P2`, `type: feature`, `area: client` | +| 232 | Multi-turn implementation | `priority: P2`, `type: feature`, `area: dataset` | +| 224 | Multiple perf configs | `priority: P2`, `type: feature`, `area: config-cli` | +| 208 | Optimize report generation | `priority: P2`, `type: performance`, `area: metrics` | +| 158 | SGLang adapter + OpenAI compat | `priority: P2`, `type: feature`, `area: adapters` | +| 125 | Multi-concurrency scans | `priority: P2`, `type: feature`, `area: core-engine` | +| 115 | Clarify default metric | `priority: P2`, `type: enhancement`, `area: config-cli` | +| 79 | Submission checker compat mode | `priority: P2`, `type: feature`, `mlcommons` | +| 73 | Random dataset support | `priority: P2`, `type: feature`, `area: dataset` | +| 68 | Official model name mapping | `priority: P2`, `type: feature`, `area: config-cli`, `mlcommons` | +| 58 | Config-template mapping | `priority: P2`, `type: feature`, `area: config-cli`, `mlcommons` | +| 213 | PostGres dup element | `priority: P2`, `type: bug`, `mlcommons` | +| 133 | llama.cpp incompatibility | `priority: P2`, `type: bug`, `area: client` | +| 174 | Better error logging mlcr | `priority: P2`, `type: enhancement`, `mlcommons` | +| 229 | Endpoints test environment | `priority: P2`, `type: chore` | +| 228 | Endpoints Vision document | `priority: P2`, `type: documentation` | +| 227 | DB and Object Store elements | `priority: P2`, `type: feature` | +| 212 | UBI Storage layer | `priority: P2`, `type: feature` | + +#### P3 + +| # | Title | Labels | +|---|-------|--------| +| 99 | Local mode errors | `priority: P3`, `type: bug`, `good first issue` | +| 50 | LlaMa3-405b support | `priority: P3`, `type: feature` | +| 204 | Documentation cleanup | `priority: P3`, `type: documentation` | +| 190 | Skills, design docs, tooling | `priority: P3`, `type: chore` | +| 181 | Sweep qwen scripts | `priority: P3`, `type: feature` | + +#### Other (no priority) + +| # | Title | Labels | +|---|-------|--------| +| 223 | Phase 2 Roadmap | `type: RFC` | +| 267 | Bump transformers | `type: chore`, `dependencies`, `security` | + +### Q2 Board Population + +**Add to board #57 (~40 issues):** All ShowStopper, P0, P1, and P2 issues. +Initial status: **Triage** (existing issues need priority confirmation from team). + +**Not on Q2 board (~5 issues):** P3 issues (#99, #50, #204, #190, #181) and +dependabot (#267). + +### Milestones + +Create milestones as releases are planned: +- `v0.5.0` — first milestone, assign issues as release scope is defined +- `v1.0.0` — future + +--- + +## 6. Phase 2 (Future) + +Trigger when issue volume > 100 or contributors > 10: + +- Add `size: S`, `size: M`, `size: L`, `size: XL` effort labels +- Disable blank issues in `config.yml` +- Add stale bot (apply `status: stale` after 90 days, close after 30 more) +- Add iteration/sprint fields to board if team adopts time-boxed cycles +- Split coarse area labels if any accumulates > 20 issues + +--- + +## 7. Migration Procedure + +Order of operations for the migration: + +1. **Create new labels** — all `type:`, `priority:`, `area:`, `status:` labels +2. **Relabel existing issues** — apply new labels per the mapping above +3. **Remove old labels from issues** — strip legacy labels +4. **Close duplicates** — comment with explanation + link to primary, copy unique + context to primary issue +5. **Delete old labels** — remove legacy labels from the repository +6. **Add issues to board #57** — all ShowStopper through P2 +7. **Set board status** — all migrated issues start in Triage +8. **Configure board automations** — auto-add, auto-done, auto-archive +9. **Create issue templates** — add all 4 YAML templates + config.yml +10. **Update CONTRIBUTING.md** — replace with expanded version +11. **Commit and push** — templates + CONTRIBUTING.md in a single PR From d76e0100eaddd5818b0eafee355999b01afb0137 Mon Sep 17 00:00:00 2001 From: Zhihan Jiang Date: Tue, 7 Apr 2026 14:44:21 -0700 Subject: [PATCH 02/14] docs: update design spec with priority corrections, PR linkages, and dedup cleanup MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Priority changes: #217→P2, #178→P2, #179→P2, #173→P2, #268→P1, #232→P0, #9→P1 Added open PR to issue linkage table. Co-Authored-By: Claude Opus 4.6 (1M context) --- .../2026-04-07-project-management-design.md | 33 +++++++++++++++---- 1 file changed, 26 insertions(+), 7 deletions(-) diff --git a/docs/superpowers/specs/2026-04-07-project-management-design.md b/docs/superpowers/specs/2026-04-07-project-management-design.md index b2d37156..43e5d446 100644 --- a/docs/superpowers/specs/2026-04-07-project-management-design.md +++ b/docs/superpowers/specs/2026-04-07-project-management-design.md @@ -468,6 +468,7 @@ Full mapping follows, organized by priority tier. | # | Title | Labels | |---|-------|--------| | 86 | Warmup runs | `priority: P0`, `type: feature`, `area: core-engine` | +| 232 | Multi-turn implementation | `priority: P0`, `type: feature`, `area: dataset` | | 183 | Pub/Sub event recorder | `priority: P0`, `type: feature`, `area: metrics` | | 138 | CI stress test upper bound | `priority: P0`, `type: chore`, `area: core-engine` | | 6 | Final report structure | `priority: P0`, `type: feature`, `area: metrics` | @@ -485,17 +486,14 @@ Full mapping follows, organized by priority tier. | 221 | RuntimeSettings non-reproducible | `priority: P1`, `type: bug`, `area: config-cli` | | 202 | max_throughput connection timeouts | `priority: P1`, `type: bug`, `area: client` | | 199 | Perf discrepancy submission vs perf config | `priority: P1`, `type: bug`, `area: config-cli` | -| 217 | BURST and STEP load patterns | `priority: P1`, `type: feature`, `area: core-engine` | | 222 | KVStore/ServiceLauncher lack tests | `priority: P1`, `type: chore`, `area: core-engine` | | 220 | SGLang adapter tests skipped | `priority: P1`, `type: chore`, `area: adapters` | | 182 | Text vs token perf on TRTLLM | `priority: P1`, `type: performance`, `area: metrics` | -| 179 | Humanity's Last Exam | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` | -| 178 | Healthbench integration | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` | | 177 | MATH500 dataset | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` | | 176 | MMLU/MMLU-Pro | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` | -| 173 | Investigate mlcr failures | `priority: P1`, `type: bug`, `mlcommons` | | 113 | DeepSeek | `priority: P1`, `type: feature` | | 210 | Wan2.2-T2V support | `priority: P1`, `type: feature` | +| 268 | Phase 2 model selection | `priority: P1`, `type: feature` | | 10 | System bottleneck tests | `priority: P1`, `type: performance`, `area: core-engine` | | 7 | Runtime visualization | `priority: P1`, `type: feature`, `area: metrics` | @@ -503,9 +501,11 @@ Full mapping follows, organized by priority tier. | # | Title | Labels | |---|-------|--------| -| 268 | Phase 2 model selection | `priority: P2`, `type: feature` | | 254 | Handling failed requests | `priority: P2`, `type: feature`, `area: client` | -| 232 | Multi-turn implementation | `priority: P2`, `type: feature`, `area: dataset` | +| 217 | BURST and STEP load patterns | `priority: P2`, `type: feature`, `area: core-engine` | +| 179 | Humanity's Last Exam | `priority: P2`, `type: feature`, `area: evaluation`, `area: dataset` | +| 178 | Healthbench integration | `priority: P2`, `type: feature`, `area: evaluation`, `area: dataset` | +| 173 | Investigate mlcr failures | `priority: P2`, `type: bug`, `mlcommons` | | 224 | Multiple perf configs | `priority: P2`, `type: feature`, `area: config-cli` | | 208 | Optimize report generation | `priority: P2`, `type: performance`, `area: metrics` | | 158 | SGLang adapter + OpenAI compat | `priority: P2`, `type: feature`, `area: adapters` | @@ -583,4 +583,23 @@ Order of operations for the migration: 8. **Configure board automations** — auto-add, auto-done, auto-archive 9. **Create issue templates** — add all 4 YAML templates + config.yml 10. **Update CONTRIBUTING.md** — replace with expanded version -11. **Commit and push** — templates + CONTRIBUTING.md in a single PR +11. **Link open PRs to issues** — add "Relates to #N" comments where applicable +12. **Commit and push** — templates + CONTRIBUTING.md in a single PR + +### Open PR → Issue Linkages + +| PR | Linked Issue | Relationship | +|----|-------------|--------------| +| #255 Make Loadgen Async | #255 (same) | PR is the issue | +| #237 CLI fix --load-pattern + --target-qps | #237 (same) | PR is the issue | +| #226 Initial multi-turn enabling | #232 multi-turn implementation | PR implements #232; #226 issue closed as dup | +| #207 Speedup tokenizer report | #208 optimize report generation | PR implements #208; #207 issue closed as dup | +| #205 Fully async benchmark | #255 Make Loadgen Async | Duplicate PR; #205 issue closed as dup | +| #204 Documentation cleanup | #204 (same) | PR is the issue | +| #190 Skills, design docs, tooling | #190 (same) | PR is the issue | +| #181 Sweep qwen scripts | #181 (same) | PR is the issue | +| #170 Warmup with random dataset | #86 Warmup runs | PR implements #86; #170 issue closed as dup | +| #158 SGLang adapter + OpenAI compat | #158 (same) | PR is the issue | +| #125 Multi-concurrency scans | #125 (same) | PR is the issue | +| #79 Submission checker compat | #79 (same) + #29 (superseded) | PR is the issue | +| #267 Bump transformers | #267 (dependabot) | PR is the issue | From cdf2ed329f3b044719e21ca7586b6ac389494ec0 Mon Sep 17 00:00:00 2001 From: Zhihan Jiang Date: Tue, 7 Apr 2026 14:51:51 -0700 Subject: [PATCH 03/14] docs: add project management implementation plan 13-task plan covering labels, board, templates, CONTRIBUTING.md, issue migration, duplicate closure, PR linkages, and board automation setup. Co-Authored-By: Claude Opus 4.6 (1M context) --- .../plans/2026-04-07-project-management.md | 1092 +++++++++++++++++ 1 file changed, 1092 insertions(+) create mode 100644 docs/superpowers/plans/2026-04-07-project-management.md diff --git a/docs/superpowers/plans/2026-04-07-project-management.md b/docs/superpowers/plans/2026-04-07-project-management.md new file mode 100644 index 00000000..5dff6134 --- /dev/null +++ b/docs/superpowers/plans/2026-04-07-project-management.md @@ -0,0 +1,1092 @@ +# Project Management Infrastructure Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Set up labels, project board, issue templates, CONTRIBUTING.md, and migrate all 57 open issues for the mlcommons/endpoints GitHub repository. + +**Architecture:** All GitHub API interactions use `curl` with auth token (the `gh` CLI has TLS certificate issues in this environment). Board configuration uses the GitHub GraphQL API for Projects V2. File changes (templates, CONTRIBUTING.md) are committed locally and pushed as a PR. + +**Tech Stack:** GitHub REST API, GitHub GraphQL API, curl, bash, git + +**IMPORTANT — API access pattern:** The `gh` CLI cannot make API calls due to TLS errors. Every API call must use this pattern: +```bash +TOKEN=$(gh auth token 2>&1) +curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" "https://api.github.com/..." +``` +For GraphQL: +```bash +TOKEN=$(gh auth token 2>&1) +curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + -X POST https://api.github.com/graphql \ + -d '{"query":"..."}' +``` + +**IMPORTANT — Label names with colons:** GitHub label names containing spaces and colons must be URL-encoded in REST API paths. For example, `type: bug` becomes `type%3A%20bug` in URLs. When creating labels via POST body (JSON), use the literal name. + +--- + +## File Structure + +No new source code files. Changes are: + +- **Create:** `.github/ISSUE_TEMPLATE/100-bug-report.yml` +- **Create:** `.github/ISSUE_TEMPLATE/200-feature-request.yml` +- **Create:** `.github/ISSUE_TEMPLATE/300-performance.yml` +- **Create:** `.github/ISSUE_TEMPLATE/400-dataset-integration.yml` +- **Create:** `.github/ISSUE_TEMPLATE/config.yml` +- **Modify:** `CONTRIBUTING.md` (full rewrite) + +All other changes are GitHub API operations (labels, board, issues) — no local files. + +--- + +### Task 1: Create New Labels + +Create all 23 new labels on the repository via the REST API. Existing labels that are being kept (`good first issue`, `help wanted`, `dependencies`, `security`, `duplicate`, `invalid`, `wontfix`) are untouched. The `mlcommons` label needs to be created fresh (the old `MLCommons` with capital M will be removed later). + +**Files:** None (API only) + +- [ ] **Step 1: Create all type labels** + +Run this script. It creates 8 type labels: + +```bash +TOKEN=$(gh auth token 2>&1) +REPO="mlcommons/endpoints" + +for label_json in \ + '{"name":"type: bug","color":"d73a4a","description":"Something isn'\''t working"}' \ + '{"name":"type: feature","color":"a2eeef","description":"New feature or capability"}' \ + '{"name":"type: enhancement","color":"bfd4f2","description":"Improvement to existing functionality"}' \ + '{"name":"type: performance","color":"3ddd26","description":"Performance regression or improvement"}' \ + '{"name":"type: documentation","color":"0075ca","description":"Documentation only"}' \ + '{"name":"type: question","color":"d876e3","description":"Usage question or clarification"}' \ + '{"name":"type: RFC","color":"76fde7","description":"Request for comments / design proposal"}' \ + '{"name":"type: chore","color":"ededed","description":"Maintenance, deps, CI, tooling"}'; do + echo "Creating: $(echo "$label_json" | python3 -c 'import sys,json; print(json.load(sys.stdin)["name"])')" + curl -s -X POST \ + -H "Authorization: token $TOKEN" \ + -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/labels" \ + -d "$label_json" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f" -> {d.get(\"name\", d.get(\"message\", \"error\"))}")' +done +``` + +Expected: 8 lines showing each label name created successfully. + +- [ ] **Step 2: Create all priority labels** + +```bash +TOKEN=$(gh auth token 2>&1) +REPO="mlcommons/endpoints" + +for label_json in \ + '{"name":"priority: ShowStopper","color":"000000","description":"Drop everything — critical blocker, all hands on deck"}' \ + '{"name":"priority: P0","color":"b60205","description":"Critical — blocks release or users"}' \ + '{"name":"priority: P1","color":"d93f0b","description":"High — must address this cycle"}' \ + '{"name":"priority: P2","color":"fbca04","description":"Medium — address within quarter"}' \ + '{"name":"priority: P3","color":"0e8a16","description":"Low — backlog, nice to have"}'; do + echo "Creating: $(echo "$label_json" | python3 -c 'import sys,json; print(json.load(sys.stdin)["name"])')" + curl -s -X POST \ + -H "Authorization: token $TOKEN" \ + -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/labels" \ + -d "$label_json" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f" -> {d.get(\"name\", d.get(\"message\", \"error\"))}")' +done +``` + +Expected: 5 labels created. + +- [ ] **Step 3: Create all area labels** + +```bash +TOKEN=$(gh auth token 2>&1) +REPO="mlcommons/endpoints" + +for label_json in \ + '{"name":"area: core-engine","color":"c5def5","description":"Load generator, scheduler, async utils"}' \ + '{"name":"area: client","color":"c5def5","description":"Endpoint client, HTTP, transport, ZMQ"}' \ + '{"name":"area: metrics","color":"c5def5","description":"Event recorder, metrics reporter, reporting"}' \ + '{"name":"area: dataset","color":"c5def5","description":"Dataset manager, formats, predefined datasets"}' \ + '{"name":"area: config-cli","color":"c5def5","description":"Config schema, CLI commands, YAML"}' \ + '{"name":"area: evaluation","color":"c5def5","description":"Accuracy evaluation, scoring, extractors"}' \ + '{"name":"area: adapters","color":"c5def5","description":"OpenAI, SGLang protocol adapters"}'; do + echo "Creating: $(echo "$label_json" | python3 -c 'import sys,json; print(json.load(sys.stdin)["name"])')" + curl -s -X POST \ + -H "Authorization: token $TOKEN" \ + -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/labels" \ + -d "$label_json" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f" -> {d.get(\"name\", d.get(\"message\", \"error\"))}")' +done +``` + +Expected: 7 labels created. + +- [ ] **Step 4: Create status labels and mlcommons label** + +```bash +TOKEN=$(gh auth token 2>&1) +REPO="mlcommons/endpoints" + +for label_json in \ + '{"name":"status: needs-triage","color":"e99695","description":"New issue, awaiting review"}' \ + '{"name":"status: needs-info","color":"f9d0c4","description":"Awaiting more details from reporter"}' \ + '{"name":"status: blocked","color":"b60205","description":"Blocked on external dependency or decision"}' \ + '{"name":"mlcommons","color":"e0703c","description":"MLCommons ruleset/submission integration"}'; do + echo "Creating: $(echo "$label_json" | python3 -c 'import sys,json; print(json.load(sys.stdin)["name"])')" + curl -s -X POST \ + -H "Authorization: token $TOKEN" \ + -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/labels" \ + -d "$label_json" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f" -> {d.get(\"name\", d.get(\"message\", \"error\"))}")' +done +``` + +Expected: 4 labels created (mlcommons may say "already_exists" if the old `MLCommons` case-insensitively matches — if so, update it in a later step). + +- [ ] **Step 5: Verify all new labels exist** + +```bash +TOKEN=$(gh auth token 2>&1) +curl -s -H "Authorization: token $TOKEN" \ + "https://api.github.com/repos/mlcommons/endpoints/labels?per_page=100" | \ + python3 -c " +import sys, json +labels = json.load(sys.stdin) +names = sorted([l['name'] for l in labels]) +print(f'Total labels: {len(names)}') +for n in names: + print(f' {n}') +" +``` + +Expected: All new `type:`, `priority:`, `area:`, `status:` labels present alongside existing labels. + +--- + +### Task 2: Relabel All Open Issues + +Apply new labels and remove old labels for every open issue, following the spec's mapping exactly. This is done in batches by priority tier. + +**Files:** None (API only) + +**IMPORTANT:** The GitHub `PUT /repos/{owner}/{repo}/issues/{number}/labels` endpoint **replaces** all labels on an issue. So each call must include the complete set of new labels for that issue. + +- [ ] **Step 1: Relabel ShowStopper issues** + +```bash +TOKEN=$(gh auth token 2>&1) +REPO="mlcommons/endpoints" + +# #84 - Pareto clarification +curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/84/labels" \ + -d '{"labels":["priority: ShowStopper","area: config-cli","mlcommons"]}' | python3 -c 'import sys,json; print(f"#84: {[l[\"name\"] for l in json.load(sys.stdin)]}")' + +# #8 - Parity with MLPerf LoadGen +curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/8/labels" \ + -d '{"labels":["priority: ShowStopper","type: performance","area: core-engine"]}' | python3 -c 'import sys,json; print(f"#8: {[l[\"name\"] for l in json.load(sys.stdin)]}")' + +# #4 - Accuracy evaluation for LLMs +curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/4/labels" \ + -d '{"labels":["priority: ShowStopper","type: feature","area: evaluation"]}' | python3 -c 'import sys,json; print(f"#4: {[l[\"name\"] for l in json.load(sys.stdin)]}")' +``` + +Expected: Each issue prints its new label set. + +- [ ] **Step 2: Relabel P0 issues** + +```bash +TOKEN=$(gh auth token 2>&1) +REPO="mlcommons/endpoints" + +# #86 - Warmup runs +curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/86/labels" \ + -d '{"labels":["priority: P0","type: feature","area: core-engine"]}' | python3 -c 'import sys,json; print(f"#86: {[l[\"name\"] for l in json.load(sys.stdin)]}")' + +# #232 - Multi-turn implementation +curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/232/labels" \ + -d '{"labels":["priority: P0","type: feature","area: dataset"]}' | python3 -c 'import sys,json; print(f"#232: {[l[\"name\"] for l in json.load(sys.stdin)]}")' + +# #183 - Pub/Sub event recorder +curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/183/labels" \ + -d '{"labels":["priority: P0","type: feature","area: metrics"]}' | python3 -c 'import sys,json; print(f"#183: {[l[\"name\"] for l in json.load(sys.stdin)]}")' + +# #138 - CI stress test upper bound +curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/138/labels" \ + -d '{"labels":["priority: P0","type: chore","area: core-engine"]}' | python3 -c 'import sys,json; print(f"#138: {[l[\"name\"] for l in json.load(sys.stdin)]}")' + +# #6 - Final report structure +curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/6/labels" \ + -d '{"labels":["priority: P0","type: feature","area: metrics"]}' | python3 -c 'import sys,json; print(f"#6: {[l[\"name\"] for l in json.load(sys.stdin)]}")' + +# #5 - Submission ruleset + config +curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/5/labels" \ + -d '{"labels":["priority: P0","type: feature","area: config-cli","mlcommons"]}' | python3 -c 'import sys,json; print(f"#5: {[l[\"name\"] for l in json.load(sys.stdin)]}")' +``` + +Expected: 6 issues relabeled. + +- [ ] **Step 3: Relabel P1 issues** + +```bash +TOKEN=$(gh auth token 2>&1) +REPO="mlcommons/endpoints" + +declare -A P1_LABELS +P1_LABELS[9]='["priority: P1","type: performance","area: core-engine"]' +P1_LABELS[255]='["priority: P1","type: feature","area: core-engine"]' +P1_LABELS[269]='["priority: P1","type: bug","area: client"]' +P1_LABELS[237]='["priority: P1","type: bug","area: config-cli"]' +P1_LABELS[219]='["priority: P1","type: bug","area: config-cli"]' +P1_LABELS[221]='["priority: P1","type: bug","area: config-cli"]' +P1_LABELS[202]='["priority: P1","type: bug","area: client"]' +P1_LABELS[199]='["priority: P1","type: bug","area: config-cli"]' +P1_LABELS[222]='["priority: P1","type: chore","area: core-engine"]' +P1_LABELS[220]='["priority: P1","type: chore","area: adapters"]' +P1_LABELS[182]='["priority: P1","type: performance","area: metrics"]' +P1_LABELS[177]='["priority: P1","type: feature","area: evaluation","area: dataset"]' +P1_LABELS[176]='["priority: P1","type: feature","area: evaluation","area: dataset"]' +P1_LABELS[113]='["priority: P1","type: feature"]' +P1_LABELS[210]='["priority: P1","type: feature"]' +P1_LABELS[268]='["priority: P1","type: feature"]' +P1_LABELS[10]='["priority: P1","type: performance","area: core-engine"]' +P1_LABELS[7]='["priority: P1","type: feature","area: metrics"]' + +for issue in "${!P1_LABELS[@]}"; do + curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/$issue/labels" \ + -d "{\"labels\":${P1_LABELS[$issue]}}" | python3 -c "import sys,json; print(f'#$issue: {[l[\"name\"] for l in json.load(sys.stdin)]}')" +done +``` + +Expected: 18 issues relabeled. + +- [ ] **Step 4: Relabel P2 issues** + +```bash +TOKEN=$(gh auth token 2>&1) +REPO="mlcommons/endpoints" + +declare -A P2_LABELS +P2_LABELS[254]='["priority: P2","type: feature","area: client"]' +P2_LABELS[217]='["priority: P2","type: feature","area: core-engine"]' +P2_LABELS[179]='["priority: P2","type: feature","area: evaluation","area: dataset"]' +P2_LABELS[178]='["priority: P2","type: feature","area: evaluation","area: dataset"]' +P2_LABELS[173]='["priority: P2","type: bug","mlcommons"]' +P2_LABELS[224]='["priority: P2","type: feature","area: config-cli"]' +P2_LABELS[208]='["priority: P2","type: performance","area: metrics"]' +P2_LABELS[158]='["priority: P2","type: feature","area: adapters"]' +P2_LABELS[125]='["priority: P2","type: feature","area: core-engine"]' +P2_LABELS[115]='["priority: P2","type: enhancement","area: config-cli"]' +P2_LABELS[79]='["priority: P2","type: feature","mlcommons"]' +P2_LABELS[73]='["priority: P2","type: feature","area: dataset"]' +P2_LABELS[68]='["priority: P2","type: feature","area: config-cli","mlcommons"]' +P2_LABELS[58]='["priority: P2","type: feature","area: config-cli","mlcommons"]' +P2_LABELS[213]='["priority: P2","type: bug","mlcommons"]' +P2_LABELS[133]='["priority: P2","type: bug","area: client"]' +P2_LABELS[174]='["priority: P2","type: enhancement","mlcommons"]' +P2_LABELS[229]='["priority: P2","type: chore"]' +P2_LABELS[228]='["priority: P2","type: documentation"]' +P2_LABELS[227]='["priority: P2","type: feature"]' +P2_LABELS[212]='["priority: P2","type: feature"]' + +for issue in "${!P2_LABELS[@]}"; do + curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/$issue/labels" \ + -d "{\"labels\":${P2_LABELS[$issue]}}" | python3 -c "import sys,json; print(f'#$issue: {[l[\"name\"] for l in json.load(sys.stdin)]}')" +done +``` + +Expected: 21 issues relabeled. + +- [ ] **Step 5: Relabel P3 and other issues** + +```bash +TOKEN=$(gh auth token 2>&1) +REPO="mlcommons/endpoints" + +# P3 issues +curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/99/labels" \ + -d '{"labels":["priority: P3","type: bug","good first issue"]}' | python3 -c 'import sys,json; print(f"#99: {[l[\"name\"] for l in json.load(sys.stdin)]}")' + +curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/50/labels" \ + -d '{"labels":["priority: P3","type: feature"]}' | python3 -c 'import sys,json; print(f"#50: {[l[\"name\"] for l in json.load(sys.stdin)]}")' + +curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/204/labels" \ + -d '{"labels":["priority: P3","type: documentation"]}' | python3 -c 'import sys,json; print(f"#204: {[l[\"name\"] for l in json.load(sys.stdin)]}")' + +curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/190/labels" \ + -d '{"labels":["priority: P3","type: chore"]}' | python3 -c 'import sys,json; print(f"#190: {[l[\"name\"] for l in json.load(sys.stdin)]}")' + +curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/181/labels" \ + -d '{"labels":["priority: P3","type: feature"]}' | python3 -c 'import sys,json; print(f"#181: {[l[\"name\"] for l in json.load(sys.stdin)]}")' + +# Other (no priority) +curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/223/labels" \ + -d '{"labels":["type: RFC"]}' | python3 -c 'import sys,json; print(f"#223: {[l[\"name\"] for l in json.load(sys.stdin)]}")' + +curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/267/labels" \ + -d '{"labels":["type: chore","dependencies","security"]}' | python3 -c 'import sys,json; print(f"#267: {[l[\"name\"] for l in json.load(sys.stdin)]}")' +``` + +Expected: 7 issues relabeled. + +- [ ] **Step 6: Verify relabeling — spot check 5 issues** + +```bash +TOKEN=$(gh auth token 2>&1) +for issue in 84 232 269 208 99; do + curl -s -H "Authorization: token $TOKEN" \ + "https://api.github.com/repos/mlcommons/endpoints/issues/$issue" | \ + python3 -c "import sys,json; d=json.load(sys.stdin); print(f'#{d[\"number\"]} {d[\"title\"]}: {[l[\"name\"] for l in d[\"labels\"]]}')" +done +``` + +Expected: Each issue shows only its new prefixed labels. + +--- + +### Task 3: Close Duplicate Issues + +For each duplicate, first read its body to preserve unique context, then comment on the primary issue with that context, then close the duplicate with an explanation. + +**Files:** None (API only) + +- [ ] **Step 1: Close #205 as duplicate of #255 (async benchmark)** + +```bash +TOKEN=$(gh auth token 2>&1) +REPO="mlcommons/endpoints" + +# Get #205 body for context preservation +BODY_205=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/205" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")') + +# Comment on primary #255 with context from #205 +curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/255/comments" \ + -d "$(python3 -c " +import json +body = '''Context preserved from duplicate #205 (fully async benchmark): + +$BODY_205''' +print(json.dumps({'body': body})) +")" | python3 -c 'import sys,json; print(f"Commented on #255: {json.load(sys.stdin).get(\"id\",\"error\")}")' + +# Comment on #205 explaining closure +curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/205/comments" \ + -d '{"body":"Closing as duplicate of #255 (Make Loadgen Async). Both issues target the same goal of making the benchmark fully async. Unique context from this issue has been copied to #255."}' | python3 -c 'import sys,json; print(f"Commented on #205: {json.load(sys.stdin).get(\"id\",\"error\")}")' + +# Close #205 +curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/205" \ + -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#205 state: {json.load(sys.stdin).get(\"state\",\"error\")}")' +``` + +Expected: #205 closed, context preserved on #255. + +- [ ] **Step 2: Close #170 as duplicate of #86 (warmup)** + +```bash +TOKEN=$(gh auth token 2>&1) +REPO="mlcommons/endpoints" + +BODY_170=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/170" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")') + +curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/86/comments" \ + -d "$(python3 -c " +import json +body = '''Context preserved from duplicate #170 (warmup with random dataset): + +$BODY_170''' +print(json.dumps({'body': body})) +")" | python3 -c 'import sys,json; print(f"Commented on #86: {json.load(sys.stdin).get(\"id\",\"error\")}")' + +curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/170/comments" \ + -d '{"body":"Closing as duplicate of #86 (Warmup runs). This issue describes a specific warmup implementation approach (random dataset) which is a subset of #86. Unique context has been copied to #86."}' | python3 -c 'import sys,json; print(f"Commented on #170: {json.load(sys.stdin).get(\"id\",\"error\")}")' + +curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/170" \ + -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#170 state: {json.load(sys.stdin).get(\"state\",\"error\")}")' +``` + +- [ ] **Step 3: Close #226 as duplicate of #232 (multi-turn)** + +```bash +TOKEN=$(gh auth token 2>&1) +REPO="mlcommons/endpoints" + +BODY_226=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/226" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")') + +curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/232/comments" \ + -d "$(python3 -c " +import json +body = '''Context preserved from duplicate #226 (Initial multi-turn enabling): + +$BODY_226''' +print(json.dumps({'body': body})) +")" | python3 -c 'import sys,json; print(f"Commented on #232: {json.load(sys.stdin).get(\"id\",\"error\")}")' + +curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/226/comments" \ + -d '{"body":"Closing as duplicate of #232 (multi-turn implementation). Both track the same multi-turn feature. Unique context has been copied to #232."}' | python3 -c 'import sys,json; print(f"Commented on #226: {json.load(sys.stdin).get(\"id\",\"error\")}")' + +curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/226" \ + -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#226 state: {json.load(sys.stdin).get(\"state\",\"error\")}")' +``` + +- [ ] **Step 4: Close #29 as superseded by #79 (submission checker)** + +```bash +TOKEN=$(gh auth token 2>&1) +REPO="mlcommons/endpoints" + +BODY_29=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/29" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")') + +curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/79/comments" \ + -d "$(python3 -c " +import json +body = '''Context preserved from superseded #29 (submission checker for 6.0): + +$BODY_29''' +print(json.dumps({'body': body})) +")" | python3 -c 'import sys,json; print(f"Commented on #79: {json.load(sys.stdin).get(\"id\",\"error\")}")' + +curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/29/comments" \ + -d '{"body":"Closing as superseded by #79 (submission checker compatibility mode). #29 was version-specific (6.0) while #79 covers the general compatibility feature. Context has been preserved on #79."}' | python3 -c 'import sys,json; print(f"Commented on #29: {json.load(sys.stdin).get(\"id\",\"error\")}")' + +curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/29" \ + -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#29 state: {json.load(sys.stdin).get(\"state\",\"error\")}")' +``` + +- [ ] **Step 5: Close #207 as duplicate of #208 (report generation)** + +```bash +TOKEN=$(gh auth token 2>&1) +REPO="mlcommons/endpoints" + +BODY_207=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/207" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")') + +curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/208/comments" \ + -d "$(python3 -c " +import json +body = '''Context preserved from duplicate #207 (speedup tokenizer report generation): + +$BODY_207''' +print(json.dumps({'body': body})) +")" | python3 -c 'import sys,json; print(f"Commented on #208: {json.load(sys.stdin).get(\"id\",\"error\")}")' + +curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/207/comments" \ + -d '{"body":"Closing as duplicate of #208 (optimize report generation). #207 describes a specific approach (parallel tokenization) to #208'\''s broader goal. Context has been preserved on #208."}' | python3 -c 'import sys,json; print(f"Commented on #207: {json.load(sys.stdin).get(\"id\",\"error\")}")' + +curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/207" \ + -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#207 state: {json.load(sys.stdin).get(\"state\",\"error\")}")' +``` + +- [ ] **Step 6: Close #83 as superseded by #223 (roadmap)** + +```bash +TOKEN=$(gh auth token 2>&1) +REPO="mlcommons/endpoints" + +curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/83/comments" \ + -d '{"body":"Closing as superseded by #223 (Phase 2 Roadmap). The Q1 roadmap is complete and Phase 2 planning has taken over."}' | python3 -c 'import sys,json; print(f"Commented on #83: {json.load(sys.stdin).get(\"id\",\"error\")}")' + +curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/83" \ + -d '{"state":"closed","state_reason":"completed"}' | python3 -c 'import sys,json; print(f"#83 state: {json.load(sys.stdin).get(\"state\",\"error\")}")' +``` + +--- + +### Task 4: Delete Legacy Labels + +Remove old labels that have been replaced. Only delete after all issues have been relabeled (Task 2 complete). + +**Files:** None (API only) + +- [ ] **Step 1: Delete all legacy labels** + +```bash +TOKEN=$(gh auth token 2>&1) +REPO="mlcommons/endpoints" + +# URL-encode label names: spaces→%20, colons are fine in DELETE paths +for label in "bug" "feature" "enhancement" "documentation" "performance" "question" \ + "P0" "P1" "P2" "ShowStopper" "testing" "accuracy" "dataset" "Roadmap" "blocked" \ + "rules" "MLCommons"; do + encoded=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$label'))") + echo -n "Deleting '$label'... " + STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X DELETE \ + -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/labels/$encoded") + if [ "$STATUS" = "204" ]; then echo "deleted"; elif [ "$STATUS" = "404" ]; then echo "not found (already gone)"; else echo "status $STATUS"; fi +done +``` + +Expected: Each label prints "deleted" or "not found". No errors. + +- [ ] **Step 2: Verify final label set** + +```bash +TOKEN=$(gh auth token 2>&1) +curl -s -H "Authorization: token $TOKEN" \ + "https://api.github.com/repos/mlcommons/endpoints/labels?per_page=100" | \ + python3 -c " +import sys, json +labels = json.load(sys.stdin) +names = sorted([l['name'] for l in labels]) +print(f'Total labels: {len(names)}') +for n in names: + print(f' {n}') +" +``` + +Expected: Only new prefixed labels plus kept labels (`good first issue`, `help wanted`, `mlcommons`, `dependencies`, `security`, `duplicate`, `invalid`, `wontfix`). No old labels remain. + +--- + +### Task 5: Configure Project Board #57 + +Set up the board with status field options, custom fields, and 4 views using the GraphQL API. + +**Files:** None (API only) + +**NOTE:** The board already exists with ID `PVT_kwDOBAnwDc4BTQvY`. We need to configure its fields and views. + +- [ ] **Step 1: Get the board's field IDs** + +```bash +TOKEN=$(gh auth token 2>&1) +curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + -X POST https://api.github.com/graphql \ + -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { fields(first: 20) { nodes { ... on ProjectV2Field { id name } ... on ProjectV2SingleSelectField { id name options { id name } } ... on ProjectV2IterationField { id name } } } } } }"}' | python3 -m json.tool +``` + +Expected: JSON listing all existing fields with their IDs. Look for the "Status" field and its current options. Record the Status field ID for next steps. + +- [ ] **Step 2: Update the Status field with 6 options** + +Using the Status field ID from Step 1, update its options. The GraphQL mutation is `updateProjectV2Field`. First, clear existing options and set the 6 new ones. + +**Note:** You must adapt the field ID from Step 1's output. Replace `STATUS_FIELD_ID` below with the actual ID. + +```bash +TOKEN=$(gh auth token 2>&1) + +# Get current status field ID (adapt if needed) +STATUS_FIELD_ID="" + +# Update status field options using the updateProjectV2SingleSelectField mutation +curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + -X POST https://api.github.com/graphql \ + -d '{ + "query": "mutation { updateProjectV2Field(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", fieldId: \"'"$STATUS_FIELD_ID"'\", singleSelectOptions: [{name: \"Inbox\", color: GRAY}, {name: \"Triage\", color: YELLOW}, {name: \"Ready\", color: BLUE}, {name: \"In Progress\", color: ORANGE}, {name: \"In Review\", color: PURPLE}, {name: \"Done\", color: GREEN}]}) { projectV2Field { ... on ProjectV2SingleSelectField { id options { id name } } } }" + }' | python3 -m json.tool +``` + +Expected: Returns the updated Status field with 6 options. + +- [ ] **Step 3: Create Priority custom field** + +```bash +TOKEN=$(gh auth token 2>&1) +curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + -X POST https://api.github.com/graphql \ + -d '{ + "query": "mutation { createProjectV2Field(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", dataType: SINGLE_SELECT, name: \"Priority\", singleSelectOptions: [{name: \"ShowStopper\", color: RED}, {name: \"P0\", color: RED}, {name: \"P1\", color: ORANGE}, {name: \"P2\", color: YELLOW}, {name: \"P3\", color: GREEN}]}) { projectV2Field { ... on ProjectV2SingleSelectField { id name options { id name } } } }" + }' | python3 -m json.tool +``` + +Expected: Priority field created with 5 options. + +- [ ] **Step 4: Create Area custom field** + +```bash +TOKEN=$(gh auth token 2>&1) +curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + -X POST https://api.github.com/graphql \ + -d '{ + "query": "mutation { createProjectV2Field(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", dataType: SINGLE_SELECT, name: \"Area\", singleSelectOptions: [{name: \"core-engine\", color: BLUE}, {name: \"client\", color: BLUE}, {name: \"metrics\", color: BLUE}, {name: \"dataset\", color: BLUE}, {name: \"config-cli\", color: BLUE}, {name: \"evaluation\", color: BLUE}, {name: \"adapters\", color: BLUE}, {name: \"mlcommons\", color: PURPLE}]}) { projectV2Field { ... on ProjectV2SingleSelectField { id name options { id name } } } }" + }' | python3 -m json.tool +``` + +Expected: Area field created with 8 options. + +- [ ] **Step 5: Create Target Release custom field** + +```bash +TOKEN=$(gh auth token 2>&1) +curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + -X POST https://api.github.com/graphql \ + -d '{ + "query": "mutation { createProjectV2Field(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", dataType: SINGLE_SELECT, name: \"Target Release\", singleSelectOptions: [{name: \"v0.5.0\", color: GRAY}, {name: \"v1.0.0\", color: GRAY}]}) { projectV2Field { ... on ProjectV2SingleSelectField { id name options { id name } } } }" + }' | python3 -m json.tool +``` + +Expected: Target Release field created. + +- [ ] **Step 6: Verify all fields exist** + +```bash +TOKEN=$(gh auth token 2>&1) +curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + -X POST https://api.github.com/graphql \ + -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { fields(first: 20) { nodes { ... on ProjectV2Field { id name } ... on ProjectV2SingleSelectField { id name options { id name } } } } } } }"}' | python3 -m json.tool +``` + +Expected: Status (6 options), Priority (5 options), Area (8 options), Target Release (2 options) all present. + +--- + +### Task 6: Add Issues to Board #57 + +Add all ShowStopper through P2 issues (~40 after dedup) to the project board and set their status to Triage. + +**Files:** None (API only) + +- [ ] **Step 1: Get issue node IDs for all Q2 issues** + +We need the GraphQL node IDs for each issue to add them to the project. Batch-fetch them: + +```bash +TOKEN=$(gh auth token 2>&1) + +# All issue numbers to add to board (ShowStopper + P0 + P1 + P2) +ISSUES="84 8 4 86 232 183 138 6 5 9 255 269 237 219 221 202 199 222 220 182 177 176 113 210 268 10 7 254 217 179 178 173 224 208 158 125 115 79 73 68 58 213 133 174 229 228 227 212" + +for issue in $ISSUES; do + NODE_ID=$(curl -s -H "Authorization: token $TOKEN" \ + "https://api.github.com/repos/mlcommons/endpoints/issues/$issue" | \ + python3 -c 'import sys,json; print(json.load(sys.stdin)["node_id"])') + echo "$issue $NODE_ID" +done +``` + +Expected: A list of issue numbers and their node IDs. Save this output — you'll need it for Step 2. + +- [ ] **Step 2: Add each issue to the project** + +For each issue, use the `addProjectV2ItemById` mutation. Process in batches to avoid rate limiting: + +```bash +TOKEN=$(gh auth token 2>&1) +PROJECT_ID="PVT_kwDOBAnwDc4BTQvY" + +# Use the node IDs from Step 1. Example for one issue: +# curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ +# -d '{"query":"mutation { addProjectV2ItemById(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", contentId: \"NODE_ID_HERE\"}) { item { id } } }"}' + +# Batch all issues: +ISSUES="84 8 4 86 232 183 138 6 5 9 255 269 237 219 221 202 199 222 220 182 177 176 113 210 268 10 7 254 217 179 178 173 224 208 158 125 115 79 73 68 58 213 133 174 229 228 227 212" + +for issue in $ISSUES; do + NODE_ID=$(curl -s -H "Authorization: token $TOKEN" \ + "https://api.github.com/repos/mlcommons/endpoints/issues/$issue" | \ + python3 -c 'import sys,json; print(json.load(sys.stdin)["node_id"])') + + ITEM_ID=$(curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ + -d "{\"query\":\"mutation { addProjectV2ItemById(input: {projectId: \\\"$PROJECT_ID\\\", contentId: \\\"$NODE_ID\\\"}) { item { id } } }\"}" | \ + python3 -c 'import sys,json; print(json.load(sys.stdin)["data"]["addProjectV2ItemById"]["item"]["id"])') + + echo "#$issue added: $ITEM_ID" + sleep 0.5 # Rate limit courtesy +done +``` + +Expected: Each issue prints its project item ID. All ~47 issues added. + +- [ ] **Step 3: Set all items to Triage status** + +After adding items, set their Status field to "Triage". You need the Status field ID and the "Triage" option ID from Task 5 Step 1/2. + +```bash +TOKEN=$(gh auth token 2>&1) +PROJECT_ID="PVT_kwDOBAnwDc4BTQvY" +STATUS_FIELD_ID="" +TRIAGE_OPTION_ID="" + +# For each item added in Step 2, set status to Triage +# Use the item IDs printed in Step 2 +for ITEM_ID in ; do + curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ + -d "{\"query\":\"mutation { updateProjectV2ItemFieldValue(input: {projectId: \\\"$PROJECT_ID\\\", itemId: \\\"$ITEM_ID\\\", fieldId: \\\"$STATUS_FIELD_ID\\\", value: {singleSelectOptionId: \\\"$TRIAGE_OPTION_ID\\\"}}) { projectV2Item { id } } }\"}" | \ + python3 -c 'import sys,json; d=json.load(sys.stdin); print(f"Set triage: {d}")' + sleep 0.3 +done +``` + +Expected: All items set to Triage status. + +- [ ] **Step 4: Verify board population** + +```bash +TOKEN=$(gh auth token 2>&1) +curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ + -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { items(first: 100) { totalCount nodes { content { ... on Issue { number title } } } } } } }"}' | \ + python3 -c " +import sys, json +data = json.load(sys.stdin) +items = data['data']['node']['items'] +print(f'Total items on board: {items[\"totalCount\"]}') +for item in items['nodes']: + c = item['content'] + print(f' #{c[\"number\"]} {c[\"title\"]}') +" +``` + +Expected: ~47 issues listed on the board. + +--- + +### Task 7: Create Board Views + +Create the 4 views on the project board. The default view already exists (rename to Kanban); create 3 additional views. + +**Files:** None (API only) + +- [ ] **Step 1: List existing views** + +```bash +TOKEN=$(gh auth token 2>&1) +curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ + -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { views(first: 10) { nodes { id name number layout } } } } }"}' | python3 -m json.tool +``` + +Expected: At least one default view. Record its ID. + +- [ ] **Step 2: Update default view to Kanban board layout** + +```bash +TOKEN=$(gh auth token 2>&1) +DEFAULT_VIEW_ID="" + +curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ + -d "{\"query\":\"mutation { updateProjectV2View(input: {projectId: \\\"PVT_kwDOBAnwDc4BTQvY\\\", viewId: \\\"$DEFAULT_VIEW_ID\\\", name: \\\"Kanban\\\", layout: BOARD_LAYOUT}) { projectV2View { id name layout } } }\"}" | python3 -m json.tool +``` + +Expected: Default view renamed to "Kanban" with BOARD_LAYOUT. + +- [ ] **Step 3: Create Priority Table view** + +```bash +TOKEN=$(gh auth token 2>&1) +curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ + -d '{"query":"mutation { createProjectV2View(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", name: \"Priority Table\", layout: TABLE_LAYOUT}) { projectV2View { id name layout } } }"}' | python3 -m json.tool +``` + +Expected: New "Priority Table" view created with TABLE_LAYOUT. + +- [ ] **Step 4: Create By Assignee view** + +```bash +TOKEN=$(gh auth token 2>&1) +curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ + -d '{"query":"mutation { createProjectV2View(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", name: \"By Assignee\", layout: TABLE_LAYOUT}) { projectV2View { id name layout } } }"}' | python3 -m json.tool +``` + +Expected: New "By Assignee" view created. + +- [ ] **Step 5: Create Stale Issues view** + +```bash +TOKEN=$(gh auth token 2>&1) +curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ + -d '{"query":"mutation { createProjectV2View(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", name: \"Stale Issues\", layout: TABLE_LAYOUT}) { projectV2View { id name layout } } }"}' | python3 -m json.tool +``` + +Expected: New "Stale Issues" view created. + +- [ ] **Step 6: Verify all 4 views exist** + +```bash +TOKEN=$(gh auth token 2>&1) +curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ + -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { views(first: 10) { nodes { id name number layout } } } } }"}' | python3 -m json.tool +``` + +Expected: 4 views — Kanban (BOARD_LAYOUT), Priority Table (TABLE_LAYOUT), By Assignee (TABLE_LAYOUT), Stale Issues (TABLE_LAYOUT). + +**NOTE:** View-level sorting, grouping, and filtering must be configured manually in the GitHub web UI after views are created. The GraphQL API supports creating views and setting layout, but fine-grained sort/group/filter configuration is not fully exposed via API. After this task, open https://github.com/orgs/mlcommons/projects/57 and configure: +- Kanban: Group by Priority +- Priority Table: Sort by Priority field ascending +- By Assignee: Group by Assignee +- Stale Issues: Sort by Updated ascending, filter to items not updated in 30+ days + +--- + +### Task 8: Create Issue Templates + +Write the 4 YAML issue form templates and the config file to the local repo. + +**Files:** +- Create: `.github/ISSUE_TEMPLATE/100-bug-report.yml` +- Create: `.github/ISSUE_TEMPLATE/200-feature-request.yml` +- Create: `.github/ISSUE_TEMPLATE/300-performance.yml` +- Create: `.github/ISSUE_TEMPLATE/400-dataset-integration.yml` +- Create: `.github/ISSUE_TEMPLATE/config.yml` + +- [ ] **Step 1: Create the ISSUE_TEMPLATE directory** + +```bash +mkdir -p .github/ISSUE_TEMPLATE +``` + +- [ ] **Step 2: Write 100-bug-report.yml** + +Write to `.github/ISSUE_TEMPLATE/100-bug-report.yml` with the exact content from the design spec Section 3, `100-bug-report.yml`. + +- [ ] **Step 3: Write 200-feature-request.yml** + +Write to `.github/ISSUE_TEMPLATE/200-feature-request.yml` with the exact content from the design spec Section 3, `200-feature-request.yml`. + +- [ ] **Step 4: Write 300-performance.yml** + +Write to `.github/ISSUE_TEMPLATE/300-performance.yml` with the exact content from the design spec Section 3, `300-performance.yml`. + +- [ ] **Step 5: Write 400-dataset-integration.yml** + +Write to `.github/ISSUE_TEMPLATE/400-dataset-integration.yml` with the exact content from the design spec Section 3, `400-dataset-integration.yml`. + +- [ ] **Step 6: Write config.yml** + +Write to `.github/ISSUE_TEMPLATE/config.yml`: + +```yaml +blank_issues_enabled: true +contact_links: + - name: Questions & Discussion + url: https://github.com/mlcommons/endpoints/discussions + about: Ask questions and discuss ideas before filing an issue +``` + +- [ ] **Step 7: Verify all template files exist** + +```bash +ls -la .github/ISSUE_TEMPLATE/ +``` + +Expected: 5 files — `100-bug-report.yml`, `200-feature-request.yml`, `300-performance.yml`, `400-dataset-integration.yml`, `config.yml`. + +- [ ] **Step 8: Commit issue templates** + +```bash +git add .github/ISSUE_TEMPLATE/ +git commit -m "chore: add issue templates (bug, feature, performance, dataset) + +Co-Authored-By: Claude Opus 4.6 (1M context) " +``` + +--- + +### Task 9: Update CONTRIBUTING.md + +Replace the existing 10-line CONTRIBUTING.md with the expanded ~250-line version. + +**Files:** +- Modify: `CONTRIBUTING.md` (full rewrite) + +- [ ] **Step 1: Write the new CONTRIBUTING.md** + +Write the full CONTRIBUTING.md content as designed in Section 4 of the spec. The full text was presented during brainstorming and approved. It includes these sections: + +1. Welcome and Table of Contents +2. Ways to Contribute (links to all 4 issue templates) +3. Development Setup (prerequisites, fork/clone, venv, pip install, pre-commit, echo server) +4. Code Style and Conventions (ruff, mypy, line length 88, conventional commits, serialization, performance-sensitive code) +5. Testing (pytest commands, markers, async mode, coverage, fixtures) +6. Submitting Changes (branch naming, PR process, review criteria) +7. Issue Guidelines (templates, lifecycle, priority levels table) +8. MLCommons CLA (existing CLA requirements preserved) +9. Questions section + +- [ ] **Step 2: Commit CONTRIBUTING.md** + +```bash +git add CONTRIBUTING.md +git commit -m "docs: expand CONTRIBUTING.md with development guide, testing, and issue guidelines + +Co-Authored-By: Claude Opus 4.6 (1M context) " +``` + +--- + +### Task 10: Link Open PRs to Issues + +Add comments on open PRs that implement issues different from their own number, creating explicit linkage. + +**Files:** None (API only) + +- [ ] **Step 1: Link PRs to their corresponding issues** + +Only PRs where the PR number differs from the issue it implements need explicit linking: + +```bash +TOKEN=$(gh auth token 2>&1) +REPO="mlcommons/endpoints" + +# PR #226 implements issue #232 (multi-turn) +curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/226/comments" \ + -d '{"body":"Relates to #232 (multi-turn implementation). This PR provides the initial multi-turn enabling work tracked by #232."}' | python3 -c 'import sys,json; print(f"PR #226 linked to #232: {json.load(sys.stdin).get(\"id\",\"error\")}")' + +# PR #207 implements issue #208 (report generation optimization) +curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/207/comments" \ + -d '{"body":"Relates to #208 (optimize report generation). This PR implements parallel tokenization as one approach to #208."}' | python3 -c 'import sys,json; print(f"PR #207 linked to #208: {json.load(sys.stdin).get(\"id\",\"error\")}")' + +# PR #170 implements issue #86 (warmup runs) +curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/170/comments" \ + -d '{"body":"Relates to #86 (Warmup runs). This PR implements warmup with random dataset as part of #86."}' | python3 -c 'import sys,json; print(f"PR #170 linked to #86: {json.load(sys.stdin).get(\"id\",\"error\")}")' + +# PR #205 relates to issue #255 (Make Loadgen Async) +curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/issues/205/comments" \ + -d '{"body":"Relates to #255 (Make Loadgen Async). Both this PR and #255 target the same async benchmark goal."}' | python3 -c 'import sys,json; print(f"PR #205 linked to #255: {json.load(sys.stdin).get(\"id\",\"error\")}")' +``` + +Expected: 4 comments posted linking PRs to their primary issues. + +--- + +### Task 11: Push and Create PR + +Push the local commits (issue templates + CONTRIBUTING.md) as a PR to the repository. + +**Files:** None (git operations) + +- [ ] **Step 1: Create a feature branch** + +```bash +git checkout -b chore/project-management-setup +``` + +- [ ] **Step 2: Cherry-pick the commits onto the branch** + +If you committed on main, reset main and cherry-pick onto the new branch. Otherwise if you're already on the branch, skip this. + +- [ ] **Step 3: Push to remote** + +```bash +git push -u origin chore/project-management-setup +``` + +- [ ] **Step 4: Create the PR** + +```bash +TOKEN=$(gh auth token 2>&1) +curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/mlcommons/endpoints/pulls" \ + -d '{ + "title": "chore: add issue templates, expand CONTRIBUTING.md, and project management setup", + "body": "## Summary\n\n- Add 4 YAML issue form templates (bug report, feature request, performance issue, dataset integration)\n- Expand CONTRIBUTING.md with development setup, code style, testing, PR process, and issue guidelines\n- Part of the project management infrastructure setup (labels, board, and issue migration done via API)\n\n## Related\n\nDesign spec: docs/superpowers/specs/2026-04-07-project-management-design.md\n\n## Test plan\n\n- [ ] Verify issue templates render correctly on GitHub (New Issue page)\n- [ ] Verify CONTRIBUTING.md renders correctly\n- [ ] Verify all links in CONTRIBUTING.md work\n\n🤖 Generated with [Claude Code](https://claude.com/claude-code)", + "head": "chore/project-management-setup", + "base": "main" + }' | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f"PR created: {d.get(\"html_url\", d.get(\"message\", \"error\"))}")' +``` + +Expected: PR URL printed. + +--- + +### Task 12: Enable Board Automations + +Configure the built-in automations on project board #57 via the GitHub web UI. + +**Files:** None (manual UI configuration) + +**NOTE:** GitHub Projects V2 built-in automations (auto-add, auto-archive, auto-set status on close) are not configurable via the GraphQL API. They must be enabled manually. + +- [ ] **Step 1: Open project settings** + +Navigate to: https://github.com/orgs/mlcommons/projects/57/settings + +- [ ] **Step 2: Enable "Auto-add" workflow** + +Under Workflows → Auto-add to project: +- Enable the workflow +- Filter: `is:issue is:open repo:mlcommons/endpoints` +- This ensures all new issues are automatically added to the board with Inbox status + +- [ ] **Step 3: Enable "Item closed" workflow** + +Under Workflows → Item closed: +- Enable the workflow +- Set status to: Done + +- [ ] **Step 4: Enable "Pull request merged" workflow** + +Under Workflows → Pull request merged: +- Enable the workflow +- Set status to: Done + +- [ ] **Step 5: Enable "Auto-archive items"** + +Under Workflows → Auto-archive items: +- Enable the workflow +- Archive items that have been Done for 14 days + +--- + +### Task 13: Configure Board Views in UI + +Fine-tune the sort, group, and filter settings for each view in the GitHub web UI. + +**Files:** None (manual UI configuration) + +- [ ] **Step 1: Configure Kanban view** + +Open: https://github.com/orgs/mlcommons/projects/57/views/1 +- Set layout to Board (should already be set) +- Column field: Status +- Group by: Priority (ShowStopper at top) +- Filter: `status:Inbox,Triage,Ready,"In Progress","In Review"` + +- [ ] **Step 2: Configure Priority Table view** + +Open the Priority Table view +- Sort by: Priority ascending (ShowStopper first) +- Show columns: Title, Priority, Area, Status, Assignee, Target Release +- Filter: exclude Done items + +- [ ] **Step 3: Configure By Assignee view** + +Open the By Assignee view +- Group by: Assignee +- Sort by: Priority ascending within each group +- Show columns: Title, Priority, Area, Status + +- [ ] **Step 4: Configure Stale Issues view** + +Open the Stale Issues view +- Sort by: Updated date ascending (oldest first) +- Show columns: Title, Priority, Area, Status, Assignee, Updated +- Filter: exclude Done, show only items not updated in 30+ days From b939e9c20c553bdf280c40656afb697324c2c590 Mon Sep 17 00:00:00 2001 From: Zhihan Jiang Date: Tue, 7 Apr 2026 15:08:39 -0700 Subject: [PATCH 04/14] chore: add issue templates (bug, feature, performance, dataset) Four YAML-based issue form templates: - 100-bug-report.yml: structured bug reporting - 200-feature-request.yml: feature proposals with motivation - 300-performance.yml: performance regressions with benchmark data - 400-dataset-integration.yml: new dataset/eval requests - config.yml: template chooser with Discussions link Co-Authored-By: Claude Opus 4.6 (1M context) --- .github/ISSUE_TEMPLATE/100-bug-report.yml | 43 ++++++++++++++ .../ISSUE_TEMPLATE/200-feature-request.yml | 27 +++++++++ .github/ISSUE_TEMPLATE/300-performance.yml | 59 +++++++++++++++++++ .../400-dataset-integration.yml | 48 +++++++++++++++ .github/ISSUE_TEMPLATE/config.yml | 5 ++ 5 files changed, 182 insertions(+) create mode 100644 .github/ISSUE_TEMPLATE/100-bug-report.yml create mode 100644 .github/ISSUE_TEMPLATE/200-feature-request.yml create mode 100644 .github/ISSUE_TEMPLATE/300-performance.yml create mode 100644 .github/ISSUE_TEMPLATE/400-dataset-integration.yml create mode 100644 .github/ISSUE_TEMPLATE/config.yml diff --git a/.github/ISSUE_TEMPLATE/100-bug-report.yml b/.github/ISSUE_TEMPLATE/100-bug-report.yml new file mode 100644 index 00000000..4cf5b586 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/100-bug-report.yml @@ -0,0 +1,43 @@ +name: Bug Report +description: Report a bug or unexpected behavior +title: "[Bug]: " +labels: ["type: bug", "status: needs-triage"] +body: + - type: textarea + id: description + attributes: + label: Bug Description + description: What happened vs. what you expected + placeholder: "When I run X, I expected Y but got Z" + validations: + required: true + - type: textarea + id: reproduction + attributes: + label: Steps to Reproduce + value: | + 1. + 2. + 3. + validations: + required: true + - type: textarea + id: environment + attributes: + label: Environment + description: OS, Python version, package version + placeholder: "OS: Ubuntu 22.04, Python 3.12, inference-endpoint v0.1.0" + validations: + required: true + - type: textarea + id: logs + attributes: + label: Relevant Logs + render: shell + - type: checkboxes + id: checklist + attributes: + label: Before submitting + options: + - label: I searched existing issues and found no duplicates + required: true diff --git a/.github/ISSUE_TEMPLATE/200-feature-request.yml b/.github/ISSUE_TEMPLATE/200-feature-request.yml new file mode 100644 index 00000000..3aa7de25 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/200-feature-request.yml @@ -0,0 +1,27 @@ +name: Feature Request +description: Suggest a new feature or enhancement +title: "[Feature]: " +labels: ["type: feature", "status: needs-triage"] +body: + - type: textarea + id: motivation + attributes: + label: Motivation + description: What problem does this solve? Why do you need it? + validations: + required: true + - type: textarea + id: proposal + attributes: + label: Proposed Solution + description: How should this work? Include API sketches if relevant. + validations: + required: true + - type: textarea + id: alternatives + attributes: + label: Alternatives Considered + - type: textarea + id: context + attributes: + label: Additional Context diff --git a/.github/ISSUE_TEMPLATE/300-performance.yml b/.github/ISSUE_TEMPLATE/300-performance.yml new file mode 100644 index 00000000..d2aa9007 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/300-performance.yml @@ -0,0 +1,59 @@ +name: Performance Issue +description: Report a performance regression or improvement opportunity +title: "[Perf]: " +labels: ["type: performance", "status: needs-triage"] +body: + - type: textarea + id: description + attributes: + label: Description + description: What performance issue did you observe? + placeholder: "QPS dropped from X to Y after upgrading to version Z" + validations: + required: true + - type: textarea + id: benchmark + attributes: + label: Benchmark Command + description: The exact command you ran + render: shell + validations: + required: true + - type: textarea + id: results + attributes: + label: Results + description: Expected vs actual numbers (QPS, latency, TTFT, TPOT, etc.) + placeholder: | + Expected: ~5000 QPS, p99 latency < 200ms + Actual: ~2000 QPS, p99 latency 800ms + validations: + required: true + - type: textarea + id: environment + attributes: + label: Environment + description: Hardware, OS, Python version, endpoint server details + placeholder: | + Hardware: 8x A100 80GB + OS: Ubuntu 22.04 + Python: 3.12 + Server: vLLM 0.6.0, Llama-3-70B + Workers: 4 + validations: + required: true + - type: textarea + id: profiling + attributes: + label: Profiling Data (optional) + description: Any profiling output, flame graphs, or bottleneck analysis + render: shell + - type: checkboxes + id: checklist + attributes: + label: Before submitting + options: + - label: I searched existing issues and found no duplicates + required: true + - label: I ran with default settings before tuning + required: false diff --git a/.github/ISSUE_TEMPLATE/400-dataset-integration.yml b/.github/ISSUE_TEMPLATE/400-dataset-integration.yml new file mode 100644 index 00000000..67c6673f --- /dev/null +++ b/.github/ISSUE_TEMPLATE/400-dataset-integration.yml @@ -0,0 +1,48 @@ +name: Dataset Integration +description: Request support for a new dataset or evaluation benchmark +title: "[Dataset]: " +labels: ["type: feature", "area: dataset", "status: needs-triage"] +body: + - type: textarea + id: dataset + attributes: + label: Dataset Information + description: Name, URL, and brief description + placeholder: | + Name: MATH-500 + URL: https://huggingface.co/datasets/... + Description: 500 competition math problems for testing reasoning + validations: + required: true + - type: dropdown + id: format + attributes: + label: Dataset Format + options: + - JSONL + - HuggingFace Dataset + - CSV + - JSON + - Parquet + - Other + validations: + required: true + - type: textarea + id: evaluation + attributes: + label: Evaluation Method + description: How should responses be scored? + placeholder: "Exact match after extracting boxed answer, or pass@1 for code" + validations: + required: true + - type: textarea + id: samples + attributes: + label: Scale + description: Number of samples, expected prompt/response lengths + placeholder: "500 samples, avg prompt ~200 tokens, avg response ~500 tokens" + - type: textarea + id: context + attributes: + label: Additional Context + description: Related benchmarks, papers, or prior art diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml new file mode 100644 index 00000000..4ac37a65 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/config.yml @@ -0,0 +1,5 @@ +blank_issues_enabled: true +contact_links: + - name: Questions & Discussion + url: https://github.com/mlcommons/endpoints/discussions + about: Ask questions and discuss ideas before filing an issue From 202dbc2e02ae9e0747102166baa926f0792a1a99 Mon Sep 17 00:00:00 2001 From: Zhihan Jiang Date: Tue, 7 Apr 2026 15:08:45 -0700 Subject: [PATCH 05/14] docs: expand CONTRIBUTING.md with development guide, testing, and issue guidelines Replace minimal 10-line CONTRIBUTING.md with comprehensive guide covering: - Ways to contribute with links to issue templates - Development setup (venv, pip install, pre-commit, echo server) - Code style (ruff, mypy, conventional commits, serialization) - Testing (pytest markers, async mode, coverage, fixtures) - PR process and review expectations - Issue lifecycle and priority levels - MLCommons CLA requirements Co-Authored-By: Claude Opus 4.6 (1M context) --- CONTRIBUTING.md | 214 ++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 208 insertions(+), 6 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 8de1bbe9..8b264dcc 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,11 +1,213 @@ -## Contributing +# Contributing to MLPerf Inference Endpoints -The best way to contribute to the MLCommons is to get involved with one of our many project communities. You can find more information about getting involved with MLCommons [here](https://mlcommons.org/community/). +Welcome! We're glad you're interested in contributing. This project is part of +[MLCommons](https://mlcommons.org/) and aims to build a high-performance +benchmarking tool for LLM inference endpoints targeting 50k+ QPS. -Generally we encourage people to become MLCommons members if they wish to contribute to MLCommons projects, but outside pull requests are very welcome too. +## Table of Contents -Regardless of whether you are a member, your organization (or you as an individual contributor) needs to sign the MLCommons Contributor License Agreement (CLA). Please submit your GitHub username to the [MLCommons Subscription form](https://mlcommons.org/community/subscribe/) to start that process. +- [Ways to Contribute](#ways-to-contribute) +- [Development Setup](#development-setup) +- [Code Style and Conventions](#code-style-and-conventions) +- [Testing](#testing) +- [Submitting Changes](#submitting-changes) +- [Issue Guidelines](#issue-guidelines) +- [MLCommons CLA](#mlcommons-cla) -MLCommons project work is tracked with issue trackers and pull requests. Modify the project in your own fork and issue a pull request once you want other developers to take a look at what you have done and discuss the proposed changes. Ensure that cla-bot and other checks pass for your pull requests. +## Ways to Contribute -For project-specific development standards (code style, test requirements, pre-commit hooks, commit format), see the [Development Guide](docs/DEVELOPMENT.md). +- **Report bugs** — use the [Bug Report](https://github.com/mlcommons/endpoints/issues/new?template=100-bug-report.yml) template +- **Request features** — use the [Feature Request](https://github.com/mlcommons/endpoints/issues/new?template=200-feature-request.yml) template +- **Report performance issues** — use the [Performance Issue](https://github.com/mlcommons/endpoints/issues/new?template=300-performance.yml) template +- **Request dataset support** — use the [Dataset Integration](https://github.com/mlcommons/endpoints/issues/new?template=400-dataset-integration.yml) template +- **Improve documentation** — fix typos, clarify guides, add examples +- **Pick up an issue** — look for [`good first issue`](https://github.com/mlcommons/endpoints/labels/good%20first%20issue) or [`help wanted`](https://github.com/mlcommons/endpoints/labels/help%20wanted) +- **Review PRs** — thoughtful reviews are as valuable as code + +## Development Setup + +### Prerequisites + +- Python 3.12+ (3.12 recommended) +- Git +- A Unix-like OS (Linux or macOS) + +### Getting Started + +```bash +# Fork and clone +git clone https://github.com//endpoints.git +cd endpoints + +# Create virtual environment +python3.12 -m venv venv +source venv/bin/activate + +# Install with dev and test extras +pip install -e ".[dev,test]" + +# Install pre-commit hooks +pre-commit install + +# Verify your setup +pytest -m unit -x --timeout=60 +``` + +### Local Testing with Echo Server + +```bash +# Start a local echo server +python -m inference_endpoint.testing.echo_server --port 8765 + +# Run a quick probe +inference-endpoint probe --endpoints http://localhost:8765 --model test-model +``` + +## Code Style and Conventions + +### Formatting and Linting + +We use [ruff](https://docs.astral.sh/ruff/) for formatting and linting, and +[mypy](https://mypy-lang.org/) for type checking. Pre-commit hooks enforce +these automatically. + +```bash +# Run all checks manually +pre-commit run --all-files +``` + +### Key Conventions + +- **Line length:** 88 characters +- **Quotes:** Double quotes +- **License headers:** Required on all Python files (auto-added by pre-commit) +- **Commit messages:** [Conventional commits](https://www.conventionalcommits.org/) — `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `perf:` +- **Comments:** Only where the *why* isn't obvious from the code. No over-documenting. + +### Serialization + +- **Hot-path data** (Query, QueryResult, StreamChunk): `msgspec.Struct` — encode/decode with `msgspec.json`, not stdlib json +- **Configuration**: `pydantic.BaseModel` for validation +- **Do not** use `dataclass` where neighboring types use `msgspec` + +### Performance-Sensitive Code + +Code in `load_generator/`, `endpoint_client/worker.py`, and `async_utils/transport/` +is latency-critical. In these paths: + +- No `match` statements — use dict dispatch +- Minimize async suspends +- No pydantic validation or excessive logging +- Use `msgspec` over `json`/`pydantic` for serialization + +## Testing + +### Running Tests + +```bash +# All tests (excludes slow/performance) +pytest + +# Unit tests only +pytest -m unit + +# Integration tests +pytest -m integration + +# Single file +pytest -xvs tests/unit/path/to/test_file.py + +# With coverage +pytest --cov=src --cov-report=html +``` + +### Test Markers + +Every test function **must** have a marker: + +```python +@pytest.mark.unit +@pytest.mark.asyncio(mode="strict") # for async tests — must use strict mode +async def test_something(): + ... +``` + +Available markers: `unit`, `integration`, `slow`, `performance`, `run_explicitly` + +### Coverage + +Target **>90% coverage** for all new code. Use existing fixtures from +`tests/conftest.py` (e.g., `mock_http_echo_server`, `mock_http_oracle_server`, +`dummy_dataset`) rather than mocking. + +## Submitting Changes + +### Branch Naming + +``` +feat/short-description +fix/short-description +docs/short-description +``` + +### Pull Request Process + +1. **Create a focused PR** — one logical change per PR +2. **Fill out the PR template** — describe what, why, and how to test +3. **Ensure CI passes** — `pre-commit run --all-files` and `pytest -m unit` locally before pushing +4. **Link related issues** — use `Closes #123` or `Relates to #123` +5. **Expect review within 2-3 business days** — reviewers are auto-assigned based on changed files + +### What We Look For in Reviews + +- Does it follow existing patterns in the codebase? +- Are tests included and meaningful (not mock-heavy)? +- Is it focused — no unrelated refactoring or over-engineering? +- Does it avoid adding unnecessary dependencies? + +### After Review + +- Address feedback with new commits (don't force-push during review) +- Once approved, a maintainer will merge + +## Issue Guidelines + +### Before Filing + +1. Search [existing issues](https://github.com/mlcommons/endpoints/issues) for duplicates +2. Use the appropriate issue template +3. Provide enough detail to reproduce or understand the request + +### Issue Lifecycle + +New issues are auto-added to our [project board](https://github.com/orgs/mlcommons/projects/57) +and flow through: **Inbox → Triage → Ready → In Progress → In Review → Done** + +### Priority Levels + +| Priority | Meaning | +|----------|---------| +| **ShowStopper** | Drop everything — critical blocker | +| **P0** | Blocks release or users | +| **P1** | Must address this cycle | +| **P2** | Address within quarter | +| **P3** | Backlog, nice to have | + +## MLCommons CLA + +All contributors must sign the +[MLCommons Contributor License Agreement](https://mlcommons.org/membership/membership-overview/). +A CLA bot will check your PR automatically. + +To sign up: +1. Visit the [MLCommons Subscription form](https://mlcommons.org/membership/membership-overview/) +2. Submit your GitHub username +3. The CLA bot will verify on your next PR + +Pull requests from non-members are welcome — you'll be prompted to sign the CLA +during the PR process. + +## Questions? + +Open a [Discussion](https://github.com/mlcommons/endpoints/discussions) or +file an issue. We aim to respond within a few business days. From 971650038c6bc15c33c798ff1cfe070d1c1ac53e Mon Sep 17 00:00:00 2001 From: Zhihan Jiang Date: Tue, 7 Apr 2026 15:55:51 -0700 Subject: [PATCH 06/14] feat: add GitHub Action to sync labels to project board fields One-way sync: when priority: or area: labels change on an issue, the corresponding board custom field is automatically updated. Labels are the single source of truth. Co-Authored-By: Claude Opus 4.6 (1M context) --- .github/workflows/sync-labels-to-board.yml | 150 +++++++++++++++++++++ 1 file changed, 150 insertions(+) create mode 100644 .github/workflows/sync-labels-to-board.yml diff --git a/.github/workflows/sync-labels-to-board.yml b/.github/workflows/sync-labels-to-board.yml new file mode 100644 index 00000000..8a3eaf83 --- /dev/null +++ b/.github/workflows/sync-labels-to-board.yml @@ -0,0 +1,150 @@ +name: Sync Labels to Project Board + +on: + issues: + types: [labeled, unlabeled] + +env: + PROJECT_ID: "PVT_kwDOBAnwDc4BTQvY" + # These IDs are populated from the board's GraphQL field configuration. + # To find them: query the board fields via GraphQL and extract option IDs. + PRIORITY_FIELD_ID: "PVTSSF_lADOBAnwDc4BTQvYzhBKk68" + AREA_FIELD_ID: "PVTSSF_lADOBAnwDc4BTQvYzhBKk7A" + +jobs: + sync-labels: + runs-on: ubuntu-latest + steps: + - name: Sync priority and area labels to board fields + uses: actions/github-script@v7 + with: + script: | + const issue = context.payload.issue; + const labels = issue.labels.map(l => l.name); + + // --- Field and option ID mappings --- + // Priority field + const PRIORITY_FIELD_ID = process.env.PRIORITY_FIELD_ID; + const PRIORITY_MAP = { + 'priority: ShowStopper': process.env.SHOWSTOPPER_OPTION_ID, + 'priority: P0': process.env.P0_OPTION_ID, + 'priority: P1': process.env.P1_OPTION_ID, + 'priority: P2': process.env.P2_OPTION_ID, + 'priority: P3': process.env.P3_OPTION_ID, + }; + + // Area field + const AREA_FIELD_ID = process.env.AREA_FIELD_ID; + const AREA_MAP = { + 'area: core-engine': process.env.CORE_ENGINE_OPTION_ID, + 'area: client': process.env.CLIENT_OPTION_ID, + 'area: metrics': process.env.METRICS_OPTION_ID, + 'area: dataset': process.env.DATASET_OPTION_ID, + 'area: config-cli': process.env.CONFIG_CLI_OPTION_ID, + 'area: evaluation': process.env.EVALUATION_OPTION_ID, + 'area: adapters': process.env.ADAPTERS_OPTION_ID, + 'area: mlcommons': process.env.MLCOMMONS_OPTION_ID, + }; + + const PROJECT_ID = process.env.PROJECT_ID; + + // Find the board item for this issue + const findItemQuery = ` + query($projectId: ID!, $cursor: String) { + node(id: $projectId) { + ... on ProjectV2 { + items(first: 100, after: $cursor) { + nodes { + id + content { + ... on Issue { number } + } + } + pageInfo { hasNextPage endCursor } + } + } + } + } + `; + + let itemId = null; + let cursor = null; + while (!itemId) { + const result = await github.graphql(findItemQuery, { + projectId: PROJECT_ID, + cursor: cursor, + }); + const items = result.node.items; + const match = items.nodes.find( + n => n.content && n.content.number === issue.number + ); + if (match) { + itemId = match.id; + break; + } + if (!items.pageInfo.hasNextPage) break; + cursor = items.pageInfo.endCursor; + } + + if (!itemId) { + core.info(`Issue #${issue.number} not found on board, skipping.`); + return; + } + + // Helper to update a single-select field + async function setField(fieldId, optionId) { + if (!optionId) { + // Clear the field + await github.graphql(` + mutation($projectId: ID!, $itemId: ID!, $fieldId: ID!) { + clearProjectV2ItemFieldValue(input: { + projectId: $projectId, itemId: $itemId, fieldId: $fieldId + }) { projectV2Item { id } } + } + `, { projectId: PROJECT_ID, itemId, fieldId }); + } else { + await github.graphql(` + mutation($projectId: ID!, $itemId: ID!, $fieldId: ID!, $optionId: String!) { + updateProjectV2ItemFieldValue(input: { + projectId: $projectId, itemId: $itemId, fieldId: $fieldId, + value: { singleSelectOptionId: $optionId } + }) { projectV2Item { id } } + } + `, { projectId: PROJECT_ID, itemId, fieldId, optionId }); + } + } + + // Sync priority: find the highest-priority label on the issue + const priorityOrder = [ + 'priority: ShowStopper', + 'priority: P0', + 'priority: P1', + 'priority: P2', + 'priority: P3', + ]; + const activePriority = priorityOrder.find(p => labels.includes(p)); + const priorityOptionId = activePriority ? PRIORITY_MAP[activePriority] : null; + await setField(PRIORITY_FIELD_ID, priorityOptionId); + core.info(`Priority set to: ${activePriority || '(cleared)'}`); + + // Sync area: use the first area label found + const activeArea = labels.find(l => l.startsWith('area: ')); + const areaOptionId = activeArea ? AREA_MAP[activeArea] : null; + await setField(AREA_FIELD_ID, areaOptionId); + core.info(`Area set to: ${activeArea || '(cleared)'}`); + env: + PRIORITY_FIELD_ID: ${{ env.PRIORITY_FIELD_ID }} + AREA_FIELD_ID: ${{ env.AREA_FIELD_ID }} + SHOWSTOPPER_OPTION_ID: "26ab336c" + P0_OPTION_ID: "d3612dd9" + P1_OPTION_ID: "7ff45c96" + P2_OPTION_ID: "e41b2ee9" + P3_OPTION_ID: "d4d24170" + CORE_ENGINE_OPTION_ID: "db5c9511" + CLIENT_OPTION_ID: "ffeff676" + METRICS_OPTION_ID: "04637e5a" + DATASET_OPTION_ID: "b493fd0d" + CONFIG_CLI_OPTION_ID: "ae1f5588" + EVALUATION_OPTION_ID: "96e592b6" + ADAPTERS_OPTION_ID: "6c615274" + MLCOMMONS_OPTION_ID: "d5eff045" From 542466d8f5a9aab8c5850132f47b464d69068ae8 Mon Sep 17 00:00:00 2001 From: Zhihan Jiang Date: Wed, 8 Apr 2026 09:58:16 -0700 Subject: [PATCH 07/14] chore: clean up repo structure and overhaul README MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Remove .cursor/rules/ (migrated to CLAUDE.md/AGENTS.md) - Remove docs/superpowers/ plans and specs (local-only artifacts) - Add .cursor/ and docs/superpowers/ to .gitignore - Overhaul README.md: remove emojis, remove inline contributor list (use git log/ATTRIBUTION instead), align architecture section with AGENTS.md, add badges, streamline to match OSS best practices - Contributors section removed — credit lives in git history and ATTRIBUTION file Co-Authored-By: Claude Opus 4.6 (1M context) --- .cursor/rules/endpoint-rules.mdc | 118 -- .cursor/rules/msgspec-patterns.mdc | 534 -------- .cursor/rules/python-antipatterns.mdc | 658 ---------- .gitignore | 9 +- README.md | 214 +--- .../plans/2026-04-07-project-management.md | 1092 ----------------- .../2026-04-07-project-management-design.md | 605 --------- 7 files changed, 71 insertions(+), 3159 deletions(-) delete mode 100644 .cursor/rules/endpoint-rules.mdc delete mode 100644 .cursor/rules/msgspec-patterns.mdc delete mode 100644 .cursor/rules/python-antipatterns.mdc delete mode 100644 docs/superpowers/plans/2026-04-07-project-management.md delete mode 100644 docs/superpowers/specs/2026-04-07-project-management-design.md diff --git a/.cursor/rules/endpoint-rules.mdc b/.cursor/rules/endpoint-rules.mdc deleted file mode 100644 index aff2d460..00000000 --- a/.cursor/rules/endpoint-rules.mdc +++ /dev/null @@ -1,118 +0,0 @@ ---- -description: -globs: -alwaysApply: true ---- -# Cursor Rules for Python Project Development - -## Core Development Principles - -### 1. Planning-First Development -- **Strict Separation**: Implementation MUST NOT begin until planning for the current step is complete -- All architectural decisions, component interfaces, and implementation approaches must be documented before coding -- Each development cycle follows: Plan ? Review Plan ? Implement ? Update Documentation - -### 2. Testing Requirements -- **Mandatory Unit Tests**: Every new component that requires testing MUST have corresponding unit tests -- **Pre-commit Validation**: All unit tests and pre-commit checks MUST pass before pushing to main repository -- **No Exceptions**: Failed tests or checks block all commits until resolved - -### 3. Scratchpad Documentation System -All planning and tracking must be maintained in `.cursor_artifacts/` directory. - -#### Required Files: -- `.cursor_artifacts/hierarchy.md` - Project folder structure, module organization, and architectural overview -- `.cursor_artifacts/progress.md` - Current status, completed tasks, next steps, and milestone tracking -- `.cursor_artifacts/learning.md` - Technical insights, lessons learned, design decisions, and gotchas -- `.cursor_artifacts/design.md` - System design, component interfaces, data models, and API specifications -- `.cursor_artifacts/testing-strategy.md` - Test plans, coverage requirements, and testing approaches -- `.cursor_artifacts/deployment.md` - Deployment procedures, environment configs, and release notes -- `.cursor_artifacts/refactoring-log.md` - Planned and completed refactoring activities with justifications, keep empty if there's no major refactoring - -#### File Management: -- **Size Limit**: Each scratchpad file MUST NOT exceed 1000 lines -- **Regular Maintenance**: Split large files into focused sub-documents when approaching limit -- **Consistent Updates**: Update relevant scratchpad files after each implementation phase - -### 4. Commit and Review Standards -- **Post-Implementation Updates**: Always update `.cursor_artifacts/` scratchpad files after each implementation -- **Small, Focused Changes**: Keep commits and reviews reasonably sized for effective review -- **Clear Commit Messages**: Use conventional commit format with clear descriptions -- **Documentation Sync**: Ensure documentation reflects current implementation state - -### 5. Python Best Practices -- Follow PEP 8 style guidelines and modern Python idioms -- Use type hints for all function signatures and complex variables -- Implement proper error handling with specific exception types -- Apply SOLID principles and clean code practices -- Use dataclasses, context managers, and pathlib where appropriate -- Follow async/await patterns for asynchronous code -- Implement proper logging instead of print statements - -### 6. Change Control and Approval -#### Automatic Approval (Small Changes): -- Bug fixes within existing functionality -- Adding unit tests -- Documentation updates -- Minor refactoring within single functions/methods -- Code formatting and style improvements - -#### User Approval Required (Significant Changes): -- **Major Refactoring**: Restructuring classes, modules, or architectural changes -- **API Changes**: Modifying public interfaces or breaking changes -- **Large Deletions**: Removing significant portions of existing code, documentation, or scratchpad content -- **New Dependencies**: Adding external libraries or changing build requirements -- **Database Schema Changes**: Migrations or structural data changes - -#### Approval Process: -1. Document proposed changes in appropriate `.cursor_artifacts/` file -2. Clearly outline impact, benefits, and risks -3. Request explicit user approval before implementation -4. Provide rollback plan for significant changes - -### 7. Comprehensive Testing Strategy -- **Test Coverage**: Aim for >90% code coverage for business logic -- **Test Types**: Unit tests, integration tests, and end-to-end tests as appropriate -- **Edge Cases**: Test boundary conditions, error scenarios, and edge cases -- **Test Documentation**: Clear test descriptions explaining what is being tested and why -- **Mock Strategy**: Use appropriate mocking for external dependencies -- **Performance Tests**: Include performance benchmarks for critical paths -- **Test Data**: Use factories or fixtures for consistent test data setup - -### 8. Additional Development Standards - -#### Code Quality: -- Use static analysis tools (pylint, mypy, black, isort) -- Implement pre-commit hooks for automated quality checks -- Regular code reviews focusing on maintainability and performance -- Document complex algorithms and business logic - -#### Version Control: -- Use feature branches for all development work -- Squash commits when merging to maintain clean history -- Tag releases with semantic versioning -- Maintain changelog with user-facing changes - -#### Security and Performance: -- Validate all user inputs and sanitize outputs -- Use secure coding practices (no hardcoded secrets, proper authentication) -- Profile performance-critical code sections -- Monitor and log security-relevant events - -#### Dependencies and Environment: -- Pin dependency versions in requirements files -- Use virtual environments for all development work -- Document environment setup and deployment procedures -- Regular dependency updates with testing - -## Enforcement -These rules are mandatory for all development work. Violations should be caught in pre-commit hooks, code review, or CI/CD pipeline. Any rule exceptions require explicit documentation and user approval. - -## Other user-defined rules -- Always double-check the validity of the output, never hallucinate and lie about things that you don't know about. -- Avoid refactoring the whole projects, and always ask for permission before doing a major refactor. -- Look for clues and never be lazy about validating the facts. -- Be diligent in checking if a component has already been implemented and can be reused. Avoid re-implementing wheels for parts that have already been built in the project. Double think if the reused components fit in the logic or not. If necessary, always use a single source of truth in the code repo (e.g. VERSION) instead of randomly hardcoding it everywhere in the code -- If the logic is incomplete in the code, add comment about it. Don't just assume the user will dig and find it out. -- Follow the best practice of whatever language you are writing in. For example in Python, don't put a lazy import unless carefully thought about. -- When running pytest, make sure you pipe the output either to commandline or some file, so you don't need to run it repetitively to grep a failed test. diff --git a/.cursor/rules/msgspec-patterns.mdc b/.cursor/rules/msgspec-patterns.mdc deleted file mode 100644 index fa637ea9..00000000 --- a/.cursor/rules/msgspec-patterns.mdc +++ /dev/null @@ -1,534 +0,0 @@ ---- -description: python performance critical code ; python msgspec usage guide -alwaysApply: false ---- -## 2. Use Structs for Structured Data - -**Rule:** Always prefer `msgspec.Struct` over `dict`, `dataclasses`, or `attrs` for structured data with a known schema. - -**Why:** Structs are 5-60x faster for common operations and are optimized for encoding/decoding. - -```python -# BAD: Using dict or dataclass -from dataclasses import dataclass - -@dataclass -class UserBad: - name: str - email: str - age: int - -# GOOD: Using msgspec. Struct -import msgspec - -class User(msgspec. Struct): - name: str - email: str - age: int - -# Usage -user = User(name="alice", email="alice@example.com", age=30) -data = msgspec.json.encode(user) -decoded = msgspec.json.decode(data, type=User) -``` - ---- - -## 3. Omit Default Values - -**Rule:** Set `omit_defaults=True` on Struct definitions when default values are known on both encoding and decoding ends. - -**Why:** Reduces encoded message size and improves both encoding and decoding performance. - -```python -# BAD: Encoding all fields including defaults -class ConfigBad(msgspec.Struct): - host: str = "localhost" - port: int = 8080 - debug: bool = False - timeout: int = 30 - -# GOOD: Omit default values -class Config(msgspec. Struct, omit_defaults=True): - host: str = "localhost" - port: int = 8080 - debug: bool = False - timeout: int = 30 - -# Only non-default values are encoded -config = Config(host="production.example.com") -data = msgspec.json.encode(config) -# Result: b'{"host":"production.example.com"}' instead of full object -``` - ---- - -## 4. Avoid Decoding Unused Fields - -**Rule:** Define smaller "view" Struct types that only contain the fields you actually need. - -**Why:** msgspec skips decoding fields not defined in your Struct, reducing allocations and CPU time. - -```python -# BAD: Decoding entire large object when you only need a few fields -class FullTweet(msgspec. Struct): - id: int - id_str: str - full_text: str - user: dict - entities: dict - extended_entities: dict - retweet_count: int - favorite_count: int - # ... many more fields - -# GOOD: Define minimal structs for your use case -class User(msgspec. Struct): - name: str - -class TweetView(msgspec.Struct): - user: User - full_text: str - favorite_count: int - -# Only these 3 fields are decoded, rest is skipped -tweet = msgspec.json.decode(large_json_response, type=TweetView) -print(tweet.user. name) # Access only what you need -``` - ---- - -## 5. Use encode_into for Buffer Reuse - -**Rule:** Compare and try-use `Encoder.encode_into()` with a pre-allocated `bytearray` in hot loops instead of `encode()`. - -**Why:** Avoids allocating a new `bytes` object for each encode operation. - -```python -# BAD: New bytes object allocated for each message -def send_messages_bad(socket, msgs): - encoder = msgspec.msgpack.Encoder() - for msg in msgs: - data = encoder.encode(msg) # New bytes object each time - socket. sendall(data) - -# POSSIBLY-GOOD ALWAYS MEASURE: Reuse a buffer -def send_messages_good(socket, msgs): - encoder = msgspec.msgpack.Encoder() - buffer = bytearray(1024) # Pre-allocate once - - for msg in msgs: - n = encoder.encode_into(msg, buffer) # Reuse buffer - socket.sendall(memoryview(buffer)[:n]) # Send only encoded bytes -``` - ---- - -## 6. Line-Delimited JSON (NDJSON) - -**Rule:** Compare and try use `encode_into()` with `buffer.extend()` for line-delimited JSON to avoid copies. - -**Why:** Avoids unnecessary copying when appending newlines to JSON messages. - -```python -# BAD: Unnecessary copy with string concatenation -def write_ndjson_bad(file, messages): - for msg in messages: - json_msg = msgspec. json.encode(msg) - full_payload = json_msg + b'\n' # Creates a copy - file. write(full_payload) - -# POSSIBLY-GOOD ALWAYS MEASURE: Zero-copy with encode_into -def write_ndjson_good(file, messages): - encoder = msgspec.json.Encoder() - buffer = bytearray(64) # Pre-allocate with reasonable size - - for msg in messages: - n = encoder.encode_into(msg, buffer) - file.write(memoryview(buffer)[:n]) # Write only encoded bytes - file.write(b"\n") -``` - ---- - -## 7. Length-Prefix Framing - -**Rule:** Use `encode_into()` with an offset for length-prefix framing. - -**Why:** Efficiently prepends message length without extra copies. - -```python -import msgspec - -def send_length_prefixed(socket, msg): - encoder = msgspec.msgpack.Encoder() - buffer = bytearray(64) - - # Encode into buffer, leaving 4 bytes at front for length prefix - n = encoder.encode_into(msg, buffer, 4) - - # Write message length as 4-byte big-endian integer at the start - buffer[:4] = n.to_bytes(4, "big") - - socket.sendall(memoryview(buffer)[:4 + n]) - -async def prefixed_send(stream, buffer: bytes) -> None: - """Write a length-prefixed buffer to an async stream""" - prefix = len(buffer).to_bytes(4, "big") - stream.write(prefix) - stream.write(buffer) - await stream.drain() - -async def prefixed_recv(stream) -> bytes: - """Read a length-prefixed buffer from an async stream""" - prefix = await stream.readexactly(4) - n = int.from_bytes(prefix, "big") - return await stream.readexactly(n) -``` - ---- - -## 8. Use MessagePack Instead of JSON - -**Rule:** Consider using `msgspec.msgpack` instead of `msgspec.json` for internal APIs. - -**Why:** MessagePack is a more compact binary format and can be more performant than JSON. - -```python -import msgspec - -class Event(msgspec. Struct): - type: str - data: dict - timestamp: float - -# Use MessagePack for internal service communication -encoder = msgspec.msgpack.Encoder() -decoder = msgspec.msgpack. Decoder(Event) - -event = Event(type="user_login", data={"user_id": 123}, timestamp=1703424000.0) -packed = encoder.encode(event) # More compact than JSON -decoded = decoder.decode(packed) -``` - ---- - -## 9. Use gc=False for Long-Lived Objects - -**Rule:** Set `gc=False` on Struct types that will never participate in reference cycles and are long-lived. - -**Why:** Reduces garbage collector overhead and pause times by up to 75x. - -### What is gc=False? - -The `gc=False` option tells Python's garbage collector to never track instances of that Struct type. -By default, Python's cyclic garbage collector tracks objects that could potentially participate in reference cycles. -When you set `gc=False`, you're telling msgspec: "I guarantee these objects will never be part of a reference cycle, so don't bother tracking them." - -### Performance Impact - -Key takeaways: -- `gc=False` reduces GC pause time by 75x compared to standard classes -- `gc=False` saves 16 bytes per instance (no GC header needed) -- Regular msgspec structs are already 6x faster for GC than standard classes - -### When to Use gc=False - -Use `gc=False` when: -- You're allocating a large number of Struct objects at once (e.g., decoding a large JSON response with thousands of items) -- You have long-lived Struct objects in memory (e.g., a large cache of data objects) -- Your Struct only contains scalar/primitive values (ints, floats, strings, bools, bytes) -- You are 100% certain the Struct will NEVER participate in a reference cycle - -DO NOT use `gc=False` when: -- Your Struct contains references to itself or other Structs (potential cycles) -- Your Struct is part of a parent-child relationship where parent references child and child references parent -- You're unsure whether cycles could occur - -ALWAYS MEASURE performance impact. - -### Decision Tree: Should I Use gc=False? - -``` -Should I use gc=False? -| -+-- Does your Struct only contain scalar types (int, float, str, bool, bytes)? -| +-- YES --> SAFE to use gc=False -| -+-- Does your Struct contain lists/dicts but YOU control what goes in them? -| +-- Will you EVER put the struct itself (or a parent) into those containers? -| +-- NO --> Probably safe, but test carefully -| +-- YES/MAYBE --> Do NOT use gc=False -| -+-- Does your Struct have a reference to another Struct of the same type? -| +-- YES --> Do NOT use gc=False (e.g., tree nodes, linked lists) -| -+-- Is your Struct part of a parent-child bidirectional relationship? -| +-- YES --> Do NOT use gc=False -| -+-- When in doubt --> Do NOT use gc=False -``` - -### Examples - -```python -# SAFE: Simple data objects with only scalar values -class Point(msgspec. Struct, gc=False): - x: float - y: float - z: float - -class LogEntry(msgspec. Struct, gc=False): - timestamp: float - level: str - message: str - source: str - -class CacheEntry(msgspec.Struct, gc=False): - key: str - value: str - ttl: int - created_at: float - -# SAFE: Structs containing only tuples of scalars -class Package(msgspec. Struct, gc=False): - name: str - version: str - depends: tuple[str, ...] # immutable tuple of strings - size: int - -# UNSAFE: Self-referential structures - DO NOT use gc=False -class TreeNode(msgspec. Struct): # NO gc=False here! - value: int - children: list["TreeNode"] - parent: "TreeNode | None" = None -``` - -### Real-World Example: Decoding Large JSON - -```python -import msgspec -from typing import Union - -# When decoding large JSON files (like package repositories), -# gc=False significantly improves performance -class Package(msgspec. Struct, gc=False): - build: str - build_number: int - depends: tuple[str, ...] # Use tuple, not list - immutable - md5: str - name: str - sha256: str - subdir: str - version: str - license: str = "" - noarch: Union[str, bool, None] = None - size: int = 0 - timestamp: int = 0 - -class RepoData(msgspec. Struct, gc=False): - repodata_version: int - info: dict - packages: dict[str, Package] - removed: tuple[str, ...] # Use tuple, not list - -# Create a typed decoder for maximum performance -decoder = msgspec.json.Decoder(RepoData) - -def load_repo_data(path: str) -> RepoData: - with open(path, "rb") as f: - return decoder.decode(f.read()) -``` - -## 10. Use array_like=True for Maximum Performance - -**Rule:** Set `array_like=True` when both ends know the field schema and you need maximum performance. - -**Why:** Encodes structs as arrays instead of objects, removing field names from the message. - -```python -# Standard encoding includes field names -class PointStandard(msgspec. Struct): - x: float - y: float - z: float - -# Encodes as: b'{"x": 1.0,"y":2.0,"z":3.0}' - -# Array-like encoding removes field names -class Point(msgspec. Struct, array_like=True): - x: float - y: float - z: float - -point = Point(1.0, 2.0, 3.0) -data = msgspec.json.encode(point) -# Result: b'[1.0,2.0,3.0]' - smaller and faster - -decoded = msgspec.json.decode(data, type=Point) -# Works correctly: Point(x=1.0, y=2.0, z=3.0) -``` - ---- - -## 11. Tagged Unions for Polymorphic Types - -**Rule:** Use `tag=True` on Struct types when handling multiple message types in a single union. - -**Why:** Enables efficient discrimination between types during decoding. - -```python -import msgspec - -# Define request types with tagging -class GetRequest(msgspec. Struct, tag=True): - key: str - -class PutRequest(msgspec.Struct, tag=True): - key: str - value: str - -class DeleteRequest(msgspec.Struct, tag=True): - key: str - -class ListRequest(msgspec.Struct, tag=True): - prefix: str = "" - -# Union type for all requests -Request = GetRequest | PutRequest | DeleteRequest | ListRequest - -# Single decoder handles all types -decoder = msgspec.msgpack.Decoder(Request) - -# Decoding automatically determines the correct type -data = msgspec.msgpack.encode(PutRequest(key="foo", value="bar")) -request = decoder.decode(data) - -match request: - case GetRequest(key): - print(f"Get: {key}") - case PutRequest(key, value): - print(f"Put: {key}={value}") - case DeleteRequest(key): - print(f"Delete: {key}") - case ListRequest(prefix): - print(f"List: {prefix}") -``` - ---- - -## 12. Use Struct Configuration Options - -**Rule:** Combine Struct options for cleaner, more robust code. - -```python -import msgspec - -class Base( - msgspec. Struct, - omit_defaults=True, # Don't encode default values - forbid_unknown_fields=True, # Error on unknown fields (good for config files) - rename="kebab", # Use kebab-case in JSON (my_field -> my-field) -): - """Base class with common configuration.""" - pass - -class ServerConfig(Base): - host: str = "localhost" - port: int = 8080 - max_connections: int = 100 - enable_ssl: bool = False - -# Decodes kebab-case JSON: {"host": "prod", "max-connections": 500} -config = msgspec.json.decode( - b'{"host":"prod","max-connections": 500}', - type=ServerConfig -) -# Result: ServerConfig(host='prod', port=8080, max_connections=500, enable_ssl=False) -``` - ---- - -## 13. TOML Configuration Files - -**Rule:** Use msgspec for parsing pyproject.toml and other TOML config files with validation. - -```python -import msgspec -from typing import Any - -class BuildSystem(msgspec. Struct, omit_defaults=True, rename="kebab"): - requires: list[str] = [] - build_backend: str | None = None - -class Project(msgspec. Struct, omit_defaults=True, rename="kebab"): - name: str | None = None - version: str | None = None - description: str | None = None - requires_python: str | None = None - dependencies: list[str] = [] - -class PyProject(msgspec. Struct, omit_defaults=True, rename="kebab"): - build_system: BuildSystem | None = None - project: Project | None = None - tool: dict[str, dict[str, Any]] = {} - -def load_pyproject(path: str) -> PyProject: - with open(path, "rb") as f: - return msgspec.toml.decode(f.read(), type=PyProject) -``` - -## Common Patterns - -### API Response Handler - -```python -import msgspec -from typing import TypeVar, Generic - -T = TypeVar('T') - -class APIResponse(msgspec. Struct, Generic[T], omit_defaults=True): - data: T | None = None - error: str | None = None - status: int = 200 - -class User(msgspec. Struct): - id: int - name: str - email: str - -# Create typed decoder for specific response type -user_response_decoder = msgspec. json.Decoder(APIResponse[User]) - -def parse_user_response(raw: bytes) -> APIResponse[User]: - return user_response_decoder.decode(raw) -``` - -## Struct Configuration Options Summary - -| Option | Description | Default | -|--------|-------------|---------| -| `omit_defaults` | Omit fields with default values when encoding | `False` | -| `forbid_unknown_fields` | Error on unknown fields when decoding | `False` | -| `frozen` | Make instances immutable and hashable | `False` | -| `order` | Generate ordering methods (`__lt__`, etc.) | `False` | -| `eq` | Generate equality methods | `True` | -| `kw_only` | Make all fields keyword-only | `False` | -| `tag` | Enable tagged union support | `None` | -| `tag_field` | Field name for the tag | `"type"` | -| `rename` | Rename fields for encoding/decoding | `None` | -| `array_like` | Encode/decode as arrays instead of objects | `False` | -| `gc` | Enable garbage collector tracking | `True` | -| `weakref` | Enable weak reference support | `False` | -| `dict` | Add `__dict__` attribute | `False` | -| `cache_hash` | Cache the hash value | `False` | - ---- - -## References - -- Official Documentation: https://jcristharif.com/msgspec/ -- Performance Tips: https://jcristharif.com/msgspec/perf-tips.html -- Structs Documentation: https://jcristharif.com/msgspec/structs.html -- GC Configuration: https://jcristharif.com/msgspec/structs.html#struct-gc diff --git a/.cursor/rules/python-antipatterns.mdc b/.cursor/rules/python-antipatterns.mdc deleted file mode 100644 index ece51ff2..00000000 --- a/.cursor/rules/python-antipatterns.mdc +++ /dev/null @@ -1,658 +0,0 @@ ---- -globs: **/*.py -alwaysApply: false ---- - -Try avoid these performance antipatterns in python code you write: - -*** - -### 1. **Match statements (sequence)** -- **Slow** -```python -def sequence_match_logical(): - seq = ["🐸", "🐛", "🦋", "🪲"] - frogs = 0 - for _ in range(100_000): - if isinstance(seq, Sequence) and len(seq) > 0 and seq[0] == "🐸": - frogs += 1 -``` -- **Fast** -```python -def sequence_match_statement(): - seq = ["🐸", "🐛", "🦋", "🪲"] - frogs = 0 - for _ in range(100_000): - match seq: - case ["🐸", *_]: frogs += 1 -``` - -*** - -### 2. **Match statements (literal)** -- **Slow** -```python -def literal_match_logical(): - seq = ["🐊", "🐛", "🐈", "🦋", "🪲", "🐳"] - butterflies, caterpillars, beetles = 0, 0, 0 - for _ in range(100_000): - for x in seq: - if x == "🦋": - butterflies += 1 - elif x == "🐛": - caterpillars += 1 - elif x == "🪲": - beetles += 1 -``` -- **Fast** -```python -def literal_match_statement(): - seq = ["🐊", "🐛", "🐈", "🦋", "🪲", "🐳"] - butterflies, caterpillars, beetles = 0, 0, 0 - for _ in range(100_000): - for x in seq: - match x: - case "🦋": butterflies += 1 - case "🐛": caterpillars += 1 - case "🪲": beetles += 1 -``` - -*** - -### 3. **Match statements (mapping)** -- **Slow** -```python -def mapping_match_logical(): - boats = [ - {"🐓": 1}, {"🦊": 1, "🌽": 1}, - {"🐓": 1, "🌽": 1}, {"🐓": 1, "🦊": 1}, - ] - problems = valid_boats = 0 - for _ in range(100_000): - for boat in boats: - if isinstance(boat, Mapping): - if "🐓" in boat and "🌽" in boat: - problems += 1 - elif "🐓" in boat and "🦊" in boat: - problems += 1 - else: - valid_boats += 1 -``` -- **Fast** -```python -def mapping_match_statement(): - boats = [ - {"🐓": 1}, {"🦊": 1, "🌽": 1}, - {"🐓": 1, "🌽": 1}, {"🐓": 1, "🦊": 1}, - ] - problems = valid_boats = 0 - for _ in range(100_000): - for boat in boats: - match boat: - case {"🐓": _, "🌽": _}: problems += 1 - case {"🐓": _, "🦊": _}: problems += 1 - case _: valid_boats += 1 -``` - -*** - -### 4. **Match statements (classes)** -- **Slow** -```python -def bench_class_matching_logical(): - drivers = [ - Driver(name="Max Verstappen", team="Red Bull"), - Driver(name="Sergio Perez", team="Red Bull"), - Driver(name="Charles Leclerc", team="Ferrari"), - Driver(name="Lewis Hamilton", team="Mercedes"), - ] - for _ in range(100_000): - for driver in drivers: - if not isinstance(driver, Driver): - desc = "Invalid request" - elif driver.name == "Max Verstappen": - desc = "Max Verstappen, the current world #1" - elif driver.team == "Ferrari": - desc = f"{driver.name}, a Ferrari driver!! 🐎" - else: - desc = f"{driver.name}, a {driver.team} driver." -``` -- **Fast** -```python -def bench_class_matching_statement(): - drivers = [ - Driver(name="Max Verstappen", team="Red Bull"), - Driver(name="Sergio Perez", team="Red Bull"), - Driver(name="Charles Leclerc", team="Ferrari"), - Driver(name="Lewis Hamilton", team="Mercedes"), - ] - for _ in range(100_000): - for driver in drivers: - match driver: - case Driver(name="Max Verstappen"): desc = "Max Verstappen, the current world #1" - case Driver(name=name, team="Ferrari"): desc = f"{name}, a Ferrari driver!! 🐎" - case Driver(name=name, team=team): desc = f"{name}, a {team} driver." - case _: desc = "Invalid request" -``` - -*** - -### 5. **Inline globals in loop** -- **Slow** -```python -def global_constant_in_loop(): - total = MY_GLOBAL_CONSTANT_A - for i in range(10_000): - total += i * MY_GLOBAL_CONSTANT_C -``` -- **Fast** -```python -def local_constant_in_loop(): - total = 3.14 - for i in range(10_000): - total += i * 1234 -``` - -*** - -### 6. **GC with higher threshold** -- **Slow** -```python -def load_with_gc(): - t1, t2, t3 = gc.get_threshold() - gc.set_threshold(1000, 20, 20) - for _ in range(100_000): - _cyclic_references() - gc.set_threshold(t1, t2, t3) -``` -- **Fast** -```python -def load_gc_at_end(): - t1, t2, t3 = gc.get_threshold() - gc.set_threshold(10, 10, 10) - for _ in range(100_000): - _cyclic_references() - gc.set_threshold(t1, t2, t3) -``` - -*** - -### 7. **Importing specific name instead of namespace** -- **Slow** -```python -def dotted_import(): - for _ in range(100_000): - return os.path.exists('/') -``` -- **Fast** -```python -def direct_import(): - for _ in range(100_000): - return exists('/') -``` - -*** - -### 8. **Refactoring Try..except outside a loop** -- **Slow** -```python -def try_in_loop(): - items = {'a': 1} - for _ in range(100_000): - try: - _ = items['a'] - except Exception: - pass -``` -- **Fast** -```python -def try_outside_loop(): - items = {'a': 1} - try: - for _ in range(100_000): - _ = items['a'] - except Exception: - pass -``` - -*** - -### 9. **Class instead of dataclass** -- **Slow** -```python -def attributes_in_class(): - class Pet: - legs: int - noise: str - def __init__(self, legs, noise): self.legs = legs; self.noise = noise - def __repr__(self): return "" - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` -- **Fast** -```python -def attributes_in_dataclass(): - @dataclass - class Pet: - legs: int - noise: str - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` - -*** - -### 10. **Namedtuple instead of dataclass** -- **Slow** -```python -def attributes_in_namedtuple(): - Pet = namedtuple("Pet", "legs noise") - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` -- **Fast** -```python -def attributes_in_dataclass(): - @dataclass - class Pet: - legs: int - noise: str - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` - -*** - -### 11. **class instead of namedtuple** -- **Slow** -```python -def attributes_in_class(): - class Pet: - legs: int - noise: str - def __init__(self, legs, noise): self.legs = legs; self.noise = noise - def __repr__(self): return "" - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` -- **Fast** -```python -def attributes_in_namedtuple(): - Pet = namedtuple("Pet", "legs noise") - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` - -*** - -### 12. **namedtuple class instead of namedtuple** -- **Slow** -```python -def attributes_in_namedtuple_type(): - class Pet(typing.NamedTuple): - legs: int - noise: str - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` -- **Fast** -```python -def attributes_in_namedtuple(): - Pet = namedtuple("Pet", "legs noise") - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` - -*** - -### 13. **dict instead of class** -- **Slow** -```python -def attributes_in_dict(): - for _ in range(100_000): - dog = {"legs": 4, "noise": "woof"} - str(dog) -``` -- **Fast** -```python -def attributes_in_class(): - class Pet: - legs: int - noise: str - def __init__(self, legs, noise): self.legs = legs; self.noise = noise - def __repr__(self): return "" - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` - -*** - -### 14. **class with slots** -- **Slow** -```python -def attributes_in_class(): - class Pet: - legs: int - noise: str - def __init__(self, legs, noise): self.legs = legs; self.noise = noise - def __repr__(self): return "" - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` -- **Fast** -```python -def attributes_in_class_with_slots(): - class Pet: - legs: int - noise: str - __slots__ = 'legs', 'noise' - def __init__(self, legs, noise): self.legs = legs; self.noise = noise - def __repr__(self): return "" - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` - -*** - -### 15. **dataclass with slots** -- **Slow** -```python -def attributes_in_dataclass(): - @dataclass - class Pet: - legs: int - noise: str - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` -- **Fast** -```python -def attributes_in_dataclass_with_slots(): - @dataclass(slots=True) - class Pet: - legs: int - noise: str - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` - -*** - -### 16. **Using a list comprehension to filter another list** -- **Slow** -```python -def filter_list_as_loop(): - result = [] - inputs = range(100_000) - for i in inputs: - if i % 2: - result.append(i) -``` -- **Fast** -```python -def filter_list_as_comprehension(): - inputs = range(100_000) - result = [i for i in inputs if i % 2] -``` - -*** - -### 17. **Join list comprehension instead of generator expression** -- **Slow** -```python -def join_list_comprehension(): - words = ['data', 'type', 'is', 'so', 'long', 'now'] - for x in range(100_000): - ''.join([ele.title() for ele in words]) -``` -- **Fast** -```python -def join_generator_expression(): - words = ['data', 'type', 'is', 'so', 'long', 'now'] - for x in range(100_000): - ''.join(ele.title() for ele in words) -``` - -*** - -### 18. **Using fullmatch instead of anchors** -- **Slow** -```python -def regex_with_anchors(): - SNAKE_CASE_RE = re.compile(r'^([a-z]+\d*_[a-z\d_]*|_+[a-z\d]+[a-z\d_]*)$') - tests = ['data_type', 'data_type_', '_dataType', 'dataType', 'data type'] - for x in range(100_000): - for test_str in tests: - SNAKE_CASE_RE.match(test_str) -``` -- **Fast** -```python -def regex_with_fullmatch(): - SNAKE_CASE_RE = re.compile(r'([a-z]+\d*_[a-z\d_]*|_+[a-z\d]+[a-z\d_]*)') - tests = ['data_type', 'data_type_', '_dataType', 'dataType', 'data type'] - for x in range(100_000): - for test_str in tests: - SNAKE_CASE_RE.fullmatch(test_str) -``` - -*** - -### 19. **Using a-zA-Z instead of IGNORECASE** -- **Slow** -```python -def regex_with_capitalrange(): - SNAKE_CASE_RE = re.compile(r'([a-zA-Z]+\d*_[a-zA-Z\d_]*|_+[a-zA-Z\d]+[a-zA-Z\d_]*)') - tests = ['data_type', 'data_type_URL', '_DataType', 'DataTypeURL', 'Data Type URL'] - for x in range(100_000): - for test_str in tests: - SNAKE_CASE_RE.fullmatch(test_str) -``` -- **Fast** -```python -def regex_with_ignorecase(): - SNAKE_CASE_RE = re.compile(r'([a-z]+\d*_[a-z\d_]*|_+[a-z\d]+[a-z\d_]*)', re.IGNORECASE) - tests = ['data_type', 'data_type_URL', '_DataType', 'DataTypeURL', 'Data Type URL'] - for x in range(100_000): - for test_str in tests: - SNAKE_CASE_RE.fullmatch(test_str) -``` - -*** - -### 20. **Kwargs for known keyword args** -- **Slow** -```python -def keyword_call(): - func_with_kwargs(a=1, b=2, c=3) -``` -- **Fast** -```python -def positional_call(): - func_with_named_args(a=1, b=2, c=3) -``` - -*** - -### 21. **Tiny Functions** -- **Slow** -```python -def use_tiny_func(): - x = 1 - for n in range(100_000): - add(x, n) - add(n, x) -``` -- **Fast** -```python -def inline_tiny_func(): - x = 1 - for n in range(100_000): - x + n - n + x -``` - -*** - -### 22. **Slicing with memoryview instead of bytes** -- **Slow** -```python -def bytes_slice(): - word = b'A' * 1000 - for i in range(1000): - n = word[0:i] -``` -- **Fast** -```python -def memoryview_slice(): - word = memoryview(b'A' * 1000) - for i in range(1000): - n = word[0:i] -``` - -*** - -### 23. **Loop invariant Code Motion** -- **Slow** -```python -def before(): - x = (1, 2, 3, 4) - i = 6 - for j in range(100_000): - len(x) * i + j -``` -- **Fast** -```python -def after(): - x = (1, 2, 3, 4) - i = 6 - x_i = len(x) * i - for j in range(100_000): - x_i + j -``` - -*** - -### 24. **Copy slice to Local** -- **Slow** -```python -def slice_as_local(): - x = list(range(100_000)) - y = list(range(100_000)) - for n in range(100_000): - x[n] + y[n] - x[n] + y[n] - x[n] + y[n] - x[n] + y[n] - x[n] + y[n] -``` -- **Fast** -```python -def slice_copy_to_fast(): - x = list(range(100_000)) - y = list(range(100_000)) - for n in range(100_000): - i = x[n] - j = y[n] - i + j - i + j - i + j - i + j - i + j -``` - -*** - -### 25. **Copy name to Local** -- **Slow** -```python -def as_local(): - for _ in range(100_000): - x + y - x + y - x + y - x + y - x + y -``` -- **Fast** -```python -def copy_name_to_fast(): - i = x - j = y - for _ in range(100_000): - i + j - i + j - i + j - i + j - i + j -``` - -*** - -### 26. **Copy dict item to Local** -- **Slow** -```python -def dont_copy_dict_key_to_fast(): - for _ in range(100_000): - d["x"] + d["y"] - d["x"] + d["y"] - d["x"] + d["y"] - d["x"] + d["y"] - d["x"] + d["y"] -``` -- **Fast** -```python -def copy_dict_key_to_fast(): - i = d["x"] - j = d["y"] - for _ in range(100_000): - i + j - i + j - i + j - i + j - i + j -``` - -*** - -### 27. **Copy class attr to Local** -- **Slow** -```python -def dont_copy_attr_to_fast(): - for _ in range(100_000): - foo.x + foo.y - foo.x + foo.y - foo.x + foo.y - foo.x + foo.y - foo.x + foo.y -``` -- **Fast** -```python -def copy_attr_to_fast(): - i = foo.x - j = foo.y - for _ in range(100_000): - i + j - i + j - i + j - i + j - i + j -``` - -*** - -These minimal code snippets **accurately reflect the benchmark order and results in your environment, showing both slow (anti-pattern) and fast (optimized) variants for each case.** - -[1](https://github.com/tonybaloney/anti-patterns/blob/master/README.md) diff --git a/.gitignore b/.gitignore index 8dc22a68..6681801b 100644 --- a/.gitignore +++ b/.gitignore @@ -189,10 +189,7 @@ outputs/ # Example vLLM virtualenv examples/03_BenchmarkComparison/vllm_venv/ -# Agent artifacts (local development only) +# AI tool artifacts (local development only) .cursor_artifacts/ -.claude/agent-memory/ - -# User-specific local rules (local Docker dev); do not commit -.cursor/rules/local-docker-dev.mdc -CLAUDE.local.md +.cursor/ +docs/superpowers/ diff --git a/README.md b/README.md index 9af4eb85..2a1a178f 100644 --- a/README.md +++ b/README.md @@ -1,209 +1,131 @@ -# MLPerf® Inference Endpoint Benchmarking System +# MLPerf Inference Endpoint Benchmarking System -A high-performance benchmarking tool for LLM endpoints. +[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) +[![Python 3.12+](https://img.shields.io/badge/python-3.12%2B-blue.svg)](https://www.python.org/downloads/) +[![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen.svg)](https://pre-commit.com/) -## Quick Start +A high-performance benchmarking tool for LLM inference endpoints, targeting 50k+ QPS. Part of [MLCommons](https://mlcommons.org/). -### Installation +## Quick Start -**Requirements**: Python 3.12+ (Python 3.12 is recommended for optimal performance. GIL-less mode in higher Python versions is not yet supported.) +**Requirements:** Python 3.12+ (3.12 recommended) ```bash -# Clone the repository -# Note: This repo will be migrated to https://github.com/mlcommons/endpoints git clone https://github.com/mlcommons/endpoints.git cd endpoints - -# Create virtual environment -python3.12 -m venv venv -source venv/bin/activate - -# As a user +python3.12 -m venv venv && source venv/bin/activate pip install . - -# As a developer (with development and test extras) -pip install -e ".[dev,test]" -pre-commit install ``` -### Basic Usage - ```bash -# Show help -inference-endpoint --help - -# Show system information -inference-endpoint -v info - # Test endpoint connectivity inference-endpoint probe \ --endpoints http://your-endpoint:8000 \ --model Qwen/Qwen3-8B -# Run offline benchmark (max throughput - uses all dataset samples) +# Run offline benchmark (max throughput) inference-endpoint benchmark offline \ --endpoints http://your-endpoint:8000 \ --model Qwen/Qwen3-8B \ --dataset tests/datasets/dummy_1k.jsonl -# Run online benchmark (sustained QPS - requires --target-qps, --load-pattern) +# Run online benchmark (sustained QPS) inference-endpoint benchmark online \ --endpoints http://your-endpoint:8000 \ --model Qwen/Qwen3-8B \ --dataset tests/datasets/dummy_1k.jsonl \ --load-pattern poisson \ --target-qps 100 - -# With explicit sample count -inference-endpoint benchmark offline \ - --endpoints http://your-endpoint:8000 \ - --model Qwen/Qwen3-8B \ - --dataset tests/datasets/dummy_1k.jsonl \ - --num-samples 5000 ``` -### Running Locally +### Local Testing ```bash -# Start local echo server -python3 -m inference_endpoint.testing.echo_server --port 8765 & - -# Test with dummy dataset (included in repo) +# Start local echo server and run a benchmark against it +python -m inference_endpoint.testing.echo_server --port 8765 & inference-endpoint benchmark offline \ --endpoints http://localhost:8765 \ - --model Qwen/Qwen3-8B \ + --model test-model \ --dataset tests/datasets/dummy_1k.jsonl - -# Stop echo server pkill -f echo_server ``` -See [Local Testing Guide](docs/LOCAL_TESTING.md) for detailed instructions. - -### Running Tests and Examples - -```bash -# Install test dependencies -pip install ".[test]" - -# Run tests (excluding performance and explicit-run tests) -pytest -m "not performance and not run_explicitly" - -# Run examples: follow instructions in examples/*/README.md -``` +See [Local Testing Guide](docs/LOCAL_TESTING.md) for more details. -## 📚 Documentation - -- [AGENTS.md](AGENTS.md) - Architecture, conventions, and AI agent guidelines -- [CLI Quick Reference](docs/CLI_QUICK_REFERENCE.md) - Command-line interface guide -- [Local Testing Guide](docs/LOCAL_TESTING.md) - Test with echo server -- [Development Guide](docs/DEVELOPMENT.md) - How to contribute and develop -- [Performance Architecture](docs/PERF_ARCHITECTURE.md) - Hot-path design and tuning -- [Performance Tuning](docs/CLIENT_PERFORMANCE_TUNING.md) - CPU affinity and client tuning -- [GitHub Setup Guide](docs/GITHUB_SETUP.md) - GitHub authentication and setup - -### Component Design Specs - -Each top-level component under `src/inference_endpoint/` has a corresponding spec: - -| Component | Spec | -| ----------------- | ---------------------------------------------------------------- | -| Core types | [docs/core/DESIGN.md](docs/core/DESIGN.md) | -| Load generator | [docs/load_generator/DESIGN.md](docs/load_generator/DESIGN.md) | -| Endpoint client | [docs/endpoint_client/DESIGN.md](docs/endpoint_client/DESIGN.md) | -| Metrics | [docs/metrics/DESIGN.md](docs/metrics/DESIGN.md) | -| Config | [docs/config/DESIGN.md](docs/config/DESIGN.md) | -| Async utils | [docs/async_utils/DESIGN.md](docs/async_utils/DESIGN.md) | -| Dataset manager | [docs/dataset_manager/DESIGN.md](docs/dataset_manager/DESIGN.md) | -| Commands (CLI) | [docs/commands/DESIGN.md](docs/commands/DESIGN.md) | -| OpenAI adapter | [docs/openai/DESIGN.md](docs/openai/DESIGN.md) | -| SGLang adapter | [docs/sglang/DESIGN.md](docs/sglang/DESIGN.md) | -| Evaluation | [docs/evaluation/DESIGN.md](docs/evaluation/DESIGN.md) | -| Testing utilities | [docs/testing/DESIGN.md](docs/testing/DESIGN.md) | -| Profiling | [docs/profiling/DESIGN.md](docs/profiling/DESIGN.md) | -| Plugins | [docs/plugins/DESIGN.md](docs/plugins/DESIGN.md) | -| Utils | [docs/utils/DESIGN.md](docs/utils/DESIGN.md) | - -## 🎯 Architecture - -The system follows a modular, event-driven architecture: +## Architecture ``` -Dataset Manager ──► Load Generator ──► Endpoint Client ──► External Endpoint - │ - Metrics Collector - (event logging + reporting) +Dataset Manager ──> Load Generator ──> Endpoint Client ──> External Endpoint + | + Metrics Collector (EventRecorder + MetricsReporter) ``` -- **Dataset Manager**: Loads benchmark datasets and applies transform pipelines -- **Load Generator**: Central orchestrator — controls timing (scheduler), issues queries, and emits sample events -- **Endpoint Client**: Multi-process HTTP worker pool communicating over ZMQ IPC -- **Metrics Collector**: Receives sample events from Load Generator; writes to SQLite (EventRecorder), aggregates after the run (MetricsReporter) - -## Accuracy Evaluation - -You can run accuracy evaluation with Pass@1 scoring by specifying accuracy datasets in the benchmark -configuration. Currently, Inference Endpoints provides the following pre-defined accuracy benchmarks: +| Component | Purpose | +|-----------|---------| +| **Load Generator** | Central orchestrator: `BenchmarkSession` owns lifecycle, `Scheduler` controls timing | +| **Endpoint Client** | Multi-process HTTP workers communicating via ZMQ IPC | +| **Dataset Manager** | Loads JSONL, HuggingFace, CSV, JSON, Parquet datasets | +| **Metrics** | SQLite-backed event recording, aggregation (QPS, latency, TTFT, TPOT) | +| **Config** | Pydantic-based YAML schema, CLI auto-generated via cyclopts | -- GPQA (default: GPQA Diamond) -- AIME (default: AIME 2025) -- LiveCodeBench (default: lite, release_v6) +### Benchmark Modes -However, LiveCodeBench will not work out-of-the-box and requires some additional setup. See the -[LiveCodeBench](src/inference_endpoint/evaluation/livecodebench/README.md) documentation for -details and explanations. +- **Offline** (`max_throughput`): Burst all queries at once for peak throughput measurement +- **Online** (`poisson`): Fixed QPS with Poisson arrival distribution for latency profiling +- **Concurrency**: Fixed concurrent request count -## 🚧 Pending Features +### Performance Design -The following features are planned for future releases: +The hot path is optimized for minimal overhead: -- [ ] **Submission Ruleset Integration** - Full MLPerf submission workflow support -- [ ] **Documentation Generation and Hosting** - Sphinx-based API documentation with GitHub Pages +- Multi-process workers with ZMQ IPC (not threads) +- `uvloop` + `eager_task_factory` for async performance +- `msgspec` for zero-copy serialization on the data path +- Custom HTTP connection pooling with `httptools` parser +- CPU affinity support for performance tuning -## 🤝 Contributing - -We welcome contributions! Please see our [Development Guide](docs/DEVELOPMENT.md) for details on: - -- Setting up your development environment -- Code style and quality standards -- Testing requirements -- Pull request process +## Accuracy Evaluation -## 🙏 Acknowledgements +Run accuracy evaluation with Pass@1 scoring using pre-defined benchmarks: -This project draws inspiration from and learns from the following excellent projects: +- **GPQA** (default: GPQA Diamond) +- **AIME** (default: AIME 2025) +- **LiveCodeBench** (default: lite, release_v6) — requires [additional setup](src/inference_endpoint/dataset_manager/predefined/livecodebench/README.md) -- [MLCommons Inference](https://github.com/mlcommons/inference) - MLPerf Inference benchmark suite -- [AIPerf](https://github.com/ai-dynamo/aiperf) - AI model performance profiling framework -- [SGLang GenAI-Bench](https://github.com/sgl-project/genai-bench) - Token-level performance evaluation tool -- [vLLM Benchmarks](https://github.com/vllm-project/vllm/tree/main/benchmarks) - Performance benchmarking tools for vLLM -- [InferenceMAX](https://github.com/InferenceMAX/InferenceMAX) - LLM inference optimization toolkit +## Documentation -We are grateful to these communities for their contributions to LLM benchmarking and performance analysis. +| Guide | Description | +|-------|-------------| +| [CLI Quick Reference](docs/CLI_QUICK_REFERENCE.md) | Command-line interface guide | +| [CLI Design](docs/CLI_DESIGN.md) | CLI architecture and design decisions | +| [Local Testing](docs/LOCAL_TESTING.md) | Test with the echo server | +| [Client Performance Tuning](docs/CLIENT_PERFORMANCE_TUNING.md) | Endpoint client optimization | +| [Performance Architecture](docs/PERF_ARCHITECTURE.md) | Performance architecture deep dive | +| [Development Guide](docs/DEVELOPMENT.md) | Development setup and workflow | +| [CONTRIBUTING.md](CONTRIBUTING.md) | How to contribute | -## 📄 License +## Contributing -This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE.md) file for -details. +We welcome contributions from the community. See [CONTRIBUTING.md](CONTRIBUTING.md) for: -## 🔗 Links +- Development setup and prerequisites +- Code style (ruff, mypy, conventional commits) +- Testing requirements (>90% coverage, pytest markers) +- Pull request process and review expectations -- [MLCommons](https://mlcommons.org/) - Machine Learning Performance Standards -- [Project Repository](https://github.com/mlcommons/endpoints) -- [MLPerf Inference](https://mlcommons.org/benchmarks/inference/) +Issues are tracked on our [project board](https://github.com/orgs/mlcommons/projects/57). Look for [`good first issue`](https://github.com/mlcommons/endpoints/labels/good%20first%20issue) or [`help wanted`](https://github.com/mlcommons/endpoints/labels/help%20wanted) to get started. -## 👥 Contributors +All contributors must sign the [MLCommons CLA](https://mlcommons.org/membership/membership-overview/). -Credits to core contributors of the project: +## Acknowledgements -- MLCommons Committee -- NVIDIA: Zhihan Jiang, Rashid Kaleem, Viraat Chandra, Alice Cheng -- ... +This project draws inspiration from: -See [ATTRIBUTION](ATTRIBUTION) for detailed attribution information. +- [MLCommons Inference](https://github.com/mlcommons/inference) — MLPerf Inference benchmark suite +- [AIPerf](https://github.com/ai-dynamo/aiperf) — AI model performance profiling +- [SGLang GenAI-Bench](https://github.com/sgl-project/genai-bench) — Token-level performance evaluation +- [vLLM Benchmarks](https://github.com/vllm-project/vllm/tree/main/benchmarks) — Performance benchmarking for vLLM -## 📞 Support +## License -- **Issues**: [GitHub Issues](https://github.com/mlcommons/endpoints/issues) -- **Discussions**: [GitHub Discussions](https://github.com/mlcommons/endpoints/discussions) -- **Documentation**: See [docs/](docs/) directory for guides +Apache License 2.0 — see [LICENSE](LICENSE) for details. diff --git a/docs/superpowers/plans/2026-04-07-project-management.md b/docs/superpowers/plans/2026-04-07-project-management.md deleted file mode 100644 index 5dff6134..00000000 --- a/docs/superpowers/plans/2026-04-07-project-management.md +++ /dev/null @@ -1,1092 +0,0 @@ -# Project Management Infrastructure Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Set up labels, project board, issue templates, CONTRIBUTING.md, and migrate all 57 open issues for the mlcommons/endpoints GitHub repository. - -**Architecture:** All GitHub API interactions use `curl` with auth token (the `gh` CLI has TLS certificate issues in this environment). Board configuration uses the GitHub GraphQL API for Projects V2. File changes (templates, CONTRIBUTING.md) are committed locally and pushed as a PR. - -**Tech Stack:** GitHub REST API, GitHub GraphQL API, curl, bash, git - -**IMPORTANT — API access pattern:** The `gh` CLI cannot make API calls due to TLS errors. Every API call must use this pattern: -```bash -TOKEN=$(gh auth token 2>&1) -curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" "https://api.github.com/..." -``` -For GraphQL: -```bash -TOKEN=$(gh auth token 2>&1) -curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - -X POST https://api.github.com/graphql \ - -d '{"query":"..."}' -``` - -**IMPORTANT — Label names with colons:** GitHub label names containing spaces and colons must be URL-encoded in REST API paths. For example, `type: bug` becomes `type%3A%20bug` in URLs. When creating labels via POST body (JSON), use the literal name. - ---- - -## File Structure - -No new source code files. Changes are: - -- **Create:** `.github/ISSUE_TEMPLATE/100-bug-report.yml` -- **Create:** `.github/ISSUE_TEMPLATE/200-feature-request.yml` -- **Create:** `.github/ISSUE_TEMPLATE/300-performance.yml` -- **Create:** `.github/ISSUE_TEMPLATE/400-dataset-integration.yml` -- **Create:** `.github/ISSUE_TEMPLATE/config.yml` -- **Modify:** `CONTRIBUTING.md` (full rewrite) - -All other changes are GitHub API operations (labels, board, issues) — no local files. - ---- - -### Task 1: Create New Labels - -Create all 23 new labels on the repository via the REST API. Existing labels that are being kept (`good first issue`, `help wanted`, `dependencies`, `security`, `duplicate`, `invalid`, `wontfix`) are untouched. The `mlcommons` label needs to be created fresh (the old `MLCommons` with capital M will be removed later). - -**Files:** None (API only) - -- [ ] **Step 1: Create all type labels** - -Run this script. It creates 8 type labels: - -```bash -TOKEN=$(gh auth token 2>&1) -REPO="mlcommons/endpoints" - -for label_json in \ - '{"name":"type: bug","color":"d73a4a","description":"Something isn'\''t working"}' \ - '{"name":"type: feature","color":"a2eeef","description":"New feature or capability"}' \ - '{"name":"type: enhancement","color":"bfd4f2","description":"Improvement to existing functionality"}' \ - '{"name":"type: performance","color":"3ddd26","description":"Performance regression or improvement"}' \ - '{"name":"type: documentation","color":"0075ca","description":"Documentation only"}' \ - '{"name":"type: question","color":"d876e3","description":"Usage question or clarification"}' \ - '{"name":"type: RFC","color":"76fde7","description":"Request for comments / design proposal"}' \ - '{"name":"type: chore","color":"ededed","description":"Maintenance, deps, CI, tooling"}'; do - echo "Creating: $(echo "$label_json" | python3 -c 'import sys,json; print(json.load(sys.stdin)["name"])')" - curl -s -X POST \ - -H "Authorization: token $TOKEN" \ - -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/labels" \ - -d "$label_json" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f" -> {d.get(\"name\", d.get(\"message\", \"error\"))}")' -done -``` - -Expected: 8 lines showing each label name created successfully. - -- [ ] **Step 2: Create all priority labels** - -```bash -TOKEN=$(gh auth token 2>&1) -REPO="mlcommons/endpoints" - -for label_json in \ - '{"name":"priority: ShowStopper","color":"000000","description":"Drop everything — critical blocker, all hands on deck"}' \ - '{"name":"priority: P0","color":"b60205","description":"Critical — blocks release or users"}' \ - '{"name":"priority: P1","color":"d93f0b","description":"High — must address this cycle"}' \ - '{"name":"priority: P2","color":"fbca04","description":"Medium — address within quarter"}' \ - '{"name":"priority: P3","color":"0e8a16","description":"Low — backlog, nice to have"}'; do - echo "Creating: $(echo "$label_json" | python3 -c 'import sys,json; print(json.load(sys.stdin)["name"])')" - curl -s -X POST \ - -H "Authorization: token $TOKEN" \ - -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/labels" \ - -d "$label_json" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f" -> {d.get(\"name\", d.get(\"message\", \"error\"))}")' -done -``` - -Expected: 5 labels created. - -- [ ] **Step 3: Create all area labels** - -```bash -TOKEN=$(gh auth token 2>&1) -REPO="mlcommons/endpoints" - -for label_json in \ - '{"name":"area: core-engine","color":"c5def5","description":"Load generator, scheduler, async utils"}' \ - '{"name":"area: client","color":"c5def5","description":"Endpoint client, HTTP, transport, ZMQ"}' \ - '{"name":"area: metrics","color":"c5def5","description":"Event recorder, metrics reporter, reporting"}' \ - '{"name":"area: dataset","color":"c5def5","description":"Dataset manager, formats, predefined datasets"}' \ - '{"name":"area: config-cli","color":"c5def5","description":"Config schema, CLI commands, YAML"}' \ - '{"name":"area: evaluation","color":"c5def5","description":"Accuracy evaluation, scoring, extractors"}' \ - '{"name":"area: adapters","color":"c5def5","description":"OpenAI, SGLang protocol adapters"}'; do - echo "Creating: $(echo "$label_json" | python3 -c 'import sys,json; print(json.load(sys.stdin)["name"])')" - curl -s -X POST \ - -H "Authorization: token $TOKEN" \ - -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/labels" \ - -d "$label_json" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f" -> {d.get(\"name\", d.get(\"message\", \"error\"))}")' -done -``` - -Expected: 7 labels created. - -- [ ] **Step 4: Create status labels and mlcommons label** - -```bash -TOKEN=$(gh auth token 2>&1) -REPO="mlcommons/endpoints" - -for label_json in \ - '{"name":"status: needs-triage","color":"e99695","description":"New issue, awaiting review"}' \ - '{"name":"status: needs-info","color":"f9d0c4","description":"Awaiting more details from reporter"}' \ - '{"name":"status: blocked","color":"b60205","description":"Blocked on external dependency or decision"}' \ - '{"name":"mlcommons","color":"e0703c","description":"MLCommons ruleset/submission integration"}'; do - echo "Creating: $(echo "$label_json" | python3 -c 'import sys,json; print(json.load(sys.stdin)["name"])')" - curl -s -X POST \ - -H "Authorization: token $TOKEN" \ - -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/labels" \ - -d "$label_json" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f" -> {d.get(\"name\", d.get(\"message\", \"error\"))}")' -done -``` - -Expected: 4 labels created (mlcommons may say "already_exists" if the old `MLCommons` case-insensitively matches — if so, update it in a later step). - -- [ ] **Step 5: Verify all new labels exist** - -```bash -TOKEN=$(gh auth token 2>&1) -curl -s -H "Authorization: token $TOKEN" \ - "https://api.github.com/repos/mlcommons/endpoints/labels?per_page=100" | \ - python3 -c " -import sys, json -labels = json.load(sys.stdin) -names = sorted([l['name'] for l in labels]) -print(f'Total labels: {len(names)}') -for n in names: - print(f' {n}') -" -``` - -Expected: All new `type:`, `priority:`, `area:`, `status:` labels present alongside existing labels. - ---- - -### Task 2: Relabel All Open Issues - -Apply new labels and remove old labels for every open issue, following the spec's mapping exactly. This is done in batches by priority tier. - -**Files:** None (API only) - -**IMPORTANT:** The GitHub `PUT /repos/{owner}/{repo}/issues/{number}/labels` endpoint **replaces** all labels on an issue. So each call must include the complete set of new labels for that issue. - -- [ ] **Step 1: Relabel ShowStopper issues** - -```bash -TOKEN=$(gh auth token 2>&1) -REPO="mlcommons/endpoints" - -# #84 - Pareto clarification -curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/84/labels" \ - -d '{"labels":["priority: ShowStopper","area: config-cli","mlcommons"]}' | python3 -c 'import sys,json; print(f"#84: {[l[\"name\"] for l in json.load(sys.stdin)]}")' - -# #8 - Parity with MLPerf LoadGen -curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/8/labels" \ - -d '{"labels":["priority: ShowStopper","type: performance","area: core-engine"]}' | python3 -c 'import sys,json; print(f"#8: {[l[\"name\"] for l in json.load(sys.stdin)]}")' - -# #4 - Accuracy evaluation for LLMs -curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/4/labels" \ - -d '{"labels":["priority: ShowStopper","type: feature","area: evaluation"]}' | python3 -c 'import sys,json; print(f"#4: {[l[\"name\"] for l in json.load(sys.stdin)]}")' -``` - -Expected: Each issue prints its new label set. - -- [ ] **Step 2: Relabel P0 issues** - -```bash -TOKEN=$(gh auth token 2>&1) -REPO="mlcommons/endpoints" - -# #86 - Warmup runs -curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/86/labels" \ - -d '{"labels":["priority: P0","type: feature","area: core-engine"]}' | python3 -c 'import sys,json; print(f"#86: {[l[\"name\"] for l in json.load(sys.stdin)]}")' - -# #232 - Multi-turn implementation -curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/232/labels" \ - -d '{"labels":["priority: P0","type: feature","area: dataset"]}' | python3 -c 'import sys,json; print(f"#232: {[l[\"name\"] for l in json.load(sys.stdin)]}")' - -# #183 - Pub/Sub event recorder -curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/183/labels" \ - -d '{"labels":["priority: P0","type: feature","area: metrics"]}' | python3 -c 'import sys,json; print(f"#183: {[l[\"name\"] for l in json.load(sys.stdin)]}")' - -# #138 - CI stress test upper bound -curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/138/labels" \ - -d '{"labels":["priority: P0","type: chore","area: core-engine"]}' | python3 -c 'import sys,json; print(f"#138: {[l[\"name\"] for l in json.load(sys.stdin)]}")' - -# #6 - Final report structure -curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/6/labels" \ - -d '{"labels":["priority: P0","type: feature","area: metrics"]}' | python3 -c 'import sys,json; print(f"#6: {[l[\"name\"] for l in json.load(sys.stdin)]}")' - -# #5 - Submission ruleset + config -curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/5/labels" \ - -d '{"labels":["priority: P0","type: feature","area: config-cli","mlcommons"]}' | python3 -c 'import sys,json; print(f"#5: {[l[\"name\"] for l in json.load(sys.stdin)]}")' -``` - -Expected: 6 issues relabeled. - -- [ ] **Step 3: Relabel P1 issues** - -```bash -TOKEN=$(gh auth token 2>&1) -REPO="mlcommons/endpoints" - -declare -A P1_LABELS -P1_LABELS[9]='["priority: P1","type: performance","area: core-engine"]' -P1_LABELS[255]='["priority: P1","type: feature","area: core-engine"]' -P1_LABELS[269]='["priority: P1","type: bug","area: client"]' -P1_LABELS[237]='["priority: P1","type: bug","area: config-cli"]' -P1_LABELS[219]='["priority: P1","type: bug","area: config-cli"]' -P1_LABELS[221]='["priority: P1","type: bug","area: config-cli"]' -P1_LABELS[202]='["priority: P1","type: bug","area: client"]' -P1_LABELS[199]='["priority: P1","type: bug","area: config-cli"]' -P1_LABELS[222]='["priority: P1","type: chore","area: core-engine"]' -P1_LABELS[220]='["priority: P1","type: chore","area: adapters"]' -P1_LABELS[182]='["priority: P1","type: performance","area: metrics"]' -P1_LABELS[177]='["priority: P1","type: feature","area: evaluation","area: dataset"]' -P1_LABELS[176]='["priority: P1","type: feature","area: evaluation","area: dataset"]' -P1_LABELS[113]='["priority: P1","type: feature"]' -P1_LABELS[210]='["priority: P1","type: feature"]' -P1_LABELS[268]='["priority: P1","type: feature"]' -P1_LABELS[10]='["priority: P1","type: performance","area: core-engine"]' -P1_LABELS[7]='["priority: P1","type: feature","area: metrics"]' - -for issue in "${!P1_LABELS[@]}"; do - curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/$issue/labels" \ - -d "{\"labels\":${P1_LABELS[$issue]}}" | python3 -c "import sys,json; print(f'#$issue: {[l[\"name\"] for l in json.load(sys.stdin)]}')" -done -``` - -Expected: 18 issues relabeled. - -- [ ] **Step 4: Relabel P2 issues** - -```bash -TOKEN=$(gh auth token 2>&1) -REPO="mlcommons/endpoints" - -declare -A P2_LABELS -P2_LABELS[254]='["priority: P2","type: feature","area: client"]' -P2_LABELS[217]='["priority: P2","type: feature","area: core-engine"]' -P2_LABELS[179]='["priority: P2","type: feature","area: evaluation","area: dataset"]' -P2_LABELS[178]='["priority: P2","type: feature","area: evaluation","area: dataset"]' -P2_LABELS[173]='["priority: P2","type: bug","mlcommons"]' -P2_LABELS[224]='["priority: P2","type: feature","area: config-cli"]' -P2_LABELS[208]='["priority: P2","type: performance","area: metrics"]' -P2_LABELS[158]='["priority: P2","type: feature","area: adapters"]' -P2_LABELS[125]='["priority: P2","type: feature","area: core-engine"]' -P2_LABELS[115]='["priority: P2","type: enhancement","area: config-cli"]' -P2_LABELS[79]='["priority: P2","type: feature","mlcommons"]' -P2_LABELS[73]='["priority: P2","type: feature","area: dataset"]' -P2_LABELS[68]='["priority: P2","type: feature","area: config-cli","mlcommons"]' -P2_LABELS[58]='["priority: P2","type: feature","area: config-cli","mlcommons"]' -P2_LABELS[213]='["priority: P2","type: bug","mlcommons"]' -P2_LABELS[133]='["priority: P2","type: bug","area: client"]' -P2_LABELS[174]='["priority: P2","type: enhancement","mlcommons"]' -P2_LABELS[229]='["priority: P2","type: chore"]' -P2_LABELS[228]='["priority: P2","type: documentation"]' -P2_LABELS[227]='["priority: P2","type: feature"]' -P2_LABELS[212]='["priority: P2","type: feature"]' - -for issue in "${!P2_LABELS[@]}"; do - curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/$issue/labels" \ - -d "{\"labels\":${P2_LABELS[$issue]}}" | python3 -c "import sys,json; print(f'#$issue: {[l[\"name\"] for l in json.load(sys.stdin)]}')" -done -``` - -Expected: 21 issues relabeled. - -- [ ] **Step 5: Relabel P3 and other issues** - -```bash -TOKEN=$(gh auth token 2>&1) -REPO="mlcommons/endpoints" - -# P3 issues -curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/99/labels" \ - -d '{"labels":["priority: P3","type: bug","good first issue"]}' | python3 -c 'import sys,json; print(f"#99: {[l[\"name\"] for l in json.load(sys.stdin)]}")' - -curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/50/labels" \ - -d '{"labels":["priority: P3","type: feature"]}' | python3 -c 'import sys,json; print(f"#50: {[l[\"name\"] for l in json.load(sys.stdin)]}")' - -curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/204/labels" \ - -d '{"labels":["priority: P3","type: documentation"]}' | python3 -c 'import sys,json; print(f"#204: {[l[\"name\"] for l in json.load(sys.stdin)]}")' - -curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/190/labels" \ - -d '{"labels":["priority: P3","type: chore"]}' | python3 -c 'import sys,json; print(f"#190: {[l[\"name\"] for l in json.load(sys.stdin)]}")' - -curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/181/labels" \ - -d '{"labels":["priority: P3","type: feature"]}' | python3 -c 'import sys,json; print(f"#181: {[l[\"name\"] for l in json.load(sys.stdin)]}")' - -# Other (no priority) -curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/223/labels" \ - -d '{"labels":["type: RFC"]}' | python3 -c 'import sys,json; print(f"#223: {[l[\"name\"] for l in json.load(sys.stdin)]}")' - -curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/267/labels" \ - -d '{"labels":["type: chore","dependencies","security"]}' | python3 -c 'import sys,json; print(f"#267: {[l[\"name\"] for l in json.load(sys.stdin)]}")' -``` - -Expected: 7 issues relabeled. - -- [ ] **Step 6: Verify relabeling — spot check 5 issues** - -```bash -TOKEN=$(gh auth token 2>&1) -for issue in 84 232 269 208 99; do - curl -s -H "Authorization: token $TOKEN" \ - "https://api.github.com/repos/mlcommons/endpoints/issues/$issue" | \ - python3 -c "import sys,json; d=json.load(sys.stdin); print(f'#{d[\"number\"]} {d[\"title\"]}: {[l[\"name\"] for l in d[\"labels\"]]}')" -done -``` - -Expected: Each issue shows only its new prefixed labels. - ---- - -### Task 3: Close Duplicate Issues - -For each duplicate, first read its body to preserve unique context, then comment on the primary issue with that context, then close the duplicate with an explanation. - -**Files:** None (API only) - -- [ ] **Step 1: Close #205 as duplicate of #255 (async benchmark)** - -```bash -TOKEN=$(gh auth token 2>&1) -REPO="mlcommons/endpoints" - -# Get #205 body for context preservation -BODY_205=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/205" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")') - -# Comment on primary #255 with context from #205 -curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/255/comments" \ - -d "$(python3 -c " -import json -body = '''Context preserved from duplicate #205 (fully async benchmark): - -$BODY_205''' -print(json.dumps({'body': body})) -")" | python3 -c 'import sys,json; print(f"Commented on #255: {json.load(sys.stdin).get(\"id\",\"error\")}")' - -# Comment on #205 explaining closure -curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/205/comments" \ - -d '{"body":"Closing as duplicate of #255 (Make Loadgen Async). Both issues target the same goal of making the benchmark fully async. Unique context from this issue has been copied to #255."}' | python3 -c 'import sys,json; print(f"Commented on #205: {json.load(sys.stdin).get(\"id\",\"error\")}")' - -# Close #205 -curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/205" \ - -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#205 state: {json.load(sys.stdin).get(\"state\",\"error\")}")' -``` - -Expected: #205 closed, context preserved on #255. - -- [ ] **Step 2: Close #170 as duplicate of #86 (warmup)** - -```bash -TOKEN=$(gh auth token 2>&1) -REPO="mlcommons/endpoints" - -BODY_170=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/170" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")') - -curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/86/comments" \ - -d "$(python3 -c " -import json -body = '''Context preserved from duplicate #170 (warmup with random dataset): - -$BODY_170''' -print(json.dumps({'body': body})) -")" | python3 -c 'import sys,json; print(f"Commented on #86: {json.load(sys.stdin).get(\"id\",\"error\")}")' - -curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/170/comments" \ - -d '{"body":"Closing as duplicate of #86 (Warmup runs). This issue describes a specific warmup implementation approach (random dataset) which is a subset of #86. Unique context has been copied to #86."}' | python3 -c 'import sys,json; print(f"Commented on #170: {json.load(sys.stdin).get(\"id\",\"error\")}")' - -curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/170" \ - -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#170 state: {json.load(sys.stdin).get(\"state\",\"error\")}")' -``` - -- [ ] **Step 3: Close #226 as duplicate of #232 (multi-turn)** - -```bash -TOKEN=$(gh auth token 2>&1) -REPO="mlcommons/endpoints" - -BODY_226=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/226" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")') - -curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/232/comments" \ - -d "$(python3 -c " -import json -body = '''Context preserved from duplicate #226 (Initial multi-turn enabling): - -$BODY_226''' -print(json.dumps({'body': body})) -")" | python3 -c 'import sys,json; print(f"Commented on #232: {json.load(sys.stdin).get(\"id\",\"error\")}")' - -curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/226/comments" \ - -d '{"body":"Closing as duplicate of #232 (multi-turn implementation). Both track the same multi-turn feature. Unique context has been copied to #232."}' | python3 -c 'import sys,json; print(f"Commented on #226: {json.load(sys.stdin).get(\"id\",\"error\")}")' - -curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/226" \ - -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#226 state: {json.load(sys.stdin).get(\"state\",\"error\")}")' -``` - -- [ ] **Step 4: Close #29 as superseded by #79 (submission checker)** - -```bash -TOKEN=$(gh auth token 2>&1) -REPO="mlcommons/endpoints" - -BODY_29=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/29" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")') - -curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/79/comments" \ - -d "$(python3 -c " -import json -body = '''Context preserved from superseded #29 (submission checker for 6.0): - -$BODY_29''' -print(json.dumps({'body': body})) -")" | python3 -c 'import sys,json; print(f"Commented on #79: {json.load(sys.stdin).get(\"id\",\"error\")}")' - -curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/29/comments" \ - -d '{"body":"Closing as superseded by #79 (submission checker compatibility mode). #29 was version-specific (6.0) while #79 covers the general compatibility feature. Context has been preserved on #79."}' | python3 -c 'import sys,json; print(f"Commented on #29: {json.load(sys.stdin).get(\"id\",\"error\")}")' - -curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/29" \ - -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#29 state: {json.load(sys.stdin).get(\"state\",\"error\")}")' -``` - -- [ ] **Step 5: Close #207 as duplicate of #208 (report generation)** - -```bash -TOKEN=$(gh auth token 2>&1) -REPO="mlcommons/endpoints" - -BODY_207=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/207" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")') - -curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/208/comments" \ - -d "$(python3 -c " -import json -body = '''Context preserved from duplicate #207 (speedup tokenizer report generation): - -$BODY_207''' -print(json.dumps({'body': body})) -")" | python3 -c 'import sys,json; print(f"Commented on #208: {json.load(sys.stdin).get(\"id\",\"error\")}")' - -curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/207/comments" \ - -d '{"body":"Closing as duplicate of #208 (optimize report generation). #207 describes a specific approach (parallel tokenization) to #208'\''s broader goal. Context has been preserved on #208."}' | python3 -c 'import sys,json; print(f"Commented on #207: {json.load(sys.stdin).get(\"id\",\"error\")}")' - -curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/207" \ - -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#207 state: {json.load(sys.stdin).get(\"state\",\"error\")}")' -``` - -- [ ] **Step 6: Close #83 as superseded by #223 (roadmap)** - -```bash -TOKEN=$(gh auth token 2>&1) -REPO="mlcommons/endpoints" - -curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/83/comments" \ - -d '{"body":"Closing as superseded by #223 (Phase 2 Roadmap). The Q1 roadmap is complete and Phase 2 planning has taken over."}' | python3 -c 'import sys,json; print(f"Commented on #83: {json.load(sys.stdin).get(\"id\",\"error\")}")' - -curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/83" \ - -d '{"state":"closed","state_reason":"completed"}' | python3 -c 'import sys,json; print(f"#83 state: {json.load(sys.stdin).get(\"state\",\"error\")}")' -``` - ---- - -### Task 4: Delete Legacy Labels - -Remove old labels that have been replaced. Only delete after all issues have been relabeled (Task 2 complete). - -**Files:** None (API only) - -- [ ] **Step 1: Delete all legacy labels** - -```bash -TOKEN=$(gh auth token 2>&1) -REPO="mlcommons/endpoints" - -# URL-encode label names: spaces→%20, colons are fine in DELETE paths -for label in "bug" "feature" "enhancement" "documentation" "performance" "question" \ - "P0" "P1" "P2" "ShowStopper" "testing" "accuracy" "dataset" "Roadmap" "blocked" \ - "rules" "MLCommons"; do - encoded=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$label'))") - echo -n "Deleting '$label'... " - STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X DELETE \ - -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/labels/$encoded") - if [ "$STATUS" = "204" ]; then echo "deleted"; elif [ "$STATUS" = "404" ]; then echo "not found (already gone)"; else echo "status $STATUS"; fi -done -``` - -Expected: Each label prints "deleted" or "not found". No errors. - -- [ ] **Step 2: Verify final label set** - -```bash -TOKEN=$(gh auth token 2>&1) -curl -s -H "Authorization: token $TOKEN" \ - "https://api.github.com/repos/mlcommons/endpoints/labels?per_page=100" | \ - python3 -c " -import sys, json -labels = json.load(sys.stdin) -names = sorted([l['name'] for l in labels]) -print(f'Total labels: {len(names)}') -for n in names: - print(f' {n}') -" -``` - -Expected: Only new prefixed labels plus kept labels (`good first issue`, `help wanted`, `mlcommons`, `dependencies`, `security`, `duplicate`, `invalid`, `wontfix`). No old labels remain. - ---- - -### Task 5: Configure Project Board #57 - -Set up the board with status field options, custom fields, and 4 views using the GraphQL API. - -**Files:** None (API only) - -**NOTE:** The board already exists with ID `PVT_kwDOBAnwDc4BTQvY`. We need to configure its fields and views. - -- [ ] **Step 1: Get the board's field IDs** - -```bash -TOKEN=$(gh auth token 2>&1) -curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - -X POST https://api.github.com/graphql \ - -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { fields(first: 20) { nodes { ... on ProjectV2Field { id name } ... on ProjectV2SingleSelectField { id name options { id name } } ... on ProjectV2IterationField { id name } } } } } }"}' | python3 -m json.tool -``` - -Expected: JSON listing all existing fields with their IDs. Look for the "Status" field and its current options. Record the Status field ID for next steps. - -- [ ] **Step 2: Update the Status field with 6 options** - -Using the Status field ID from Step 1, update its options. The GraphQL mutation is `updateProjectV2Field`. First, clear existing options and set the 6 new ones. - -**Note:** You must adapt the field ID from Step 1's output. Replace `STATUS_FIELD_ID` below with the actual ID. - -```bash -TOKEN=$(gh auth token 2>&1) - -# Get current status field ID (adapt if needed) -STATUS_FIELD_ID="" - -# Update status field options using the updateProjectV2SingleSelectField mutation -curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - -X POST https://api.github.com/graphql \ - -d '{ - "query": "mutation { updateProjectV2Field(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", fieldId: \"'"$STATUS_FIELD_ID"'\", singleSelectOptions: [{name: \"Inbox\", color: GRAY}, {name: \"Triage\", color: YELLOW}, {name: \"Ready\", color: BLUE}, {name: \"In Progress\", color: ORANGE}, {name: \"In Review\", color: PURPLE}, {name: \"Done\", color: GREEN}]}) { projectV2Field { ... on ProjectV2SingleSelectField { id options { id name } } } }" - }' | python3 -m json.tool -``` - -Expected: Returns the updated Status field with 6 options. - -- [ ] **Step 3: Create Priority custom field** - -```bash -TOKEN=$(gh auth token 2>&1) -curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - -X POST https://api.github.com/graphql \ - -d '{ - "query": "mutation { createProjectV2Field(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", dataType: SINGLE_SELECT, name: \"Priority\", singleSelectOptions: [{name: \"ShowStopper\", color: RED}, {name: \"P0\", color: RED}, {name: \"P1\", color: ORANGE}, {name: \"P2\", color: YELLOW}, {name: \"P3\", color: GREEN}]}) { projectV2Field { ... on ProjectV2SingleSelectField { id name options { id name } } } }" - }' | python3 -m json.tool -``` - -Expected: Priority field created with 5 options. - -- [ ] **Step 4: Create Area custom field** - -```bash -TOKEN=$(gh auth token 2>&1) -curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - -X POST https://api.github.com/graphql \ - -d '{ - "query": "mutation { createProjectV2Field(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", dataType: SINGLE_SELECT, name: \"Area\", singleSelectOptions: [{name: \"core-engine\", color: BLUE}, {name: \"client\", color: BLUE}, {name: \"metrics\", color: BLUE}, {name: \"dataset\", color: BLUE}, {name: \"config-cli\", color: BLUE}, {name: \"evaluation\", color: BLUE}, {name: \"adapters\", color: BLUE}, {name: \"mlcommons\", color: PURPLE}]}) { projectV2Field { ... on ProjectV2SingleSelectField { id name options { id name } } } }" - }' | python3 -m json.tool -``` - -Expected: Area field created with 8 options. - -- [ ] **Step 5: Create Target Release custom field** - -```bash -TOKEN=$(gh auth token 2>&1) -curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - -X POST https://api.github.com/graphql \ - -d '{ - "query": "mutation { createProjectV2Field(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", dataType: SINGLE_SELECT, name: \"Target Release\", singleSelectOptions: [{name: \"v0.5.0\", color: GRAY}, {name: \"v1.0.0\", color: GRAY}]}) { projectV2Field { ... on ProjectV2SingleSelectField { id name options { id name } } } }" - }' | python3 -m json.tool -``` - -Expected: Target Release field created. - -- [ ] **Step 6: Verify all fields exist** - -```bash -TOKEN=$(gh auth token 2>&1) -curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - -X POST https://api.github.com/graphql \ - -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { fields(first: 20) { nodes { ... on ProjectV2Field { id name } ... on ProjectV2SingleSelectField { id name options { id name } } } } } } }"}' | python3 -m json.tool -``` - -Expected: Status (6 options), Priority (5 options), Area (8 options), Target Release (2 options) all present. - ---- - -### Task 6: Add Issues to Board #57 - -Add all ShowStopper through P2 issues (~40 after dedup) to the project board and set their status to Triage. - -**Files:** None (API only) - -- [ ] **Step 1: Get issue node IDs for all Q2 issues** - -We need the GraphQL node IDs for each issue to add them to the project. Batch-fetch them: - -```bash -TOKEN=$(gh auth token 2>&1) - -# All issue numbers to add to board (ShowStopper + P0 + P1 + P2) -ISSUES="84 8 4 86 232 183 138 6 5 9 255 269 237 219 221 202 199 222 220 182 177 176 113 210 268 10 7 254 217 179 178 173 224 208 158 125 115 79 73 68 58 213 133 174 229 228 227 212" - -for issue in $ISSUES; do - NODE_ID=$(curl -s -H "Authorization: token $TOKEN" \ - "https://api.github.com/repos/mlcommons/endpoints/issues/$issue" | \ - python3 -c 'import sys,json; print(json.load(sys.stdin)["node_id"])') - echo "$issue $NODE_ID" -done -``` - -Expected: A list of issue numbers and their node IDs. Save this output — you'll need it for Step 2. - -- [ ] **Step 2: Add each issue to the project** - -For each issue, use the `addProjectV2ItemById` mutation. Process in batches to avoid rate limiting: - -```bash -TOKEN=$(gh auth token 2>&1) -PROJECT_ID="PVT_kwDOBAnwDc4BTQvY" - -# Use the node IDs from Step 1. Example for one issue: -# curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ -# -d '{"query":"mutation { addProjectV2ItemById(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", contentId: \"NODE_ID_HERE\"}) { item { id } } }"}' - -# Batch all issues: -ISSUES="84 8 4 86 232 183 138 6 5 9 255 269 237 219 221 202 199 222 220 182 177 176 113 210 268 10 7 254 217 179 178 173 224 208 158 125 115 79 73 68 58 213 133 174 229 228 227 212" - -for issue in $ISSUES; do - NODE_ID=$(curl -s -H "Authorization: token $TOKEN" \ - "https://api.github.com/repos/mlcommons/endpoints/issues/$issue" | \ - python3 -c 'import sys,json; print(json.load(sys.stdin)["node_id"])') - - ITEM_ID=$(curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ - -d "{\"query\":\"mutation { addProjectV2ItemById(input: {projectId: \\\"$PROJECT_ID\\\", contentId: \\\"$NODE_ID\\\"}) { item { id } } }\"}" | \ - python3 -c 'import sys,json; print(json.load(sys.stdin)["data"]["addProjectV2ItemById"]["item"]["id"])') - - echo "#$issue added: $ITEM_ID" - sleep 0.5 # Rate limit courtesy -done -``` - -Expected: Each issue prints its project item ID. All ~47 issues added. - -- [ ] **Step 3: Set all items to Triage status** - -After adding items, set their Status field to "Triage". You need the Status field ID and the "Triage" option ID from Task 5 Step 1/2. - -```bash -TOKEN=$(gh auth token 2>&1) -PROJECT_ID="PVT_kwDOBAnwDc4BTQvY" -STATUS_FIELD_ID="" -TRIAGE_OPTION_ID="" - -# For each item added in Step 2, set status to Triage -# Use the item IDs printed in Step 2 -for ITEM_ID in ; do - curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ - -d "{\"query\":\"mutation { updateProjectV2ItemFieldValue(input: {projectId: \\\"$PROJECT_ID\\\", itemId: \\\"$ITEM_ID\\\", fieldId: \\\"$STATUS_FIELD_ID\\\", value: {singleSelectOptionId: \\\"$TRIAGE_OPTION_ID\\\"}}) { projectV2Item { id } } }\"}" | \ - python3 -c 'import sys,json; d=json.load(sys.stdin); print(f"Set triage: {d}")' - sleep 0.3 -done -``` - -Expected: All items set to Triage status. - -- [ ] **Step 4: Verify board population** - -```bash -TOKEN=$(gh auth token 2>&1) -curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ - -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { items(first: 100) { totalCount nodes { content { ... on Issue { number title } } } } } } }"}' | \ - python3 -c " -import sys, json -data = json.load(sys.stdin) -items = data['data']['node']['items'] -print(f'Total items on board: {items[\"totalCount\"]}') -for item in items['nodes']: - c = item['content'] - print(f' #{c[\"number\"]} {c[\"title\"]}') -" -``` - -Expected: ~47 issues listed on the board. - ---- - -### Task 7: Create Board Views - -Create the 4 views on the project board. The default view already exists (rename to Kanban); create 3 additional views. - -**Files:** None (API only) - -- [ ] **Step 1: List existing views** - -```bash -TOKEN=$(gh auth token 2>&1) -curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ - -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { views(first: 10) { nodes { id name number layout } } } } }"}' | python3 -m json.tool -``` - -Expected: At least one default view. Record its ID. - -- [ ] **Step 2: Update default view to Kanban board layout** - -```bash -TOKEN=$(gh auth token 2>&1) -DEFAULT_VIEW_ID="" - -curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ - -d "{\"query\":\"mutation { updateProjectV2View(input: {projectId: \\\"PVT_kwDOBAnwDc4BTQvY\\\", viewId: \\\"$DEFAULT_VIEW_ID\\\", name: \\\"Kanban\\\", layout: BOARD_LAYOUT}) { projectV2View { id name layout } } }\"}" | python3 -m json.tool -``` - -Expected: Default view renamed to "Kanban" with BOARD_LAYOUT. - -- [ ] **Step 3: Create Priority Table view** - -```bash -TOKEN=$(gh auth token 2>&1) -curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ - -d '{"query":"mutation { createProjectV2View(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", name: \"Priority Table\", layout: TABLE_LAYOUT}) { projectV2View { id name layout } } }"}' | python3 -m json.tool -``` - -Expected: New "Priority Table" view created with TABLE_LAYOUT. - -- [ ] **Step 4: Create By Assignee view** - -```bash -TOKEN=$(gh auth token 2>&1) -curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ - -d '{"query":"mutation { createProjectV2View(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", name: \"By Assignee\", layout: TABLE_LAYOUT}) { projectV2View { id name layout } } }"}' | python3 -m json.tool -``` - -Expected: New "By Assignee" view created. - -- [ ] **Step 5: Create Stale Issues view** - -```bash -TOKEN=$(gh auth token 2>&1) -curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ - -d '{"query":"mutation { createProjectV2View(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", name: \"Stale Issues\", layout: TABLE_LAYOUT}) { projectV2View { id name layout } } }"}' | python3 -m json.tool -``` - -Expected: New "Stale Issues" view created. - -- [ ] **Step 6: Verify all 4 views exist** - -```bash -TOKEN=$(gh auth token 2>&1) -curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \ - -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { views(first: 10) { nodes { id name number layout } } } } }"}' | python3 -m json.tool -``` - -Expected: 4 views — Kanban (BOARD_LAYOUT), Priority Table (TABLE_LAYOUT), By Assignee (TABLE_LAYOUT), Stale Issues (TABLE_LAYOUT). - -**NOTE:** View-level sorting, grouping, and filtering must be configured manually in the GitHub web UI after views are created. The GraphQL API supports creating views and setting layout, but fine-grained sort/group/filter configuration is not fully exposed via API. After this task, open https://github.com/orgs/mlcommons/projects/57 and configure: -- Kanban: Group by Priority -- Priority Table: Sort by Priority field ascending -- By Assignee: Group by Assignee -- Stale Issues: Sort by Updated ascending, filter to items not updated in 30+ days - ---- - -### Task 8: Create Issue Templates - -Write the 4 YAML issue form templates and the config file to the local repo. - -**Files:** -- Create: `.github/ISSUE_TEMPLATE/100-bug-report.yml` -- Create: `.github/ISSUE_TEMPLATE/200-feature-request.yml` -- Create: `.github/ISSUE_TEMPLATE/300-performance.yml` -- Create: `.github/ISSUE_TEMPLATE/400-dataset-integration.yml` -- Create: `.github/ISSUE_TEMPLATE/config.yml` - -- [ ] **Step 1: Create the ISSUE_TEMPLATE directory** - -```bash -mkdir -p .github/ISSUE_TEMPLATE -``` - -- [ ] **Step 2: Write 100-bug-report.yml** - -Write to `.github/ISSUE_TEMPLATE/100-bug-report.yml` with the exact content from the design spec Section 3, `100-bug-report.yml`. - -- [ ] **Step 3: Write 200-feature-request.yml** - -Write to `.github/ISSUE_TEMPLATE/200-feature-request.yml` with the exact content from the design spec Section 3, `200-feature-request.yml`. - -- [ ] **Step 4: Write 300-performance.yml** - -Write to `.github/ISSUE_TEMPLATE/300-performance.yml` with the exact content from the design spec Section 3, `300-performance.yml`. - -- [ ] **Step 5: Write 400-dataset-integration.yml** - -Write to `.github/ISSUE_TEMPLATE/400-dataset-integration.yml` with the exact content from the design spec Section 3, `400-dataset-integration.yml`. - -- [ ] **Step 6: Write config.yml** - -Write to `.github/ISSUE_TEMPLATE/config.yml`: - -```yaml -blank_issues_enabled: true -contact_links: - - name: Questions & Discussion - url: https://github.com/mlcommons/endpoints/discussions - about: Ask questions and discuss ideas before filing an issue -``` - -- [ ] **Step 7: Verify all template files exist** - -```bash -ls -la .github/ISSUE_TEMPLATE/ -``` - -Expected: 5 files — `100-bug-report.yml`, `200-feature-request.yml`, `300-performance.yml`, `400-dataset-integration.yml`, `config.yml`. - -- [ ] **Step 8: Commit issue templates** - -```bash -git add .github/ISSUE_TEMPLATE/ -git commit -m "chore: add issue templates (bug, feature, performance, dataset) - -Co-Authored-By: Claude Opus 4.6 (1M context) " -``` - ---- - -### Task 9: Update CONTRIBUTING.md - -Replace the existing 10-line CONTRIBUTING.md with the expanded ~250-line version. - -**Files:** -- Modify: `CONTRIBUTING.md` (full rewrite) - -- [ ] **Step 1: Write the new CONTRIBUTING.md** - -Write the full CONTRIBUTING.md content as designed in Section 4 of the spec. The full text was presented during brainstorming and approved. It includes these sections: - -1. Welcome and Table of Contents -2. Ways to Contribute (links to all 4 issue templates) -3. Development Setup (prerequisites, fork/clone, venv, pip install, pre-commit, echo server) -4. Code Style and Conventions (ruff, mypy, line length 88, conventional commits, serialization, performance-sensitive code) -5. Testing (pytest commands, markers, async mode, coverage, fixtures) -6. Submitting Changes (branch naming, PR process, review criteria) -7. Issue Guidelines (templates, lifecycle, priority levels table) -8. MLCommons CLA (existing CLA requirements preserved) -9. Questions section - -- [ ] **Step 2: Commit CONTRIBUTING.md** - -```bash -git add CONTRIBUTING.md -git commit -m "docs: expand CONTRIBUTING.md with development guide, testing, and issue guidelines - -Co-Authored-By: Claude Opus 4.6 (1M context) " -``` - ---- - -### Task 10: Link Open PRs to Issues - -Add comments on open PRs that implement issues different from their own number, creating explicit linkage. - -**Files:** None (API only) - -- [ ] **Step 1: Link PRs to their corresponding issues** - -Only PRs where the PR number differs from the issue it implements need explicit linking: - -```bash -TOKEN=$(gh auth token 2>&1) -REPO="mlcommons/endpoints" - -# PR #226 implements issue #232 (multi-turn) -curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/226/comments" \ - -d '{"body":"Relates to #232 (multi-turn implementation). This PR provides the initial multi-turn enabling work tracked by #232."}' | python3 -c 'import sys,json; print(f"PR #226 linked to #232: {json.load(sys.stdin).get(\"id\",\"error\")}")' - -# PR #207 implements issue #208 (report generation optimization) -curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/207/comments" \ - -d '{"body":"Relates to #208 (optimize report generation). This PR implements parallel tokenization as one approach to #208."}' | python3 -c 'import sys,json; print(f"PR #207 linked to #208: {json.load(sys.stdin).get(\"id\",\"error\")}")' - -# PR #170 implements issue #86 (warmup runs) -curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/170/comments" \ - -d '{"body":"Relates to #86 (Warmup runs). This PR implements warmup with random dataset as part of #86."}' | python3 -c 'import sys,json; print(f"PR #170 linked to #86: {json.load(sys.stdin).get(\"id\",\"error\")}")' - -# PR #205 relates to issue #255 (Make Loadgen Async) -curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/$REPO/issues/205/comments" \ - -d '{"body":"Relates to #255 (Make Loadgen Async). Both this PR and #255 target the same async benchmark goal."}' | python3 -c 'import sys,json; print(f"PR #205 linked to #255: {json.load(sys.stdin).get(\"id\",\"error\")}")' -``` - -Expected: 4 comments posted linking PRs to their primary issues. - ---- - -### Task 11: Push and Create PR - -Push the local commits (issue templates + CONTRIBUTING.md) as a PR to the repository. - -**Files:** None (git operations) - -- [ ] **Step 1: Create a feature branch** - -```bash -git checkout -b chore/project-management-setup -``` - -- [ ] **Step 2: Cherry-pick the commits onto the branch** - -If you committed on main, reset main and cherry-pick onto the new branch. Otherwise if you're already on the branch, skip this. - -- [ ] **Step 3: Push to remote** - -```bash -git push -u origin chore/project-management-setup -``` - -- [ ] **Step 4: Create the PR** - -```bash -TOKEN=$(gh auth token 2>&1) -curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \ - "https://api.github.com/repos/mlcommons/endpoints/pulls" \ - -d '{ - "title": "chore: add issue templates, expand CONTRIBUTING.md, and project management setup", - "body": "## Summary\n\n- Add 4 YAML issue form templates (bug report, feature request, performance issue, dataset integration)\n- Expand CONTRIBUTING.md with development setup, code style, testing, PR process, and issue guidelines\n- Part of the project management infrastructure setup (labels, board, and issue migration done via API)\n\n## Related\n\nDesign spec: docs/superpowers/specs/2026-04-07-project-management-design.md\n\n## Test plan\n\n- [ ] Verify issue templates render correctly on GitHub (New Issue page)\n- [ ] Verify CONTRIBUTING.md renders correctly\n- [ ] Verify all links in CONTRIBUTING.md work\n\n🤖 Generated with [Claude Code](https://claude.com/claude-code)", - "head": "chore/project-management-setup", - "base": "main" - }' | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f"PR created: {d.get(\"html_url\", d.get(\"message\", \"error\"))}")' -``` - -Expected: PR URL printed. - ---- - -### Task 12: Enable Board Automations - -Configure the built-in automations on project board #57 via the GitHub web UI. - -**Files:** None (manual UI configuration) - -**NOTE:** GitHub Projects V2 built-in automations (auto-add, auto-archive, auto-set status on close) are not configurable via the GraphQL API. They must be enabled manually. - -- [ ] **Step 1: Open project settings** - -Navigate to: https://github.com/orgs/mlcommons/projects/57/settings - -- [ ] **Step 2: Enable "Auto-add" workflow** - -Under Workflows → Auto-add to project: -- Enable the workflow -- Filter: `is:issue is:open repo:mlcommons/endpoints` -- This ensures all new issues are automatically added to the board with Inbox status - -- [ ] **Step 3: Enable "Item closed" workflow** - -Under Workflows → Item closed: -- Enable the workflow -- Set status to: Done - -- [ ] **Step 4: Enable "Pull request merged" workflow** - -Under Workflows → Pull request merged: -- Enable the workflow -- Set status to: Done - -- [ ] **Step 5: Enable "Auto-archive items"** - -Under Workflows → Auto-archive items: -- Enable the workflow -- Archive items that have been Done for 14 days - ---- - -### Task 13: Configure Board Views in UI - -Fine-tune the sort, group, and filter settings for each view in the GitHub web UI. - -**Files:** None (manual UI configuration) - -- [ ] **Step 1: Configure Kanban view** - -Open: https://github.com/orgs/mlcommons/projects/57/views/1 -- Set layout to Board (should already be set) -- Column field: Status -- Group by: Priority (ShowStopper at top) -- Filter: `status:Inbox,Triage,Ready,"In Progress","In Review"` - -- [ ] **Step 2: Configure Priority Table view** - -Open the Priority Table view -- Sort by: Priority ascending (ShowStopper first) -- Show columns: Title, Priority, Area, Status, Assignee, Target Release -- Filter: exclude Done items - -- [ ] **Step 3: Configure By Assignee view** - -Open the By Assignee view -- Group by: Assignee -- Sort by: Priority ascending within each group -- Show columns: Title, Priority, Area, Status - -- [ ] **Step 4: Configure Stale Issues view** - -Open the Stale Issues view -- Sort by: Updated date ascending (oldest first) -- Show columns: Title, Priority, Area, Status, Assignee, Updated -- Filter: exclude Done, show only items not updated in 30+ days diff --git a/docs/superpowers/specs/2026-04-07-project-management-design.md b/docs/superpowers/specs/2026-04-07-project-management-design.md deleted file mode 100644 index 43e5d446..00000000 --- a/docs/superpowers/specs/2026-04-07-project-management-design.md +++ /dev/null @@ -1,605 +0,0 @@ -# Project Management Design: Labels, Board, Templates, and CONTRIBUTING.md - -**Date:** 2026-04-07 -**Author:** Zhihan Jiang (nvzhihanj) -**Status:** Draft - -## Context - -The mlcommons/endpoints repository has 57 open issues with inconsistent labeling, -no issue templates, a minimal CONTRIBUTING.md, and no active project board. The -project has 3-4 core contributors (NVIDIA) and growing community participation -(Intel, MLCommons, external). The goal is to establish project management -infrastructure that serves the **broader MLCommons community** as the primary -audience — making it easy for external contributors to self-serve, pick up issues, -and understand the project roadmap. - -### Research Basis - -This design is informed by analysis of label taxonomies and project management -practices from: Kubernetes, PyTorch, vLLM, Ray, SGLang, MLCommons/inference, -and guidance from opensource.guide, GitHub Docs, CNCF, and Linux Foundation. - -### Phased Approach - -- **Phase 1 (now):** Labels, board, templates, CONTRIBUTING.md, issue migration -- **Phase 2 (when issue volume > 100 or contributors > 10):** Size/effort labels, - stale bot automation, iteration/sprint fields, disable blank issues - ---- - -## 1. Label Taxonomy (~28 labels) - -### Design Principles - -- **Prefixed naming** (`type:`, `priority:`, `area:`, `status:`) for filterability - and visual grouping — inspired by Ray and PyTorch -- **Coarse area labels** (7) grouping related modules — start coarse, split later -- **Severity-gradient colors** for priority — hotter = more urgent -- **Single color family** per label category for visual coherence - -### Type Labels - -| Label | Color | Description | -|-------|-------|-------------| -| `type: bug` | `#d73a4a` | Something isn't working | -| `type: feature` | `#a2eeef` | New feature or capability | -| `type: enhancement` | `#bfd4f2` | Improvement to existing functionality | -| `type: performance` | `#3ddd26` | Performance regression or improvement | -| `type: documentation` | `#0075ca` | Documentation only | -| `type: question` | `#d876e3` | Usage question or clarification | -| `type: RFC` | `#76fde7` | Request for comments / design proposal | -| `type: chore` | `#ededed` | Maintenance, deps, CI, tooling | - -### Priority Labels - -| Label | Color | Description | -|-------|-------|-------------| -| `priority: ShowStopper` | `#000000` | Drop everything — critical blocker, all hands on deck | -| `priority: P0` | `#b60205` | Critical — blocks release or users | -| `priority: P1` | `#d93f0b` | High — must address this cycle | -| `priority: P2` | `#fbca04` | Medium — address within quarter | -| `priority: P3` | `#0e8a16` | Low — backlog, nice to have | - -### Area Labels - -| Label | Color | Description | -|-------|-------|-------------| -| `area: core-engine` | `#c5def5` | Load generator, scheduler, async utils | -| `area: client` | `#c5def5` | Endpoint client, HTTP, transport, ZMQ | -| `area: metrics` | `#c5def5` | Event recorder, metrics reporter, reporting | -| `area: dataset` | `#c5def5` | Dataset manager, formats, predefined datasets | -| `area: config-cli` | `#c5def5` | Config schema, CLI commands, YAML | -| `area: evaluation` | `#c5def5` | Accuracy evaluation, scoring, extractors | -| `area: adapters` | `#c5def5` | OpenAI, SGLang protocol adapters | - -### Status Labels - -| Label | Color | Description | -|-------|-------|-------------| -| `status: needs-triage` | `#e99695` | New issue, awaiting review | -| `status: needs-info` | `#f9d0c4` | Awaiting more details from reporter | -| `status: blocked` | `#b60205` | Blocked on external dependency or decision | - -### Community Labels (keep existing) - -| Label | Color | Description | -|-------|-------|-------------| -| `good first issue` | `#7057ff` | Good for newcomers | -| `help wanted` | `#008672` | Extra attention needed | - -### Other (keep existing) - -| Label | Color | Description | -|-------|-------|-------------| -| `mlcommons` | `#e0703c` | MLCommons ruleset/submission integration | -| `dependencies` | `#9083cd` | Dependency updates | -| `security` | `#b60205` | Security vulnerability or hardening | -| `duplicate` | `#cfd3d7` | Duplicate issue | -| `invalid` | `#e4e669` | Not valid | -| `wontfix` | `#ffffff` | Will not be worked on | - -### Labels to Remove - -These are replaced by the prefixed equivalents above: - -| Old Label | Replaced By | -|-----------|-------------| -| `bug` | `type: bug` | -| `feature` | `type: feature` | -| `enhancement` | `type: enhancement` | -| `documentation` | `type: documentation` | -| `performance` | `type: performance` | -| `question` | `type: question` | -| `P0` | `priority: P0` | -| `P1` | `priority: P1` | -| `P2` | `priority: P2` | -| `ShowStopper` | `priority: ShowStopper` | -| `testing` | `type: chore` (context-dependent) | -| `accuracy` | `area: evaluation` | -| `dataset` | `area: dataset` | -| `Roadmap` | `type: RFC` | -| `blocked` | `status: blocked` | -| `rules` | `mlcommons` | -| `MLCommons` | `mlcommons` (lowercase) | - ---- - -## 2. Project Board #57 Structure - -### Status Columns - -``` -Inbox → Triage → Ready → In Progress → In Review → Done -``` - -| Column | Purpose | Entry Criteria | -|--------|---------|----------------| -| **Inbox** | New issues land here automatically | Auto-added when issue opened | -| **Triage** | Being evaluated for priority/area/assignee | Someone picked it up to review | -| **Ready** | Triaged, prioritized, ready to work on | Has priority + area labels | -| **In Progress** | Actively being worked on | Assigned, PR may be in flight | -| **In Review** | PR submitted, awaiting review | Linked PR exists | -| **Done** | Merged/resolved/closed | Auto-set when issue closed | - -### Custom Fields - -| Field | Type | Values | -|-------|------|--------| -| Priority | Single select | ShowStopper, P0, P1, P2, P3 | -| Area | Single select | core-engine, client, metrics, dataset, config-cli, evaluation, adapters, mlcommons | -| Target Release | Single select | v0.5.0, v1.0.0 (add as needed) | - -### Views (4) - -**1. Kanban (default)** -- Layout: Board -- Columns: Status field -- Group by: Priority (ShowStopper at top → P3 at bottom) -- Filter: status ≠ Done - -**2. Priority Table** -- Layout: Table -- Sort: Priority ascending (ShowStopper first), then updated date descending -- Columns: Title, Priority, Area, Status, Assignee, Target Release -- Filter: status ≠ Done - -**3. By Assignee** -- Layout: Table -- Group by: Assignee -- Sort: Priority ascending within each group -- Columns: Title, Priority, Area, Status -- Filter: status ≠ Done - -**4. Stale Issues** -- Layout: Table -- Sort: Updated date ascending (oldest first) -- Columns: Title, Priority, Area, Status, Assignee, Last Updated -- Filter: status ≠ Done AND last updated more than 30 days ago - -### Automations - -| Trigger | Action | -|---------|--------| -| Issue added to project | Set status → Inbox | -| Issue closed | Set status → Done | -| PR merged closing issue | Set status → Done | -| Item in Done 14+ days | Auto-archive | - ---- - -## 3. Issue Templates - -### Files - -- `.github/ISSUE_TEMPLATE/100-bug-report.yml` — Bug Report -- `.github/ISSUE_TEMPLATE/200-feature-request.yml` — Feature Request -- `.github/ISSUE_TEMPLATE/300-performance.yml` — Performance Issue -- `.github/ISSUE_TEMPLATE/400-dataset-integration.yml` — Dataset Integration -- `.github/ISSUE_TEMPLATE/config.yml` — Template chooser config - -### 100-bug-report.yml - -```yaml -name: Bug Report -description: Report a bug or unexpected behavior -title: "[Bug]: " -labels: ["type: bug", "status: needs-triage"] -body: - - type: textarea - id: description - attributes: - label: Bug Description - description: What happened vs. what you expected - placeholder: "When I run X, I expected Y but got Z" - validations: - required: true - - type: textarea - id: reproduction - attributes: - label: Steps to Reproduce - value: | - 1. - 2. - 3. - validations: - required: true - - type: textarea - id: environment - attributes: - label: Environment - description: OS, Python version, package version - placeholder: "OS: Ubuntu 22.04, Python 3.12, inference-endpoint v0.1.0" - validations: - required: true - - type: textarea - id: logs - attributes: - label: Relevant Logs - render: shell - - type: checkboxes - id: checklist - attributes: - label: Before submitting - options: - - label: I searched existing issues and found no duplicates - required: true -``` - -### 200-feature-request.yml - -```yaml -name: Feature Request -description: Suggest a new feature or enhancement -title: "[Feature]: " -labels: ["type: feature", "status: needs-triage"] -body: - - type: textarea - id: motivation - attributes: - label: Motivation - description: What problem does this solve? Why do you need it? - validations: - required: true - - type: textarea - id: proposal - attributes: - label: Proposed Solution - description: How should this work? Include API sketches if relevant. - validations: - required: true - - type: textarea - id: alternatives - attributes: - label: Alternatives Considered - - type: textarea - id: context - attributes: - label: Additional Context -``` - -### 300-performance.yml - -```yaml -name: Performance Issue -description: Report a performance regression or improvement opportunity -title: "[Perf]: " -labels: ["type: performance", "status: needs-triage"] -body: - - type: textarea - id: description - attributes: - label: Description - description: What performance issue did you observe? - placeholder: "QPS dropped from X to Y after upgrading to version Z" - validations: - required: true - - type: textarea - id: benchmark - attributes: - label: Benchmark Command - description: The exact command you ran - render: shell - validations: - required: true - - type: textarea - id: results - attributes: - label: Results - description: Expected vs actual numbers (QPS, latency, TTFT, TPOT, etc.) - placeholder: | - Expected: ~5000 QPS, p99 latency < 200ms - Actual: ~2000 QPS, p99 latency 800ms - validations: - required: true - - type: textarea - id: environment - attributes: - label: Environment - description: Hardware, OS, Python version, endpoint server details - placeholder: | - Hardware: 8x A100 80GB - OS: Ubuntu 22.04 - Python: 3.12 - Server: vLLM 0.6.0, Llama-3-70B - Workers: 4 - validations: - required: true - - type: textarea - id: profiling - attributes: - label: Profiling Data (optional) - description: Any profiling output, flame graphs, or bottleneck analysis - render: shell - - type: checkboxes - id: checklist - attributes: - label: Before submitting - options: - - label: I searched existing issues and found no duplicates - required: true - - label: I ran with default settings before tuning - required: false -``` - -### 400-dataset-integration.yml - -```yaml -name: Dataset Integration -description: Request support for a new dataset or evaluation benchmark -title: "[Dataset]: " -labels: ["type: feature", "area: dataset", "status: needs-triage"] -body: - - type: textarea - id: dataset - attributes: - label: Dataset Information - description: Name, URL, and brief description - placeholder: | - Name: MATH-500 - URL: https://huggingface.co/datasets/... - Description: 500 competition math problems for testing reasoning - validations: - required: true - - type: dropdown - id: format - attributes: - label: Dataset Format - options: - - JSONL - - HuggingFace Dataset - - CSV - - JSON - - Parquet - - Other - validations: - required: true - - type: textarea - id: evaluation - attributes: - label: Evaluation Method - description: How should responses be scored? - placeholder: "Exact match after extracting boxed answer, or pass@1 for code" - validations: - required: true - - type: textarea - id: samples - attributes: - label: Scale - description: Number of samples, expected prompt/response lengths - placeholder: "500 samples, avg prompt ~200 tokens, avg response ~500 tokens" - - type: textarea - id: context - attributes: - label: Additional Context - description: Related benchmarks, papers, or prior art -``` - -### config.yml - -```yaml -blank_issues_enabled: true -contact_links: - - name: Questions & Discussion - url: https://github.com/mlcommons/endpoints/discussions - about: Ask questions and discuss ideas before filing an issue -``` - ---- - -## 4. CONTRIBUTING.md - -Replace the existing minimal CONTRIBUTING.md with an expanded version (~250 lines) -covering: - -1. **Ways to Contribute** — links to all 4 issue templates, plus docs, PR reviews, - `good first issue` and `help wanted` labels -2. **Development Setup** — prerequisites, fork/clone, venv, `pip install -e ".[dev,test]"`, - pre-commit install, local echo server testing -3. **Code Style and Conventions** — ruff, mypy, line length 88, double quotes, - conventional commits, license headers, serialization conventions - (msgspec vs pydantic), performance-sensitive code guidelines -4. **Testing** — pytest commands, markers (`unit`, `integration`, `slow`, - `performance`), `@pytest.mark.asyncio(mode="strict")`, >90% coverage target, - use real fixtures over mocks -5. **Submitting Changes** — branch naming (`feat/`, `fix/`, `docs/`), PR template, - CI checks, review expectations (2-3 business days), review criteria -6. **Issue Guidelines** — search first, use templates, issue lifecycle - (Inbox → Triage → Ready → In Progress → In Review → Done), priority levels table -7. **MLCommons CLA** — existing CLA requirements preserved - ---- - -## 5. Issue Migration Plan - -### Duplicate Resolution - -Close duplicates with a comment explaining the closure and linking to the primary -issue. Copy any unique context from the duplicate into a comment on the primary -issue so no information is lost. - -| Close | Primary | Reason | -|-------|---------|--------| -| #205 "fully async benchmark" | #255 "Make Loadgen Async" | Same goal, #255 is cleaner | -| #170 "warmup with random dataset" | #86 "Warmup runs" | Subset of #86 | -| #226 "Initial multi-turn enabling" | #232 "multi-turn implementation" | Same feature | -| #29 "submission checker for 6.0" | #79 "submission checker compat mode" | #29 is version-specific, superseded | -| #207 "speedup tokenizer report" | #208 "optimize report generation" | #207 is a specific approach to #208 | -| #83 "Q1 Roadmap" | #223 "Phase 2 Roadmap" | Superseded | - -**Evaluation:** #73 "random dataset support" — keep if random dataset has value -beyond warmup use case; otherwise close as duplicate of #86. - -### Label Reassignment - -All 57 open issues are reassigned from old labels to the new prefixed taxonomy. -Full mapping follows, organized by priority tier. - -#### ShowStopper - -| # | Title | Labels | -|---|-------|--------| -| 84 | Pareto clarification | `priority: ShowStopper`, `area: config-cli`, `mlcommons` | -| 8 | Parity with MLPerf LoadGen | `priority: ShowStopper`, `type: performance`, `area: core-engine` | -| 4 | Accuracy evaluation for LLMs | `priority: ShowStopper`, `type: feature`, `area: evaluation` | - -#### P0 - -| # | Title | Labels | -|---|-------|--------| -| 86 | Warmup runs | `priority: P0`, `type: feature`, `area: core-engine` | -| 232 | Multi-turn implementation | `priority: P0`, `type: feature`, `area: dataset` | -| 183 | Pub/Sub event recorder | `priority: P0`, `type: feature`, `area: metrics` | -| 138 | CI stress test upper bound | `priority: P0`, `type: chore`, `area: core-engine` | -| 6 | Final report structure | `priority: P0`, `type: feature`, `area: metrics` | -| 5 | Submission ruleset + config | `priority: P0`, `type: feature`, `area: config-cli`, `mlcommons` | - -#### P1 - -| # | Title | Labels | -|---|-------|--------| -| 9 | Roofline analysis | `priority: P1`, `type: performance`, `area: core-engine` | -| 255 | Make Loadgen Async | `priority: P1`, `type: feature`, `area: core-engine` | -| 269 | Low concurrency timeouts | `priority: P1`, `type: bug`, `area: client` | -| 237 | CLI fix --load-pattern + --target-qps | `priority: P1`, `type: bug`, `area: config-cli` | -| 219 | target_qps hardcoded in Offline | `priority: P1`, `type: bug`, `area: config-cli` | -| 221 | RuntimeSettings non-reproducible | `priority: P1`, `type: bug`, `area: config-cli` | -| 202 | max_throughput connection timeouts | `priority: P1`, `type: bug`, `area: client` | -| 199 | Perf discrepancy submission vs perf config | `priority: P1`, `type: bug`, `area: config-cli` | -| 222 | KVStore/ServiceLauncher lack tests | `priority: P1`, `type: chore`, `area: core-engine` | -| 220 | SGLang adapter tests skipped | `priority: P1`, `type: chore`, `area: adapters` | -| 182 | Text vs token perf on TRTLLM | `priority: P1`, `type: performance`, `area: metrics` | -| 177 | MATH500 dataset | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` | -| 176 | MMLU/MMLU-Pro | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` | -| 113 | DeepSeek | `priority: P1`, `type: feature` | -| 210 | Wan2.2-T2V support | `priority: P1`, `type: feature` | -| 268 | Phase 2 model selection | `priority: P1`, `type: feature` | -| 10 | System bottleneck tests | `priority: P1`, `type: performance`, `area: core-engine` | -| 7 | Runtime visualization | `priority: P1`, `type: feature`, `area: metrics` | - -#### P2 - -| # | Title | Labels | -|---|-------|--------| -| 254 | Handling failed requests | `priority: P2`, `type: feature`, `area: client` | -| 217 | BURST and STEP load patterns | `priority: P2`, `type: feature`, `area: core-engine` | -| 179 | Humanity's Last Exam | `priority: P2`, `type: feature`, `area: evaluation`, `area: dataset` | -| 178 | Healthbench integration | `priority: P2`, `type: feature`, `area: evaluation`, `area: dataset` | -| 173 | Investigate mlcr failures | `priority: P2`, `type: bug`, `mlcommons` | -| 224 | Multiple perf configs | `priority: P2`, `type: feature`, `area: config-cli` | -| 208 | Optimize report generation | `priority: P2`, `type: performance`, `area: metrics` | -| 158 | SGLang adapter + OpenAI compat | `priority: P2`, `type: feature`, `area: adapters` | -| 125 | Multi-concurrency scans | `priority: P2`, `type: feature`, `area: core-engine` | -| 115 | Clarify default metric | `priority: P2`, `type: enhancement`, `area: config-cli` | -| 79 | Submission checker compat mode | `priority: P2`, `type: feature`, `mlcommons` | -| 73 | Random dataset support | `priority: P2`, `type: feature`, `area: dataset` | -| 68 | Official model name mapping | `priority: P2`, `type: feature`, `area: config-cli`, `mlcommons` | -| 58 | Config-template mapping | `priority: P2`, `type: feature`, `area: config-cli`, `mlcommons` | -| 213 | PostGres dup element | `priority: P2`, `type: bug`, `mlcommons` | -| 133 | llama.cpp incompatibility | `priority: P2`, `type: bug`, `area: client` | -| 174 | Better error logging mlcr | `priority: P2`, `type: enhancement`, `mlcommons` | -| 229 | Endpoints test environment | `priority: P2`, `type: chore` | -| 228 | Endpoints Vision document | `priority: P2`, `type: documentation` | -| 227 | DB and Object Store elements | `priority: P2`, `type: feature` | -| 212 | UBI Storage layer | `priority: P2`, `type: feature` | - -#### P3 - -| # | Title | Labels | -|---|-------|--------| -| 99 | Local mode errors | `priority: P3`, `type: bug`, `good first issue` | -| 50 | LlaMa3-405b support | `priority: P3`, `type: feature` | -| 204 | Documentation cleanup | `priority: P3`, `type: documentation` | -| 190 | Skills, design docs, tooling | `priority: P3`, `type: chore` | -| 181 | Sweep qwen scripts | `priority: P3`, `type: feature` | - -#### Other (no priority) - -| # | Title | Labels | -|---|-------|--------| -| 223 | Phase 2 Roadmap | `type: RFC` | -| 267 | Bump transformers | `type: chore`, `dependencies`, `security` | - -### Q2 Board Population - -**Add to board #57 (~40 issues):** All ShowStopper, P0, P1, and P2 issues. -Initial status: **Triage** (existing issues need priority confirmation from team). - -**Not on Q2 board (~5 issues):** P3 issues (#99, #50, #204, #190, #181) and -dependabot (#267). - -### Milestones - -Create milestones as releases are planned: -- `v0.5.0` — first milestone, assign issues as release scope is defined -- `v1.0.0` — future - ---- - -## 6. Phase 2 (Future) - -Trigger when issue volume > 100 or contributors > 10: - -- Add `size: S`, `size: M`, `size: L`, `size: XL` effort labels -- Disable blank issues in `config.yml` -- Add stale bot (apply `status: stale` after 90 days, close after 30 more) -- Add iteration/sprint fields to board if team adopts time-boxed cycles -- Split coarse area labels if any accumulates > 20 issues - ---- - -## 7. Migration Procedure - -Order of operations for the migration: - -1. **Create new labels** — all `type:`, `priority:`, `area:`, `status:` labels -2. **Relabel existing issues** — apply new labels per the mapping above -3. **Remove old labels from issues** — strip legacy labels -4. **Close duplicates** — comment with explanation + link to primary, copy unique - context to primary issue -5. **Delete old labels** — remove legacy labels from the repository -6. **Add issues to board #57** — all ShowStopper through P2 -7. **Set board status** — all migrated issues start in Triage -8. **Configure board automations** — auto-add, auto-done, auto-archive -9. **Create issue templates** — add all 4 YAML templates + config.yml -10. **Update CONTRIBUTING.md** — replace with expanded version -11. **Link open PRs to issues** — add "Relates to #N" comments where applicable -12. **Commit and push** — templates + CONTRIBUTING.md in a single PR - -### Open PR → Issue Linkages - -| PR | Linked Issue | Relationship | -|----|-------------|--------------| -| #255 Make Loadgen Async | #255 (same) | PR is the issue | -| #237 CLI fix --load-pattern + --target-qps | #237 (same) | PR is the issue | -| #226 Initial multi-turn enabling | #232 multi-turn implementation | PR implements #232; #226 issue closed as dup | -| #207 Speedup tokenizer report | #208 optimize report generation | PR implements #208; #207 issue closed as dup | -| #205 Fully async benchmark | #255 Make Loadgen Async | Duplicate PR; #205 issue closed as dup | -| #204 Documentation cleanup | #204 (same) | PR is the issue | -| #190 Skills, design docs, tooling | #190 (same) | PR is the issue | -| #181 Sweep qwen scripts | #181 (same) | PR is the issue | -| #170 Warmup with random dataset | #86 Warmup runs | PR implements #86; #170 issue closed as dup | -| #158 SGLang adapter + OpenAI compat | #158 (same) | PR is the issue | -| #125 Multi-concurrency scans | #125 (same) | PR is the issue | -| #79 Submission checker compat | #79 (same) + #29 (superseded) | PR is the issue | -| #267 Bump transformers | #267 (dependabot) | PR is the issue | From b1ab1c7ba1abeb976ea129480fdd09c96d4e409d Mon Sep 17 00:00:00 2001 From: Zhihan Jiang Date: Wed, 8 Apr 2026 09:59:56 -0700 Subject: [PATCH 08/14] style: apply prettier formatting to README and CONTRIBUTING Co-Authored-By: Claude Opus 4.6 (1M context) --- CONTRIBUTING.md | 15 ++++++++------- README.md | 32 ++++++++++++++++---------------- 2 files changed, 24 insertions(+), 23 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 8b264dcc..bd346de2 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -82,7 +82,7 @@ pre-commit run --all-files - **Quotes:** Double quotes - **License headers:** Required on all Python files (auto-added by pre-commit) - **Commit messages:** [Conventional commits](https://www.conventionalcommits.org/) — `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `perf:` -- **Comments:** Only where the *why* isn't obvious from the code. No over-documenting. +- **Comments:** Only where the _why_ isn't obvious from the code. No over-documenting. ### Serialization @@ -185,13 +185,13 @@ and flow through: **Inbox → Triage → Ready → In Progress → In Review → ### Priority Levels -| Priority | Meaning | -|----------|---------| +| Priority | Meaning | +| --------------- | ---------------------------------- | | **ShowStopper** | Drop everything — critical blocker | -| **P0** | Blocks release or users | -| **P1** | Must address this cycle | -| **P2** | Address within quarter | -| **P3** | Backlog, nice to have | +| **P0** | Blocks release or users | +| **P1** | Must address this cycle | +| **P2** | Address within quarter | +| **P3** | Backlog, nice to have | ## MLCommons CLA @@ -200,6 +200,7 @@ All contributors must sign the A CLA bot will check your PR automatically. To sign up: + 1. Visit the [MLCommons Subscription form](https://mlcommons.org/membership/membership-overview/) 2. Submit your GitHub username 3. The CLA bot will verify on your next PR diff --git a/README.md b/README.md index 2a1a178f..b81cf8ba 100644 --- a/README.md +++ b/README.md @@ -60,13 +60,13 @@ Dataset Manager ──> Load Generator ──> Endpoint Client ──> External Metrics Collector (EventRecorder + MetricsReporter) ``` -| Component | Purpose | -|-----------|---------| -| **Load Generator** | Central orchestrator: `BenchmarkSession` owns lifecycle, `Scheduler` controls timing | -| **Endpoint Client** | Multi-process HTTP workers communicating via ZMQ IPC | -| **Dataset Manager** | Loads JSONL, HuggingFace, CSV, JSON, Parquet datasets | -| **Metrics** | SQLite-backed event recording, aggregation (QPS, latency, TTFT, TPOT) | -| **Config** | Pydantic-based YAML schema, CLI auto-generated via cyclopts | +| Component | Purpose | +| ------------------- | ------------------------------------------------------------------------------------ | +| **Load Generator** | Central orchestrator: `BenchmarkSession` owns lifecycle, `Scheduler` controls timing | +| **Endpoint Client** | Multi-process HTTP workers communicating via ZMQ IPC | +| **Dataset Manager** | Loads JSONL, HuggingFace, CSV, JSON, Parquet datasets | +| **Metrics** | SQLite-backed event recording, aggregation (QPS, latency, TTFT, TPOT) | +| **Config** | Pydantic-based YAML schema, CLI auto-generated via cyclopts | ### Benchmark Modes @@ -94,15 +94,15 @@ Run accuracy evaluation with Pass@1 scoring using pre-defined benchmarks: ## Documentation -| Guide | Description | -|-------|-------------| -| [CLI Quick Reference](docs/CLI_QUICK_REFERENCE.md) | Command-line interface guide | -| [CLI Design](docs/CLI_DESIGN.md) | CLI architecture and design decisions | -| [Local Testing](docs/LOCAL_TESTING.md) | Test with the echo server | -| [Client Performance Tuning](docs/CLIENT_PERFORMANCE_TUNING.md) | Endpoint client optimization | -| [Performance Architecture](docs/PERF_ARCHITECTURE.md) | Performance architecture deep dive | -| [Development Guide](docs/DEVELOPMENT.md) | Development setup and workflow | -| [CONTRIBUTING.md](CONTRIBUTING.md) | How to contribute | +| Guide | Description | +| -------------------------------------------------------------- | ------------------------------------- | +| [CLI Quick Reference](docs/CLI_QUICK_REFERENCE.md) | Command-line interface guide | +| [CLI Design](docs/CLI_DESIGN.md) | CLI architecture and design decisions | +| [Local Testing](docs/LOCAL_TESTING.md) | Test with the echo server | +| [Client Performance Tuning](docs/CLIENT_PERFORMANCE_TUNING.md) | Endpoint client optimization | +| [Performance Architecture](docs/PERF_ARCHITECTURE.md) | Performance architecture deep dive | +| [Development Guide](docs/DEVELOPMENT.md) | Development setup and workflow | +| [CONTRIBUTING.md](CONTRIBUTING.md) | How to contribute | ## Contributing From b5961aace99c747cb23b308fd0f57876a69ce771 Mon Sep 17 00:00:00 2001 From: Zhihan Jiang Date: Wed, 8 Apr 2026 10:01:00 -0700 Subject: [PATCH 09/14] fix: remove invalid mode='strict' from @pytest.mark.asyncio examples MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Strict asyncio mode is configured globally in pyproject.toml via asyncio_mode = "strict". The marker does not accept a mode argument — passing it causes errors in recent pytest-asyncio versions. Fixed in: CONTRIBUTING.md, AGENTS.md, docs/DEVELOPMENT.md Co-Authored-By: Claude Opus 4.6 (1M context) --- AGENTS.md | 4 +- CONTRIBUTING.md | 2 +- docs/DEVELOPMENT.md | 164 +++++++++++++++++++++----------------------- 3 files changed, 82 insertions(+), 88 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 52a3dbb5..eb0349ca 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -240,7 +240,7 @@ See [Development Guide](docs/DEVELOPMENT.md) for full setup and workflow details @pytest.mark.run_explicitly # Only run when explicitly selected ``` -**Async tests**: Use `@pytest.mark.asyncio(mode="strict")` — the project uses strict asyncio mode. +**Async tests**: Use `@pytest.mark.asyncio` — strict mode is configured globally in `pyproject.toml` (`asyncio_mode = "strict"`). Do NOT pass `mode="strict"` to the marker — it's not a valid argument. **Key fixtures** (defined in `tests/conftest.py`): @@ -342,7 +342,7 @@ Known failure modes when AI tools generate code for this project. Reference thes - **Generating mock-heavy tests for integration scenarios**: This project has real echo/oracle server fixtures. AI tends to mock HTTP calls even when `mock_http_echo_server` or `mock_http_oracle_server` fixtures exist and should be used. - **Missing test markers**: Every test function needs `@pytest.mark.unit`, `@pytest.mark.integration`, or another marker. AI-generated tests almost always omit markers, which breaks CI filtering. -- **Wrong asyncio mode**: Tests must use `@pytest.mark.asyncio(mode="strict")` — AI often writes bare `@pytest.mark.asyncio` or forgets it entirely, causing silent test skips or failures. +- **Wrong asyncio marker**: Tests must use bare `@pytest.mark.asyncio` — strict mode is configured globally in `pyproject.toml`. Do NOT pass `mode="strict"` to the marker (it's not a valid argument and will cause errors). AI sometimes hallucinates this parameter. - **Fabricating fixture names**: AI may invent fixtures that don't exist in `conftest.py`. Always check that referenced fixtures actually exist before using them. ### Code Style & Repo Conventions diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index bd346de2..0cb0c164 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -127,7 +127,7 @@ Every test function **must** have a marker: ```python @pytest.mark.unit -@pytest.mark.asyncio(mode="strict") # for async tests — must use strict mode +@pytest.mark.asyncio # strict mode is configured globally in pyproject.toml async def test_something(): ... ``` diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md index af32da1d..4c95246f 100644 --- a/docs/DEVELOPMENT.md +++ b/docs/DEVELOPMENT.md @@ -2,7 +2,7 @@ This guide provides everything you need to contribute to the MLPerf Inference Endpoint Benchmarking System. -## Getting Started +## 🚀 Getting Started ### Prerequisites @@ -14,48 +14,40 @@ This guide provides everything you need to contribute to the MLPerf Inference En ### Development Environment Setup ```bash -# 1. Fork https://github.com/mlcommons/endpoints on GitHub, then clone your fork -git clone https://github.com/YOUR_USERNAME/endpoints.git -cd endpoints +# 1. Clone the repository +git clone https://github.com/mlperf/inference-endpoint.git +cd inference-endpoint -# 2. Add the upstream repo as a remote -git remote add upstream https://github.com/mlcommons/endpoints.git - -# 3. Create virtual environment (Python 3.12+ required) +# 2. Create virtual environment (Python 3.12+ required) python3.12 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate -# 4. Install development dependencies +# 3. Install development dependencies pip install -e ".[dev,test]" -# 5. Install pre-commit hooks +# 4. Install pre-commit hooks pre-commit install -# 6. Verify installation +# 5. Verify installation inference-endpoint --version pytest --version ``` -## Project Structure +## 🏗️ Project Structure ``` -endpoints/ +inference-endpoint/ ├── src/inference_endpoint/ # Main package source -│ ├── main.py # Entry point and CLI app -│ ├── exceptions.py # Project-wide exception types -│ ├── async_utils/ # Event loop, ZMQ transport, pub/sub +│ ├── cli.py # Command-line interface │ ├── commands/ # CLI command implementations │ ├── config/ # Configuration and schema management │ ├── core/ # Core types and orchestration │ ├── dataset_manager/ # Dataset handling and loading │ ├── endpoint_client/ # HTTP/ZMQ endpoint communication -│ ├── evaluation/ # Accuracy evaluation and scoring │ ├── load_generator/ # Load generation and scheduling │ ├── metrics/ # Performance measurement and reporting │ ├── openai/ # OpenAI API compatibility -│ ├── plugins/ # Plugin system │ ├── profiling/ # Performance profiling tools -│ ├── sglang/ # SGLang API adapter │ ├── testing/ # Test utilities (echo server, etc.) │ └── utils/ # Common utilities ├── tests/ # Test suite @@ -68,7 +60,7 @@ endpoints/ └── scripts/ # Utility scripts ``` -## Testing +## 🧪 Testing ### Running Tests @@ -111,36 +103,24 @@ import pytest from inference_endpoint.core.types import Query class TestQuery: - @pytest.mark.unit def test_query_creation(self): """Test creating a basic query.""" - query = Query(data={"prompt": "Test", "model": "test-model"}) - assert query.data["prompt"] == "Test" - assert query.data["model"] == "test-model" + query = Query(prompt="Test", model="test-model") + assert query.prompt == "Test" + assert query.model == "test-model" - @pytest.mark.unit - @pytest.mark.asyncio(mode="strict") + @pytest.mark.asyncio async def test_async_operation(self): """Test async operations.""" # Your async test here pass ``` -## Code Quality +## 📝 Code Quality ### Pre-commit Hooks -The project uses pre-commit hooks to ensure code quality. - -Hooks that run automatically on commit: - -- trailing-whitespace, end-of-file-fixer, check-yaml, check-merge-conflict, debug-statements -- `ruff` (lint + autofix) and `ruff-format` -- `mypy` type checking -- `prettier` for YAML/JSON/Markdown -- License header enforcement (Apache 2.0 SPDX header required on all Python files, added by `scripts/add_license_header.py`) - -**Always run `pre-commit run --all-files` before committing.** +The project uses pre-commit hooks to ensure code quality: ```bash # Install hooks (done during setup) @@ -151,12 +131,13 @@ pre-commit run # Run all hooks on all files pre-commit run --all-files + +# Skip hooks (use sparingly) +git commit --no-verify ``` ### Code Formatting -Configuration: `ruff` (line-length 88, target Python 3.12), `ruff-format` (double quotes, space indent). - ```bash # Format code with ruff ruff format src/ tests/ @@ -178,17 +159,12 @@ mypy src/ pre-commit run --all-files ``` -## Development Workflow +## 🔧 Development Workflow ### 1. Feature Development ```bash -# Sync your fork with upstream before starting -git fetch upstream -git checkout main -git merge upstream/main - -# Create a feature branch on your fork +# Create feature branch git checkout -b feature/your-feature-name # Make changes and test @@ -199,7 +175,7 @@ pre-commit run --all-files git add . git commit -m "feat: add your feature description" -# Push to your fork and open a PR against mlcommons/endpoints +# Push and create PR git push origin feature/your-feature-name ``` @@ -221,15 +197,42 @@ When developing a new component: - **Performance Tests**: Ensure no performance regressions - **Documentation**: Update docs for new features -## Documentation +## 📚 Documentation ### Writing Documentation -- **Code Comments**: Add comments only where the _why_ is not obvious from the code; avoid restating what the code does +- **Code Comments**: Use docstrings for all public APIs - **README Updates**: Update README.md for user-facing changes +- **API Documentation**: Document new interfaces and changes - **Examples**: Provide usage examples for new features -## Performance Considerations +### Documentation Standards + +```python +def process_query(query: Query) -> QueryResult: + """ + Process a query and return the result. + + Args: + query: The query to process + + Returns: + QueryResult containing the processed response + + Raises: + QueryError: If the query cannot be processed + + Example: + >>> query = Query(prompt="Hello") + >>> result = process_query(query) + >>> print(result.content) + 'Hello there!' + """ + # Implementation here + pass +``` + +## 🚀 Performance Considerations ### Development Guidelines @@ -251,7 +254,7 @@ pytest --benchmark-only pytest --benchmark-compare ``` -## Debugging +## 🔍 Debugging ### Common Issues @@ -273,22 +276,7 @@ pytest -s -v python -m pdb -m pytest test_file.py ``` -## YAML Config Templates - -Config templates in `src/inference_endpoint/config/templates/` are auto-generated from schema defaults. When you change `config/schema.py`, regenerate them: - -```bash -python scripts/regenerate_templates.py -``` - -The pre-commit hook auto-regenerates templates when `schema.py`, `config.py`, or `regenerate_templates.py` change. CI validates templates are up to date via `--check` mode. - -Two variants are generated per mode (offline, online, concurrency): - -- `_template.yaml` — minimal: only required fields + placeholders -- `_template_full.yaml` — all fields with schema defaults + inline `# options:` comments - -## Package Management +## 📦 Package Management ### Adding Dependencies @@ -303,7 +291,7 @@ Install after updating: pip install -e ".[dev,test]" ``` -## Troubleshooting +## 🚨 Troubleshooting ### Common Problems @@ -338,20 +326,17 @@ python -c "import sys; print(sys.path)" export PYTHONPATH="${PYTHONPATH}:$(pwd)/src" ``` -## Contributing Guidelines +## 🤝 Contributing Guidelines ### Pull Request Process -1. **Fork** `mlcommons/endpoints` on GitHub -2. **Clone your fork** and add `upstream` as a remote (see [Development Environment Setup](#development-environment-setup)) -3. **Sync with upstream** (`git fetch upstream && git merge upstream/main`) before starting work -4. **Create a feature branch** on your fork (`git checkout -b feature/your-feature-name`) -5. **Make your changes** following the coding standards -6. **Add tests** for new functionality -7. **Update documentation** as needed -8. **Run all checks** locally: `pytest` and `pre-commit run --all-files` -9. **Push to your fork** and open a PR against `mlcommons/endpoints:main` -10. **Address review comments** promptly +1. **Fork the repository** and create a feature branch +2. **Make your changes** following the coding standards +3. **Add tests** for new functionality +4. **Update documentation** as needed +5. **Run all checks** locally before submitting +6. **Create a PR** with clear description and tests +7. **Address review comments** promptly ### Commit Message Format @@ -366,8 +351,6 @@ docs(readme): update installation instructions test(loadgen): add performance benchmarks ``` -Allowed types: `feat`, `fix`, `docs`, `test`, `chore`, `refactor`, `perf`, `ci`. - ### Code Review Checklist - [ ] Code follows style guidelines @@ -377,9 +360,20 @@ Allowed types: `feat`, `fix`, `docs`, `test`, `chore`, `refactor`, `perf`, `ci`. - [ ] Security implications are reviewed - [ ] Error handling is appropriate -## Getting Help +## 📞 Getting Help -- **Issues**: [GitHub Issues](https://github.com/mlcommons/endpoints/issues) -- **Discussions**: [GitHub Discussions](https://github.com/mlcommons/endpoints/discussions) +- **Issues**: [GitHub Issues](https://github.com/mlperf/inference-endpoint/issues) +- **Discussions**: [GitHub Discussions](https://github.com/mlperf/inference-endpoint/discussions) - **Documentation**: Check this guide and project docs - **Team**: Reach out to the development team + +## 🎯 Next Steps + +1. **Set up your environment** using this guide +2. **Explore the codebase** to understand the architecture +3. **Pick a component** to work on from the project board +4. **Start with tests** to understand the expected behavior +5. **Implement incrementally** with regular testing +6. **Ask questions** when you need help + +Happy coding! 🚀 From cc3af95b2fa9518830686f5e4ab45f713377008d Mon Sep 17 00:00:00 2001 From: Zhihan Jiang Date: Wed, 8 Apr 2026 10:06:31 -0700 Subject: [PATCH 10/14] docs: remove CLA line from README Contributing section MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CLA details are already in CONTRIBUTING.md — no need to duplicate in README. Co-Authored-By: Claude Opus 4.6 (1M context) --- README.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/README.md b/README.md index b81cf8ba..a14ed18b 100644 --- a/README.md +++ b/README.md @@ -115,8 +115,6 @@ We welcome contributions from the community. See [CONTRIBUTING.md](CONTRIBUTING. Issues are tracked on our [project board](https://github.com/orgs/mlcommons/projects/57). Look for [`good first issue`](https://github.com/mlcommons/endpoints/labels/good%20first%20issue) or [`help wanted`](https://github.com/mlcommons/endpoints/labels/help%20wanted) to get started. -All contributors must sign the [MLCommons CLA](https://mlcommons.org/membership/membership-overview/). - ## Acknowledgements This project draws inspiration from: From 22c646ec70903f01b300b995334b99b3d4fb689b Mon Sep 17 00:00:00 2001 From: Zhihan Jiang Date: Wed, 8 Apr 2026 10:15:43 -0700 Subject: [PATCH 11/14] docs: strengthen pre-commit requirement in AGENTS.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Make it explicit that pre-commit must run before every commit, no exceptions. Hooks may modify files — stage changes and commit once. Co-Authored-By: Claude Opus 4.6 (1M context) --- AGENTS.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index eb0349ca..6fec5395 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -21,7 +21,7 @@ pytest -m integration # Integration tests only pytest --cov=src --cov-report=html # With coverage pytest -xvs tests/unit/path/to/test_file.py # Single test file -# Code quality (run before commits) +# Code quality — MUST run before every commit, no exceptions pre-commit run --all-files # Local testing with echo server @@ -215,7 +215,7 @@ All of these run automatically on commit: - License header enforcement - `regenerate-templates`: auto-regenerates YAML config templates from schema defaults when `schema.py`, `config.py`, or `regenerate_templates.py` change -**Always run `pre-commit run --all-files` before committing.** +**IMPORTANT: Always run `pre-commit run --all-files` before every commit.** Hooks may modify files (prettier, ruff-format, license headers). If files are modified, stage the changes and commit once. Never commit without running pre-commit first. See [Development Guide](docs/DEVELOPMENT.md) for full setup and workflow details. From d75f3c60c1cac390b9e2001d9128570f5d8665a8 Mon Sep 17 00:00:00 2001 From: Zhihan Jiang Date: Mon, 13 Apr 2026 15:34:46 -0700 Subject: [PATCH 12/14] fix: remove Discussions references (feature not enabled) Remove Discussions link from issue template config.yml and CONTRIBUTING.md since GitHub Discussions is not enabled on this repo. Co-Authored-By: Claude Opus 4.6 (1M context) --- .github/ISSUE_TEMPLATE/config.yml | 4 ---- CONTRIBUTING.md | 3 +-- 2 files changed, 1 insertion(+), 6 deletions(-) diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml index 4ac37a65..0086358d 100644 --- a/.github/ISSUE_TEMPLATE/config.yml +++ b/.github/ISSUE_TEMPLATE/config.yml @@ -1,5 +1 @@ blank_issues_enabled: true -contact_links: - - name: Questions & Discussion - url: https://github.com/mlcommons/endpoints/discussions - about: Ask questions and discuss ideas before filing an issue diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 0cb0c164..db06a18c 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -210,5 +210,4 @@ during the PR process. ## Questions? -Open a [Discussion](https://github.com/mlcommons/endpoints/discussions) or -file an issue. We aim to respond within a few business days. +File an [issue](https://github.com/mlcommons/endpoints/issues). We aim to respond within a few business days. From 1975e9ae4b5f34cf402d2a85234343a112815895 Mon Sep 17 00:00:00 2001 From: Zhihan Jiang Date: Mon, 13 Apr 2026 15:42:04 -0700 Subject: [PATCH 13/14] =?UTF-8?q?docs:=20overhaul=20DEVELOPMENT.md=20?= =?UTF-8?q?=E2=80=94=20fix=20stale=20URLs,=20add=20fork=20workflow,=20alig?= =?UTF-8?q?n=20with=20AGENTS.md?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Fix repo URL: mlperf/inference-endpoint → mlcommons/endpoints - Add proper fork workflow (fork → clone → add upstream → branch → PR) - Update project structure to match current codebase (add evaluation, sglang, plugins, async_utils; fix entry point main.py not cli.py) - Remove emoji headers for consistency - Fix test example: add required markers, correct asyncio usage - Remove "skip hooks" advice (contradicts project policy) - Remove verbose docstring example (contradicts minimal-comments policy) - Remove Discussions references (feature not enabled) - Add YAML config templates section - Add performance considerations aligned with AGENTS.md - Add key test fixtures section Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/DEVELOPMENT.md | 378 ++++++++++++++------------------------------ 1 file changed, 122 insertions(+), 256 deletions(-) diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md index 4c95246f..e4e2d3de 100644 --- a/docs/DEVELOPMENT.md +++ b/docs/DEVELOPMENT.md @@ -1,282 +1,206 @@ # Development Guide -This guide provides everything you need to contribute to the MLPerf Inference Endpoint Benchmarking System. +This guide covers the development setup and workflow for the MLPerf Inference Endpoint Benchmarking System. For contribution guidelines, see [CONTRIBUTING.md](../CONTRIBUTING.md). -## 🚀 Getting Started +## Getting Started ### Prerequisites -- **Python**: 3.12+ (Python 3.12 is recommended for optimal performance) +- **Python**: 3.12+ (3.12 recommended) - **Git**: Latest version -- **Virtual Environment**: Python venv or conda -- **IDE**: VS Code, PyCharm, or your preferred editor +- **OS**: Linux or macOS (Windows is not supported) ### Development Environment Setup ```bash -# 1. Clone the repository -git clone https://github.com/mlperf/inference-endpoint.git -cd inference-endpoint +# 1. Fork https://github.com/mlcommons/endpoints on GitHub, then clone your fork +git clone https://github.com/YOUR_USERNAME/endpoints.git +cd endpoints -# 2. Create virtual environment (Python 3.12+ required) +# 2. Add the upstream repo as a remote +git remote add upstream https://github.com/mlcommons/endpoints.git + +# 3. Create virtual environment (Python 3.12+ required) python3.12 -m venv venv -source venv/bin/activate # On Windows: venv\Scripts\activate +source venv/bin/activate -# 3. Install development dependencies +# 4. Install development dependencies pip install -e ".[dev,test]" -# 4. Install pre-commit hooks +# 5. Install pre-commit hooks pre-commit install -# 5. Verify installation +# 6. Verify installation inference-endpoint --version pytest --version ``` -## 🏗️ Project Structure +## Project Structure ``` -inference-endpoint/ +endpoints/ ├── src/inference_endpoint/ # Main package source -│ ├── cli.py # Command-line interface +│ ├── main.py # Entry point and CLI app +│ ├── exceptions.py # Project-wide exception types +│ ├── async_utils/ # Event loop, ZMQ transport, pub/sub │ ├── commands/ # CLI command implementations │ ├── config/ # Configuration and schema management │ ├── core/ # Core types and orchestration │ ├── dataset_manager/ # Dataset handling and loading │ ├── endpoint_client/ # HTTP/ZMQ endpoint communication +│ ├── evaluation/ # Accuracy evaluation and scoring │ ├── load_generator/ # Load generation and scheduling │ ├── metrics/ # Performance measurement and reporting │ ├── openai/ # OpenAI API compatibility +│ ├── plugins/ # Plugin system │ ├── profiling/ # Performance profiling tools +│ ├── sglang/ # SGLang API adapter │ ├── testing/ # Test utilities (echo server, etc.) │ └── utils/ # Common utilities ├── tests/ # Test suite │ ├── unit/ # Unit tests │ ├── integration/ # Integration tests -│ ├── performance/ # Performance tests -│ └── datasets/ # Test datasets +│ ├── performance/ # Performance benchmarks +│ └── datasets/ # Test data (dummy_1k.jsonl, squad_pruned/) ├── docs/ # Documentation ├── examples/ # Usage examples └── scripts/ # Utility scripts ``` -## 🧪 Testing +## Testing ### Running Tests ```bash -# Run all tests +# All tests (excludes slow/performance) pytest -# Run with coverage -pytest --cov=src --cov-report=html - -# Run specific test categories -pytest -m unit # Unit tests only -pytest -m integration # Integration tests only -pytest -m performance # Performance tests only (no timeout) +# Unit tests only +pytest -m unit -# Run tests in parallel -pytest -n auto +# Integration tests +pytest -m integration -# Run tests with verbose output -pytest -v +# Single file with verbose output +pytest -xvs tests/unit/path/to/test_file.py -# Run specific test file -pytest tests/unit/test_core_types.py - -# Run with output to file (recommended) -pytest -v 2>&1 | tee test_results.log +# With coverage +pytest --cov=src --cov-report=html ``` -### Test Structure +### Test Markers -- **Unit Tests** (`tests/unit/`): Test individual components in isolation -- **Integration Tests** (`tests/integration/`): Test component interactions with real servers -- **Performance Tests** (`tests/performance/`): Test performance characteristics (marked with @pytest.mark.performance, no timeout) -- **Test Datasets** (`tests/datasets/`): Sample datasets for testing (dummy_1k.jsonl, squad_pruned/) - -### Writing Tests +Every test function **must** have a marker: ```python import pytest -from inference_endpoint.core.types import Query - -class TestQuery: - def test_query_creation(self): - """Test creating a basic query.""" - query = Query(prompt="Test", model="test-model") - assert query.prompt == "Test" - assert query.model == "test-model" - - @pytest.mark.asyncio - async def test_async_operation(self): - """Test async operations.""" - # Your async test here - pass -``` -## 📝 Code Quality +@pytest.mark.unit +def test_something(): + ... -### Pre-commit Hooks +@pytest.mark.unit +@pytest.mark.asyncio # strict mode is configured globally in pyproject.toml +async def test_async_something(): + ... +``` -The project uses pre-commit hooks to ensure code quality: +Available markers: `unit`, `integration`, `slow`, `performance`, `run_explicitly` -```bash -# Install hooks (done during setup) -pre-commit install +### Key Fixtures -# Run all hooks on staged files -pre-commit run +Defined in `tests/conftest.py` — use these instead of mocking: -# Run all hooks on all files -pre-commit run --all-files +- `mock_http_echo_server` — real HTTP echo server on dynamic port +- `mock_http_oracle_server` — dataset-driven response server +- `dummy_dataset` — in-memory test dataset +- `events_db` — pre-populated SQLite events database -# Skip hooks (use sparingly) -git commit --no-verify -``` +### Coverage -### Code Formatting +Target **>90% coverage** for all new code. -```bash -# Format code with ruff -ruff format src/ tests/ +## Code Quality -# Check formatting without changing files -ruff format --check src/ tests/ -``` +### Pre-commit Hooks -### Linting +All of these run automatically on commit: -```bash -# Run ruff linter -ruff check src/ tests/ +- trailing-whitespace, end-of-file-fixer, check-yaml, check-merge-conflict, debug-statements +- `ruff` (lint + autofix) and `ruff-format` +- `mypy` type checking +- `prettier` for YAML/JSON/Markdown +- License header enforcement +- YAML template validation and regeneration -# Run mypy for type checking -mypy src/ +**IMPORTANT: Always run `pre-commit run --all-files` before every commit.** Hooks may modify files. If files are modified, stage the changes and commit once. -# Run all quality checks +```bash +# Run all hooks pre-commit run --all-files + +# Install hooks (done during setup) +pre-commit install ``` -## 🔧 Development Workflow +### Code Style + +- **Formatter/Linter**: `ruff` (line-length 88, target Python 3.12) +- **Type checking**: `mypy` +- **Formatting**: `ruff-format` (double quotes, space indent) +- **License headers**: Required on all Python files (auto-added by pre-commit) +- **Commit messages**: [Conventional commits](https://www.conventionalcommits.org/) — `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `perf:` +- **Comments**: Only where the _why_ isn't obvious from the code -### 1. Feature Development +## Development Workflow + +### Feature Development ```bash -# Create feature branch -git checkout -b feature/your-feature-name +# Sync your fork with upstream before starting +git fetch upstream +git checkout main +git merge upstream/main + +# Create a feature branch on your fork +git checkout -b feat/your-feature-name # Make changes and test pytest pre-commit run --all-files # Commit changes -git add . +git add git commit -m "feat: add your feature description" -# Push and create PR -git push origin feature/your-feature-name +# Push to your fork and open a PR against mlcommons/endpoints +git push origin feat/your-feature-name ``` -### 2. Component Development - -When developing a new component: - -1. **Create the component directory** in `src/inference_endpoint/` -2. **Add `__init__.py`** with component description -3. **Implement the component** following the established patterns -4. **Add tests** in the corresponding `tests/unit/` directory -5. **Update main package** `__init__.py` if needed -6. **Add dependencies** to `pyproject.toml` under `[project.dependencies]` or `[project.optional-dependencies]` +### Branch Naming -### 3. Testing Strategy - -- **Unit Tests**: >90% coverage required -- **Integration Tests**: Test component interactions -- **Performance Tests**: Ensure no performance regressions -- **Documentation**: Update docs for new features - -## 📚 Documentation - -### Writing Documentation - -- **Code Comments**: Use docstrings for all public APIs -- **README Updates**: Update README.md for user-facing changes -- **API Documentation**: Document new interfaces and changes -- **Examples**: Provide usage examples for new features - -### Documentation Standards - -```python -def process_query(query: Query) -> QueryResult: - """ - Process a query and return the result. - - Args: - query: The query to process - - Returns: - QueryResult containing the processed response - - Raises: - QueryError: If the query cannot be processed - - Example: - >>> query = Query(prompt="Hello") - >>> result = process_query(query) - >>> print(result.content) - 'Hello there!' - """ - # Implementation here - pass +``` +feat/short-description +fix/short-description +docs/short-description ``` -## 🚀 Performance Considerations - -### Development Guidelines - -- **Async First**: Use async/await for I/O operations -- **Memory Efficiency**: Minimize object creation in hot paths -- **Profiling**: Use pytest-benchmark for performance testing -- **Monitoring**: Add performance metrics for critical operations +## YAML Config Templates -### Performance Testing +Config templates in `src/inference_endpoint/config/templates/` are auto-generated from schema defaults. When you change `config/schema.py`, regenerate them: ```bash -# Run performance tests -pytest -m performance - -# Run benchmarks -pytest --benchmark-only - -# Compare with previous runs -pytest --benchmark-compare +python scripts/regenerate_templates.py ``` -## 🔍 Debugging +The pre-commit hook auto-regenerates templates when `schema.py`, `config.py`, or `regenerate_templates.py` change. CI validates templates are up to date via `--check` mode. -### Common Issues +Two variants are generated per mode (offline, online, concurrency): -1. **Import Errors**: Ensure `src/` is in Python path -2. **Test Failures**: Check test data and mock objects -3. **Performance Issues**: Use profiling tools to identify bottlenecks -4. **Async Issues**: Ensure proper event loop handling +- `_template.yaml` — minimal: only required fields + placeholders +- `_template_full.yaml` — all fields with schema defaults + inline `# options:` comments -### Debug Tools - -```bash -# Run with debug logging -inference-endpoint --verbose - -# Run tests with debug output -pytest -s -v - -# Use Python debugger -python -m pdb -m pytest test_file.py -``` - -## 📦 Package Management +## Package Management ### Adding Dependencies @@ -285,95 +209,37 @@ Add dependencies to `pyproject.toml` (always pin to exact versions with `==`): - **Runtime dependencies**: `[project.dependencies]` - **Optional groups** (dev, test, etc.): `[project.optional-dependencies]` -Install after updating: +After adding a dependency, run `pip-audit` (included in `dev` extras) to verify it has no known vulnerabilities. ```bash pip install -e ".[dev,test]" ``` -## 🚨 Troubleshooting - -### Common Problems - -**Pre-commit hooks failing:** - -```bash -# Update pre-commit -pre-commit autoupdate - -# Skip hooks temporarily -git commit --no-verify -``` - -**Tests failing:** +## Performance Considerations -```bash -# Clear Python cache -find . -type d -name "__pycache__" -delete -find . -type f -name "*.pyc" -delete +Code in `load_generator/`, `endpoint_client/worker.py`, and `async_utils/transport/` is latency-critical. In these paths: -# Reinstall package -pip install -e . -``` +- No `match` statements — use dict dispatch +- Use `dataclass(slots=True)` or `msgspec.Struct` for frequently instantiated classes +- Minimize async suspends +- Use `msgspec` over `json`/`pydantic` for serialization +- The HTTP client uses custom `ConnectionPool` with `httptools` parser — not `aiohttp`/`requests` -**Import errors:** +## Debugging ```bash -# Check Python path -python -c "import sys; print(sys.path)" - -# Ensure src is in path -export PYTHONPATH="${PYTHONPATH}:$(pwd)/src" -``` - -## 🤝 Contributing Guidelines - -### Pull Request Process - -1. **Fork the repository** and create a feature branch -2. **Make your changes** following the coding standards -3. **Add tests** for new functionality -4. **Update documentation** as needed -5. **Run all checks** locally before submitting -6. **Create a PR** with clear description and tests -7. **Address review comments** promptly - -### Commit Message Format +# Run with verbose logging +inference-endpoint -v benchmark offline ... -Use conventional commit format: +# Run tests with stdout visible +pytest -xvs tests/unit/path/to/test.py +# Use Python debugger +python -m pdb -m pytest tests/unit/path/to/test.py ``` -type(scope): description - -feat(core): add query lifecycle management -fix(api): resolve endpoint connection issue -docs(readme): update installation instructions -test(loadgen): add performance benchmarks -``` - -### Code Review Checklist - -- [ ] Code follows style guidelines -- [ ] Tests pass and coverage is adequate -- [ ] Documentation is updated -- [ ] Performance impact is considered -- [ ] Security implications are reviewed -- [ ] Error handling is appropriate - -## 📞 Getting Help - -- **Issues**: [GitHub Issues](https://github.com/mlperf/inference-endpoint/issues) -- **Discussions**: [GitHub Discussions](https://github.com/mlperf/inference-endpoint/discussions) -- **Documentation**: Check this guide and project docs -- **Team**: Reach out to the development team - -## 🎯 Next Steps -1. **Set up your environment** using this guide -2. **Explore the codebase** to understand the architecture -3. **Pick a component** to work on from the project board -4. **Start with tests** to understand the expected behavior -5. **Implement incrementally** with regular testing -6. **Ask questions** when you need help +## Getting Help -Happy coding! 🚀 +- **Issues**: [GitHub Issues](https://github.com/mlcommons/endpoints/issues) +- **Project Board**: [Q2 Board](https://github.com/orgs/mlcommons/projects/57) +- **Documentation**: See [docs/](.) directory for guides From 149cce0096cf2b151686fc4ab28cffce83f8bd88 Mon Sep 17 00:00:00 2001 From: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com> Date: Mon, 13 Apr 2026 18:01:41 -0500 Subject: [PATCH 14/14] Update dependencies Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com> --- pyproject.toml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pyproject.toml b/pyproject.toml index 19fa129d..67dfc865 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -47,7 +47,7 @@ dependencies = [ "transformers==5.4.0", "numpy==2.4.4", "datasets==4.8.4", - "Pillow==12.1.1", + "Pillow==12.2.0", "sentencepiece==0.2.1", "protobuf==7.34.1", "openai_harmony==0.0.8", @@ -82,7 +82,7 @@ test = [ # Includes optional dependencies for full test coverage "inference-endpoint[sql]", # Testing framework - "pytest==9.0.2", + "pytest==9.0.3", "pytest-asyncio==1.3.0", "pytest-cov==7.1.0", "pytest-benchmark==5.2.3",