From 060e9f612db01bab87bdeb618242851f258fdc44 Mon Sep 17 00:00:00 2001
From: Zhihan Jiang <zhihanj@nvidia.com>
Date: Tue, 7 Apr 2026 13:39:09 -0700
Subject: [PATCH 01/14] docs: add project management design spec for labels,
 board, templates, and CONTRIBUTING.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 .../2026-04-07-project-management-design.md   | 586 ++++++++++++++++++
 1 file changed, 586 insertions(+)
 create mode 100644 docs/superpowers/specs/2026-04-07-project-management-design.md

diff --git a/docs/superpowers/specs/2026-04-07-project-management-design.md b/docs/superpowers/specs/2026-04-07-project-management-design.md
new file mode 100644
index 00000000..b2d37156
--- /dev/null
+++ b/docs/superpowers/specs/2026-04-07-project-management-design.md
@@ -0,0 +1,586 @@
+# Project Management Design: Labels, Board, Templates, and CONTRIBUTING.md
+
+**Date:** 2026-04-07
+**Author:** Zhihan Jiang (nvzhihanj)
+**Status:** Draft
+
+## Context
+
+The mlcommons/endpoints repository has 57 open issues with inconsistent labeling,
+no issue templates, a minimal CONTRIBUTING.md, and no active project board. The
+project has 3-4 core contributors (NVIDIA) and growing community participation
+(Intel, MLCommons, external). The goal is to establish project management
+infrastructure that serves the **broader MLCommons community** as the primary
+audience — making it easy for external contributors to self-serve, pick up issues,
+and understand the project roadmap.
+
+### Research Basis
+
+This design is informed by analysis of label taxonomies and project management
+practices from: Kubernetes, PyTorch, vLLM, Ray, SGLang, MLCommons/inference,
+and guidance from opensource.guide, GitHub Docs, CNCF, and Linux Foundation.
+
+### Phased Approach
+
+- **Phase 1 (now):** Labels, board, templates, CONTRIBUTING.md, issue migration
+- **Phase 2 (when issue volume > 100 or contributors > 10):** Size/effort labels,
+  stale bot automation, iteration/sprint fields, disable blank issues
+
+---
+
+## 1. Label Taxonomy (~28 labels)
+
+### Design Principles
+
+- **Prefixed naming** (`type:`, `priority:`, `area:`, `status:`) for filterability
+  and visual grouping — inspired by Ray and PyTorch
+- **Coarse area labels** (7) grouping related modules — start coarse, split later
+- **Severity-gradient colors** for priority — hotter = more urgent
+- **Single color family** per label category for visual coherence
+
+### Type Labels
+
+| Label | Color | Description |
+|-------|-------|-------------|
+| `type: bug` | `#d73a4a` | Something isn't working |
+| `type: feature` | `#a2eeef` | New feature or capability |
+| `type: enhancement` | `#bfd4f2` | Improvement to existing functionality |
+| `type: performance` | `#3ddd26` | Performance regression or improvement |
+| `type: documentation` | `#0075ca` | Documentation only |
+| `type: question` | `#d876e3` | Usage question or clarification |
+| `type: RFC` | `#76fde7` | Request for comments / design proposal |
+| `type: chore` | `#ededed` | Maintenance, deps, CI, tooling |
+
+### Priority Labels
+
+| Label | Color | Description |
+|-------|-------|-------------|
+| `priority: ShowStopper` | `#000000` | Drop everything — critical blocker, all hands on deck |
+| `priority: P0` | `#b60205` | Critical — blocks release or users |
+| `priority: P1` | `#d93f0b` | High — must address this cycle |
+| `priority: P2` | `#fbca04` | Medium — address within quarter |
+| `priority: P3` | `#0e8a16` | Low — backlog, nice to have |
+
+### Area Labels
+
+| Label | Color | Description |
+|-------|-------|-------------|
+| `area: core-engine` | `#c5def5` | Load generator, scheduler, async utils |
+| `area: client` | `#c5def5` | Endpoint client, HTTP, transport, ZMQ |
+| `area: metrics` | `#c5def5` | Event recorder, metrics reporter, reporting |
+| `area: dataset` | `#c5def5` | Dataset manager, formats, predefined datasets |
+| `area: config-cli` | `#c5def5` | Config schema, CLI commands, YAML |
+| `area: evaluation` | `#c5def5` | Accuracy evaluation, scoring, extractors |
+| `area: adapters` | `#c5def5` | OpenAI, SGLang protocol adapters |
+
+### Status Labels
+
+| Label | Color | Description |
+|-------|-------|-------------|
+| `status: needs-triage` | `#e99695` | New issue, awaiting review |
+| `status: needs-info` | `#f9d0c4` | Awaiting more details from reporter |
+| `status: blocked` | `#b60205` | Blocked on external dependency or decision |
+
+### Community Labels (keep existing)
+
+| Label | Color | Description |
+|-------|-------|-------------|
+| `good first issue` | `#7057ff` | Good for newcomers |
+| `help wanted` | `#008672` | Extra attention needed |
+
+### Other (keep existing)
+
+| Label | Color | Description |
+|-------|-------|-------------|
+| `mlcommons` | `#e0703c` | MLCommons ruleset/submission integration |
+| `dependencies` | `#9083cd` | Dependency updates |
+| `security` | `#b60205` | Security vulnerability or hardening |
+| `duplicate` | `#cfd3d7` | Duplicate issue |
+| `invalid` | `#e4e669` | Not valid |
+| `wontfix` | `#ffffff` | Will not be worked on |
+
+### Labels to Remove
+
+These are replaced by the prefixed equivalents above:
+
+| Old Label | Replaced By |
+|-----------|-------------|
+| `bug` | `type: bug` |
+| `feature` | `type: feature` |
+| `enhancement` | `type: enhancement` |
+| `documentation` | `type: documentation` |
+| `performance` | `type: performance` |
+| `question` | `type: question` |
+| `P0` | `priority: P0` |
+| `P1` | `priority: P1` |
+| `P2` | `priority: P2` |
+| `ShowStopper` | `priority: ShowStopper` |
+| `testing` | `type: chore` (context-dependent) |
+| `accuracy` | `area: evaluation` |
+| `dataset` | `area: dataset` |
+| `Roadmap` | `type: RFC` |
+| `blocked` | `status: blocked` |
+| `rules` | `mlcommons` |
+| `MLCommons` | `mlcommons` (lowercase) |
+
+---
+
+## 2. Project Board #57 Structure
+
+### Status Columns
+
+```
+Inbox → Triage → Ready → In Progress → In Review → Done
+```
+
+| Column | Purpose | Entry Criteria |
+|--------|---------|----------------|
+| **Inbox** | New issues land here automatically | Auto-added when issue opened |
+| **Triage** | Being evaluated for priority/area/assignee | Someone picked it up to review |
+| **Ready** | Triaged, prioritized, ready to work on | Has priority + area labels |
+| **In Progress** | Actively being worked on | Assigned, PR may be in flight |
+| **In Review** | PR submitted, awaiting review | Linked PR exists |
+| **Done** | Merged/resolved/closed | Auto-set when issue closed |
+
+### Custom Fields
+
+| Field | Type | Values |
+|-------|------|--------|
+| Priority | Single select | ShowStopper, P0, P1, P2, P3 |
+| Area | Single select | core-engine, client, metrics, dataset, config-cli, evaluation, adapters, mlcommons |
+| Target Release | Single select | v0.5.0, v1.0.0 (add as needed) |
+
+### Views (4)
+
+**1. Kanban (default)**
+- Layout: Board
+- Columns: Status field
+- Group by: Priority (ShowStopper at top → P3 at bottom)
+- Filter: status ≠ Done
+
+**2. Priority Table**
+- Layout: Table
+- Sort: Priority ascending (ShowStopper first), then updated date descending
+- Columns: Title, Priority, Area, Status, Assignee, Target Release
+- Filter: status ≠ Done
+
+**3. By Assignee**
+- Layout: Table
+- Group by: Assignee
+- Sort: Priority ascending within each group
+- Columns: Title, Priority, Area, Status
+- Filter: status ≠ Done
+
+**4. Stale Issues**
+- Layout: Table
+- Sort: Updated date ascending (oldest first)
+- Columns: Title, Priority, Area, Status, Assignee, Last Updated
+- Filter: status ≠ Done AND last updated more than 30 days ago
+
+### Automations
+
+| Trigger | Action |
+|---------|--------|
+| Issue added to project | Set status → Inbox |
+| Issue closed | Set status → Done |
+| PR merged closing issue | Set status → Done |
+| Item in Done 14+ days | Auto-archive |
+
+---
+
+## 3. Issue Templates
+
+### Files
+
+- `.github/ISSUE_TEMPLATE/100-bug-report.yml` — Bug Report
+- `.github/ISSUE_TEMPLATE/200-feature-request.yml` — Feature Request
+- `.github/ISSUE_TEMPLATE/300-performance.yml` — Performance Issue
+- `.github/ISSUE_TEMPLATE/400-dataset-integration.yml` — Dataset Integration
+- `.github/ISSUE_TEMPLATE/config.yml` — Template chooser config
+
+### 100-bug-report.yml
+
+```yaml
+name: Bug Report
+description: Report a bug or unexpected behavior
+title: "[Bug]: "
+labels: ["type: bug", "status: needs-triage"]
+body:
+  - type: textarea
+    id: description
+    attributes:
+      label: Bug Description
+      description: What happened vs. what you expected
+      placeholder: "When I run X, I expected Y but got Z"
+    validations:
+      required: true
+  - type: textarea
+    id: reproduction
+    attributes:
+      label: Steps to Reproduce
+      value: |
+        1.
+        2.
+        3.
+    validations:
+      required: true
+  - type: textarea
+    id: environment
+    attributes:
+      label: Environment
+      description: OS, Python version, package version
+      placeholder: "OS: Ubuntu 22.04, Python 3.12, inference-endpoint v0.1.0"
+    validations:
+      required: true
+  - type: textarea
+    id: logs
+    attributes:
+      label: Relevant Logs
+      render: shell
+  - type: checkboxes
+    id: checklist
+    attributes:
+      label: Before submitting
+      options:
+        - label: I searched existing issues and found no duplicates
+          required: true
+```
+
+### 200-feature-request.yml
+
+```yaml
+name: Feature Request
+description: Suggest a new feature or enhancement
+title: "[Feature]: "
+labels: ["type: feature", "status: needs-triage"]
+body:
+  - type: textarea
+    id: motivation
+    attributes:
+      label: Motivation
+      description: What problem does this solve? Why do you need it?
+    validations:
+      required: true
+  - type: textarea
+    id: proposal
+    attributes:
+      label: Proposed Solution
+      description: How should this work? Include API sketches if relevant.
+    validations:
+      required: true
+  - type: textarea
+    id: alternatives
+    attributes:
+      label: Alternatives Considered
+  - type: textarea
+    id: context
+    attributes:
+      label: Additional Context
+```
+
+### 300-performance.yml
+
+```yaml
+name: Performance Issue
+description: Report a performance regression or improvement opportunity
+title: "[Perf]: "
+labels: ["type: performance", "status: needs-triage"]
+body:
+  - type: textarea
+    id: description
+    attributes:
+      label: Description
+      description: What performance issue did you observe?
+      placeholder: "QPS dropped from X to Y after upgrading to version Z"
+    validations:
+      required: true
+  - type: textarea
+    id: benchmark
+    attributes:
+      label: Benchmark Command
+      description: The exact command you ran
+      render: shell
+    validations:
+      required: true
+  - type: textarea
+    id: results
+    attributes:
+      label: Results
+      description: Expected vs actual numbers (QPS, latency, TTFT, TPOT, etc.)
+      placeholder: |
+        Expected: ~5000 QPS, p99 latency < 200ms
+        Actual: ~2000 QPS, p99 latency 800ms
+    validations:
+      required: true
+  - type: textarea
+    id: environment
+    attributes:
+      label: Environment
+      description: Hardware, OS, Python version, endpoint server details
+      placeholder: |
+        Hardware: 8x A100 80GB
+        OS: Ubuntu 22.04
+        Python: 3.12
+        Server: vLLM 0.6.0, Llama-3-70B
+        Workers: 4
+    validations:
+      required: true
+  - type: textarea
+    id: profiling
+    attributes:
+      label: Profiling Data (optional)
+      description: Any profiling output, flame graphs, or bottleneck analysis
+      render: shell
+  - type: checkboxes
+    id: checklist
+    attributes:
+      label: Before submitting
+      options:
+        - label: I searched existing issues and found no duplicates
+          required: true
+        - label: I ran with default settings before tuning
+          required: false
+```
+
+### 400-dataset-integration.yml
+
+```yaml
+name: Dataset Integration
+description: Request support for a new dataset or evaluation benchmark
+title: "[Dataset]: "
+labels: ["type: feature", "area: dataset", "status: needs-triage"]
+body:
+  - type: textarea
+    id: dataset
+    attributes:
+      label: Dataset Information
+      description: Name, URL, and brief description
+      placeholder: |
+        Name: MATH-500
+        URL: https://huggingface.co/datasets/...
+        Description: 500 competition math problems for testing reasoning
+    validations:
+      required: true
+  - type: dropdown
+    id: format
+    attributes:
+      label: Dataset Format
+      options:
+        - JSONL
+        - HuggingFace Dataset
+        - CSV
+        - JSON
+        - Parquet
+        - Other
+    validations:
+      required: true
+  - type: textarea
+    id: evaluation
+    attributes:
+      label: Evaluation Method
+      description: How should responses be scored?
+      placeholder: "Exact match after extracting boxed answer, or pass@1 for code"
+    validations:
+      required: true
+  - type: textarea
+    id: samples
+    attributes:
+      label: Scale
+      description: Number of samples, expected prompt/response lengths
+      placeholder: "500 samples, avg prompt ~200 tokens, avg response ~500 tokens"
+  - type: textarea
+    id: context
+    attributes:
+      label: Additional Context
+      description: Related benchmarks, papers, or prior art
+```
+
+### config.yml
+
+```yaml
+blank_issues_enabled: true
+contact_links:
+  - name: Questions & Discussion
+    url: https://github.com/mlcommons/endpoints/discussions
+    about: Ask questions and discuss ideas before filing an issue
+```
+
+---
+
+## 4. CONTRIBUTING.md
+
+Replace the existing minimal CONTRIBUTING.md with an expanded version (~250 lines)
+covering:
+
+1. **Ways to Contribute** — links to all 4 issue templates, plus docs, PR reviews,
+   `good first issue` and `help wanted` labels
+2. **Development Setup** — prerequisites, fork/clone, venv, `pip install -e ".[dev,test]"`,
+   pre-commit install, local echo server testing
+3. **Code Style and Conventions** — ruff, mypy, line length 88, double quotes,
+   conventional commits, license headers, serialization conventions
+   (msgspec vs pydantic), performance-sensitive code guidelines
+4. **Testing** — pytest commands, markers (`unit`, `integration`, `slow`,
+   `performance`), `@pytest.mark.asyncio(mode="strict")`, >90% coverage target,
+   use real fixtures over mocks
+5. **Submitting Changes** — branch naming (`feat/`, `fix/`, `docs/`), PR template,
+   CI checks, review expectations (2-3 business days), review criteria
+6. **Issue Guidelines** — search first, use templates, issue lifecycle
+   (Inbox → Triage → Ready → In Progress → In Review → Done), priority levels table
+7. **MLCommons CLA** — existing CLA requirements preserved
+
+---
+
+## 5. Issue Migration Plan
+
+### Duplicate Resolution
+
+Close duplicates with a comment explaining the closure and linking to the primary
+issue. Copy any unique context from the duplicate into a comment on the primary
+issue so no information is lost.
+
+| Close | Primary | Reason |
+|-------|---------|--------|
+| #205 "fully async benchmark" | #255 "Make Loadgen Async" | Same goal, #255 is cleaner |
+| #170 "warmup with random dataset" | #86 "Warmup runs" | Subset of #86 |
+| #226 "Initial multi-turn enabling" | #232 "multi-turn implementation" | Same feature |
+| #29 "submission checker for 6.0" | #79 "submission checker compat mode" | #29 is version-specific, superseded |
+| #207 "speedup tokenizer report" | #208 "optimize report generation" | #207 is a specific approach to #208 |
+| #83 "Q1 Roadmap" | #223 "Phase 2 Roadmap" | Superseded |
+
+**Evaluation:** #73 "random dataset support" — keep if random dataset has value
+beyond warmup use case; otherwise close as duplicate of #86.
+
+### Label Reassignment
+
+All 57 open issues are reassigned from old labels to the new prefixed taxonomy.
+Full mapping follows, organized by priority tier.
+
+#### ShowStopper
+
+| # | Title | Labels |
+|---|-------|--------|
+| 84 | Pareto clarification | `priority: ShowStopper`, `area: config-cli`, `mlcommons` |
+| 8 | Parity with MLPerf LoadGen | `priority: ShowStopper`, `type: performance`, `area: core-engine` |
+| 4 | Accuracy evaluation for LLMs | `priority: ShowStopper`, `type: feature`, `area: evaluation` |
+
+#### P0
+
+| # | Title | Labels |
+|---|-------|--------|
+| 86 | Warmup runs | `priority: P0`, `type: feature`, `area: core-engine` |
+| 183 | Pub/Sub event recorder | `priority: P0`, `type: feature`, `area: metrics` |
+| 138 | CI stress test upper bound | `priority: P0`, `type: chore`, `area: core-engine` |
+| 6 | Final report structure | `priority: P0`, `type: feature`, `area: metrics` |
+| 5 | Submission ruleset + config | `priority: P0`, `type: feature`, `area: config-cli`, `mlcommons` |
+
+#### P1
+
+| # | Title | Labels |
+|---|-------|--------|
+| 9 | Roofline analysis | `priority: P1`, `type: performance`, `area: core-engine` |
+| 255 | Make Loadgen Async | `priority: P1`, `type: feature`, `area: core-engine` |
+| 269 | Low concurrency timeouts | `priority: P1`, `type: bug`, `area: client` |
+| 237 | CLI fix --load-pattern + --target-qps | `priority: P1`, `type: bug`, `area: config-cli` |
+| 219 | target_qps hardcoded in Offline | `priority: P1`, `type: bug`, `area: config-cli` |
+| 221 | RuntimeSettings non-reproducible | `priority: P1`, `type: bug`, `area: config-cli` |
+| 202 | max_throughput connection timeouts | `priority: P1`, `type: bug`, `area: client` |
+| 199 | Perf discrepancy submission vs perf config | `priority: P1`, `type: bug`, `area: config-cli` |
+| 217 | BURST and STEP load patterns | `priority: P1`, `type: feature`, `area: core-engine` |
+| 222 | KVStore/ServiceLauncher lack tests | `priority: P1`, `type: chore`, `area: core-engine` |
+| 220 | SGLang adapter tests skipped | `priority: P1`, `type: chore`, `area: adapters` |
+| 182 | Text vs token perf on TRTLLM | `priority: P1`, `type: performance`, `area: metrics` |
+| 179 | Humanity's Last Exam | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` |
+| 178 | Healthbench integration | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` |
+| 177 | MATH500 dataset | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` |
+| 176 | MMLU/MMLU-Pro | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` |
+| 173 | Investigate mlcr failures | `priority: P1`, `type: bug`, `mlcommons` |
+| 113 | DeepSeek | `priority: P1`, `type: feature` |
+| 210 | Wan2.2-T2V support | `priority: P1`, `type: feature` |
+| 10 | System bottleneck tests | `priority: P1`, `type: performance`, `area: core-engine` |
+| 7 | Runtime visualization | `priority: P1`, `type: feature`, `area: metrics` |
+
+#### P2
+
+| # | Title | Labels |
+|---|-------|--------|
+| 268 | Phase 2 model selection | `priority: P2`, `type: feature` |
+| 254 | Handling failed requests | `priority: P2`, `type: feature`, `area: client` |
+| 232 | Multi-turn implementation | `priority: P2`, `type: feature`, `area: dataset` |
+| 224 | Multiple perf configs | `priority: P2`, `type: feature`, `area: config-cli` |
+| 208 | Optimize report generation | `priority: P2`, `type: performance`, `area: metrics` |
+| 158 | SGLang adapter + OpenAI compat | `priority: P2`, `type: feature`, `area: adapters` |
+| 125 | Multi-concurrency scans | `priority: P2`, `type: feature`, `area: core-engine` |
+| 115 | Clarify default metric | `priority: P2`, `type: enhancement`, `area: config-cli` |
+| 79 | Submission checker compat mode | `priority: P2`, `type: feature`, `mlcommons` |
+| 73 | Random dataset support | `priority: P2`, `type: feature`, `area: dataset` |
+| 68 | Official model name mapping | `priority: P2`, `type: feature`, `area: config-cli`, `mlcommons` |
+| 58 | Config-template mapping | `priority: P2`, `type: feature`, `area: config-cli`, `mlcommons` |
+| 213 | PostGres dup element | `priority: P2`, `type: bug`, `mlcommons` |
+| 133 | llama.cpp incompatibility | `priority: P2`, `type: bug`, `area: client` |
+| 174 | Better error logging mlcr | `priority: P2`, `type: enhancement`, `mlcommons` |
+| 229 | Endpoints test environment | `priority: P2`, `type: chore` |
+| 228 | Endpoints Vision document | `priority: P2`, `type: documentation` |
+| 227 | DB and Object Store elements | `priority: P2`, `type: feature` |
+| 212 | UBI Storage layer | `priority: P2`, `type: feature` |
+
+#### P3
+
+| # | Title | Labels |
+|---|-------|--------|
+| 99 | Local mode errors | `priority: P3`, `type: bug`, `good first issue` |
+| 50 | LlaMa3-405b support | `priority: P3`, `type: feature` |
+| 204 | Documentation cleanup | `priority: P3`, `type: documentation` |
+| 190 | Skills, design docs, tooling | `priority: P3`, `type: chore` |
+| 181 | Sweep qwen scripts | `priority: P3`, `type: feature` |
+
+#### Other (no priority)
+
+| # | Title | Labels |
+|---|-------|--------|
+| 223 | Phase 2 Roadmap | `type: RFC` |
+| 267 | Bump transformers | `type: chore`, `dependencies`, `security` |
+
+### Q2 Board Population
+
+**Add to board #57 (~40 issues):** All ShowStopper, P0, P1, and P2 issues.
+Initial status: **Triage** (existing issues need priority confirmation from team).
+
+**Not on Q2 board (~5 issues):** P3 issues (#99, #50, #204, #190, #181) and
+dependabot (#267).
+
+### Milestones
+
+Create milestones as releases are planned:
+- `v0.5.0` — first milestone, assign issues as release scope is defined
+- `v1.0.0` — future
+
+---
+
+## 6. Phase 2 (Future)
+
+Trigger when issue volume > 100 or contributors > 10:
+
+- Add `size: S`, `size: M`, `size: L`, `size: XL` effort labels
+- Disable blank issues in `config.yml`
+- Add stale bot (apply `status: stale` after 90 days, close after 30 more)
+- Add iteration/sprint fields to board if team adopts time-boxed cycles
+- Split coarse area labels if any accumulates > 20 issues
+
+---
+
+## 7. Migration Procedure
+
+Order of operations for the migration:
+
+1. **Create new labels** — all `type:`, `priority:`, `area:`, `status:` labels
+2. **Relabel existing issues** — apply new labels per the mapping above
+3. **Remove old labels from issues** — strip legacy labels
+4. **Close duplicates** — comment with explanation + link to primary, copy unique
+   context to primary issue
+5. **Delete old labels** — remove legacy labels from the repository
+6. **Add issues to board #57** — all ShowStopper through P2
+7. **Set board status** — all migrated issues start in Triage
+8. **Configure board automations** — auto-add, auto-done, auto-archive
+9. **Create issue templates** — add all 4 YAML templates + config.yml
+10. **Update CONTRIBUTING.md** — replace with expanded version
+11. **Commit and push** — templates + CONTRIBUTING.md in a single PR

From d76e0100eaddd5818b0eafee355999b01afb0137 Mon Sep 17 00:00:00 2001
From: Zhihan Jiang <zhihanj@nvidia.com>
Date: Tue, 7 Apr 2026 14:44:21 -0700
Subject: [PATCH 02/14] docs: update design spec with priority corrections, PR
 linkages, and dedup cleanup
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Priority changes: #217→P2, #178→P2, #179→P2, #173→P2, #268→P1, #232→P0, #9→P1
Added open PR to issue linkage table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 .../2026-04-07-project-management-design.md   | 33 +++++++++++++++----
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/docs/superpowers/specs/2026-04-07-project-management-design.md b/docs/superpowers/specs/2026-04-07-project-management-design.md
index b2d37156..43e5d446 100644
--- a/docs/superpowers/specs/2026-04-07-project-management-design.md
+++ b/docs/superpowers/specs/2026-04-07-project-management-design.md
@@ -468,6 +468,7 @@ Full mapping follows, organized by priority tier.
 | # | Title | Labels |
 |---|-------|--------|
 | 86 | Warmup runs | `priority: P0`, `type: feature`, `area: core-engine` |
+| 232 | Multi-turn implementation | `priority: P0`, `type: feature`, `area: dataset` |
 | 183 | Pub/Sub event recorder | `priority: P0`, `type: feature`, `area: metrics` |
 | 138 | CI stress test upper bound | `priority: P0`, `type: chore`, `area: core-engine` |
 | 6 | Final report structure | `priority: P0`, `type: feature`, `area: metrics` |
@@ -485,17 +486,14 @@ Full mapping follows, organized by priority tier.
 | 221 | RuntimeSettings non-reproducible | `priority: P1`, `type: bug`, `area: config-cli` |
 | 202 | max_throughput connection timeouts | `priority: P1`, `type: bug`, `area: client` |
 | 199 | Perf discrepancy submission vs perf config | `priority: P1`, `type: bug`, `area: config-cli` |
-| 217 | BURST and STEP load patterns | `priority: P1`, `type: feature`, `area: core-engine` |
 | 222 | KVStore/ServiceLauncher lack tests | `priority: P1`, `type: chore`, `area: core-engine` |
 | 220 | SGLang adapter tests skipped | `priority: P1`, `type: chore`, `area: adapters` |
 | 182 | Text vs token perf on TRTLLM | `priority: P1`, `type: performance`, `area: metrics` |
-| 179 | Humanity's Last Exam | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` |
-| 178 | Healthbench integration | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` |
 | 177 | MATH500 dataset | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` |
 | 176 | MMLU/MMLU-Pro | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` |
-| 173 | Investigate mlcr failures | `priority: P1`, `type: bug`, `mlcommons` |
 | 113 | DeepSeek | `priority: P1`, `type: feature` |
 | 210 | Wan2.2-T2V support | `priority: P1`, `type: feature` |
+| 268 | Phase 2 model selection | `priority: P1`, `type: feature` |
 | 10 | System bottleneck tests | `priority: P1`, `type: performance`, `area: core-engine` |
 | 7 | Runtime visualization | `priority: P1`, `type: feature`, `area: metrics` |
 
@@ -503,9 +501,11 @@ Full mapping follows, organized by priority tier.
 
 | # | Title | Labels |
 |---|-------|--------|
-| 268 | Phase 2 model selection | `priority: P2`, `type: feature` |
 | 254 | Handling failed requests | `priority: P2`, `type: feature`, `area: client` |
-| 232 | Multi-turn implementation | `priority: P2`, `type: feature`, `area: dataset` |
+| 217 | BURST and STEP load patterns | `priority: P2`, `type: feature`, `area: core-engine` |
+| 179 | Humanity's Last Exam | `priority: P2`, `type: feature`, `area: evaluation`, `area: dataset` |
+| 178 | Healthbench integration | `priority: P2`, `type: feature`, `area: evaluation`, `area: dataset` |
+| 173 | Investigate mlcr failures | `priority: P2`, `type: bug`, `mlcommons` |
 | 224 | Multiple perf configs | `priority: P2`, `type: feature`, `area: config-cli` |
 | 208 | Optimize report generation | `priority: P2`, `type: performance`, `area: metrics` |
 | 158 | SGLang adapter + OpenAI compat | `priority: P2`, `type: feature`, `area: adapters` |
@@ -583,4 +583,23 @@ Order of operations for the migration:
 8. **Configure board automations** — auto-add, auto-done, auto-archive
 9. **Create issue templates** — add all 4 YAML templates + config.yml
 10. **Update CONTRIBUTING.md** — replace with expanded version
-11. **Commit and push** — templates + CONTRIBUTING.md in a single PR
+11. **Link open PRs to issues** — add "Relates to #N" comments where applicable
+12. **Commit and push** — templates + CONTRIBUTING.md in a single PR
+
+### Open PR → Issue Linkages
+
+| PR | Linked Issue | Relationship |
+|----|-------------|--------------|
+| #255 Make Loadgen Async | #255 (same) | PR is the issue |
+| #237 CLI fix --load-pattern + --target-qps | #237 (same) | PR is the issue |
+| #226 Initial multi-turn enabling | #232 multi-turn implementation | PR implements #232; #226 issue closed as dup |
+| #207 Speedup tokenizer report | #208 optimize report generation | PR implements #208; #207 issue closed as dup |
+| #205 Fully async benchmark | #255 Make Loadgen Async | Duplicate PR; #205 issue closed as dup |
+| #204 Documentation cleanup | #204 (same) | PR is the issue |
+| #190 Skills, design docs, tooling | #190 (same) | PR is the issue |
+| #181 Sweep qwen scripts | #181 (same) | PR is the issue |
+| #170 Warmup with random dataset | #86 Warmup runs | PR implements #86; #170 issue closed as dup |
+| #158 SGLang adapter + OpenAI compat | #158 (same) | PR is the issue |
+| #125 Multi-concurrency scans | #125 (same) | PR is the issue |
+| #79 Submission checker compat | #79 (same) + #29 (superseded) | PR is the issue |
+| #267 Bump transformers | #267 (dependabot) | PR is the issue |

From cdf2ed329f3b044719e21ca7586b6ac389494ec0 Mon Sep 17 00:00:00 2001
From: Zhihan Jiang <zhihanj@nvidia.com>
Date: Tue, 7 Apr 2026 14:51:51 -0700
Subject: [PATCH 03/14] docs: add project management implementation plan

13-task plan covering labels, board, templates, CONTRIBUTING.md, issue migration,
duplicate closure, PR linkages, and board automation setup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 .../plans/2026-04-07-project-management.md    | 1092 +++++++++++++++++
 1 file changed, 1092 insertions(+)
 create mode 100644 docs/superpowers/plans/2026-04-07-project-management.md

diff --git a/docs/superpowers/plans/2026-04-07-project-management.md b/docs/superpowers/plans/2026-04-07-project-management.md
new file mode 100644
index 00000000..5dff6134
--- /dev/null
+++ b/docs/superpowers/plans/2026-04-07-project-management.md
@@ -0,0 +1,1092 @@
+# Project Management Infrastructure Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Set up labels, project board, issue templates, CONTRIBUTING.md, and migrate all 57 open issues for the mlcommons/endpoints GitHub repository.
+
+**Architecture:** All GitHub API interactions use `curl` with auth token (the `gh` CLI has TLS certificate issues in this environment). Board configuration uses the GitHub GraphQL API for Projects V2. File changes (templates, CONTRIBUTING.md) are committed locally and pushed as a PR.
+
+**Tech Stack:** GitHub REST API, GitHub GraphQL API, curl, bash, git
+
+**IMPORTANT — API access pattern:** The `gh` CLI cannot make API calls due to TLS errors. Every API call must use this pattern:
+```bash
+TOKEN=$(gh auth token 2>&1)
+curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" "https://api.github.com/..."
+```
+For GraphQL:
+```bash
+TOKEN=$(gh auth token 2>&1)
+curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  -X POST https://api.github.com/graphql \
+  -d '{"query":"..."}'
+```
+
+**IMPORTANT — Label names with colons:** GitHub label names containing spaces and colons must be URL-encoded in REST API paths. For example, `type: bug` becomes `type%3A%20bug` in URLs. When creating labels via POST body (JSON), use the literal name.
+
+---
+
+## File Structure
+
+No new source code files. Changes are:
+
+- **Create:** `.github/ISSUE_TEMPLATE/100-bug-report.yml`
+- **Create:** `.github/ISSUE_TEMPLATE/200-feature-request.yml`
+- **Create:** `.github/ISSUE_TEMPLATE/300-performance.yml`
+- **Create:** `.github/ISSUE_TEMPLATE/400-dataset-integration.yml`
+- **Create:** `.github/ISSUE_TEMPLATE/config.yml`
+- **Modify:** `CONTRIBUTING.md` (full rewrite)
+
+All other changes are GitHub API operations (labels, board, issues) — no local files.
+
+---
+
+### Task 1: Create New Labels
+
+Create all 23 new labels on the repository via the REST API. Existing labels that are being kept (`good first issue`, `help wanted`, `dependencies`, `security`, `duplicate`, `invalid`, `wontfix`) are untouched. The `mlcommons` label needs to be created fresh (the old `MLCommons` with capital M will be removed later).
+
+**Files:** None (API only)
+
+- [ ] **Step 1: Create all type labels**
+
+Run this script. It creates 8 type labels:
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+REPO="mlcommons/endpoints"
+
+for label_json in \
+  '{"name":"type: bug","color":"d73a4a","description":"Something isn'\''t working"}' \
+  '{"name":"type: feature","color":"a2eeef","description":"New feature or capability"}' \
+  '{"name":"type: enhancement","color":"bfd4f2","description":"Improvement to existing functionality"}' \
+  '{"name":"type: performance","color":"3ddd26","description":"Performance regression or improvement"}' \
+  '{"name":"type: documentation","color":"0075ca","description":"Documentation only"}' \
+  '{"name":"type: question","color":"d876e3","description":"Usage question or clarification"}' \
+  '{"name":"type: RFC","color":"76fde7","description":"Request for comments / design proposal"}' \
+  '{"name":"type: chore","color":"ededed","description":"Maintenance, deps, CI, tooling"}'; do
+  echo "Creating: $(echo "$label_json" | python3 -c 'import sys,json; print(json.load(sys.stdin)["name"])')"
+  curl -s -X POST \
+    -H "Authorization: token $TOKEN" \
+    -H "Accept: application/vnd.github+json" \
+    "https://api.github.com/repos/$REPO/labels" \
+    -d "$label_json" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f"  -> {d.get(\"name\", d.get(\"message\", \"error\"))}")'
+done
+```
+
+Expected: 8 lines showing each label name created successfully.
+
+- [ ] **Step 2: Create all priority labels**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+REPO="mlcommons/endpoints"
+
+for label_json in \
+  '{"name":"priority: ShowStopper","color":"000000","description":"Drop everything — critical blocker, all hands on deck"}' \
+  '{"name":"priority: P0","color":"b60205","description":"Critical — blocks release or users"}' \
+  '{"name":"priority: P1","color":"d93f0b","description":"High — must address this cycle"}' \
+  '{"name":"priority: P2","color":"fbca04","description":"Medium — address within quarter"}' \
+  '{"name":"priority: P3","color":"0e8a16","description":"Low — backlog, nice to have"}'; do
+  echo "Creating: $(echo "$label_json" | python3 -c 'import sys,json; print(json.load(sys.stdin)["name"])')"
+  curl -s -X POST \
+    -H "Authorization: token $TOKEN" \
+    -H "Accept: application/vnd.github+json" \
+    "https://api.github.com/repos/$REPO/labels" \
+    -d "$label_json" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f"  -> {d.get(\"name\", d.get(\"message\", \"error\"))}")'
+done
+```
+
+Expected: 5 labels created.
+
+- [ ] **Step 3: Create all area labels**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+REPO="mlcommons/endpoints"
+
+for label_json in \
+  '{"name":"area: core-engine","color":"c5def5","description":"Load generator, scheduler, async utils"}' \
+  '{"name":"area: client","color":"c5def5","description":"Endpoint client, HTTP, transport, ZMQ"}' \
+  '{"name":"area: metrics","color":"c5def5","description":"Event recorder, metrics reporter, reporting"}' \
+  '{"name":"area: dataset","color":"c5def5","description":"Dataset manager, formats, predefined datasets"}' \
+  '{"name":"area: config-cli","color":"c5def5","description":"Config schema, CLI commands, YAML"}' \
+  '{"name":"area: evaluation","color":"c5def5","description":"Accuracy evaluation, scoring, extractors"}' \
+  '{"name":"area: adapters","color":"c5def5","description":"OpenAI, SGLang protocol adapters"}'; do
+  echo "Creating: $(echo "$label_json" | python3 -c 'import sys,json; print(json.load(sys.stdin)["name"])')"
+  curl -s -X POST \
+    -H "Authorization: token $TOKEN" \
+    -H "Accept: application/vnd.github+json" \
+    "https://api.github.com/repos/$REPO/labels" \
+    -d "$label_json" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f"  -> {d.get(\"name\", d.get(\"message\", \"error\"))}")'
+done
+```
+
+Expected: 7 labels created.
+
+- [ ] **Step 4: Create status labels and mlcommons label**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+REPO="mlcommons/endpoints"
+
+for label_json in \
+  '{"name":"status: needs-triage","color":"e99695","description":"New issue, awaiting review"}' \
+  '{"name":"status: needs-info","color":"f9d0c4","description":"Awaiting more details from reporter"}' \
+  '{"name":"status: blocked","color":"b60205","description":"Blocked on external dependency or decision"}' \
+  '{"name":"mlcommons","color":"e0703c","description":"MLCommons ruleset/submission integration"}'; do
+  echo "Creating: $(echo "$label_json" | python3 -c 'import sys,json; print(json.load(sys.stdin)["name"])')"
+  curl -s -X POST \
+    -H "Authorization: token $TOKEN" \
+    -H "Accept: application/vnd.github+json" \
+    "https://api.github.com/repos/$REPO/labels" \
+    -d "$label_json" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f"  -> {d.get(\"name\", d.get(\"message\", \"error\"))}")'
+done
+```
+
+Expected: 4 labels created (mlcommons may say "already_exists" if the old `MLCommons` case-insensitively matches — if so, update it in a later step).
+
+- [ ] **Step 5: Verify all new labels exist**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+curl -s -H "Authorization: token $TOKEN" \
+  "https://api.github.com/repos/mlcommons/endpoints/labels?per_page=100" | \
+  python3 -c "
+import sys, json
+labels = json.load(sys.stdin)
+names = sorted([l['name'] for l in labels])
+print(f'Total labels: {len(names)}')
+for n in names:
+    print(f'  {n}')
+"
+```
+
+Expected: All new `type:`, `priority:`, `area:`, `status:` labels present alongside existing labels.
+
+---
+
+### Task 2: Relabel All Open Issues
+
+Apply new labels and remove old labels for every open issue, following the spec's mapping exactly. This is done in batches by priority tier.
+
+**Files:** None (API only)
+
+**IMPORTANT:** The GitHub `PUT /repos/{owner}/{repo}/issues/{number}/labels` endpoint **replaces** all labels on an issue. So each call must include the complete set of new labels for that issue.
+
+- [ ] **Step 1: Relabel ShowStopper issues**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+REPO="mlcommons/endpoints"
+
+# #84 - Pareto clarification
+curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/84/labels" \
+  -d '{"labels":["priority: ShowStopper","area: config-cli","mlcommons"]}' | python3 -c 'import sys,json; print(f"#84: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
+
+# #8 - Parity with MLPerf LoadGen
+curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/8/labels" \
+  -d '{"labels":["priority: ShowStopper","type: performance","area: core-engine"]}' | python3 -c 'import sys,json; print(f"#8: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
+
+# #4 - Accuracy evaluation for LLMs
+curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/4/labels" \
+  -d '{"labels":["priority: ShowStopper","type: feature","area: evaluation"]}' | python3 -c 'import sys,json; print(f"#4: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
+```
+
+Expected: Each issue prints its new label set.
+
+- [ ] **Step 2: Relabel P0 issues**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+REPO="mlcommons/endpoints"
+
+# #86 - Warmup runs
+curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/86/labels" \
+  -d '{"labels":["priority: P0","type: feature","area: core-engine"]}' | python3 -c 'import sys,json; print(f"#86: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
+
+# #232 - Multi-turn implementation
+curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/232/labels" \
+  -d '{"labels":["priority: P0","type: feature","area: dataset"]}' | python3 -c 'import sys,json; print(f"#232: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
+
+# #183 - Pub/Sub event recorder
+curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/183/labels" \
+  -d '{"labels":["priority: P0","type: feature","area: metrics"]}' | python3 -c 'import sys,json; print(f"#183: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
+
+# #138 - CI stress test upper bound
+curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/138/labels" \
+  -d '{"labels":["priority: P0","type: chore","area: core-engine"]}' | python3 -c 'import sys,json; print(f"#138: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
+
+# #6 - Final report structure
+curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/6/labels" \
+  -d '{"labels":["priority: P0","type: feature","area: metrics"]}' | python3 -c 'import sys,json; print(f"#6: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
+
+# #5 - Submission ruleset + config
+curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/5/labels" \
+  -d '{"labels":["priority: P0","type: feature","area: config-cli","mlcommons"]}' | python3 -c 'import sys,json; print(f"#5: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
+```
+
+Expected: 6 issues relabeled.
+
+- [ ] **Step 3: Relabel P1 issues**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+REPO="mlcommons/endpoints"
+
+declare -A P1_LABELS
+P1_LABELS[9]='["priority: P1","type: performance","area: core-engine"]'
+P1_LABELS[255]='["priority: P1","type: feature","area: core-engine"]'
+P1_LABELS[269]='["priority: P1","type: bug","area: client"]'
+P1_LABELS[237]='["priority: P1","type: bug","area: config-cli"]'
+P1_LABELS[219]='["priority: P1","type: bug","area: config-cli"]'
+P1_LABELS[221]='["priority: P1","type: bug","area: config-cli"]'
+P1_LABELS[202]='["priority: P1","type: bug","area: client"]'
+P1_LABELS[199]='["priority: P1","type: bug","area: config-cli"]'
+P1_LABELS[222]='["priority: P1","type: chore","area: core-engine"]'
+P1_LABELS[220]='["priority: P1","type: chore","area: adapters"]'
+P1_LABELS[182]='["priority: P1","type: performance","area: metrics"]'
+P1_LABELS[177]='["priority: P1","type: feature","area: evaluation","area: dataset"]'
+P1_LABELS[176]='["priority: P1","type: feature","area: evaluation","area: dataset"]'
+P1_LABELS[113]='["priority: P1","type: feature"]'
+P1_LABELS[210]='["priority: P1","type: feature"]'
+P1_LABELS[268]='["priority: P1","type: feature"]'
+P1_LABELS[10]='["priority: P1","type: performance","area: core-engine"]'
+P1_LABELS[7]='["priority: P1","type: feature","area: metrics"]'
+
+for issue in "${!P1_LABELS[@]}"; do
+  curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+    "https://api.github.com/repos/$REPO/issues/$issue/labels" \
+    -d "{\"labels\":${P1_LABELS[$issue]}}" | python3 -c "import sys,json; print(f'#$issue: {[l[\"name\"] for l in json.load(sys.stdin)]}')"
+done
+```
+
+Expected: 18 issues relabeled.
+
+- [ ] **Step 4: Relabel P2 issues**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+REPO="mlcommons/endpoints"
+
+declare -A P2_LABELS
+P2_LABELS[254]='["priority: P2","type: feature","area: client"]'
+P2_LABELS[217]='["priority: P2","type: feature","area: core-engine"]'
+P2_LABELS[179]='["priority: P2","type: feature","area: evaluation","area: dataset"]'
+P2_LABELS[178]='["priority: P2","type: feature","area: evaluation","area: dataset"]'
+P2_LABELS[173]='["priority: P2","type: bug","mlcommons"]'
+P2_LABELS[224]='["priority: P2","type: feature","area: config-cli"]'
+P2_LABELS[208]='["priority: P2","type: performance","area: metrics"]'
+P2_LABELS[158]='["priority: P2","type: feature","area: adapters"]'
+P2_LABELS[125]='["priority: P2","type: feature","area: core-engine"]'
+P2_LABELS[115]='["priority: P2","type: enhancement","area: config-cli"]'
+P2_LABELS[79]='["priority: P2","type: feature","mlcommons"]'
+P2_LABELS[73]='["priority: P2","type: feature","area: dataset"]'
+P2_LABELS[68]='["priority: P2","type: feature","area: config-cli","mlcommons"]'
+P2_LABELS[58]='["priority: P2","type: feature","area: config-cli","mlcommons"]'
+P2_LABELS[213]='["priority: P2","type: bug","mlcommons"]'
+P2_LABELS[133]='["priority: P2","type: bug","area: client"]'
+P2_LABELS[174]='["priority: P2","type: enhancement","mlcommons"]'
+P2_LABELS[229]='["priority: P2","type: chore"]'
+P2_LABELS[228]='["priority: P2","type: documentation"]'
+P2_LABELS[227]='["priority: P2","type: feature"]'
+P2_LABELS[212]='["priority: P2","type: feature"]'
+
+for issue in "${!P2_LABELS[@]}"; do
+  curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+    "https://api.github.com/repos/$REPO/issues/$issue/labels" \
+    -d "{\"labels\":${P2_LABELS[$issue]}}" | python3 -c "import sys,json; print(f'#$issue: {[l[\"name\"] for l in json.load(sys.stdin)]}')"
+done
+```
+
+Expected: 21 issues relabeled.
+
+- [ ] **Step 5: Relabel P3 and other issues**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+REPO="mlcommons/endpoints"
+
+# P3 issues
+curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/99/labels" \
+  -d '{"labels":["priority: P3","type: bug","good first issue"]}' | python3 -c 'import sys,json; print(f"#99: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
+
+curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/50/labels" \
+  -d '{"labels":["priority: P3","type: feature"]}' | python3 -c 'import sys,json; print(f"#50: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
+
+curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/204/labels" \
+  -d '{"labels":["priority: P3","type: documentation"]}' | python3 -c 'import sys,json; print(f"#204: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
+
+curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/190/labels" \
+  -d '{"labels":["priority: P3","type: chore"]}' | python3 -c 'import sys,json; print(f"#190: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
+
+curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/181/labels" \
+  -d '{"labels":["priority: P3","type: feature"]}' | python3 -c 'import sys,json; print(f"#181: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
+
+# Other (no priority)
+curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/223/labels" \
+  -d '{"labels":["type: RFC"]}' | python3 -c 'import sys,json; print(f"#223: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
+
+curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/267/labels" \
+  -d '{"labels":["type: chore","dependencies","security"]}' | python3 -c 'import sys,json; print(f"#267: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
+```
+
+Expected: 7 issues relabeled.
+
+- [ ] **Step 6: Verify relabeling — spot check 5 issues**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+for issue in 84 232 269 208 99; do
+  curl -s -H "Authorization: token $TOKEN" \
+    "https://api.github.com/repos/mlcommons/endpoints/issues/$issue" | \
+    python3 -c "import sys,json; d=json.load(sys.stdin); print(f'#{d[\"number\"]} {d[\"title\"]}: {[l[\"name\"] for l in d[\"labels\"]]}')"
+done
+```
+
+Expected: Each issue shows only its new prefixed labels.
+
+---
+
+### Task 3: Close Duplicate Issues
+
+For each duplicate, first read its body to preserve unique context, then comment on the primary issue with that context, then close the duplicate with an explanation.
+
+**Files:** None (API only)
+
+- [ ] **Step 1: Close #205 as duplicate of #255 (async benchmark)**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+REPO="mlcommons/endpoints"
+
+# Get #205 body for context preservation
+BODY_205=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/205" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")')
+
+# Comment on primary #255 with context from #205
+curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/255/comments" \
+  -d "$(python3 -c "
+import json
+body = '''Context preserved from duplicate #205 (fully async benchmark):
+
+$BODY_205'''
+print(json.dumps({'body': body}))
+")" | python3 -c 'import sys,json; print(f"Commented on #255: {json.load(sys.stdin).get(\"id\",\"error\")}")'
+
+# Comment on #205 explaining closure
+curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/205/comments" \
+  -d '{"body":"Closing as duplicate of #255 (Make Loadgen Async). Both issues target the same goal of making the benchmark fully async. Unique context from this issue has been copied to #255."}' | python3 -c 'import sys,json; print(f"Commented on #205: {json.load(sys.stdin).get(\"id\",\"error\")}")'
+
+# Close #205
+curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/205" \
+  -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#205 state: {json.load(sys.stdin).get(\"state\",\"error\")}")'
+```
+
+Expected: #205 closed, context preserved on #255.
+
+- [ ] **Step 2: Close #170 as duplicate of #86 (warmup)**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+REPO="mlcommons/endpoints"
+
+BODY_170=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/170" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")')
+
+curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/86/comments" \
+  -d "$(python3 -c "
+import json
+body = '''Context preserved from duplicate #170 (warmup with random dataset):
+
+$BODY_170'''
+print(json.dumps({'body': body}))
+")" | python3 -c 'import sys,json; print(f"Commented on #86: {json.load(sys.stdin).get(\"id\",\"error\")}")'
+
+curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/170/comments" \
+  -d '{"body":"Closing as duplicate of #86 (Warmup runs). This issue describes a specific warmup implementation approach (random dataset) which is a subset of #86. Unique context has been copied to #86."}' | python3 -c 'import sys,json; print(f"Commented on #170: {json.load(sys.stdin).get(\"id\",\"error\")}")'
+
+curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/170" \
+  -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#170 state: {json.load(sys.stdin).get(\"state\",\"error\")}")'
+```
+
+- [ ] **Step 3: Close #226 as duplicate of #232 (multi-turn)**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+REPO="mlcommons/endpoints"
+
+BODY_226=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/226" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")')
+
+curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/232/comments" \
+  -d "$(python3 -c "
+import json
+body = '''Context preserved from duplicate #226 (Initial multi-turn enabling):
+
+$BODY_226'''
+print(json.dumps({'body': body}))
+")" | python3 -c 'import sys,json; print(f"Commented on #232: {json.load(sys.stdin).get(\"id\",\"error\")}")'
+
+curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/226/comments" \
+  -d '{"body":"Closing as duplicate of #232 (multi-turn implementation). Both track the same multi-turn feature. Unique context has been copied to #232."}' | python3 -c 'import sys,json; print(f"Commented on #226: {json.load(sys.stdin).get(\"id\",\"error\")}")'
+
+curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/226" \
+  -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#226 state: {json.load(sys.stdin).get(\"state\",\"error\")}")'
+```
+
+- [ ] **Step 4: Close #29 as superseded by #79 (submission checker)**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+REPO="mlcommons/endpoints"
+
+BODY_29=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/29" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")')
+
+curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/79/comments" \
+  -d "$(python3 -c "
+import json
+body = '''Context preserved from superseded #29 (submission checker for 6.0):
+
+$BODY_29'''
+print(json.dumps({'body': body}))
+")" | python3 -c 'import sys,json; print(f"Commented on #79: {json.load(sys.stdin).get(\"id\",\"error\")}")'
+
+curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/29/comments" \
+  -d '{"body":"Closing as superseded by #79 (submission checker compatibility mode). #29 was version-specific (6.0) while #79 covers the general compatibility feature. Context has been preserved on #79."}' | python3 -c 'import sys,json; print(f"Commented on #29: {json.load(sys.stdin).get(\"id\",\"error\")}")'
+
+curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/29" \
+  -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#29 state: {json.load(sys.stdin).get(\"state\",\"error\")}")'
+```
+
+- [ ] **Step 5: Close #207 as duplicate of #208 (report generation)**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+REPO="mlcommons/endpoints"
+
+BODY_207=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/207" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")')
+
+curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/208/comments" \
+  -d "$(python3 -c "
+import json
+body = '''Context preserved from duplicate #207 (speedup tokenizer report generation):
+
+$BODY_207'''
+print(json.dumps({'body': body}))
+")" | python3 -c 'import sys,json; print(f"Commented on #208: {json.load(sys.stdin).get(\"id\",\"error\")}")'
+
+curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/207/comments" \
+  -d '{"body":"Closing as duplicate of #208 (optimize report generation). #207 describes a specific approach (parallel tokenization) to #208'\''s broader goal. Context has been preserved on #208."}' | python3 -c 'import sys,json; print(f"Commented on #207: {json.load(sys.stdin).get(\"id\",\"error\")}")'
+
+curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/207" \
+  -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#207 state: {json.load(sys.stdin).get(\"state\",\"error\")}")'
+```
+
+- [ ] **Step 6: Close #83 as superseded by #223 (roadmap)**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+REPO="mlcommons/endpoints"
+
+curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/83/comments" \
+  -d '{"body":"Closing as superseded by #223 (Phase 2 Roadmap). The Q1 roadmap is complete and Phase 2 planning has taken over."}' | python3 -c 'import sys,json; print(f"Commented on #83: {json.load(sys.stdin).get(\"id\",\"error\")}")'
+
+curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/83" \
+  -d '{"state":"closed","state_reason":"completed"}' | python3 -c 'import sys,json; print(f"#83 state: {json.load(sys.stdin).get(\"state\",\"error\")}")'
+```
+
+---
+
+### Task 4: Delete Legacy Labels
+
+Remove old labels that have been replaced. Only delete after all issues have been relabeled (Task 2 complete).
+
+**Files:** None (API only)
+
+- [ ] **Step 1: Delete all legacy labels**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+REPO="mlcommons/endpoints"
+
+# URL-encode label names: spaces→%20, colons are fine in DELETE paths
+for label in "bug" "feature" "enhancement" "documentation" "performance" "question" \
+  "P0" "P1" "P2" "ShowStopper" "testing" "accuracy" "dataset" "Roadmap" "blocked" \
+  "rules" "MLCommons"; do
+  encoded=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$label'))")
+  echo -n "Deleting '$label'... "
+  STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X DELETE \
+    -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+    "https://api.github.com/repos/$REPO/labels/$encoded")
+  if [ "$STATUS" = "204" ]; then echo "deleted"; elif [ "$STATUS" = "404" ]; then echo "not found (already gone)"; else echo "status $STATUS"; fi
+done
+```
+
+Expected: Each label prints "deleted" or "not found". No errors.
+
+- [ ] **Step 2: Verify final label set**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+curl -s -H "Authorization: token $TOKEN" \
+  "https://api.github.com/repos/mlcommons/endpoints/labels?per_page=100" | \
+  python3 -c "
+import sys, json
+labels = json.load(sys.stdin)
+names = sorted([l['name'] for l in labels])
+print(f'Total labels: {len(names)}')
+for n in names:
+    print(f'  {n}')
+"
+```
+
+Expected: Only new prefixed labels plus kept labels (`good first issue`, `help wanted`, `mlcommons`, `dependencies`, `security`, `duplicate`, `invalid`, `wontfix`). No old labels remain.
+
+---
+
+### Task 5: Configure Project Board #57
+
+Set up the board with status field options, custom fields, and 4 views using the GraphQL API.
+
+**Files:** None (API only)
+
+**NOTE:** The board already exists with ID `PVT_kwDOBAnwDc4BTQvY`. We need to configure its fields and views.
+
+- [ ] **Step 1: Get the board's field IDs**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  -X POST https://api.github.com/graphql \
+  -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { fields(first: 20) { nodes { ... on ProjectV2Field { id name } ... on ProjectV2SingleSelectField { id name options { id name } } ... on ProjectV2IterationField { id name } } } } } }"}' | python3 -m json.tool
+```
+
+Expected: JSON listing all existing fields with their IDs. Look for the "Status" field and its current options. Record the Status field ID for next steps.
+
+- [ ] **Step 2: Update the Status field with 6 options**
+
+Using the Status field ID from Step 1, update its options. The GraphQL mutation is `updateProjectV2Field`. First, clear existing options and set the 6 new ones.
+
+**Note:** You must adapt the field ID from Step 1's output. Replace `STATUS_FIELD_ID` below with the actual ID.
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+
+# Get current status field ID (adapt if needed)
+STATUS_FIELD_ID="<from step 1>"
+
+# Update status field options using the updateProjectV2SingleSelectField mutation
+curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  -X POST https://api.github.com/graphql \
+  -d '{
+    "query": "mutation { updateProjectV2Field(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", fieldId: \"'"$STATUS_FIELD_ID"'\", singleSelectOptions: [{name: \"Inbox\", color: GRAY}, {name: \"Triage\", color: YELLOW}, {name: \"Ready\", color: BLUE}, {name: \"In Progress\", color: ORANGE}, {name: \"In Review\", color: PURPLE}, {name: \"Done\", color: GREEN}]}) { projectV2Field { ... on ProjectV2SingleSelectField { id options { id name } } } }"
+  }' | python3 -m json.tool
+```
+
+Expected: Returns the updated Status field with 6 options.
+
+- [ ] **Step 3: Create Priority custom field**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  -X POST https://api.github.com/graphql \
+  -d '{
+    "query": "mutation { createProjectV2Field(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", dataType: SINGLE_SELECT, name: \"Priority\", singleSelectOptions: [{name: \"ShowStopper\", color: RED}, {name: \"P0\", color: RED}, {name: \"P1\", color: ORANGE}, {name: \"P2\", color: YELLOW}, {name: \"P3\", color: GREEN}]}) { projectV2Field { ... on ProjectV2SingleSelectField { id name options { id name } } } }"
+  }' | python3 -m json.tool
+```
+
+Expected: Priority field created with 5 options.
+
+- [ ] **Step 4: Create Area custom field**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  -X POST https://api.github.com/graphql \
+  -d '{
+    "query": "mutation { createProjectV2Field(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", dataType: SINGLE_SELECT, name: \"Area\", singleSelectOptions: [{name: \"core-engine\", color: BLUE}, {name: \"client\", color: BLUE}, {name: \"metrics\", color: BLUE}, {name: \"dataset\", color: BLUE}, {name: \"config-cli\", color: BLUE}, {name: \"evaluation\", color: BLUE}, {name: \"adapters\", color: BLUE}, {name: \"mlcommons\", color: PURPLE}]}) { projectV2Field { ... on ProjectV2SingleSelectField { id name options { id name } } } }"
+  }' | python3 -m json.tool
+```
+
+Expected: Area field created with 8 options.
+
+- [ ] **Step 5: Create Target Release custom field**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  -X POST https://api.github.com/graphql \
+  -d '{
+    "query": "mutation { createProjectV2Field(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", dataType: SINGLE_SELECT, name: \"Target Release\", singleSelectOptions: [{name: \"v0.5.0\", color: GRAY}, {name: \"v1.0.0\", color: GRAY}]}) { projectV2Field { ... on ProjectV2SingleSelectField { id name options { id name } } } }"
+  }' | python3 -m json.tool
+```
+
+Expected: Target Release field created.
+
+- [ ] **Step 6: Verify all fields exist**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  -X POST https://api.github.com/graphql \
+  -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { fields(first: 20) { nodes { ... on ProjectV2Field { id name } ... on ProjectV2SingleSelectField { id name options { id name } } } } } } }"}' | python3 -m json.tool
+```
+
+Expected: Status (6 options), Priority (5 options), Area (8 options), Target Release (2 options) all present.
+
+---
+
+### Task 6: Add Issues to Board #57
+
+Add all ShowStopper through P2 issues (~40 after dedup) to the project board and set their status to Triage.
+
+**Files:** None (API only)
+
+- [ ] **Step 1: Get issue node IDs for all Q2 issues**
+
+We need the GraphQL node IDs for each issue to add them to the project. Batch-fetch them:
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+
+# All issue numbers to add to board (ShowStopper + P0 + P1 + P2)
+ISSUES="84 8 4 86 232 183 138 6 5 9 255 269 237 219 221 202 199 222 220 182 177 176 113 210 268 10 7 254 217 179 178 173 224 208 158 125 115 79 73 68 58 213 133 174 229 228 227 212"
+
+for issue in $ISSUES; do
+  NODE_ID=$(curl -s -H "Authorization: token $TOKEN" \
+    "https://api.github.com/repos/mlcommons/endpoints/issues/$issue" | \
+    python3 -c 'import sys,json; print(json.load(sys.stdin)["node_id"])')
+  echo "$issue $NODE_ID"
+done
+```
+
+Expected: A list of issue numbers and their node IDs. Save this output — you'll need it for Step 2.
+
+- [ ] **Step 2: Add each issue to the project**
+
+For each issue, use the `addProjectV2ItemById` mutation. Process in batches to avoid rate limiting:
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+PROJECT_ID="PVT_kwDOBAnwDc4BTQvY"
+
+# Use the node IDs from Step 1. Example for one issue:
+# curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
+#   -d '{"query":"mutation { addProjectV2ItemById(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", contentId: \"NODE_ID_HERE\"}) { item { id } } }"}'
+
+# Batch all issues:
+ISSUES="84 8 4 86 232 183 138 6 5 9 255 269 237 219 221 202 199 222 220 182 177 176 113 210 268 10 7 254 217 179 178 173 224 208 158 125 115 79 73 68 58 213 133 174 229 228 227 212"
+
+for issue in $ISSUES; do
+  NODE_ID=$(curl -s -H "Authorization: token $TOKEN" \
+    "https://api.github.com/repos/mlcommons/endpoints/issues/$issue" | \
+    python3 -c 'import sys,json; print(json.load(sys.stdin)["node_id"])')
+
+  ITEM_ID=$(curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
+    -d "{\"query\":\"mutation { addProjectV2ItemById(input: {projectId: \\\"$PROJECT_ID\\\", contentId: \\\"$NODE_ID\\\"}) { item { id } } }\"}" | \
+    python3 -c 'import sys,json; print(json.load(sys.stdin)["data"]["addProjectV2ItemById"]["item"]["id"])')
+
+  echo "#$issue added: $ITEM_ID"
+  sleep 0.5  # Rate limit courtesy
+done
+```
+
+Expected: Each issue prints its project item ID. All ~47 issues added.
+
+- [ ] **Step 3: Set all items to Triage status**
+
+After adding items, set their Status field to "Triage". You need the Status field ID and the "Triage" option ID from Task 5 Step 1/2.
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+PROJECT_ID="PVT_kwDOBAnwDc4BTQvY"
+STATUS_FIELD_ID="<from Task 5>"
+TRIAGE_OPTION_ID="<from Task 5>"
+
+# For each item added in Step 2, set status to Triage
+# Use the item IDs printed in Step 2
+for ITEM_ID in <paste item IDs from step 2>; do
+  curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
+    -d "{\"query\":\"mutation { updateProjectV2ItemFieldValue(input: {projectId: \\\"$PROJECT_ID\\\", itemId: \\\"$ITEM_ID\\\", fieldId: \\\"$STATUS_FIELD_ID\\\", value: {singleSelectOptionId: \\\"$TRIAGE_OPTION_ID\\\"}}) { projectV2Item { id } } }\"}" | \
+    python3 -c 'import sys,json; d=json.load(sys.stdin); print(f"Set triage: {d}")'
+  sleep 0.3
+done
+```
+
+Expected: All items set to Triage status.
+
+- [ ] **Step 4: Verify board population**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
+  -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { items(first: 100) { totalCount nodes { content { ... on Issue { number title } } } } } } }"}' | \
+  python3 -c "
+import sys, json
+data = json.load(sys.stdin)
+items = data['data']['node']['items']
+print(f'Total items on board: {items[\"totalCount\"]}')
+for item in items['nodes']:
+    c = item['content']
+    print(f'  #{c[\"number\"]} {c[\"title\"]}')
+"
+```
+
+Expected: ~47 issues listed on the board.
+
+---
+
+### Task 7: Create Board Views
+
+Create the 4 views on the project board. The default view already exists (rename to Kanban); create 3 additional views.
+
+**Files:** None (API only)
+
+- [ ] **Step 1: List existing views**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
+  -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { views(first: 10) { nodes { id name number layout } } } } }"}' | python3 -m json.tool
+```
+
+Expected: At least one default view. Record its ID.
+
+- [ ] **Step 2: Update default view to Kanban board layout**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+DEFAULT_VIEW_ID="<from step 1>"
+
+curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
+  -d "{\"query\":\"mutation { updateProjectV2View(input: {projectId: \\\"PVT_kwDOBAnwDc4BTQvY\\\", viewId: \\\"$DEFAULT_VIEW_ID\\\", name: \\\"Kanban\\\", layout: BOARD_LAYOUT}) { projectV2View { id name layout } } }\"}" | python3 -m json.tool
+```
+
+Expected: Default view renamed to "Kanban" with BOARD_LAYOUT.
+
+- [ ] **Step 3: Create Priority Table view**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
+  -d '{"query":"mutation { createProjectV2View(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", name: \"Priority Table\", layout: TABLE_LAYOUT}) { projectV2View { id name layout } } }"}' | python3 -m json.tool
+```
+
+Expected: New "Priority Table" view created with TABLE_LAYOUT.
+
+- [ ] **Step 4: Create By Assignee view**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
+  -d '{"query":"mutation { createProjectV2View(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", name: \"By Assignee\", layout: TABLE_LAYOUT}) { projectV2View { id name layout } } }"}' | python3 -m json.tool
+```
+
+Expected: New "By Assignee" view created.
+
+- [ ] **Step 5: Create Stale Issues view**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
+  -d '{"query":"mutation { createProjectV2View(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", name: \"Stale Issues\", layout: TABLE_LAYOUT}) { projectV2View { id name layout } } }"}' | python3 -m json.tool
+```
+
+Expected: New "Stale Issues" view created.
+
+- [ ] **Step 6: Verify all 4 views exist**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
+  -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { views(first: 10) { nodes { id name number layout } } } } }"}' | python3 -m json.tool
+```
+
+Expected: 4 views — Kanban (BOARD_LAYOUT), Priority Table (TABLE_LAYOUT), By Assignee (TABLE_LAYOUT), Stale Issues (TABLE_LAYOUT).
+
+**NOTE:** View-level sorting, grouping, and filtering must be configured manually in the GitHub web UI after views are created. The GraphQL API supports creating views and setting layout, but fine-grained sort/group/filter configuration is not fully exposed via API. After this task, open https://github.com/orgs/mlcommons/projects/57 and configure:
+- Kanban: Group by Priority
+- Priority Table: Sort by Priority field ascending
+- By Assignee: Group by Assignee
+- Stale Issues: Sort by Updated ascending, filter to items not updated in 30+ days
+
+---
+
+### Task 8: Create Issue Templates
+
+Write the 4 YAML issue form templates and the config file to the local repo.
+
+**Files:**
+- Create: `.github/ISSUE_TEMPLATE/100-bug-report.yml`
+- Create: `.github/ISSUE_TEMPLATE/200-feature-request.yml`
+- Create: `.github/ISSUE_TEMPLATE/300-performance.yml`
+- Create: `.github/ISSUE_TEMPLATE/400-dataset-integration.yml`
+- Create: `.github/ISSUE_TEMPLATE/config.yml`
+
+- [ ] **Step 1: Create the ISSUE_TEMPLATE directory**
+
+```bash
+mkdir -p .github/ISSUE_TEMPLATE
+```
+
+- [ ] **Step 2: Write 100-bug-report.yml**
+
+Write to `.github/ISSUE_TEMPLATE/100-bug-report.yml` with the exact content from the design spec Section 3, `100-bug-report.yml`.
+
+- [ ] **Step 3: Write 200-feature-request.yml**
+
+Write to `.github/ISSUE_TEMPLATE/200-feature-request.yml` with the exact content from the design spec Section 3, `200-feature-request.yml`.
+
+- [ ] **Step 4: Write 300-performance.yml**
+
+Write to `.github/ISSUE_TEMPLATE/300-performance.yml` with the exact content from the design spec Section 3, `300-performance.yml`.
+
+- [ ] **Step 5: Write 400-dataset-integration.yml**
+
+Write to `.github/ISSUE_TEMPLATE/400-dataset-integration.yml` with the exact content from the design spec Section 3, `400-dataset-integration.yml`.
+
+- [ ] **Step 6: Write config.yml**
+
+Write to `.github/ISSUE_TEMPLATE/config.yml`:
+
+```yaml
+blank_issues_enabled: true
+contact_links:
+  - name: Questions & Discussion
+    url: https://github.com/mlcommons/endpoints/discussions
+    about: Ask questions and discuss ideas before filing an issue
+```
+
+- [ ] **Step 7: Verify all template files exist**
+
+```bash
+ls -la .github/ISSUE_TEMPLATE/
+```
+
+Expected: 5 files — `100-bug-report.yml`, `200-feature-request.yml`, `300-performance.yml`, `400-dataset-integration.yml`, `config.yml`.
+
+- [ ] **Step 8: Commit issue templates**
+
+```bash
+git add .github/ISSUE_TEMPLATE/
+git commit -m "chore: add issue templates (bug, feature, performance, dataset)
+
+Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>"
+```
+
+---
+
+### Task 9: Update CONTRIBUTING.md
+
+Replace the existing 10-line CONTRIBUTING.md with the expanded ~250-line version.
+
+**Files:**
+- Modify: `CONTRIBUTING.md` (full rewrite)
+
+- [ ] **Step 1: Write the new CONTRIBUTING.md**
+
+Write the full CONTRIBUTING.md content as designed in Section 4 of the spec. The full text was presented during brainstorming and approved. It includes these sections:
+
+1. Welcome and Table of Contents
+2. Ways to Contribute (links to all 4 issue templates)
+3. Development Setup (prerequisites, fork/clone, venv, pip install, pre-commit, echo server)
+4. Code Style and Conventions (ruff, mypy, line length 88, conventional commits, serialization, performance-sensitive code)
+5. Testing (pytest commands, markers, async mode, coverage, fixtures)
+6. Submitting Changes (branch naming, PR process, review criteria)
+7. Issue Guidelines (templates, lifecycle, priority levels table)
+8. MLCommons CLA (existing CLA requirements preserved)
+9. Questions section
+
+- [ ] **Step 2: Commit CONTRIBUTING.md**
+
+```bash
+git add CONTRIBUTING.md
+git commit -m "docs: expand CONTRIBUTING.md with development guide, testing, and issue guidelines
+
+Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>"
+```
+
+---
+
+### Task 10: Link Open PRs to Issues
+
+Add comments on open PRs that implement issues different from their own number, creating explicit linkage.
+
+**Files:** None (API only)
+
+- [ ] **Step 1: Link PRs to their corresponding issues**
+
+Only PRs where the PR number differs from the issue it implements need explicit linking:
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+REPO="mlcommons/endpoints"
+
+# PR #226 implements issue #232 (multi-turn)
+curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/226/comments" \
+  -d '{"body":"Relates to #232 (multi-turn implementation). This PR provides the initial multi-turn enabling work tracked by #232."}' | python3 -c 'import sys,json; print(f"PR #226 linked to #232: {json.load(sys.stdin).get(\"id\",\"error\")}")'
+
+# PR #207 implements issue #208 (report generation optimization)
+curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/207/comments" \
+  -d '{"body":"Relates to #208 (optimize report generation). This PR implements parallel tokenization as one approach to #208."}' | python3 -c 'import sys,json; print(f"PR #207 linked to #208: {json.load(sys.stdin).get(\"id\",\"error\")}")'
+
+# PR #170 implements issue #86 (warmup runs)
+curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/170/comments" \
+  -d '{"body":"Relates to #86 (Warmup runs). This PR implements warmup with random dataset as part of #86."}' | python3 -c 'import sys,json; print(f"PR #170 linked to #86: {json.load(sys.stdin).get(\"id\",\"error\")}")'
+
+# PR #205 relates to issue #255 (Make Loadgen Async)
+curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/$REPO/issues/205/comments" \
+  -d '{"body":"Relates to #255 (Make Loadgen Async). Both this PR and #255 target the same async benchmark goal."}' | python3 -c 'import sys,json; print(f"PR #205 linked to #255: {json.load(sys.stdin).get(\"id\",\"error\")}")'
+```
+
+Expected: 4 comments posted linking PRs to their primary issues.
+
+---
+
+### Task 11: Push and Create PR
+
+Push the local commits (issue templates + CONTRIBUTING.md) as a PR to the repository.
+
+**Files:** None (git operations)
+
+- [ ] **Step 1: Create a feature branch**
+
+```bash
+git checkout -b chore/project-management-setup
+```
+
+- [ ] **Step 2: Cherry-pick the commits onto the branch**
+
+If you committed on main, reset main and cherry-pick onto the new branch. Otherwise if you're already on the branch, skip this.
+
+- [ ] **Step 3: Push to remote**
+
+```bash
+git push -u origin chore/project-management-setup
+```
+
+- [ ] **Step 4: Create the PR**
+
+```bash
+TOKEN=$(gh auth token 2>&1)
+curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/mlcommons/endpoints/pulls" \
+  -d '{
+    "title": "chore: add issue templates, expand CONTRIBUTING.md, and project management setup",
+    "body": "## Summary\n\n- Add 4 YAML issue form templates (bug report, feature request, performance issue, dataset integration)\n- Expand CONTRIBUTING.md with development setup, code style, testing, PR process, and issue guidelines\n- Part of the project management infrastructure setup (labels, board, and issue migration done via API)\n\n## Related\n\nDesign spec: docs/superpowers/specs/2026-04-07-project-management-design.md\n\n## Test plan\n\n- [ ] Verify issue templates render correctly on GitHub (New Issue page)\n- [ ] Verify CONTRIBUTING.md renders correctly\n- [ ] Verify all links in CONTRIBUTING.md work\n\n🤖 Generated with [Claude Code](https://claude.com/claude-code)",
+    "head": "chore/project-management-setup",
+    "base": "main"
+  }' | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f"PR created: {d.get(\"html_url\", d.get(\"message\", \"error\"))}")'
+```
+
+Expected: PR URL printed.
+
+---
+
+### Task 12: Enable Board Automations
+
+Configure the built-in automations on project board #57 via the GitHub web UI.
+
+**Files:** None (manual UI configuration)
+
+**NOTE:** GitHub Projects V2 built-in automations (auto-add, auto-archive, auto-set status on close) are not configurable via the GraphQL API. They must be enabled manually.
+
+- [ ] **Step 1: Open project settings**
+
+Navigate to: https://github.com/orgs/mlcommons/projects/57/settings
+
+- [ ] **Step 2: Enable "Auto-add" workflow**
+
+Under Workflows → Auto-add to project:
+- Enable the workflow
+- Filter: `is:issue is:open repo:mlcommons/endpoints`
+- This ensures all new issues are automatically added to the board with Inbox status
+
+- [ ] **Step 3: Enable "Item closed" workflow**
+
+Under Workflows → Item closed:
+- Enable the workflow
+- Set status to: Done
+
+- [ ] **Step 4: Enable "Pull request merged" workflow**
+
+Under Workflows → Pull request merged:
+- Enable the workflow
+- Set status to: Done
+
+- [ ] **Step 5: Enable "Auto-archive items"**
+
+Under Workflows → Auto-archive items:
+- Enable the workflow
+- Archive items that have been Done for 14 days
+
+---
+
+### Task 13: Configure Board Views in UI
+
+Fine-tune the sort, group, and filter settings for each view in the GitHub web UI.
+
+**Files:** None (manual UI configuration)
+
+- [ ] **Step 1: Configure Kanban view**
+
+Open: https://github.com/orgs/mlcommons/projects/57/views/1
+- Set layout to Board (should already be set)
+- Column field: Status
+- Group by: Priority (ShowStopper at top)
+- Filter: `status:Inbox,Triage,Ready,"In Progress","In Review"`
+
+- [ ] **Step 2: Configure Priority Table view**
+
+Open the Priority Table view
+- Sort by: Priority ascending (ShowStopper first)
+- Show columns: Title, Priority, Area, Status, Assignee, Target Release
+- Filter: exclude Done items
+
+- [ ] **Step 3: Configure By Assignee view**
+
+Open the By Assignee view
+- Group by: Assignee
+- Sort by: Priority ascending within each group
+- Show columns: Title, Priority, Area, Status
+
+- [ ] **Step 4: Configure Stale Issues view**
+
+Open the Stale Issues view
+- Sort by: Updated date ascending (oldest first)
+- Show columns: Title, Priority, Area, Status, Assignee, Updated
+- Filter: exclude Done, show only items not updated in 30+ days

From b939e9c20c553bdf280c40656afb697324c2c590 Mon Sep 17 00:00:00 2001
From: Zhihan Jiang <zhihanj@nvidia.com>
Date: Tue, 7 Apr 2026 15:08:39 -0700
Subject: [PATCH 04/14] chore: add issue templates (bug, feature, performance,
 dataset)

Four YAML-based issue form templates:
- 100-bug-report.yml: structured bug reporting
- 200-feature-request.yml: feature proposals with motivation
- 300-performance.yml: performance regressions with benchmark data
- 400-dataset-integration.yml: new dataset/eval requests
- config.yml: template chooser with Discussions link

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 .github/ISSUE_TEMPLATE/100-bug-report.yml     | 43 ++++++++++++++
 .../ISSUE_TEMPLATE/200-feature-request.yml    | 27 +++++++++
 .github/ISSUE_TEMPLATE/300-performance.yml    | 59 +++++++++++++++++++
 .../400-dataset-integration.yml               | 48 +++++++++++++++
 .github/ISSUE_TEMPLATE/config.yml             |  5 ++
 5 files changed, 182 insertions(+)
 create mode 100644 .github/ISSUE_TEMPLATE/100-bug-report.yml
 create mode 100644 .github/ISSUE_TEMPLATE/200-feature-request.yml
 create mode 100644 .github/ISSUE_TEMPLATE/300-performance.yml
 create mode 100644 .github/ISSUE_TEMPLATE/400-dataset-integration.yml
 create mode 100644 .github/ISSUE_TEMPLATE/config.yml

diff --git a/.github/ISSUE_TEMPLATE/100-bug-report.yml b/.github/ISSUE_TEMPLATE/100-bug-report.yml
new file mode 100644
index 00000000..4cf5b586
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/100-bug-report.yml
@@ -0,0 +1,43 @@
+name: Bug Report
+description: Report a bug or unexpected behavior
+title: "[Bug]: "
+labels: ["type: bug", "status: needs-triage"]
+body:
+  - type: textarea
+    id: description
+    attributes:
+      label: Bug Description
+      description: What happened vs. what you expected
+      placeholder: "When I run X, I expected Y but got Z"
+    validations:
+      required: true
+  - type: textarea
+    id: reproduction
+    attributes:
+      label: Steps to Reproduce
+      value: |
+        1.
+        2.
+        3.
+    validations:
+      required: true
+  - type: textarea
+    id: environment
+    attributes:
+      label: Environment
+      description: OS, Python version, package version
+      placeholder: "OS: Ubuntu 22.04, Python 3.12, inference-endpoint v0.1.0"
+    validations:
+      required: true
+  - type: textarea
+    id: logs
+    attributes:
+      label: Relevant Logs
+      render: shell
+  - type: checkboxes
+    id: checklist
+    attributes:
+      label: Before submitting
+      options:
+        - label: I searched existing issues and found no duplicates
+          required: true
diff --git a/.github/ISSUE_TEMPLATE/200-feature-request.yml b/.github/ISSUE_TEMPLATE/200-feature-request.yml
new file mode 100644
index 00000000..3aa7de25
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/200-feature-request.yml
@@ -0,0 +1,27 @@
+name: Feature Request
+description: Suggest a new feature or enhancement
+title: "[Feature]: "
+labels: ["type: feature", "status: needs-triage"]
+body:
+  - type: textarea
+    id: motivation
+    attributes:
+      label: Motivation
+      description: What problem does this solve? Why do you need it?
+    validations:
+      required: true
+  - type: textarea
+    id: proposal
+    attributes:
+      label: Proposed Solution
+      description: How should this work? Include API sketches if relevant.
+    validations:
+      required: true
+  - type: textarea
+    id: alternatives
+    attributes:
+      label: Alternatives Considered
+  - type: textarea
+    id: context
+    attributes:
+      label: Additional Context
diff --git a/.github/ISSUE_TEMPLATE/300-performance.yml b/.github/ISSUE_TEMPLATE/300-performance.yml
new file mode 100644
index 00000000..d2aa9007
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/300-performance.yml
@@ -0,0 +1,59 @@
+name: Performance Issue
+description: Report a performance regression or improvement opportunity
+title: "[Perf]: "
+labels: ["type: performance", "status: needs-triage"]
+body:
+  - type: textarea
+    id: description
+    attributes:
+      label: Description
+      description: What performance issue did you observe?
+      placeholder: "QPS dropped from X to Y after upgrading to version Z"
+    validations:
+      required: true
+  - type: textarea
+    id: benchmark
+    attributes:
+      label: Benchmark Command
+      description: The exact command you ran
+      render: shell
+    validations:
+      required: true
+  - type: textarea
+    id: results
+    attributes:
+      label: Results
+      description: Expected vs actual numbers (QPS, latency, TTFT, TPOT, etc.)
+      placeholder: |
+        Expected: ~5000 QPS, p99 latency < 200ms
+        Actual: ~2000 QPS, p99 latency 800ms
+    validations:
+      required: true
+  - type: textarea
+    id: environment
+    attributes:
+      label: Environment
+      description: Hardware, OS, Python version, endpoint server details
+      placeholder: |
+        Hardware: 8x A100 80GB
+        OS: Ubuntu 22.04
+        Python: 3.12
+        Server: vLLM 0.6.0, Llama-3-70B
+        Workers: 4
+    validations:
+      required: true
+  - type: textarea
+    id: profiling
+    attributes:
+      label: Profiling Data (optional)
+      description: Any profiling output, flame graphs, or bottleneck analysis
+      render: shell
+  - type: checkboxes
+    id: checklist
+    attributes:
+      label: Before submitting
+      options:
+        - label: I searched existing issues and found no duplicates
+          required: true
+        - label: I ran with default settings before tuning
+          required: false
diff --git a/.github/ISSUE_TEMPLATE/400-dataset-integration.yml b/.github/ISSUE_TEMPLATE/400-dataset-integration.yml
new file mode 100644
index 00000000..67c6673f
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/400-dataset-integration.yml
@@ -0,0 +1,48 @@
+name: Dataset Integration
+description: Request support for a new dataset or evaluation benchmark
+title: "[Dataset]: "
+labels: ["type: feature", "area: dataset", "status: needs-triage"]
+body:
+  - type: textarea
+    id: dataset
+    attributes:
+      label: Dataset Information
+      description: Name, URL, and brief description
+      placeholder: |
+        Name: MATH-500
+        URL: https://huggingface.co/datasets/...
+        Description: 500 competition math problems for testing reasoning
+    validations:
+      required: true
+  - type: dropdown
+    id: format
+    attributes:
+      label: Dataset Format
+      options:
+        - JSONL
+        - HuggingFace Dataset
+        - CSV
+        - JSON
+        - Parquet
+        - Other
+    validations:
+      required: true
+  - type: textarea
+    id: evaluation
+    attributes:
+      label: Evaluation Method
+      description: How should responses be scored?
+      placeholder: "Exact match after extracting boxed answer, or pass@1 for code"
+    validations:
+      required: true
+  - type: textarea
+    id: samples
+    attributes:
+      label: Scale
+      description: Number of samples, expected prompt/response lengths
+      placeholder: "500 samples, avg prompt ~200 tokens, avg response ~500 tokens"
+  - type: textarea
+    id: context
+    attributes:
+      label: Additional Context
+      description: Related benchmarks, papers, or prior art
diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
new file mode 100644
index 00000000..4ac37a65
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,5 @@
+blank_issues_enabled: true
+contact_links:
+  - name: Questions & Discussion
+    url: https://github.com/mlcommons/endpoints/discussions
+    about: Ask questions and discuss ideas before filing an issue

From 202dbc2e02ae9e0747102166baa926f0792a1a99 Mon Sep 17 00:00:00 2001
From: Zhihan Jiang <zhihanj@nvidia.com>
Date: Tue, 7 Apr 2026 15:08:45 -0700
Subject: [PATCH 05/14] docs: expand CONTRIBUTING.md with development guide,
 testing, and issue guidelines

Replace minimal 10-line CONTRIBUTING.md with comprehensive guide covering:
- Ways to contribute with links to issue templates
- Development setup (venv, pip install, pre-commit, echo server)
- Code style (ruff, mypy, conventional commits, serialization)
- Testing (pytest markers, async mode, coverage, fixtures)
- PR process and review expectations
- Issue lifecycle and priority levels
- MLCommons CLA requirements

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 CONTRIBUTING.md | 214 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 208 insertions(+), 6 deletions(-)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 8de1bbe9..8b264dcc 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,11 +1,213 @@
-## Contributing
+# Contributing to MLPerf Inference Endpoints
 
-The best way to contribute to the MLCommons is to get involved with one of our many project communities. You can find more information about getting involved with MLCommons [here](https://mlcommons.org/community/).
+Welcome! We're glad you're interested in contributing. This project is part of
+[MLCommons](https://mlcommons.org/) and aims to build a high-performance
+benchmarking tool for LLM inference endpoints targeting 50k+ QPS.
 
-Generally we encourage people to become MLCommons members if they wish to contribute to MLCommons projects, but outside pull requests are very welcome too.
+## Table of Contents
 
-Regardless of whether you are a member, your organization (or you as an individual contributor) needs to sign the MLCommons Contributor License Agreement (CLA). Please submit your GitHub username to the [MLCommons Subscription form](https://mlcommons.org/community/subscribe/) to start that process.
+- [Ways to Contribute](#ways-to-contribute)
+- [Development Setup](#development-setup)
+- [Code Style and Conventions](#code-style-and-conventions)
+- [Testing](#testing)
+- [Submitting Changes](#submitting-changes)
+- [Issue Guidelines](#issue-guidelines)
+- [MLCommons CLA](#mlcommons-cla)
 
-MLCommons project work is tracked with issue trackers and pull requests. Modify the project in your own fork and issue a pull request once you want other developers to take a look at what you have done and discuss the proposed changes. Ensure that cla-bot and other checks pass for your pull requests.
+## Ways to Contribute
 
-For project-specific development standards (code style, test requirements, pre-commit hooks, commit format), see the [Development Guide](docs/DEVELOPMENT.md).
+- **Report bugs** — use the [Bug Report](https://github.com/mlcommons/endpoints/issues/new?template=100-bug-report.yml) template
+- **Request features** — use the [Feature Request](https://github.com/mlcommons/endpoints/issues/new?template=200-feature-request.yml) template
+- **Report performance issues** — use the [Performance Issue](https://github.com/mlcommons/endpoints/issues/new?template=300-performance.yml) template
+- **Request dataset support** — use the [Dataset Integration](https://github.com/mlcommons/endpoints/issues/new?template=400-dataset-integration.yml) template
+- **Improve documentation** — fix typos, clarify guides, add examples
+- **Pick up an issue** — look for [`good first issue`](https://github.com/mlcommons/endpoints/labels/good%20first%20issue) or [`help wanted`](https://github.com/mlcommons/endpoints/labels/help%20wanted)
+- **Review PRs** — thoughtful reviews are as valuable as code
+
+## Development Setup
+
+### Prerequisites
+
+- Python 3.12+ (3.12 recommended)
+- Git
+- A Unix-like OS (Linux or macOS)
+
+### Getting Started
+
+```bash
+# Fork and clone
+git clone https://github.com/<your-username>/endpoints.git
+cd endpoints
+
+# Create virtual environment
+python3.12 -m venv venv
+source venv/bin/activate
+
+# Install with dev and test extras
+pip install -e ".[dev,test]"
+
+# Install pre-commit hooks
+pre-commit install
+
+# Verify your setup
+pytest -m unit -x --timeout=60
+```
+
+### Local Testing with Echo Server
+
+```bash
+# Start a local echo server
+python -m inference_endpoint.testing.echo_server --port 8765
+
+# Run a quick probe
+inference-endpoint probe --endpoints http://localhost:8765 --model test-model
+```
+
+## Code Style and Conventions
+
+### Formatting and Linting
+
+We use [ruff](https://docs.astral.sh/ruff/) for formatting and linting, and
+[mypy](https://mypy-lang.org/) for type checking. Pre-commit hooks enforce
+these automatically.
+
+```bash
+# Run all checks manually
+pre-commit run --all-files
+```
+
+### Key Conventions
+
+- **Line length:** 88 characters
+- **Quotes:** Double quotes
+- **License headers:** Required on all Python files (auto-added by pre-commit)
+- **Commit messages:** [Conventional commits](https://www.conventionalcommits.org/) — `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `perf:`
+- **Comments:** Only where the *why* isn't obvious from the code. No over-documenting.
+
+### Serialization
+
+- **Hot-path data** (Query, QueryResult, StreamChunk): `msgspec.Struct` — encode/decode with `msgspec.json`, not stdlib json
+- **Configuration**: `pydantic.BaseModel` for validation
+- **Do not** use `dataclass` where neighboring types use `msgspec`
+
+### Performance-Sensitive Code
+
+Code in `load_generator/`, `endpoint_client/worker.py`, and `async_utils/transport/`
+is latency-critical. In these paths:
+
+- No `match` statements — use dict dispatch
+- Minimize async suspends
+- No pydantic validation or excessive logging
+- Use `msgspec` over `json`/`pydantic` for serialization
+
+## Testing
+
+### Running Tests
+
+```bash
+# All tests (excludes slow/performance)
+pytest
+
+# Unit tests only
+pytest -m unit
+
+# Integration tests
+pytest -m integration
+
+# Single file
+pytest -xvs tests/unit/path/to/test_file.py
+
+# With coverage
+pytest --cov=src --cov-report=html
+```
+
+### Test Markers
+
+Every test function **must** have a marker:
+
+```python
+@pytest.mark.unit
+@pytest.mark.asyncio(mode="strict")  # for async tests — must use strict mode
+async def test_something():
+    ...
+```
+
+Available markers: `unit`, `integration`, `slow`, `performance`, `run_explicitly`
+
+### Coverage
+
+Target **>90% coverage** for all new code. Use existing fixtures from
+`tests/conftest.py` (e.g., `mock_http_echo_server`, `mock_http_oracle_server`,
+`dummy_dataset`) rather than mocking.
+
+## Submitting Changes
+
+### Branch Naming
+
+```
+feat/short-description
+fix/short-description
+docs/short-description
+```
+
+### Pull Request Process
+
+1. **Create a focused PR** — one logical change per PR
+2. **Fill out the PR template** — describe what, why, and how to test
+3. **Ensure CI passes** — `pre-commit run --all-files` and `pytest -m unit` locally before pushing
+4. **Link related issues** — use `Closes #123` or `Relates to #123`
+5. **Expect review within 2-3 business days** — reviewers are auto-assigned based on changed files
+
+### What We Look For in Reviews
+
+- Does it follow existing patterns in the codebase?
+- Are tests included and meaningful (not mock-heavy)?
+- Is it focused — no unrelated refactoring or over-engineering?
+- Does it avoid adding unnecessary dependencies?
+
+### After Review
+
+- Address feedback with new commits (don't force-push during review)
+- Once approved, a maintainer will merge
+
+## Issue Guidelines
+
+### Before Filing
+
+1. Search [existing issues](https://github.com/mlcommons/endpoints/issues) for duplicates
+2. Use the appropriate issue template
+3. Provide enough detail to reproduce or understand the request
+
+### Issue Lifecycle
+
+New issues are auto-added to our [project board](https://github.com/orgs/mlcommons/projects/57)
+and flow through: **Inbox → Triage → Ready → In Progress → In Review → Done**
+
+### Priority Levels
+
+| Priority | Meaning |
+|----------|---------|
+| **ShowStopper** | Drop everything — critical blocker |
+| **P0** | Blocks release or users |
+| **P1** | Must address this cycle |
+| **P2** | Address within quarter |
+| **P3** | Backlog, nice to have |
+
+## MLCommons CLA
+
+All contributors must sign the
+[MLCommons Contributor License Agreement](https://mlcommons.org/membership/membership-overview/).
+A CLA bot will check your PR automatically.
+
+To sign up:
+1. Visit the [MLCommons Subscription form](https://mlcommons.org/membership/membership-overview/)
+2. Submit your GitHub username
+3. The CLA bot will verify on your next PR
+
+Pull requests from non-members are welcome — you'll be prompted to sign the CLA
+during the PR process.
+
+## Questions?
+
+Open a [Discussion](https://github.com/mlcommons/endpoints/discussions) or
+file an issue. We aim to respond within a few business days.

From 971650038c6bc15c33c798ff1cfe070d1c1ac53e Mon Sep 17 00:00:00 2001
From: Zhihan Jiang <zhihanj@nvidia.com>
Date: Tue, 7 Apr 2026 15:55:51 -0700
Subject: [PATCH 06/14] feat: add GitHub Action to sync labels to project board
 fields

One-way sync: when priority: or area: labels change on an issue,
the corresponding board custom field is automatically updated.
Labels are the single source of truth.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 .github/workflows/sync-labels-to-board.yml | 150 +++++++++++++++++++++
 1 file changed, 150 insertions(+)
 create mode 100644 .github/workflows/sync-labels-to-board.yml

diff --git a/.github/workflows/sync-labels-to-board.yml b/.github/workflows/sync-labels-to-board.yml
new file mode 100644
index 00000000..8a3eaf83
--- /dev/null
+++ b/.github/workflows/sync-labels-to-board.yml
@@ -0,0 +1,150 @@
+name: Sync Labels to Project Board
+
+on:
+  issues:
+    types: [labeled, unlabeled]
+
+env:
+  PROJECT_ID: "PVT_kwDOBAnwDc4BTQvY"
+  # These IDs are populated from the board's GraphQL field configuration.
+  # To find them: query the board fields via GraphQL and extract option IDs.
+  PRIORITY_FIELD_ID: "PVTSSF_lADOBAnwDc4BTQvYzhBKk68"
+  AREA_FIELD_ID: "PVTSSF_lADOBAnwDc4BTQvYzhBKk7A"
+
+jobs:
+  sync-labels:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Sync priority and area labels to board fields
+        uses: actions/github-script@v7
+        with:
+          script: |
+            const issue = context.payload.issue;
+            const labels = issue.labels.map(l => l.name);
+
+            // --- Field and option ID mappings ---
+            // Priority field
+            const PRIORITY_FIELD_ID = process.env.PRIORITY_FIELD_ID;
+            const PRIORITY_MAP = {
+              'priority: ShowStopper': process.env.SHOWSTOPPER_OPTION_ID,
+              'priority: P0': process.env.P0_OPTION_ID,
+              'priority: P1': process.env.P1_OPTION_ID,
+              'priority: P2': process.env.P2_OPTION_ID,
+              'priority: P3': process.env.P3_OPTION_ID,
+            };
+
+            // Area field
+            const AREA_FIELD_ID = process.env.AREA_FIELD_ID;
+            const AREA_MAP = {
+              'area: core-engine': process.env.CORE_ENGINE_OPTION_ID,
+              'area: client': process.env.CLIENT_OPTION_ID,
+              'area: metrics': process.env.METRICS_OPTION_ID,
+              'area: dataset': process.env.DATASET_OPTION_ID,
+              'area: config-cli': process.env.CONFIG_CLI_OPTION_ID,
+              'area: evaluation': process.env.EVALUATION_OPTION_ID,
+              'area: adapters': process.env.ADAPTERS_OPTION_ID,
+              'area: mlcommons': process.env.MLCOMMONS_OPTION_ID,
+            };
+
+            const PROJECT_ID = process.env.PROJECT_ID;
+
+            // Find the board item for this issue
+            const findItemQuery = `
+              query($projectId: ID!, $cursor: String) {
+                node(id: $projectId) {
+                  ... on ProjectV2 {
+                    items(first: 100, after: $cursor) {
+                      nodes {
+                        id
+                        content {
+                          ... on Issue { number }
+                        }
+                      }
+                      pageInfo { hasNextPage endCursor }
+                    }
+                  }
+                }
+              }
+            `;
+
+            let itemId = null;
+            let cursor = null;
+            while (!itemId) {
+              const result = await github.graphql(findItemQuery, {
+                projectId: PROJECT_ID,
+                cursor: cursor,
+              });
+              const items = result.node.items;
+              const match = items.nodes.find(
+                n => n.content && n.content.number === issue.number
+              );
+              if (match) {
+                itemId = match.id;
+                break;
+              }
+              if (!items.pageInfo.hasNextPage) break;
+              cursor = items.pageInfo.endCursor;
+            }
+
+            if (!itemId) {
+              core.info(`Issue #${issue.number} not found on board, skipping.`);
+              return;
+            }
+
+            // Helper to update a single-select field
+            async function setField(fieldId, optionId) {
+              if (!optionId) {
+                // Clear the field
+                await github.graphql(`
+                  mutation($projectId: ID!, $itemId: ID!, $fieldId: ID!) {
+                    clearProjectV2ItemFieldValue(input: {
+                      projectId: $projectId, itemId: $itemId, fieldId: $fieldId
+                    }) { projectV2Item { id } }
+                  }
+                `, { projectId: PROJECT_ID, itemId, fieldId });
+              } else {
+                await github.graphql(`
+                  mutation($projectId: ID!, $itemId: ID!, $fieldId: ID!, $optionId: String!) {
+                    updateProjectV2ItemFieldValue(input: {
+                      projectId: $projectId, itemId: $itemId, fieldId: $fieldId,
+                      value: { singleSelectOptionId: $optionId }
+                    }) { projectV2Item { id } }
+                  }
+                `, { projectId: PROJECT_ID, itemId, fieldId, optionId });
+              }
+            }
+
+            // Sync priority: find the highest-priority label on the issue
+            const priorityOrder = [
+              'priority: ShowStopper',
+              'priority: P0',
+              'priority: P1',
+              'priority: P2',
+              'priority: P3',
+            ];
+            const activePriority = priorityOrder.find(p => labels.includes(p));
+            const priorityOptionId = activePriority ? PRIORITY_MAP[activePriority] : null;
+            await setField(PRIORITY_FIELD_ID, priorityOptionId);
+            core.info(`Priority set to: ${activePriority || '(cleared)'}`);
+
+            // Sync area: use the first area label found
+            const activeArea = labels.find(l => l.startsWith('area: '));
+            const areaOptionId = activeArea ? AREA_MAP[activeArea] : null;
+            await setField(AREA_FIELD_ID, areaOptionId);
+            core.info(`Area set to: ${activeArea || '(cleared)'}`);
+        env:
+          PRIORITY_FIELD_ID: ${{ env.PRIORITY_FIELD_ID }}
+          AREA_FIELD_ID: ${{ env.AREA_FIELD_ID }}
+          SHOWSTOPPER_OPTION_ID: "26ab336c"
+          P0_OPTION_ID: "d3612dd9"
+          P1_OPTION_ID: "7ff45c96"
+          P2_OPTION_ID: "e41b2ee9"
+          P3_OPTION_ID: "d4d24170"
+          CORE_ENGINE_OPTION_ID: "db5c9511"
+          CLIENT_OPTION_ID: "ffeff676"
+          METRICS_OPTION_ID: "04637e5a"
+          DATASET_OPTION_ID: "b493fd0d"
+          CONFIG_CLI_OPTION_ID: "ae1f5588"
+          EVALUATION_OPTION_ID: "96e592b6"
+          ADAPTERS_OPTION_ID: "6c615274"
+          MLCOMMONS_OPTION_ID: "d5eff045"

From 542466d8f5a9aab8c5850132f47b464d69068ae8 Mon Sep 17 00:00:00 2001
From: Zhihan Jiang <zhihanj@nvidia.com>
Date: Wed, 8 Apr 2026 09:58:16 -0700
Subject: [PATCH 07/14] chore: clean up repo structure and overhaul README
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Remove .cursor/rules/ (migrated to CLAUDE.md/AGENTS.md)
- Remove docs/superpowers/ plans and specs (local-only artifacts)
- Add .cursor/ and docs/superpowers/ to .gitignore
- Overhaul README.md: remove emojis, remove inline contributor list
  (use git log/ATTRIBUTION instead), align architecture section with
  AGENTS.md, add badges, streamline to match OSS best practices
- Contributors section removed — credit lives in git history and
  ATTRIBUTION file

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 .cursor/rules/endpoint-rules.mdc              |  118 --
 .cursor/rules/msgspec-patterns.mdc            |  534 --------
 .cursor/rules/python-antipatterns.mdc         |  658 ----------
 .gitignore                                    |    9 +-
 README.md                                     |  214 +---
 .../plans/2026-04-07-project-management.md    | 1092 -----------------
 .../2026-04-07-project-management-design.md   |  605 ---------
 7 files changed, 71 insertions(+), 3159 deletions(-)
 delete mode 100644 .cursor/rules/endpoint-rules.mdc
 delete mode 100644 .cursor/rules/msgspec-patterns.mdc
 delete mode 100644 .cursor/rules/python-antipatterns.mdc
 delete mode 100644 docs/superpowers/plans/2026-04-07-project-management.md
 delete mode 100644 docs/superpowers/specs/2026-04-07-project-management-design.md

diff --git a/.cursor/rules/endpoint-rules.mdc b/.cursor/rules/endpoint-rules.mdc
deleted file mode 100644
index aff2d460..00000000
--- a/.cursor/rules/endpoint-rules.mdc
+++ /dev/null
@@ -1,118 +0,0 @@
----
-description:
-globs:
-alwaysApply: true
----
-# Cursor Rules for Python Project Development
-
-## Core Development Principles
-
-### 1. Planning-First Development
-- **Strict Separation**: Implementation MUST NOT begin until planning for the current step is complete
-- All architectural decisions, component interfaces, and implementation approaches must be documented before coding
-- Each development cycle follows: Plan ? Review Plan ? Implement ? Update Documentation
-
-### 2. Testing Requirements
-- **Mandatory Unit Tests**: Every new component that requires testing MUST have corresponding unit tests
-- **Pre-commit Validation**: All unit tests and pre-commit checks MUST pass before pushing to main repository
-- **No Exceptions**: Failed tests or checks block all commits until resolved
-
-### 3. Scratchpad Documentation System
-All planning and tracking must be maintained in `.cursor_artifacts/` directory.
-
-#### Required Files:
-- `.cursor_artifacts/hierarchy.md` - Project folder structure, module organization, and architectural overview
-- `.cursor_artifacts/progress.md` - Current status, completed tasks, next steps, and milestone tracking
-- `.cursor_artifacts/learning.md` - Technical insights, lessons learned, design decisions, and gotchas
-- `.cursor_artifacts/design.md` - System design, component interfaces, data models, and API specifications
-- `.cursor_artifacts/testing-strategy.md` - Test plans, coverage requirements, and testing approaches
-- `.cursor_artifacts/deployment.md` - Deployment procedures, environment configs, and release notes
-- `.cursor_artifacts/refactoring-log.md` - Planned and completed refactoring activities with justifications, keep empty if there's no major refactoring
-
-#### File Management:
-- **Size Limit**: Each scratchpad file MUST NOT exceed 1000 lines
-- **Regular Maintenance**: Split large files into focused sub-documents when approaching limit
-- **Consistent Updates**: Update relevant scratchpad files after each implementation phase
-
-### 4. Commit and Review Standards
-- **Post-Implementation Updates**: Always update `.cursor_artifacts/` scratchpad files after each implementation
-- **Small, Focused Changes**: Keep commits and reviews reasonably sized for effective review
-- **Clear Commit Messages**: Use conventional commit format with clear descriptions
-- **Documentation Sync**: Ensure documentation reflects current implementation state
-
-### 5. Python Best Practices
-- Follow PEP 8 style guidelines and modern Python idioms
-- Use type hints for all function signatures and complex variables
-- Implement proper error handling with specific exception types
-- Apply SOLID principles and clean code practices
-- Use dataclasses, context managers, and pathlib where appropriate
-- Follow async/await patterns for asynchronous code
-- Implement proper logging instead of print statements
-
-### 6. Change Control and Approval
-#### Automatic Approval (Small Changes):
-- Bug fixes within existing functionality
-- Adding unit tests
-- Documentation updates
-- Minor refactoring within single functions/methods
-- Code formatting and style improvements
-
-#### User Approval Required (Significant Changes):
-- **Major Refactoring**: Restructuring classes, modules, or architectural changes
-- **API Changes**: Modifying public interfaces or breaking changes
-- **Large Deletions**: Removing significant portions of existing code, documentation, or scratchpad content
-- **New Dependencies**: Adding external libraries or changing build requirements
-- **Database Schema Changes**: Migrations or structural data changes
-
-#### Approval Process:
-1. Document proposed changes in appropriate `.cursor_artifacts/` file
-2. Clearly outline impact, benefits, and risks
-3. Request explicit user approval before implementation
-4. Provide rollback plan for significant changes
-
-### 7. Comprehensive Testing Strategy
-- **Test Coverage**: Aim for >90% code coverage for business logic
-- **Test Types**: Unit tests, integration tests, and end-to-end tests as appropriate
-- **Edge Cases**: Test boundary conditions, error scenarios, and edge cases
-- **Test Documentation**: Clear test descriptions explaining what is being tested and why
-- **Mock Strategy**: Use appropriate mocking for external dependencies
-- **Performance Tests**: Include performance benchmarks for critical paths
-- **Test Data**: Use factories or fixtures for consistent test data setup
-
-### 8. Additional Development Standards
-
-#### Code Quality:
-- Use static analysis tools (pylint, mypy, black, isort)
-- Implement pre-commit hooks for automated quality checks
-- Regular code reviews focusing on maintainability and performance
-- Document complex algorithms and business logic
-
-#### Version Control:
-- Use feature branches for all development work
-- Squash commits when merging to maintain clean history
-- Tag releases with semantic versioning
-- Maintain changelog with user-facing changes
-
-#### Security and Performance:
-- Validate all user inputs and sanitize outputs
-- Use secure coding practices (no hardcoded secrets, proper authentication)
-- Profile performance-critical code sections
-- Monitor and log security-relevant events
-
-#### Dependencies and Environment:
-- Pin dependency versions in requirements files
-- Use virtual environments for all development work
-- Document environment setup and deployment procedures
-- Regular dependency updates with testing
-
-## Enforcement
-These rules are mandatory for all development work. Violations should be caught in pre-commit hooks, code review, or CI/CD pipeline. Any rule exceptions require explicit documentation and user approval.
-
-## Other user-defined rules
-- Always double-check the validity of the output, never hallucinate and lie about things that you don't know about.
-- Avoid refactoring the whole projects, and always ask for permission before doing a major refactor.
-- Look for clues and never be lazy about validating the facts.
-- Be diligent in checking if a component has already been implemented and can be reused. Avoid re-implementing wheels for parts that have already been built in the project. Double think if the reused components fit in the logic or not. If necessary, always use a single source of truth in the code repo (e.g. VERSION) instead of randomly hardcoding it everywhere in the code
-- If the logic is incomplete in the code, add comment about it. Don't just assume the user will dig and find it out.
-- Follow the best practice of whatever language you are writing in. For example in Python, don't put a lazy import unless carefully thought about.
-- When running pytest, make sure you pipe the output either to commandline or some file, so you don't need to run it repetitively to grep a failed test.
diff --git a/.cursor/rules/msgspec-patterns.mdc b/.cursor/rules/msgspec-patterns.mdc
deleted file mode 100644
index fa637ea9..00000000
--- a/.cursor/rules/msgspec-patterns.mdc
+++ /dev/null
@@ -1,534 +0,0 @@
----
-description: python performance critical code ; python msgspec usage guide
-alwaysApply: false
----
-## 2. Use Structs for Structured Data
-
-**Rule:** Always prefer `msgspec.Struct` over `dict`, `dataclasses`, or `attrs` for structured data with a known schema.
-
-**Why:** Structs are 5-60x faster for common operations and are optimized for encoding/decoding.
-
-```python
-# BAD: Using dict or dataclass
-from dataclasses import dataclass
-
-@dataclass
-class UserBad:
-    name:  str
-    email: str
-    age: int
-
-# GOOD: Using msgspec. Struct
-import msgspec
-
-class User(msgspec. Struct):
-    name: str
-    email: str
-    age: int
-
-# Usage
-user = User(name="alice", email="alice@example.com", age=30)
-data = msgspec.json.encode(user)
-decoded = msgspec.json.decode(data, type=User)
-```
-
----
-
-## 3. Omit Default Values
-
-**Rule:** Set `omit_defaults=True` on Struct definitions when default values are known on both encoding and decoding ends.
-
-**Why:** Reduces encoded message size and improves both encoding and decoding performance.
-
-```python
-# BAD:  Encoding all fields including defaults
-class ConfigBad(msgspec.Struct):
-    host: str = "localhost"
-    port: int = 8080
-    debug: bool = False
-    timeout: int = 30
-
-# GOOD:  Omit default values
-class Config(msgspec. Struct, omit_defaults=True):
-    host: str = "localhost"
-    port: int = 8080
-    debug: bool = False
-    timeout: int = 30
-
-# Only non-default values are encoded
-config = Config(host="production.example.com")
-data = msgspec.json.encode(config)
-# Result: b'{"host":"production.example.com"}' instead of full object
-```
-
----
-
-## 4. Avoid Decoding Unused Fields
-
-**Rule:** Define smaller "view" Struct types that only contain the fields you actually need.
-
-**Why:** msgspec skips decoding fields not defined in your Struct, reducing allocations and CPU time.
-
-```python
-# BAD:  Decoding entire large object when you only need a few fields
-class FullTweet(msgspec. Struct):
-    id: int
-    id_str: str
-    full_text: str
-    user: dict
-    entities: dict
-    extended_entities: dict
-    retweet_count:  int
-    favorite_count: int
-    # ...  many more fields
-
-# GOOD: Define minimal structs for your use case
-class User(msgspec. Struct):
-    name: str
-
-class TweetView(msgspec.Struct):
-    user: User
-    full_text: str
-    favorite_count: int
-
-# Only these 3 fields are decoded, rest is skipped
-tweet = msgspec.json.decode(large_json_response, type=TweetView)
-print(tweet.user. name)  # Access only what you need
-```
-
----
-
-## 5. Use encode_into for Buffer Reuse
-
-**Rule:** Compare and try-use `Encoder.encode_into()` with a pre-allocated `bytearray` in hot loops instead of `encode()`.
-
-**Why:** Avoids allocating a new `bytes` object for each encode operation.
-
-```python
-# BAD:  New bytes object allocated for each message
-def send_messages_bad(socket, msgs):
-    encoder = msgspec.msgpack.Encoder()
-    for msg in msgs:
-        data = encoder.encode(msg)  # New bytes object each time
-        socket. sendall(data)
-
-# POSSIBLY-GOOD ALWAYS MEASURE:  Reuse a buffer
-def send_messages_good(socket, msgs):
-    encoder = msgspec.msgpack.Encoder()
-    buffer = bytearray(1024)  # Pre-allocate once
-
-    for msg in msgs:
-        n = encoder.encode_into(msg, buffer)  # Reuse buffer
-        socket.sendall(memoryview(buffer)[:n])  # Send only encoded bytes
-```
-
----
-
-## 6. Line-Delimited JSON (NDJSON)
-
-**Rule:** Compare and try use `encode_into()` with `buffer.extend()` for line-delimited JSON to avoid copies.
-
-**Why:** Avoids unnecessary copying when appending newlines to JSON messages.
-
-```python
-# BAD: Unnecessary copy with string concatenation
-def write_ndjson_bad(file, messages):
-    for msg in messages:
-        json_msg = msgspec. json.encode(msg)
-        full_payload = json_msg + b'\n'  # Creates a copy
-        file. write(full_payload)
-
-# POSSIBLY-GOOD ALWAYS MEASURE: Zero-copy with encode_into
-def write_ndjson_good(file, messages):
-    encoder = msgspec.json.Encoder()
-    buffer = bytearray(64)  # Pre-allocate with reasonable size
-
-    for msg in messages:
-        n = encoder.encode_into(msg, buffer)
-        file.write(memoryview(buffer)[:n])  # Write only encoded bytes
-        file.write(b"\n")
-```
-
----
-
-## 7. Length-Prefix Framing
-
-**Rule:** Use `encode_into()` with an offset for length-prefix framing.
-
-**Why:** Efficiently prepends message length without extra copies.
-
-```python
-import msgspec
-
-def send_length_prefixed(socket, msg):
-    encoder = msgspec.msgpack.Encoder()
-    buffer = bytearray(64)
-
-    # Encode into buffer, leaving 4 bytes at front for length prefix
-    n = encoder.encode_into(msg, buffer, 4)
-
-    # Write message length as 4-byte big-endian integer at the start
-    buffer[:4] = n.to_bytes(4, "big")
-
-    socket.sendall(memoryview(buffer)[:4 + n])
-
-async def prefixed_send(stream, buffer:  bytes) -> None:
-    """Write a length-prefixed buffer to an async stream"""
-    prefix = len(buffer).to_bytes(4, "big")
-    stream.write(prefix)
-    stream.write(buffer)
-    await stream.drain()
-
-async def prefixed_recv(stream) -> bytes:
-    """Read a length-prefixed buffer from an async stream"""
-    prefix = await stream.readexactly(4)
-    n = int.from_bytes(prefix, "big")
-    return await stream.readexactly(n)
-```
-
----
-
-## 8. Use MessagePack Instead of JSON
-
-**Rule:** Consider using `msgspec.msgpack` instead of `msgspec.json` for internal APIs.
-
-**Why:** MessagePack is a more compact binary format and can be more performant than JSON.
-
-```python
-import msgspec
-
-class Event(msgspec. Struct):
-    type: str
-    data: dict
-    timestamp: float
-
-# Use MessagePack for internal service communication
-encoder = msgspec.msgpack.Encoder()
-decoder = msgspec.msgpack. Decoder(Event)
-
-event = Event(type="user_login", data={"user_id": 123}, timestamp=1703424000.0)
-packed = encoder.encode(event)  # More compact than JSON
-decoded = decoder.decode(packed)
-```
-
----
-
-## 9. Use gc=False for Long-Lived Objects
-
-**Rule:** Set `gc=False` on Struct types that will never participate in reference cycles and are long-lived.
-
-**Why:** Reduces garbage collector overhead and pause times by up to 75x.
-
-### What is gc=False?
-
-The `gc=False` option tells Python's garbage collector to never track instances of that Struct type.
-By default, Python's cyclic garbage collector tracks objects that could potentially participate in reference cycles.
-When you set `gc=False`, you're telling msgspec:  "I guarantee these objects will never be part of a reference cycle, so don't bother tracking them."
-
-### Performance Impact
-
-Key takeaways:
-- `gc=False` reduces GC pause time by 75x compared to standard classes
-- `gc=False` saves 16 bytes per instance (no GC header needed)
-- Regular msgspec structs are already 6x faster for GC than standard classes
-
-### When to Use gc=False
-
-Use `gc=False` when:
-- You're allocating a large number of Struct objects at once (e.g., decoding a large JSON response with thousands of items)
-- You have long-lived Struct objects in memory (e.g., a large cache of data objects)
-- Your Struct only contains scalar/primitive values (ints, floats, strings, bools, bytes)
-- You are 100% certain the Struct will NEVER participate in a reference cycle
-
-DO NOT use `gc=False` when:
-- Your Struct contains references to itself or other Structs (potential cycles)
-- Your Struct is part of a parent-child relationship where parent references child and child references parent
-- You're unsure whether cycles could occur
-
-ALWAYS MEASURE performance impact.
-
-### Decision Tree:  Should I Use gc=False?
-
-```
-Should I use gc=False?
-|
-+-- Does your Struct only contain scalar types (int, float, str, bool, bytes)?
-|   +-- YES --> SAFE to use gc=False
-|
-+-- Does your Struct contain lists/dicts but YOU control what goes in them?
-|   +-- Will you EVER put the struct itself (or a parent) into those containers?
-|       +-- NO --> Probably safe, but test carefully
-|       +-- YES/MAYBE --> Do NOT use gc=False
-|
-+-- Does your Struct have a reference to another Struct of the same type?
-|   +-- YES --> Do NOT use gc=False (e.g., tree nodes, linked lists)
-|
-+-- Is your Struct part of a parent-child bidirectional relationship?
-|   +-- YES --> Do NOT use gc=False
-|
-+-- When in doubt --> Do NOT use gc=False
-```
-
-### Examples
-
-```python
-# SAFE: Simple data objects with only scalar values
-class Point(msgspec. Struct, gc=False):
-    x: float
-    y: float
-    z: float
-
-class LogEntry(msgspec. Struct, gc=False):
-    timestamp: float
-    level: str
-    message: str
-    source: str
-
-class CacheEntry(msgspec.Struct, gc=False):
-    key: str
-    value: str
-    ttl: int
-    created_at: float
-
-# SAFE:  Structs containing only tuples of scalars
-class Package(msgspec. Struct, gc=False):
-    name: str
-    version:  str
-    depends: tuple[str, ...]  # immutable tuple of strings
-    size: int
-
-# UNSAFE: Self-referential structures - DO NOT use gc=False
-class TreeNode(msgspec. Struct):  # NO gc=False here!
-    value: int
-    children: list["TreeNode"]
-    parent: "TreeNode | None" = None
-```
-
-### Real-World Example:  Decoding Large JSON
-
-```python
-import msgspec
-from typing import Union
-
-# When decoding large JSON files (like package repositories),
-# gc=False significantly improves performance
-class Package(msgspec. Struct, gc=False):
-    build:  str
-    build_number: int
-    depends: tuple[str, ...]  # Use tuple, not list - immutable
-    md5: str
-    name: str
-    sha256: str
-    subdir: str
-    version: str
-    license: str = ""
-    noarch: Union[str, bool, None] = None
-    size: int = 0
-    timestamp: int = 0
-
-class RepoData(msgspec. Struct, gc=False):
-    repodata_version: int
-    info: dict
-    packages: dict[str, Package]
-    removed:  tuple[str, ...]  # Use tuple, not list
-
-# Create a typed decoder for maximum performance
-decoder = msgspec.json.Decoder(RepoData)
-
-def load_repo_data(path: str) -> RepoData:
-    with open(path, "rb") as f:
-        return decoder.decode(f.read())
-```
-
-## 10. Use array_like=True for Maximum Performance
-
-**Rule:** Set `array_like=True` when both ends know the field schema and you need maximum performance.
-
-**Why:** Encodes structs as arrays instead of objects, removing field names from the message.
-
-```python
-# Standard encoding includes field names
-class PointStandard(msgspec. Struct):
-    x: float
-    y: float
-    z: float
-
-# Encodes as:  b'{"x": 1.0,"y":2.0,"z":3.0}'
-
-# Array-like encoding removes field names
-class Point(msgspec. Struct, array_like=True):
-    x: float
-    y: float
-    z: float
-
-point = Point(1.0, 2.0, 3.0)
-data = msgspec.json.encode(point)
-# Result: b'[1.0,2.0,3.0]' - smaller and faster
-
-decoded = msgspec.json.decode(data, type=Point)
-# Works correctly: Point(x=1.0, y=2.0, z=3.0)
-```
-
----
-
-## 11. Tagged Unions for Polymorphic Types
-
-**Rule:** Use `tag=True` on Struct types when handling multiple message types in a single union.
-
-**Why:** Enables efficient discrimination between types during decoding.
-
-```python
-import msgspec
-
-# Define request types with tagging
-class GetRequest(msgspec. Struct, tag=True):
-    key: str
-
-class PutRequest(msgspec.Struct, tag=True):
-    key: str
-    value: str
-
-class DeleteRequest(msgspec.Struct, tag=True):
-    key: str
-
-class ListRequest(msgspec.Struct, tag=True):
-    prefix: str = ""
-
-# Union type for all requests
-Request = GetRequest | PutRequest | DeleteRequest | ListRequest
-
-# Single decoder handles all types
-decoder = msgspec.msgpack.Decoder(Request)
-
-# Decoding automatically determines the correct type
-data = msgspec.msgpack.encode(PutRequest(key="foo", value="bar"))
-request = decoder.decode(data)
-
-match request:
-    case GetRequest(key):
-        print(f"Get:  {key}")
-    case PutRequest(key, value):
-        print(f"Put: {key}={value}")
-    case DeleteRequest(key):
-        print(f"Delete: {key}")
-    case ListRequest(prefix):
-        print(f"List: {prefix}")
-```
-
----
-
-## 12. Use Struct Configuration Options
-
-**Rule:** Combine Struct options for cleaner, more robust code.
-
-```python
-import msgspec
-
-class Base(
-    msgspec. Struct,
-    omit_defaults=True,          # Don't encode default values
-    forbid_unknown_fields=True,  # Error on unknown fields (good for config files)
-    rename="kebab",              # Use kebab-case in JSON (my_field -> my-field)
-):
-    """Base class with common configuration."""
-    pass
-
-class ServerConfig(Base):
-    host: str = "localhost"
-    port: int = 8080
-    max_connections: int = 100
-    enable_ssl: bool = False
-
-# Decodes kebab-case JSON:  {"host": "prod", "max-connections": 500}
-config = msgspec.json.decode(
-    b'{"host":"prod","max-connections":  500}',
-    type=ServerConfig
-)
-# Result: ServerConfig(host='prod', port=8080, max_connections=500, enable_ssl=False)
-```
-
----
-
-## 13. TOML Configuration Files
-
-**Rule:** Use msgspec for parsing pyproject.toml and other TOML config files with validation.
-
-```python
-import msgspec
-from typing import Any
-
-class BuildSystem(msgspec. Struct, omit_defaults=True, rename="kebab"):
-    requires:  list[str] = []
-    build_backend: str | None = None
-
-class Project(msgspec. Struct, omit_defaults=True, rename="kebab"):
-    name: str | None = None
-    version: str | None = None
-    description: str | None = None
-    requires_python: str | None = None
-    dependencies: list[str] = []
-
-class PyProject(msgspec. Struct, omit_defaults=True, rename="kebab"):
-    build_system: BuildSystem | None = None
-    project:  Project | None = None
-    tool: dict[str, dict[str, Any]] = {}
-
-def load_pyproject(path: str) -> PyProject:
-    with open(path, "rb") as f:
-        return msgspec.toml.decode(f.read(), type=PyProject)
-```
-
-## Common Patterns
-
-### API Response Handler
-
-```python
-import msgspec
-from typing import TypeVar, Generic
-
-T = TypeVar('T')
-
-class APIResponse(msgspec. Struct, Generic[T], omit_defaults=True):
-    data: T | None = None
-    error: str | None = None
-    status: int = 200
-
-class User(msgspec. Struct):
-    id: int
-    name: str
-    email: str
-
-# Create typed decoder for specific response type
-user_response_decoder = msgspec. json.Decoder(APIResponse[User])
-
-def parse_user_response(raw:  bytes) -> APIResponse[User]:
-    return user_response_decoder.decode(raw)
-```
-
-## Struct Configuration Options Summary
-
-| Option | Description | Default |
-|--------|-------------|---------|
-| `omit_defaults` | Omit fields with default values when encoding | `False` |
-| `forbid_unknown_fields` | Error on unknown fields when decoding | `False` |
-| `frozen` | Make instances immutable and hashable | `False` |
-| `order` | Generate ordering methods (`__lt__`, etc.) | `False` |
-| `eq` | Generate equality methods | `True` |
-| `kw_only` | Make all fields keyword-only | `False` |
-| `tag` | Enable tagged union support | `None` |
-| `tag_field` | Field name for the tag | `"type"` |
-| `rename` | Rename fields for encoding/decoding | `None` |
-| `array_like` | Encode/decode as arrays instead of objects | `False` |
-| `gc` | Enable garbage collector tracking | `True` |
-| `weakref` | Enable weak reference support | `False` |
-| `dict` | Add `__dict__` attribute | `False` |
-| `cache_hash` | Cache the hash value | `False` |
-
----
-
-## References
-
-- Official Documentation: https://jcristharif.com/msgspec/
-- Performance Tips: https://jcristharif.com/msgspec/perf-tips.html
-- Structs Documentation: https://jcristharif.com/msgspec/structs.html
-- GC Configuration: https://jcristharif.com/msgspec/structs.html#struct-gc
diff --git a/.cursor/rules/python-antipatterns.mdc b/.cursor/rules/python-antipatterns.mdc
deleted file mode 100644
index ece51ff2..00000000
--- a/.cursor/rules/python-antipatterns.mdc
+++ /dev/null
@@ -1,658 +0,0 @@
----
-globs: **/*.py
-alwaysApply: false
----
-
-Try avoid these performance antipatterns in python code you write:
-
-***
-
-### 1. **Match statements (sequence)**
-- **Slow**
-```python
-def sequence_match_logical():
-    seq = ["🐸", "🐛", "🦋", "🪲"]
-    frogs = 0
-    for _ in range(100_000):
-        if isinstance(seq, Sequence) and len(seq) > 0 and seq[0] == "🐸":
-            frogs += 1
-```
-- **Fast**
-```python
-def sequence_match_statement():
-    seq = ["🐸", "🐛", "🦋", "🪲"]
-    frogs = 0
-    for _ in range(100_000):
-        match seq:
-            case ["🐸", *_]: frogs += 1
-```
-
-***
-
-### 2. **Match statements (literal)**
-- **Slow**
-```python
-def literal_match_logical():
-    seq = ["🐊", "🐛", "🐈", "🦋", "🪲", "🐳"]
-    butterflies, caterpillars, beetles = 0, 0, 0
-    for _ in range(100_000):
-        for x in seq:
-            if x == "🦋":
-                butterflies += 1
-            elif x == "🐛":
-                caterpillars += 1
-            elif x == "🪲":
-                beetles += 1
-```
-- **Fast**
-```python
-def literal_match_statement():
-    seq = ["🐊", "🐛", "🐈", "🦋", "🪲", "🐳"]
-    butterflies, caterpillars, beetles = 0, 0, 0
-    for _ in range(100_000):
-        for x in seq:
-            match x:
-                case "🦋": butterflies += 1
-                case "🐛": caterpillars += 1
-                case "🪲": beetles += 1
-```
-
-***
-
-### 3. **Match statements (mapping)**
-- **Slow**
-```python
-def mapping_match_logical():
-    boats = [
-        {"🐓": 1}, {"🦊": 1, "🌽": 1},
-        {"🐓": 1, "🌽": 1}, {"🐓": 1, "🦊": 1},
-    ]
-    problems = valid_boats = 0
-    for _ in range(100_000):
-        for boat in boats:
-            if isinstance(boat, Mapping):
-                if "🐓" in boat and "🌽" in boat:
-                    problems += 1
-                elif "🐓" in boat and "🦊" in boat:
-                    problems += 1
-                else:
-                    valid_boats += 1
-```
-- **Fast**
-```python
-def mapping_match_statement():
-    boats = [
-        {"🐓": 1}, {"🦊": 1, "🌽": 1},
-        {"🐓": 1, "🌽": 1}, {"🐓": 1, "🦊": 1},
-    ]
-    problems = valid_boats = 0
-    for _ in range(100_000):
-        for boat in boats:
-            match boat:
-                case {"🐓": _, "🌽": _}: problems += 1
-                case {"🐓": _, "🦊": _}: problems += 1
-                case _: valid_boats += 1
-```
-
-***
-
-### 4. **Match statements (classes)**
-- **Slow**
-```python
-def bench_class_matching_logical():
-    drivers = [
-        Driver(name="Max Verstappen", team="Red Bull"),
-        Driver(name="Sergio Perez", team="Red Bull"),
-        Driver(name="Charles Leclerc", team="Ferrari"),
-        Driver(name="Lewis Hamilton", team="Mercedes"),
-    ]
-    for _ in range(100_000):
-        for driver in drivers:
-            if not isinstance(driver, Driver):
-                desc = "Invalid request"
-            elif driver.name == "Max Verstappen":
-                desc = "Max Verstappen, the current world #1"
-            elif driver.team == "Ferrari":
-                desc = f"{driver.name}, a Ferrari driver!! 🐎"
-            else:
-                desc = f"{driver.name}, a {driver.team} driver."
-```
-- **Fast**
-```python
-def bench_class_matching_statement():
-    drivers = [
-        Driver(name="Max Verstappen", team="Red Bull"),
-        Driver(name="Sergio Perez", team="Red Bull"),
-        Driver(name="Charles Leclerc", team="Ferrari"),
-        Driver(name="Lewis Hamilton", team="Mercedes"),
-    ]
-    for _ in range(100_000):
-        for driver in drivers:
-            match driver:
-                case Driver(name="Max Verstappen"): desc = "Max Verstappen, the current world #1"
-                case Driver(name=name, team="Ferrari"): desc = f"{name}, a Ferrari driver!! 🐎"
-                case Driver(name=name, team=team): desc = f"{name}, a {team} driver."
-                case _: desc = "Invalid request"
-```
-
-***
-
-### 5. **Inline globals in loop**
-- **Slow**
-```python
-def global_constant_in_loop():
-    total = MY_GLOBAL_CONSTANT_A
-    for i in range(10_000):
-        total += i * MY_GLOBAL_CONSTANT_C
-```
-- **Fast**
-```python
-def local_constant_in_loop():
-    total = 3.14
-    for i in range(10_000):
-        total += i * 1234
-```
-
-***
-
-### 6. **GC with higher threshold**
-- **Slow**
-```python
-def load_with_gc():
-    t1, t2, t3 = gc.get_threshold()
-    gc.set_threshold(1000, 20, 20)
-    for _ in range(100_000):
-        _cyclic_references()
-    gc.set_threshold(t1, t2, t3)
-```
-- **Fast**
-```python
-def load_gc_at_end():
-    t1, t2, t3 = gc.get_threshold()
-    gc.set_threshold(10, 10, 10)
-    for _ in range(100_000):
-        _cyclic_references()
-    gc.set_threshold(t1, t2, t3)
-```
-
-***
-
-### 7. **Importing specific name instead of namespace**
-- **Slow**
-```python
-def dotted_import():
-    for _ in range(100_000):
-        return os.path.exists('/')
-```
-- **Fast**
-```python
-def direct_import():
-    for _ in range(100_000):
-        return exists('/')
-```
-
-***
-
-### 8. **Refactoring Try..except outside a loop**
-- **Slow**
-```python
-def try_in_loop():
-    items = {'a': 1}
-    for _ in range(100_000):
-        try:
-            _ = items['a']
-        except Exception:
-            pass
-```
-- **Fast**
-```python
-def try_outside_loop():
-    items = {'a': 1}
-    try:
-        for _ in range(100_000):
-            _ = items['a']
-    except Exception:
-        pass
-```
-
-***
-
-### 9. **Class instead of dataclass**
-- **Slow**
-```python
-def attributes_in_class():
-    class Pet:
-        legs: int
-        noise: str
-        def __init__(self, legs, noise): self.legs = legs; self.noise = noise
-        def __repr__(self): return ""
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-- **Fast**
-```python
-def attributes_in_dataclass():
-    @dataclass
-    class Pet:
-        legs: int
-        noise: str
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-
-***
-
-### 10. **Namedtuple instead of dataclass**
-- **Slow**
-```python
-def attributes_in_namedtuple():
-    Pet = namedtuple("Pet", "legs noise")
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-- **Fast**
-```python
-def attributes_in_dataclass():
-    @dataclass
-    class Pet:
-        legs: int
-        noise: str
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-
-***
-
-### 11. **class instead of namedtuple**
-- **Slow**
-```python
-def attributes_in_class():
-    class Pet:
-        legs: int
-        noise: str
-        def __init__(self, legs, noise): self.legs = legs; self.noise = noise
-        def __repr__(self): return ""
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-- **Fast**
-```python
-def attributes_in_namedtuple():
-    Pet = namedtuple("Pet", "legs noise")
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-
-***
-
-### 12. **namedtuple class instead of namedtuple**
-- **Slow**
-```python
-def attributes_in_namedtuple_type():
-    class Pet(typing.NamedTuple):
-        legs: int
-        noise: str
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-- **Fast**
-```python
-def attributes_in_namedtuple():
-    Pet = namedtuple("Pet", "legs noise")
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-
-***
-
-### 13. **dict instead of class**
-- **Slow**
-```python
-def attributes_in_dict():
-    for _ in range(100_000):
-        dog = {"legs": 4, "noise": "woof"}
-        str(dog)
-```
-- **Fast**
-```python
-def attributes_in_class():
-    class Pet:
-        legs: int
-        noise: str
-        def __init__(self, legs, noise): self.legs = legs; self.noise = noise
-        def __repr__(self): return ""
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-
-***
-
-### 14. **class with slots**
-- **Slow**
-```python
-def attributes_in_class():
-    class Pet:
-        legs: int
-        noise: str
-        def __init__(self, legs, noise): self.legs = legs; self.noise = noise
-        def __repr__(self): return ""
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-- **Fast**
-```python
-def attributes_in_class_with_slots():
-    class Pet:
-        legs: int
-        noise: str
-        __slots__ = 'legs', 'noise'
-        def __init__(self, legs, noise): self.legs = legs; self.noise = noise
-        def __repr__(self): return ""
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-
-***
-
-### 15. **dataclass with slots**
-- **Slow**
-```python
-def attributes_in_dataclass():
-    @dataclass
-    class Pet:
-        legs: int
-        noise: str
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-- **Fast**
-```python
-def attributes_in_dataclass_with_slots():
-    @dataclass(slots=True)
-    class Pet:
-        legs: int
-        noise: str
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-
-***
-
-### 16. **Using a list comprehension to filter another list**
-- **Slow**
-```python
-def filter_list_as_loop():
-    result = []
-    inputs = range(100_000)
-    for i in inputs:
-        if i % 2:
-            result.append(i)
-```
-- **Fast**
-```python
-def filter_list_as_comprehension():
-    inputs = range(100_000)
-    result = [i for i in inputs if i % 2]
-```
-
-***
-
-### 17. **Join list comprehension instead of generator expression**
-- **Slow**
-```python
-def join_list_comprehension():
-    words = ['data', 'type', 'is', 'so', 'long', 'now']
-    for x in range(100_000):
-        ''.join([ele.title() for ele in words])
-```
-- **Fast**
-```python
-def join_generator_expression():
-    words = ['data', 'type', 'is', 'so', 'long', 'now']
-    for x in range(100_000):
-        ''.join(ele.title() for ele in words)
-```
-
-***
-
-### 18. **Using fullmatch instead of anchors**
-- **Slow**
-```python
-def regex_with_anchors():
-    SNAKE_CASE_RE = re.compile(r'^([a-z]+\d*_[a-z\d_]*|_+[a-z\d]+[a-z\d_]*)$')
-    tests = ['data_type', 'data_type_', '_dataType', 'dataType', 'data type']
-    for x in range(100_000):
-        for test_str in tests:
-            SNAKE_CASE_RE.match(test_str)
-```
-- **Fast**
-```python
-def regex_with_fullmatch():
-    SNAKE_CASE_RE = re.compile(r'([a-z]+\d*_[a-z\d_]*|_+[a-z\d]+[a-z\d_]*)')
-    tests = ['data_type', 'data_type_', '_dataType', 'dataType', 'data type']
-    for x in range(100_000):
-        for test_str in tests:
-            SNAKE_CASE_RE.fullmatch(test_str)
-```
-
-***
-
-### 19. **Using a-zA-Z instead of IGNORECASE**
-- **Slow**
-```python
-def regex_with_capitalrange():
-    SNAKE_CASE_RE = re.compile(r'([a-zA-Z]+\d*_[a-zA-Z\d_]*|_+[a-zA-Z\d]+[a-zA-Z\d_]*)')
-    tests = ['data_type', 'data_type_URL', '_DataType', 'DataTypeURL', 'Data Type URL']
-    for x in range(100_000):
-        for test_str in tests:
-            SNAKE_CASE_RE.fullmatch(test_str)
-```
-- **Fast**
-```python
-def regex_with_ignorecase():
-    SNAKE_CASE_RE = re.compile(r'([a-z]+\d*_[a-z\d_]*|_+[a-z\d]+[a-z\d_]*)', re.IGNORECASE)
-    tests = ['data_type', 'data_type_URL', '_DataType', 'DataTypeURL', 'Data Type URL']
-    for x in range(100_000):
-        for test_str in tests:
-            SNAKE_CASE_RE.fullmatch(test_str)
-```
-
-***
-
-### 20. **Kwargs for known keyword args**
-- **Slow**
-```python
-def keyword_call():
-    func_with_kwargs(a=1, b=2, c=3)
-```
-- **Fast**
-```python
-def positional_call():
-    func_with_named_args(a=1, b=2, c=3)
-```
-
-***
-
-### 21. **Tiny Functions**
-- **Slow**
-```python
-def use_tiny_func():
-    x = 1
-    for n in range(100_000):
-        add(x, n)
-        add(n, x)
-```
-- **Fast**
-```python
-def inline_tiny_func():
-    x = 1
-    for n in range(100_000):
-        x + n
-        n + x
-```
-
-***
-
-### 22. **Slicing with memoryview instead of bytes**
-- **Slow**
-```python
-def bytes_slice():
-    word = b'A' * 1000
-    for i in range(1000):
-        n = word[0:i]
-```
-- **Fast**
-```python
-def memoryview_slice():
-    word = memoryview(b'A' * 1000)
-    for i in range(1000):
-        n = word[0:i]
-```
-
-***
-
-### 23. **Loop invariant Code Motion**
-- **Slow**
-```python
-def before():
-    x = (1, 2, 3, 4)
-    i = 6
-    for j in range(100_000):
-        len(x) * i + j
-```
-- **Fast**
-```python
-def after():
-    x = (1, 2, 3, 4)
-    i = 6
-    x_i = len(x) * i
-    for j in range(100_000):
-        x_i + j
-```
-
-***
-
-### 24. **Copy slice to Local**
-- **Slow**
-```python
-def slice_as_local():
-    x = list(range(100_000))
-    y = list(range(100_000))
-    for n in range(100_000):
-        x[n] + y[n]
-        x[n] + y[n]
-        x[n] + y[n]
-        x[n] + y[n]
-        x[n] + y[n]
-```
-- **Fast**
-```python
-def slice_copy_to_fast():
-    x = list(range(100_000))
-    y = list(range(100_000))
-    for n in range(100_000):
-        i = x[n]
-        j = y[n]
-        i + j
-        i + j
-        i + j
-        i + j
-        i + j
-```
-
-***
-
-### 25. **Copy name to Local**
-- **Slow**
-```python
-def as_local():
-    for _ in range(100_000):
-        x + y
-        x + y
-        x + y
-        x + y
-        x + y
-```
-- **Fast**
-```python
-def copy_name_to_fast():
-    i = x
-    j = y
-    for _ in range(100_000):
-        i + j
-        i + j
-        i + j
-        i + j
-        i + j
-```
-
-***
-
-### 26. **Copy dict item to Local**
-- **Slow**
-```python
-def dont_copy_dict_key_to_fast():
-    for _ in range(100_000):
-        d["x"] + d["y"]
-        d["x"] + d["y"]
-        d["x"] + d["y"]
-        d["x"] + d["y"]
-        d["x"] + d["y"]
-```
-- **Fast**
-```python
-def copy_dict_key_to_fast():
-    i = d["x"]
-    j = d["y"]
-    for _ in range(100_000):
-        i + j
-        i + j
-        i + j
-        i + j
-        i + j
-```
-
-***
-
-### 27. **Copy class attr to Local**
-- **Slow**
-```python
-def dont_copy_attr_to_fast():
-    for _ in range(100_000):
-        foo.x + foo.y
-        foo.x + foo.y
-        foo.x + foo.y
-        foo.x + foo.y
-        foo.x + foo.y
-```
-- **Fast**
-```python
-def copy_attr_to_fast():
-    i = foo.x
-    j = foo.y
-    for _ in range(100_000):
-        i + j
-        i + j
-        i + j
-        i + j
-        i + j
-```
-
-***
-
-These minimal code snippets **accurately reflect the benchmark order and results in your environment, showing both slow (anti-pattern) and fast (optimized) variants for each case.**
-
-[1](https://github.com/tonybaloney/anti-patterns/blob/master/README.md)
diff --git a/.gitignore b/.gitignore
index 8dc22a68..6681801b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -189,10 +189,7 @@ outputs/
 # Example vLLM virtualenv
 examples/03_BenchmarkComparison/vllm_venv/
 
-# Agent artifacts (local development only)
+# AI tool artifacts (local development only)
 .cursor_artifacts/
-.claude/agent-memory/
-
-# User-specific local rules (local Docker dev); do not commit
-.cursor/rules/local-docker-dev.mdc
-CLAUDE.local.md
+.cursor/
+docs/superpowers/
diff --git a/README.md b/README.md
index 9af4eb85..2a1a178f 100644
--- a/README.md
+++ b/README.md
@@ -1,209 +1,131 @@
-# MLPerf® Inference Endpoint Benchmarking System
+# MLPerf Inference Endpoint Benchmarking System
 
-A high-performance benchmarking tool for LLM endpoints.
+[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
+[![Python 3.12+](https://img.shields.io/badge/python-3.12%2B-blue.svg)](https://www.python.org/downloads/)
+[![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen.svg)](https://pre-commit.com/)
 
-## Quick Start
+A high-performance benchmarking tool for LLM inference endpoints, targeting 50k+ QPS. Part of [MLCommons](https://mlcommons.org/).
 
-### Installation
+## Quick Start
 
-**Requirements**: Python 3.12+ (Python 3.12 is recommended for optimal performance. GIL-less mode in higher Python versions is not yet supported.)
+**Requirements:** Python 3.12+ (3.12 recommended)
 
 ```bash
-# Clone the repository
-# Note: This repo will be migrated to https://github.com/mlcommons/endpoints
 git clone https://github.com/mlcommons/endpoints.git
 cd endpoints
-
-# Create virtual environment
-python3.12 -m venv venv
-source venv/bin/activate
-
-# As a user
+python3.12 -m venv venv && source venv/bin/activate
 pip install .
-
-# As a developer (with development and test extras)
-pip install -e ".[dev,test]"
-pre-commit install
 ```
 
-### Basic Usage
-
 ```bash
-# Show help
-inference-endpoint --help
-
-# Show system information
-inference-endpoint -v info
-
 # Test endpoint connectivity
 inference-endpoint probe \
   --endpoints http://your-endpoint:8000 \
   --model Qwen/Qwen3-8B
 
-# Run offline benchmark (max throughput - uses all dataset samples)
+# Run offline benchmark (max throughput)
 inference-endpoint benchmark offline \
   --endpoints http://your-endpoint:8000 \
   --model Qwen/Qwen3-8B \
   --dataset tests/datasets/dummy_1k.jsonl
 
-# Run online benchmark (sustained QPS - requires --target-qps, --load-pattern)
+# Run online benchmark (sustained QPS)
 inference-endpoint benchmark online \
   --endpoints http://your-endpoint:8000 \
   --model Qwen/Qwen3-8B \
   --dataset tests/datasets/dummy_1k.jsonl \
   --load-pattern poisson \
   --target-qps 100
-
-# With explicit sample count
-inference-endpoint benchmark offline \
-  --endpoints http://your-endpoint:8000 \
-  --model Qwen/Qwen3-8B \
-  --dataset tests/datasets/dummy_1k.jsonl \
-  --num-samples 5000
 ```
 
-### Running Locally
+### Local Testing
 
 ```bash
-# Start local echo server
-python3 -m inference_endpoint.testing.echo_server --port 8765 &
-
-# Test with dummy dataset (included in repo)
+# Start local echo server and run a benchmark against it
+python -m inference_endpoint.testing.echo_server --port 8765 &
 inference-endpoint benchmark offline \
   --endpoints http://localhost:8765 \
-  --model Qwen/Qwen3-8B \
+  --model test-model \
   --dataset tests/datasets/dummy_1k.jsonl
-
-# Stop echo server
 pkill -f echo_server
 ```
 
-See [Local Testing Guide](docs/LOCAL_TESTING.md) for detailed instructions.
-
-### Running Tests and Examples
-
-```bash
-# Install test dependencies
-pip install ".[test]"
-
-# Run tests (excluding performance and explicit-run tests)
-pytest -m "not performance and not run_explicitly"
-
-# Run examples: follow instructions in examples/*/README.md
-```
+See [Local Testing Guide](docs/LOCAL_TESTING.md) for more details.
 
-## 📚 Documentation
-
-- [AGENTS.md](AGENTS.md) - Architecture, conventions, and AI agent guidelines
-- [CLI Quick Reference](docs/CLI_QUICK_REFERENCE.md) - Command-line interface guide
-- [Local Testing Guide](docs/LOCAL_TESTING.md) - Test with echo server
-- [Development Guide](docs/DEVELOPMENT.md) - How to contribute and develop
-- [Performance Architecture](docs/PERF_ARCHITECTURE.md) - Hot-path design and tuning
-- [Performance Tuning](docs/CLIENT_PERFORMANCE_TUNING.md) - CPU affinity and client tuning
-- [GitHub Setup Guide](docs/GITHUB_SETUP.md) - GitHub authentication and setup
-
-### Component Design Specs
-
-Each top-level component under `src/inference_endpoint/` has a corresponding spec:
-
-| Component         | Spec                                                             |
-| ----------------- | ---------------------------------------------------------------- |
-| Core types        | [docs/core/DESIGN.md](docs/core/DESIGN.md)                       |
-| Load generator    | [docs/load_generator/DESIGN.md](docs/load_generator/DESIGN.md)   |
-| Endpoint client   | [docs/endpoint_client/DESIGN.md](docs/endpoint_client/DESIGN.md) |
-| Metrics           | [docs/metrics/DESIGN.md](docs/metrics/DESIGN.md)                 |
-| Config            | [docs/config/DESIGN.md](docs/config/DESIGN.md)                   |
-| Async utils       | [docs/async_utils/DESIGN.md](docs/async_utils/DESIGN.md)         |
-| Dataset manager   | [docs/dataset_manager/DESIGN.md](docs/dataset_manager/DESIGN.md) |
-| Commands (CLI)    | [docs/commands/DESIGN.md](docs/commands/DESIGN.md)               |
-| OpenAI adapter    | [docs/openai/DESIGN.md](docs/openai/DESIGN.md)                   |
-| SGLang adapter    | [docs/sglang/DESIGN.md](docs/sglang/DESIGN.md)                   |
-| Evaluation        | [docs/evaluation/DESIGN.md](docs/evaluation/DESIGN.md)           |
-| Testing utilities | [docs/testing/DESIGN.md](docs/testing/DESIGN.md)                 |
-| Profiling         | [docs/profiling/DESIGN.md](docs/profiling/DESIGN.md)             |
-| Plugins           | [docs/plugins/DESIGN.md](docs/plugins/DESIGN.md)                 |
-| Utils             | [docs/utils/DESIGN.md](docs/utils/DESIGN.md)                     |
-
-## 🎯 Architecture
-
-The system follows a modular, event-driven architecture:
+## Architecture
 
 ```
-Dataset Manager ──► Load Generator ──► Endpoint Client ──► External Endpoint
-                          │
-                    Metrics Collector
-                 (event logging + reporting)
+Dataset Manager ──> Load Generator ──> Endpoint Client ──> External Endpoint
+                         |
+                    Metrics Collector (EventRecorder + MetricsReporter)
 ```
 
-- **Dataset Manager**: Loads benchmark datasets and applies transform pipelines
-- **Load Generator**: Central orchestrator — controls timing (scheduler), issues queries, and emits sample events
-- **Endpoint Client**: Multi-process HTTP worker pool communicating over ZMQ IPC
-- **Metrics Collector**: Receives sample events from Load Generator; writes to SQLite (EventRecorder), aggregates after the run (MetricsReporter)
-
-## Accuracy Evaluation
-
-You can run accuracy evaluation with Pass@1 scoring by specifying accuracy datasets in the benchmark
-configuration. Currently, Inference Endpoints provides the following pre-defined accuracy benchmarks:
+| Component | Purpose |
+|-----------|---------|
+| **Load Generator** | Central orchestrator: `BenchmarkSession` owns lifecycle, `Scheduler` controls timing |
+| **Endpoint Client** | Multi-process HTTP workers communicating via ZMQ IPC |
+| **Dataset Manager** | Loads JSONL, HuggingFace, CSV, JSON, Parquet datasets |
+| **Metrics** | SQLite-backed event recording, aggregation (QPS, latency, TTFT, TPOT) |
+| **Config** | Pydantic-based YAML schema, CLI auto-generated via cyclopts |
 
-- GPQA (default: GPQA Diamond)
-- AIME (default: AIME 2025)
-- LiveCodeBench (default: lite, release_v6)
+### Benchmark Modes
 
-However, LiveCodeBench will not work out-of-the-box and requires some additional setup. See the
-[LiveCodeBench](src/inference_endpoint/evaluation/livecodebench/README.md) documentation for
-details and explanations.
+- **Offline** (`max_throughput`): Burst all queries at once for peak throughput measurement
+- **Online** (`poisson`): Fixed QPS with Poisson arrival distribution for latency profiling
+- **Concurrency**: Fixed concurrent request count
 
-## 🚧 Pending Features
+### Performance Design
 
-The following features are planned for future releases:
+The hot path is optimized for minimal overhead:
 
-- [ ] **Submission Ruleset Integration** - Full MLPerf submission workflow support
-- [ ] **Documentation Generation and Hosting** - Sphinx-based API documentation with GitHub Pages
+- Multi-process workers with ZMQ IPC (not threads)
+- `uvloop` + `eager_task_factory` for async performance
+- `msgspec` for zero-copy serialization on the data path
+- Custom HTTP connection pooling with `httptools` parser
+- CPU affinity support for performance tuning
 
-## 🤝 Contributing
-
-We welcome contributions! Please see our [Development Guide](docs/DEVELOPMENT.md) for details on:
-
-- Setting up your development environment
-- Code style and quality standards
-- Testing requirements
-- Pull request process
+## Accuracy Evaluation
 
-## 🙏 Acknowledgements
+Run accuracy evaluation with Pass@1 scoring using pre-defined benchmarks:
 
-This project draws inspiration from and learns from the following excellent projects:
+- **GPQA** (default: GPQA Diamond)
+- **AIME** (default: AIME 2025)
+- **LiveCodeBench** (default: lite, release_v6) — requires [additional setup](src/inference_endpoint/dataset_manager/predefined/livecodebench/README.md)
 
-- [MLCommons Inference](https://github.com/mlcommons/inference) - MLPerf Inference benchmark suite
-- [AIPerf](https://github.com/ai-dynamo/aiperf) - AI model performance profiling framework
-- [SGLang GenAI-Bench](https://github.com/sgl-project/genai-bench) - Token-level performance evaluation tool
-- [vLLM Benchmarks](https://github.com/vllm-project/vllm/tree/main/benchmarks) - Performance benchmarking tools for vLLM
-- [InferenceMAX](https://github.com/InferenceMAX/InferenceMAX) - LLM inference optimization toolkit
+## Documentation
 
-We are grateful to these communities for their contributions to LLM benchmarking and performance analysis.
+| Guide | Description |
+|-------|-------------|
+| [CLI Quick Reference](docs/CLI_QUICK_REFERENCE.md) | Command-line interface guide |
+| [CLI Design](docs/CLI_DESIGN.md) | CLI architecture and design decisions |
+| [Local Testing](docs/LOCAL_TESTING.md) | Test with the echo server |
+| [Client Performance Tuning](docs/CLIENT_PERFORMANCE_TUNING.md) | Endpoint client optimization |
+| [Performance Architecture](docs/PERF_ARCHITECTURE.md) | Performance architecture deep dive |
+| [Development Guide](docs/DEVELOPMENT.md) | Development setup and workflow |
+| [CONTRIBUTING.md](CONTRIBUTING.md) | How to contribute |
 
-## 📄 License
+## Contributing
 
-This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE.md) file for
-details.
+We welcome contributions from the community. See [CONTRIBUTING.md](CONTRIBUTING.md) for:
 
-## 🔗 Links
+- Development setup and prerequisites
+- Code style (ruff, mypy, conventional commits)
+- Testing requirements (>90% coverage, pytest markers)
+- Pull request process and review expectations
 
-- [MLCommons](https://mlcommons.org/) - Machine Learning Performance Standards
-- [Project Repository](https://github.com/mlcommons/endpoints)
-- [MLPerf Inference](https://mlcommons.org/benchmarks/inference/)
+Issues are tracked on our [project board](https://github.com/orgs/mlcommons/projects/57). Look for [`good first issue`](https://github.com/mlcommons/endpoints/labels/good%20first%20issue) or [`help wanted`](https://github.com/mlcommons/endpoints/labels/help%20wanted) to get started.
 
-## 👥 Contributors
+All contributors must sign the [MLCommons CLA](https://mlcommons.org/membership/membership-overview/).
 
-Credits to core contributors of the project:
+## Acknowledgements
 
-- MLCommons Committee
-- NVIDIA: Zhihan Jiang, Rashid Kaleem, Viraat Chandra, Alice Cheng
-- ...
+This project draws inspiration from:
 
-See [ATTRIBUTION](ATTRIBUTION) for detailed attribution information.
+- [MLCommons Inference](https://github.com/mlcommons/inference) — MLPerf Inference benchmark suite
+- [AIPerf](https://github.com/ai-dynamo/aiperf) — AI model performance profiling
+- [SGLang GenAI-Bench](https://github.com/sgl-project/genai-bench) — Token-level performance evaluation
+- [vLLM Benchmarks](https://github.com/vllm-project/vllm/tree/main/benchmarks) — Performance benchmarking for vLLM
 
-## 📞 Support
+## License
 
-- **Issues**: [GitHub Issues](https://github.com/mlcommons/endpoints/issues)
-- **Discussions**: [GitHub Discussions](https://github.com/mlcommons/endpoints/discussions)
-- **Documentation**: See [docs/](docs/) directory for guides
+Apache License 2.0 — see [LICENSE](LICENSE) for details.
diff --git a/docs/superpowers/plans/2026-04-07-project-management.md b/docs/superpowers/plans/2026-04-07-project-management.md
deleted file mode 100644
index 5dff6134..00000000
--- a/docs/superpowers/plans/2026-04-07-project-management.md
+++ /dev/null
@@ -1,1092 +0,0 @@
-# Project Management Infrastructure Implementation Plan
-
-> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Set up labels, project board, issue templates, CONTRIBUTING.md, and migrate all 57 open issues for the mlcommons/endpoints GitHub repository.
-
-**Architecture:** All GitHub API interactions use `curl` with auth token (the `gh` CLI has TLS certificate issues in this environment). Board configuration uses the GitHub GraphQL API for Projects V2. File changes (templates, CONTRIBUTING.md) are committed locally and pushed as a PR.
-
-**Tech Stack:** GitHub REST API, GitHub GraphQL API, curl, bash, git
-
-**IMPORTANT — API access pattern:** The `gh` CLI cannot make API calls due to TLS errors. Every API call must use this pattern:
-```bash
-TOKEN=$(gh auth token 2>&1)
-curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" "https://api.github.com/..."
-```
-For GraphQL:
-```bash
-TOKEN=$(gh auth token 2>&1)
-curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  -X POST https://api.github.com/graphql \
-  -d '{"query":"..."}'
-```
-
-**IMPORTANT — Label names with colons:** GitHub label names containing spaces and colons must be URL-encoded in REST API paths. For example, `type: bug` becomes `type%3A%20bug` in URLs. When creating labels via POST body (JSON), use the literal name.
-
----
-
-## File Structure
-
-No new source code files. Changes are:
-
-- **Create:** `.github/ISSUE_TEMPLATE/100-bug-report.yml`
-- **Create:** `.github/ISSUE_TEMPLATE/200-feature-request.yml`
-- **Create:** `.github/ISSUE_TEMPLATE/300-performance.yml`
-- **Create:** `.github/ISSUE_TEMPLATE/400-dataset-integration.yml`
-- **Create:** `.github/ISSUE_TEMPLATE/config.yml`
-- **Modify:** `CONTRIBUTING.md` (full rewrite)
-
-All other changes are GitHub API operations (labels, board, issues) — no local files.
-
----
-
-### Task 1: Create New Labels
-
-Create all 23 new labels on the repository via the REST API. Existing labels that are being kept (`good first issue`, `help wanted`, `dependencies`, `security`, `duplicate`, `invalid`, `wontfix`) are untouched. The `mlcommons` label needs to be created fresh (the old `MLCommons` with capital M will be removed later).
-
-**Files:** None (API only)
-
-- [ ] **Step 1: Create all type labels**
-
-Run this script. It creates 8 type labels:
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-REPO="mlcommons/endpoints"
-
-for label_json in \
-  '{"name":"type: bug","color":"d73a4a","description":"Something isn'\''t working"}' \
-  '{"name":"type: feature","color":"a2eeef","description":"New feature or capability"}' \
-  '{"name":"type: enhancement","color":"bfd4f2","description":"Improvement to existing functionality"}' \
-  '{"name":"type: performance","color":"3ddd26","description":"Performance regression or improvement"}' \
-  '{"name":"type: documentation","color":"0075ca","description":"Documentation only"}' \
-  '{"name":"type: question","color":"d876e3","description":"Usage question or clarification"}' \
-  '{"name":"type: RFC","color":"76fde7","description":"Request for comments / design proposal"}' \
-  '{"name":"type: chore","color":"ededed","description":"Maintenance, deps, CI, tooling"}'; do
-  echo "Creating: $(echo "$label_json" | python3 -c 'import sys,json; print(json.load(sys.stdin)["name"])')"
-  curl -s -X POST \
-    -H "Authorization: token $TOKEN" \
-    -H "Accept: application/vnd.github+json" \
-    "https://api.github.com/repos/$REPO/labels" \
-    -d "$label_json" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f"  -> {d.get(\"name\", d.get(\"message\", \"error\"))}")'
-done
-```
-
-Expected: 8 lines showing each label name created successfully.
-
-- [ ] **Step 2: Create all priority labels**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-REPO="mlcommons/endpoints"
-
-for label_json in \
-  '{"name":"priority: ShowStopper","color":"000000","description":"Drop everything — critical blocker, all hands on deck"}' \
-  '{"name":"priority: P0","color":"b60205","description":"Critical — blocks release or users"}' \
-  '{"name":"priority: P1","color":"d93f0b","description":"High — must address this cycle"}' \
-  '{"name":"priority: P2","color":"fbca04","description":"Medium — address within quarter"}' \
-  '{"name":"priority: P3","color":"0e8a16","description":"Low — backlog, nice to have"}'; do
-  echo "Creating: $(echo "$label_json" | python3 -c 'import sys,json; print(json.load(sys.stdin)["name"])')"
-  curl -s -X POST \
-    -H "Authorization: token $TOKEN" \
-    -H "Accept: application/vnd.github+json" \
-    "https://api.github.com/repos/$REPO/labels" \
-    -d "$label_json" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f"  -> {d.get(\"name\", d.get(\"message\", \"error\"))}")'
-done
-```
-
-Expected: 5 labels created.
-
-- [ ] **Step 3: Create all area labels**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-REPO="mlcommons/endpoints"
-
-for label_json in \
-  '{"name":"area: core-engine","color":"c5def5","description":"Load generator, scheduler, async utils"}' \
-  '{"name":"area: client","color":"c5def5","description":"Endpoint client, HTTP, transport, ZMQ"}' \
-  '{"name":"area: metrics","color":"c5def5","description":"Event recorder, metrics reporter, reporting"}' \
-  '{"name":"area: dataset","color":"c5def5","description":"Dataset manager, formats, predefined datasets"}' \
-  '{"name":"area: config-cli","color":"c5def5","description":"Config schema, CLI commands, YAML"}' \
-  '{"name":"area: evaluation","color":"c5def5","description":"Accuracy evaluation, scoring, extractors"}' \
-  '{"name":"area: adapters","color":"c5def5","description":"OpenAI, SGLang protocol adapters"}'; do
-  echo "Creating: $(echo "$label_json" | python3 -c 'import sys,json; print(json.load(sys.stdin)["name"])')"
-  curl -s -X POST \
-    -H "Authorization: token $TOKEN" \
-    -H "Accept: application/vnd.github+json" \
-    "https://api.github.com/repos/$REPO/labels" \
-    -d "$label_json" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f"  -> {d.get(\"name\", d.get(\"message\", \"error\"))}")'
-done
-```
-
-Expected: 7 labels created.
-
-- [ ] **Step 4: Create status labels and mlcommons label**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-REPO="mlcommons/endpoints"
-
-for label_json in \
-  '{"name":"status: needs-triage","color":"e99695","description":"New issue, awaiting review"}' \
-  '{"name":"status: needs-info","color":"f9d0c4","description":"Awaiting more details from reporter"}' \
-  '{"name":"status: blocked","color":"b60205","description":"Blocked on external dependency or decision"}' \
-  '{"name":"mlcommons","color":"e0703c","description":"MLCommons ruleset/submission integration"}'; do
-  echo "Creating: $(echo "$label_json" | python3 -c 'import sys,json; print(json.load(sys.stdin)["name"])')"
-  curl -s -X POST \
-    -H "Authorization: token $TOKEN" \
-    -H "Accept: application/vnd.github+json" \
-    "https://api.github.com/repos/$REPO/labels" \
-    -d "$label_json" | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f"  -> {d.get(\"name\", d.get(\"message\", \"error\"))}")'
-done
-```
-
-Expected: 4 labels created (mlcommons may say "already_exists" if the old `MLCommons` case-insensitively matches — if so, update it in a later step).
-
-- [ ] **Step 5: Verify all new labels exist**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-curl -s -H "Authorization: token $TOKEN" \
-  "https://api.github.com/repos/mlcommons/endpoints/labels?per_page=100" | \
-  python3 -c "
-import sys, json
-labels = json.load(sys.stdin)
-names = sorted([l['name'] for l in labels])
-print(f'Total labels: {len(names)}')
-for n in names:
-    print(f'  {n}')
-"
-```
-
-Expected: All new `type:`, `priority:`, `area:`, `status:` labels present alongside existing labels.
-
----
-
-### Task 2: Relabel All Open Issues
-
-Apply new labels and remove old labels for every open issue, following the spec's mapping exactly. This is done in batches by priority tier.
-
-**Files:** None (API only)
-
-**IMPORTANT:** The GitHub `PUT /repos/{owner}/{repo}/issues/{number}/labels` endpoint **replaces** all labels on an issue. So each call must include the complete set of new labels for that issue.
-
-- [ ] **Step 1: Relabel ShowStopper issues**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-REPO="mlcommons/endpoints"
-
-# #84 - Pareto clarification
-curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/84/labels" \
-  -d '{"labels":["priority: ShowStopper","area: config-cli","mlcommons"]}' | python3 -c 'import sys,json; print(f"#84: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
-
-# #8 - Parity with MLPerf LoadGen
-curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/8/labels" \
-  -d '{"labels":["priority: ShowStopper","type: performance","area: core-engine"]}' | python3 -c 'import sys,json; print(f"#8: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
-
-# #4 - Accuracy evaluation for LLMs
-curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/4/labels" \
-  -d '{"labels":["priority: ShowStopper","type: feature","area: evaluation"]}' | python3 -c 'import sys,json; print(f"#4: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
-```
-
-Expected: Each issue prints its new label set.
-
-- [ ] **Step 2: Relabel P0 issues**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-REPO="mlcommons/endpoints"
-
-# #86 - Warmup runs
-curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/86/labels" \
-  -d '{"labels":["priority: P0","type: feature","area: core-engine"]}' | python3 -c 'import sys,json; print(f"#86: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
-
-# #232 - Multi-turn implementation
-curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/232/labels" \
-  -d '{"labels":["priority: P0","type: feature","area: dataset"]}' | python3 -c 'import sys,json; print(f"#232: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
-
-# #183 - Pub/Sub event recorder
-curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/183/labels" \
-  -d '{"labels":["priority: P0","type: feature","area: metrics"]}' | python3 -c 'import sys,json; print(f"#183: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
-
-# #138 - CI stress test upper bound
-curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/138/labels" \
-  -d '{"labels":["priority: P0","type: chore","area: core-engine"]}' | python3 -c 'import sys,json; print(f"#138: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
-
-# #6 - Final report structure
-curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/6/labels" \
-  -d '{"labels":["priority: P0","type: feature","area: metrics"]}' | python3 -c 'import sys,json; print(f"#6: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
-
-# #5 - Submission ruleset + config
-curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/5/labels" \
-  -d '{"labels":["priority: P0","type: feature","area: config-cli","mlcommons"]}' | python3 -c 'import sys,json; print(f"#5: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
-```
-
-Expected: 6 issues relabeled.
-
-- [ ] **Step 3: Relabel P1 issues**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-REPO="mlcommons/endpoints"
-
-declare -A P1_LABELS
-P1_LABELS[9]='["priority: P1","type: performance","area: core-engine"]'
-P1_LABELS[255]='["priority: P1","type: feature","area: core-engine"]'
-P1_LABELS[269]='["priority: P1","type: bug","area: client"]'
-P1_LABELS[237]='["priority: P1","type: bug","area: config-cli"]'
-P1_LABELS[219]='["priority: P1","type: bug","area: config-cli"]'
-P1_LABELS[221]='["priority: P1","type: bug","area: config-cli"]'
-P1_LABELS[202]='["priority: P1","type: bug","area: client"]'
-P1_LABELS[199]='["priority: P1","type: bug","area: config-cli"]'
-P1_LABELS[222]='["priority: P1","type: chore","area: core-engine"]'
-P1_LABELS[220]='["priority: P1","type: chore","area: adapters"]'
-P1_LABELS[182]='["priority: P1","type: performance","area: metrics"]'
-P1_LABELS[177]='["priority: P1","type: feature","area: evaluation","area: dataset"]'
-P1_LABELS[176]='["priority: P1","type: feature","area: evaluation","area: dataset"]'
-P1_LABELS[113]='["priority: P1","type: feature"]'
-P1_LABELS[210]='["priority: P1","type: feature"]'
-P1_LABELS[268]='["priority: P1","type: feature"]'
-P1_LABELS[10]='["priority: P1","type: performance","area: core-engine"]'
-P1_LABELS[7]='["priority: P1","type: feature","area: metrics"]'
-
-for issue in "${!P1_LABELS[@]}"; do
-  curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-    "https://api.github.com/repos/$REPO/issues/$issue/labels" \
-    -d "{\"labels\":${P1_LABELS[$issue]}}" | python3 -c "import sys,json; print(f'#$issue: {[l[\"name\"] for l in json.load(sys.stdin)]}')"
-done
-```
-
-Expected: 18 issues relabeled.
-
-- [ ] **Step 4: Relabel P2 issues**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-REPO="mlcommons/endpoints"
-
-declare -A P2_LABELS
-P2_LABELS[254]='["priority: P2","type: feature","area: client"]'
-P2_LABELS[217]='["priority: P2","type: feature","area: core-engine"]'
-P2_LABELS[179]='["priority: P2","type: feature","area: evaluation","area: dataset"]'
-P2_LABELS[178]='["priority: P2","type: feature","area: evaluation","area: dataset"]'
-P2_LABELS[173]='["priority: P2","type: bug","mlcommons"]'
-P2_LABELS[224]='["priority: P2","type: feature","area: config-cli"]'
-P2_LABELS[208]='["priority: P2","type: performance","area: metrics"]'
-P2_LABELS[158]='["priority: P2","type: feature","area: adapters"]'
-P2_LABELS[125]='["priority: P2","type: feature","area: core-engine"]'
-P2_LABELS[115]='["priority: P2","type: enhancement","area: config-cli"]'
-P2_LABELS[79]='["priority: P2","type: feature","mlcommons"]'
-P2_LABELS[73]='["priority: P2","type: feature","area: dataset"]'
-P2_LABELS[68]='["priority: P2","type: feature","area: config-cli","mlcommons"]'
-P2_LABELS[58]='["priority: P2","type: feature","area: config-cli","mlcommons"]'
-P2_LABELS[213]='["priority: P2","type: bug","mlcommons"]'
-P2_LABELS[133]='["priority: P2","type: bug","area: client"]'
-P2_LABELS[174]='["priority: P2","type: enhancement","mlcommons"]'
-P2_LABELS[229]='["priority: P2","type: chore"]'
-P2_LABELS[228]='["priority: P2","type: documentation"]'
-P2_LABELS[227]='["priority: P2","type: feature"]'
-P2_LABELS[212]='["priority: P2","type: feature"]'
-
-for issue in "${!P2_LABELS[@]}"; do
-  curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-    "https://api.github.com/repos/$REPO/issues/$issue/labels" \
-    -d "{\"labels\":${P2_LABELS[$issue]}}" | python3 -c "import sys,json; print(f'#$issue: {[l[\"name\"] for l in json.load(sys.stdin)]}')"
-done
-```
-
-Expected: 21 issues relabeled.
-
-- [ ] **Step 5: Relabel P3 and other issues**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-REPO="mlcommons/endpoints"
-
-# P3 issues
-curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/99/labels" \
-  -d '{"labels":["priority: P3","type: bug","good first issue"]}' | python3 -c 'import sys,json; print(f"#99: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
-
-curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/50/labels" \
-  -d '{"labels":["priority: P3","type: feature"]}' | python3 -c 'import sys,json; print(f"#50: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
-
-curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/204/labels" \
-  -d '{"labels":["priority: P3","type: documentation"]}' | python3 -c 'import sys,json; print(f"#204: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
-
-curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/190/labels" \
-  -d '{"labels":["priority: P3","type: chore"]}' | python3 -c 'import sys,json; print(f"#190: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
-
-curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/181/labels" \
-  -d '{"labels":["priority: P3","type: feature"]}' | python3 -c 'import sys,json; print(f"#181: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
-
-# Other (no priority)
-curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/223/labels" \
-  -d '{"labels":["type: RFC"]}' | python3 -c 'import sys,json; print(f"#223: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
-
-curl -s -X PUT -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/267/labels" \
-  -d '{"labels":["type: chore","dependencies","security"]}' | python3 -c 'import sys,json; print(f"#267: {[l[\"name\"] for l in json.load(sys.stdin)]}")'
-```
-
-Expected: 7 issues relabeled.
-
-- [ ] **Step 6: Verify relabeling — spot check 5 issues**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-for issue in 84 232 269 208 99; do
-  curl -s -H "Authorization: token $TOKEN" \
-    "https://api.github.com/repos/mlcommons/endpoints/issues/$issue" | \
-    python3 -c "import sys,json; d=json.load(sys.stdin); print(f'#{d[\"number\"]} {d[\"title\"]}: {[l[\"name\"] for l in d[\"labels\"]]}')"
-done
-```
-
-Expected: Each issue shows only its new prefixed labels.
-
----
-
-### Task 3: Close Duplicate Issues
-
-For each duplicate, first read its body to preserve unique context, then comment on the primary issue with that context, then close the duplicate with an explanation.
-
-**Files:** None (API only)
-
-- [ ] **Step 1: Close #205 as duplicate of #255 (async benchmark)**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-REPO="mlcommons/endpoints"
-
-# Get #205 body for context preservation
-BODY_205=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/205" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")')
-
-# Comment on primary #255 with context from #205
-curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/255/comments" \
-  -d "$(python3 -c "
-import json
-body = '''Context preserved from duplicate #205 (fully async benchmark):
-
-$BODY_205'''
-print(json.dumps({'body': body}))
-")" | python3 -c 'import sys,json; print(f"Commented on #255: {json.load(sys.stdin).get(\"id\",\"error\")}")'
-
-# Comment on #205 explaining closure
-curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/205/comments" \
-  -d '{"body":"Closing as duplicate of #255 (Make Loadgen Async). Both issues target the same goal of making the benchmark fully async. Unique context from this issue has been copied to #255."}' | python3 -c 'import sys,json; print(f"Commented on #205: {json.load(sys.stdin).get(\"id\",\"error\")}")'
-
-# Close #205
-curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/205" \
-  -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#205 state: {json.load(sys.stdin).get(\"state\",\"error\")}")'
-```
-
-Expected: #205 closed, context preserved on #255.
-
-- [ ] **Step 2: Close #170 as duplicate of #86 (warmup)**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-REPO="mlcommons/endpoints"
-
-BODY_170=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/170" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")')
-
-curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/86/comments" \
-  -d "$(python3 -c "
-import json
-body = '''Context preserved from duplicate #170 (warmup with random dataset):
-
-$BODY_170'''
-print(json.dumps({'body': body}))
-")" | python3 -c 'import sys,json; print(f"Commented on #86: {json.load(sys.stdin).get(\"id\",\"error\")}")'
-
-curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/170/comments" \
-  -d '{"body":"Closing as duplicate of #86 (Warmup runs). This issue describes a specific warmup implementation approach (random dataset) which is a subset of #86. Unique context has been copied to #86."}' | python3 -c 'import sys,json; print(f"Commented on #170: {json.load(sys.stdin).get(\"id\",\"error\")}")'
-
-curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/170" \
-  -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#170 state: {json.load(sys.stdin).get(\"state\",\"error\")}")'
-```
-
-- [ ] **Step 3: Close #226 as duplicate of #232 (multi-turn)**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-REPO="mlcommons/endpoints"
-
-BODY_226=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/226" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")')
-
-curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/232/comments" \
-  -d "$(python3 -c "
-import json
-body = '''Context preserved from duplicate #226 (Initial multi-turn enabling):
-
-$BODY_226'''
-print(json.dumps({'body': body}))
-")" | python3 -c 'import sys,json; print(f"Commented on #232: {json.load(sys.stdin).get(\"id\",\"error\")}")'
-
-curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/226/comments" \
-  -d '{"body":"Closing as duplicate of #232 (multi-turn implementation). Both track the same multi-turn feature. Unique context has been copied to #232."}' | python3 -c 'import sys,json; print(f"Commented on #226: {json.load(sys.stdin).get(\"id\",\"error\")}")'
-
-curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/226" \
-  -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#226 state: {json.load(sys.stdin).get(\"state\",\"error\")}")'
-```
-
-- [ ] **Step 4: Close #29 as superseded by #79 (submission checker)**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-REPO="mlcommons/endpoints"
-
-BODY_29=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/29" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")')
-
-curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/79/comments" \
-  -d "$(python3 -c "
-import json
-body = '''Context preserved from superseded #29 (submission checker for 6.0):
-
-$BODY_29'''
-print(json.dumps({'body': body}))
-")" | python3 -c 'import sys,json; print(f"Commented on #79: {json.load(sys.stdin).get(\"id\",\"error\")}")'
-
-curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/29/comments" \
-  -d '{"body":"Closing as superseded by #79 (submission checker compatibility mode). #29 was version-specific (6.0) while #79 covers the general compatibility feature. Context has been preserved on #79."}' | python3 -c 'import sys,json; print(f"Commented on #29: {json.load(sys.stdin).get(\"id\",\"error\")}")'
-
-curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/29" \
-  -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#29 state: {json.load(sys.stdin).get(\"state\",\"error\")}")'
-```
-
-- [ ] **Step 5: Close #207 as duplicate of #208 (report generation)**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-REPO="mlcommons/endpoints"
-
-BODY_207=$(curl -s -H "Authorization: token $TOKEN" "https://api.github.com/repos/$REPO/issues/207" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("body","") or "(no body)")')
-
-curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/208/comments" \
-  -d "$(python3 -c "
-import json
-body = '''Context preserved from duplicate #207 (speedup tokenizer report generation):
-
-$BODY_207'''
-print(json.dumps({'body': body}))
-")" | python3 -c 'import sys,json; print(f"Commented on #208: {json.load(sys.stdin).get(\"id\",\"error\")}")'
-
-curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/207/comments" \
-  -d '{"body":"Closing as duplicate of #208 (optimize report generation). #207 describes a specific approach (parallel tokenization) to #208'\''s broader goal. Context has been preserved on #208."}' | python3 -c 'import sys,json; print(f"Commented on #207: {json.load(sys.stdin).get(\"id\",\"error\")}")'
-
-curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/207" \
-  -d '{"state":"closed","state_reason":"not_planned"}' | python3 -c 'import sys,json; print(f"#207 state: {json.load(sys.stdin).get(\"state\",\"error\")}")'
-```
-
-- [ ] **Step 6: Close #83 as superseded by #223 (roadmap)**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-REPO="mlcommons/endpoints"
-
-curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/83/comments" \
-  -d '{"body":"Closing as superseded by #223 (Phase 2 Roadmap). The Q1 roadmap is complete and Phase 2 planning has taken over."}' | python3 -c 'import sys,json; print(f"Commented on #83: {json.load(sys.stdin).get(\"id\",\"error\")}")'
-
-curl -s -X PATCH -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/83" \
-  -d '{"state":"closed","state_reason":"completed"}' | python3 -c 'import sys,json; print(f"#83 state: {json.load(sys.stdin).get(\"state\",\"error\")}")'
-```
-
----
-
-### Task 4: Delete Legacy Labels
-
-Remove old labels that have been replaced. Only delete after all issues have been relabeled (Task 2 complete).
-
-**Files:** None (API only)
-
-- [ ] **Step 1: Delete all legacy labels**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-REPO="mlcommons/endpoints"
-
-# URL-encode label names: spaces→%20, colons are fine in DELETE paths
-for label in "bug" "feature" "enhancement" "documentation" "performance" "question" \
-  "P0" "P1" "P2" "ShowStopper" "testing" "accuracy" "dataset" "Roadmap" "blocked" \
-  "rules" "MLCommons"; do
-  encoded=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$label'))")
-  echo -n "Deleting '$label'... "
-  STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X DELETE \
-    -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-    "https://api.github.com/repos/$REPO/labels/$encoded")
-  if [ "$STATUS" = "204" ]; then echo "deleted"; elif [ "$STATUS" = "404" ]; then echo "not found (already gone)"; else echo "status $STATUS"; fi
-done
-```
-
-Expected: Each label prints "deleted" or "not found". No errors.
-
-- [ ] **Step 2: Verify final label set**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-curl -s -H "Authorization: token $TOKEN" \
-  "https://api.github.com/repos/mlcommons/endpoints/labels?per_page=100" | \
-  python3 -c "
-import sys, json
-labels = json.load(sys.stdin)
-names = sorted([l['name'] for l in labels])
-print(f'Total labels: {len(names)}')
-for n in names:
-    print(f'  {n}')
-"
-```
-
-Expected: Only new prefixed labels plus kept labels (`good first issue`, `help wanted`, `mlcommons`, `dependencies`, `security`, `duplicate`, `invalid`, `wontfix`). No old labels remain.
-
----
-
-### Task 5: Configure Project Board #57
-
-Set up the board with status field options, custom fields, and 4 views using the GraphQL API.
-
-**Files:** None (API only)
-
-**NOTE:** The board already exists with ID `PVT_kwDOBAnwDc4BTQvY`. We need to configure its fields and views.
-
-- [ ] **Step 1: Get the board's field IDs**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  -X POST https://api.github.com/graphql \
-  -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { fields(first: 20) { nodes { ... on ProjectV2Field { id name } ... on ProjectV2SingleSelectField { id name options { id name } } ... on ProjectV2IterationField { id name } } } } } }"}' | python3 -m json.tool
-```
-
-Expected: JSON listing all existing fields with their IDs. Look for the "Status" field and its current options. Record the Status field ID for next steps.
-
-- [ ] **Step 2: Update the Status field with 6 options**
-
-Using the Status field ID from Step 1, update its options. The GraphQL mutation is `updateProjectV2Field`. First, clear existing options and set the 6 new ones.
-
-**Note:** You must adapt the field ID from Step 1's output. Replace `STATUS_FIELD_ID` below with the actual ID.
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-
-# Get current status field ID (adapt if needed)
-STATUS_FIELD_ID="<from step 1>"
-
-# Update status field options using the updateProjectV2SingleSelectField mutation
-curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  -X POST https://api.github.com/graphql \
-  -d '{
-    "query": "mutation { updateProjectV2Field(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", fieldId: \"'"$STATUS_FIELD_ID"'\", singleSelectOptions: [{name: \"Inbox\", color: GRAY}, {name: \"Triage\", color: YELLOW}, {name: \"Ready\", color: BLUE}, {name: \"In Progress\", color: ORANGE}, {name: \"In Review\", color: PURPLE}, {name: \"Done\", color: GREEN}]}) { projectV2Field { ... on ProjectV2SingleSelectField { id options { id name } } } }"
-  }' | python3 -m json.tool
-```
-
-Expected: Returns the updated Status field with 6 options.
-
-- [ ] **Step 3: Create Priority custom field**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  -X POST https://api.github.com/graphql \
-  -d '{
-    "query": "mutation { createProjectV2Field(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", dataType: SINGLE_SELECT, name: \"Priority\", singleSelectOptions: [{name: \"ShowStopper\", color: RED}, {name: \"P0\", color: RED}, {name: \"P1\", color: ORANGE}, {name: \"P2\", color: YELLOW}, {name: \"P3\", color: GREEN}]}) { projectV2Field { ... on ProjectV2SingleSelectField { id name options { id name } } } }"
-  }' | python3 -m json.tool
-```
-
-Expected: Priority field created with 5 options.
-
-- [ ] **Step 4: Create Area custom field**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  -X POST https://api.github.com/graphql \
-  -d '{
-    "query": "mutation { createProjectV2Field(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", dataType: SINGLE_SELECT, name: \"Area\", singleSelectOptions: [{name: \"core-engine\", color: BLUE}, {name: \"client\", color: BLUE}, {name: \"metrics\", color: BLUE}, {name: \"dataset\", color: BLUE}, {name: \"config-cli\", color: BLUE}, {name: \"evaluation\", color: BLUE}, {name: \"adapters\", color: BLUE}, {name: \"mlcommons\", color: PURPLE}]}) { projectV2Field { ... on ProjectV2SingleSelectField { id name options { id name } } } }"
-  }' | python3 -m json.tool
-```
-
-Expected: Area field created with 8 options.
-
-- [ ] **Step 5: Create Target Release custom field**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  -X POST https://api.github.com/graphql \
-  -d '{
-    "query": "mutation { createProjectV2Field(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", dataType: SINGLE_SELECT, name: \"Target Release\", singleSelectOptions: [{name: \"v0.5.0\", color: GRAY}, {name: \"v1.0.0\", color: GRAY}]}) { projectV2Field { ... on ProjectV2SingleSelectField { id name options { id name } } } }"
-  }' | python3 -m json.tool
-```
-
-Expected: Target Release field created.
-
-- [ ] **Step 6: Verify all fields exist**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-curl -s -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  -X POST https://api.github.com/graphql \
-  -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { fields(first: 20) { nodes { ... on ProjectV2Field { id name } ... on ProjectV2SingleSelectField { id name options { id name } } } } } } }"}' | python3 -m json.tool
-```
-
-Expected: Status (6 options), Priority (5 options), Area (8 options), Target Release (2 options) all present.
-
----
-
-### Task 6: Add Issues to Board #57
-
-Add all ShowStopper through P2 issues (~40 after dedup) to the project board and set their status to Triage.
-
-**Files:** None (API only)
-
-- [ ] **Step 1: Get issue node IDs for all Q2 issues**
-
-We need the GraphQL node IDs for each issue to add them to the project. Batch-fetch them:
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-
-# All issue numbers to add to board (ShowStopper + P0 + P1 + P2)
-ISSUES="84 8 4 86 232 183 138 6 5 9 255 269 237 219 221 202 199 222 220 182 177 176 113 210 268 10 7 254 217 179 178 173 224 208 158 125 115 79 73 68 58 213 133 174 229 228 227 212"
-
-for issue in $ISSUES; do
-  NODE_ID=$(curl -s -H "Authorization: token $TOKEN" \
-    "https://api.github.com/repos/mlcommons/endpoints/issues/$issue" | \
-    python3 -c 'import sys,json; print(json.load(sys.stdin)["node_id"])')
-  echo "$issue $NODE_ID"
-done
-```
-
-Expected: A list of issue numbers and their node IDs. Save this output — you'll need it for Step 2.
-
-- [ ] **Step 2: Add each issue to the project**
-
-For each issue, use the `addProjectV2ItemById` mutation. Process in batches to avoid rate limiting:
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-PROJECT_ID="PVT_kwDOBAnwDc4BTQvY"
-
-# Use the node IDs from Step 1. Example for one issue:
-# curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
-#   -d '{"query":"mutation { addProjectV2ItemById(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", contentId: \"NODE_ID_HERE\"}) { item { id } } }"}'
-
-# Batch all issues:
-ISSUES="84 8 4 86 232 183 138 6 5 9 255 269 237 219 221 202 199 222 220 182 177 176 113 210 268 10 7 254 217 179 178 173 224 208 158 125 115 79 73 68 58 213 133 174 229 228 227 212"
-
-for issue in $ISSUES; do
-  NODE_ID=$(curl -s -H "Authorization: token $TOKEN" \
-    "https://api.github.com/repos/mlcommons/endpoints/issues/$issue" | \
-    python3 -c 'import sys,json; print(json.load(sys.stdin)["node_id"])')
-
-  ITEM_ID=$(curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
-    -d "{\"query\":\"mutation { addProjectV2ItemById(input: {projectId: \\\"$PROJECT_ID\\\", contentId: \\\"$NODE_ID\\\"}) { item { id } } }\"}" | \
-    python3 -c 'import sys,json; print(json.load(sys.stdin)["data"]["addProjectV2ItemById"]["item"]["id"])')
-
-  echo "#$issue added: $ITEM_ID"
-  sleep 0.5  # Rate limit courtesy
-done
-```
-
-Expected: Each issue prints its project item ID. All ~47 issues added.
-
-- [ ] **Step 3: Set all items to Triage status**
-
-After adding items, set their Status field to "Triage". You need the Status field ID and the "Triage" option ID from Task 5 Step 1/2.
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-PROJECT_ID="PVT_kwDOBAnwDc4BTQvY"
-STATUS_FIELD_ID="<from Task 5>"
-TRIAGE_OPTION_ID="<from Task 5>"
-
-# For each item added in Step 2, set status to Triage
-# Use the item IDs printed in Step 2
-for ITEM_ID in <paste item IDs from step 2>; do
-  curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
-    -d "{\"query\":\"mutation { updateProjectV2ItemFieldValue(input: {projectId: \\\"$PROJECT_ID\\\", itemId: \\\"$ITEM_ID\\\", fieldId: \\\"$STATUS_FIELD_ID\\\", value: {singleSelectOptionId: \\\"$TRIAGE_OPTION_ID\\\"}}) { projectV2Item { id } } }\"}" | \
-    python3 -c 'import sys,json; d=json.load(sys.stdin); print(f"Set triage: {d}")'
-  sleep 0.3
-done
-```
-
-Expected: All items set to Triage status.
-
-- [ ] **Step 4: Verify board population**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
-  -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { items(first: 100) { totalCount nodes { content { ... on Issue { number title } } } } } } }"}' | \
-  python3 -c "
-import sys, json
-data = json.load(sys.stdin)
-items = data['data']['node']['items']
-print(f'Total items on board: {items[\"totalCount\"]}')
-for item in items['nodes']:
-    c = item['content']
-    print(f'  #{c[\"number\"]} {c[\"title\"]}')
-"
-```
-
-Expected: ~47 issues listed on the board.
-
----
-
-### Task 7: Create Board Views
-
-Create the 4 views on the project board. The default view already exists (rename to Kanban); create 3 additional views.
-
-**Files:** None (API only)
-
-- [ ] **Step 1: List existing views**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
-  -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { views(first: 10) { nodes { id name number layout } } } } }"}' | python3 -m json.tool
-```
-
-Expected: At least one default view. Record its ID.
-
-- [ ] **Step 2: Update default view to Kanban board layout**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-DEFAULT_VIEW_ID="<from step 1>"
-
-curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
-  -d "{\"query\":\"mutation { updateProjectV2View(input: {projectId: \\\"PVT_kwDOBAnwDc4BTQvY\\\", viewId: \\\"$DEFAULT_VIEW_ID\\\", name: \\\"Kanban\\\", layout: BOARD_LAYOUT}) { projectV2View { id name layout } } }\"}" | python3 -m json.tool
-```
-
-Expected: Default view renamed to "Kanban" with BOARD_LAYOUT.
-
-- [ ] **Step 3: Create Priority Table view**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
-  -d '{"query":"mutation { createProjectV2View(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", name: \"Priority Table\", layout: TABLE_LAYOUT}) { projectV2View { id name layout } } }"}' | python3 -m json.tool
-```
-
-Expected: New "Priority Table" view created with TABLE_LAYOUT.
-
-- [ ] **Step 4: Create By Assignee view**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
-  -d '{"query":"mutation { createProjectV2View(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", name: \"By Assignee\", layout: TABLE_LAYOUT}) { projectV2View { id name layout } } }"}' | python3 -m json.tool
-```
-
-Expected: New "By Assignee" view created.
-
-- [ ] **Step 5: Create Stale Issues view**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
-  -d '{"query":"mutation { createProjectV2View(input: {projectId: \"PVT_kwDOBAnwDc4BTQvY\", name: \"Stale Issues\", layout: TABLE_LAYOUT}) { projectV2View { id name layout } } }"}' | python3 -m json.tool
-```
-
-Expected: New "Stale Issues" view created.
-
-- [ ] **Step 6: Verify all 4 views exist**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-curl -s -H "Authorization: token $TOKEN" -X POST https://api.github.com/graphql \
-  -d '{"query":"{ node(id: \"PVT_kwDOBAnwDc4BTQvY\") { ... on ProjectV2 { views(first: 10) { nodes { id name number layout } } } } }"}' | python3 -m json.tool
-```
-
-Expected: 4 views — Kanban (BOARD_LAYOUT), Priority Table (TABLE_LAYOUT), By Assignee (TABLE_LAYOUT), Stale Issues (TABLE_LAYOUT).
-
-**NOTE:** View-level sorting, grouping, and filtering must be configured manually in the GitHub web UI after views are created. The GraphQL API supports creating views and setting layout, but fine-grained sort/group/filter configuration is not fully exposed via API. After this task, open https://github.com/orgs/mlcommons/projects/57 and configure:
-- Kanban: Group by Priority
-- Priority Table: Sort by Priority field ascending
-- By Assignee: Group by Assignee
-- Stale Issues: Sort by Updated ascending, filter to items not updated in 30+ days
-
----
-
-### Task 8: Create Issue Templates
-
-Write the 4 YAML issue form templates and the config file to the local repo.
-
-**Files:**
-- Create: `.github/ISSUE_TEMPLATE/100-bug-report.yml`
-- Create: `.github/ISSUE_TEMPLATE/200-feature-request.yml`
-- Create: `.github/ISSUE_TEMPLATE/300-performance.yml`
-- Create: `.github/ISSUE_TEMPLATE/400-dataset-integration.yml`
-- Create: `.github/ISSUE_TEMPLATE/config.yml`
-
-- [ ] **Step 1: Create the ISSUE_TEMPLATE directory**
-
-```bash
-mkdir -p .github/ISSUE_TEMPLATE
-```
-
-- [ ] **Step 2: Write 100-bug-report.yml**
-
-Write to `.github/ISSUE_TEMPLATE/100-bug-report.yml` with the exact content from the design spec Section 3, `100-bug-report.yml`.
-
-- [ ] **Step 3: Write 200-feature-request.yml**
-
-Write to `.github/ISSUE_TEMPLATE/200-feature-request.yml` with the exact content from the design spec Section 3, `200-feature-request.yml`.
-
-- [ ] **Step 4: Write 300-performance.yml**
-
-Write to `.github/ISSUE_TEMPLATE/300-performance.yml` with the exact content from the design spec Section 3, `300-performance.yml`.
-
-- [ ] **Step 5: Write 400-dataset-integration.yml**
-
-Write to `.github/ISSUE_TEMPLATE/400-dataset-integration.yml` with the exact content from the design spec Section 3, `400-dataset-integration.yml`.
-
-- [ ] **Step 6: Write config.yml**
-
-Write to `.github/ISSUE_TEMPLATE/config.yml`:
-
-```yaml
-blank_issues_enabled: true
-contact_links:
-  - name: Questions & Discussion
-    url: https://github.com/mlcommons/endpoints/discussions
-    about: Ask questions and discuss ideas before filing an issue
-```
-
-- [ ] **Step 7: Verify all template files exist**
-
-```bash
-ls -la .github/ISSUE_TEMPLATE/
-```
-
-Expected: 5 files — `100-bug-report.yml`, `200-feature-request.yml`, `300-performance.yml`, `400-dataset-integration.yml`, `config.yml`.
-
-- [ ] **Step 8: Commit issue templates**
-
-```bash
-git add .github/ISSUE_TEMPLATE/
-git commit -m "chore: add issue templates (bug, feature, performance, dataset)
-
-Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>"
-```
-
----
-
-### Task 9: Update CONTRIBUTING.md
-
-Replace the existing 10-line CONTRIBUTING.md with the expanded ~250-line version.
-
-**Files:**
-- Modify: `CONTRIBUTING.md` (full rewrite)
-
-- [ ] **Step 1: Write the new CONTRIBUTING.md**
-
-Write the full CONTRIBUTING.md content as designed in Section 4 of the spec. The full text was presented during brainstorming and approved. It includes these sections:
-
-1. Welcome and Table of Contents
-2. Ways to Contribute (links to all 4 issue templates)
-3. Development Setup (prerequisites, fork/clone, venv, pip install, pre-commit, echo server)
-4. Code Style and Conventions (ruff, mypy, line length 88, conventional commits, serialization, performance-sensitive code)
-5. Testing (pytest commands, markers, async mode, coverage, fixtures)
-6. Submitting Changes (branch naming, PR process, review criteria)
-7. Issue Guidelines (templates, lifecycle, priority levels table)
-8. MLCommons CLA (existing CLA requirements preserved)
-9. Questions section
-
-- [ ] **Step 2: Commit CONTRIBUTING.md**
-
-```bash
-git add CONTRIBUTING.md
-git commit -m "docs: expand CONTRIBUTING.md with development guide, testing, and issue guidelines
-
-Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>"
-```
-
----
-
-### Task 10: Link Open PRs to Issues
-
-Add comments on open PRs that implement issues different from their own number, creating explicit linkage.
-
-**Files:** None (API only)
-
-- [ ] **Step 1: Link PRs to their corresponding issues**
-
-Only PRs where the PR number differs from the issue it implements need explicit linking:
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-REPO="mlcommons/endpoints"
-
-# PR #226 implements issue #232 (multi-turn)
-curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/226/comments" \
-  -d '{"body":"Relates to #232 (multi-turn implementation). This PR provides the initial multi-turn enabling work tracked by #232."}' | python3 -c 'import sys,json; print(f"PR #226 linked to #232: {json.load(sys.stdin).get(\"id\",\"error\")}")'
-
-# PR #207 implements issue #208 (report generation optimization)
-curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/207/comments" \
-  -d '{"body":"Relates to #208 (optimize report generation). This PR implements parallel tokenization as one approach to #208."}' | python3 -c 'import sys,json; print(f"PR #207 linked to #208: {json.load(sys.stdin).get(\"id\",\"error\")}")'
-
-# PR #170 implements issue #86 (warmup runs)
-curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/170/comments" \
-  -d '{"body":"Relates to #86 (Warmup runs). This PR implements warmup with random dataset as part of #86."}' | python3 -c 'import sys,json; print(f"PR #170 linked to #86: {json.load(sys.stdin).get(\"id\",\"error\")}")'
-
-# PR #205 relates to issue #255 (Make Loadgen Async)
-curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/$REPO/issues/205/comments" \
-  -d '{"body":"Relates to #255 (Make Loadgen Async). Both this PR and #255 target the same async benchmark goal."}' | python3 -c 'import sys,json; print(f"PR #205 linked to #255: {json.load(sys.stdin).get(\"id\",\"error\")}")'
-```
-
-Expected: 4 comments posted linking PRs to their primary issues.
-
----
-
-### Task 11: Push and Create PR
-
-Push the local commits (issue templates + CONTRIBUTING.md) as a PR to the repository.
-
-**Files:** None (git operations)
-
-- [ ] **Step 1: Create a feature branch**
-
-```bash
-git checkout -b chore/project-management-setup
-```
-
-- [ ] **Step 2: Cherry-pick the commits onto the branch**
-
-If you committed on main, reset main and cherry-pick onto the new branch. Otherwise if you're already on the branch, skip this.
-
-- [ ] **Step 3: Push to remote**
-
-```bash
-git push -u origin chore/project-management-setup
-```
-
-- [ ] **Step 4: Create the PR**
-
-```bash
-TOKEN=$(gh auth token 2>&1)
-curl -s -X POST -H "Authorization: token $TOKEN" -H "Accept: application/vnd.github+json" \
-  "https://api.github.com/repos/mlcommons/endpoints/pulls" \
-  -d '{
-    "title": "chore: add issue templates, expand CONTRIBUTING.md, and project management setup",
-    "body": "## Summary\n\n- Add 4 YAML issue form templates (bug report, feature request, performance issue, dataset integration)\n- Expand CONTRIBUTING.md with development setup, code style, testing, PR process, and issue guidelines\n- Part of the project management infrastructure setup (labels, board, and issue migration done via API)\n\n## Related\n\nDesign spec: docs/superpowers/specs/2026-04-07-project-management-design.md\n\n## Test plan\n\n- [ ] Verify issue templates render correctly on GitHub (New Issue page)\n- [ ] Verify CONTRIBUTING.md renders correctly\n- [ ] Verify all links in CONTRIBUTING.md work\n\n🤖 Generated with [Claude Code](https://claude.com/claude-code)",
-    "head": "chore/project-management-setup",
-    "base": "main"
-  }' | python3 -c 'import sys,json; d=json.load(sys.stdin); print(f"PR created: {d.get(\"html_url\", d.get(\"message\", \"error\"))}")'
-```
-
-Expected: PR URL printed.
-
----
-
-### Task 12: Enable Board Automations
-
-Configure the built-in automations on project board #57 via the GitHub web UI.
-
-**Files:** None (manual UI configuration)
-
-**NOTE:** GitHub Projects V2 built-in automations (auto-add, auto-archive, auto-set status on close) are not configurable via the GraphQL API. They must be enabled manually.
-
-- [ ] **Step 1: Open project settings**
-
-Navigate to: https://github.com/orgs/mlcommons/projects/57/settings
-
-- [ ] **Step 2: Enable "Auto-add" workflow**
-
-Under Workflows → Auto-add to project:
-- Enable the workflow
-- Filter: `is:issue is:open repo:mlcommons/endpoints`
-- This ensures all new issues are automatically added to the board with Inbox status
-
-- [ ] **Step 3: Enable "Item closed" workflow**
-
-Under Workflows → Item closed:
-- Enable the workflow
-- Set status to: Done
-
-- [ ] **Step 4: Enable "Pull request merged" workflow**
-
-Under Workflows → Pull request merged:
-- Enable the workflow
-- Set status to: Done
-
-- [ ] **Step 5: Enable "Auto-archive items"**
-
-Under Workflows → Auto-archive items:
-- Enable the workflow
-- Archive items that have been Done for 14 days
-
----
-
-### Task 13: Configure Board Views in UI
-
-Fine-tune the sort, group, and filter settings for each view in the GitHub web UI.
-
-**Files:** None (manual UI configuration)
-
-- [ ] **Step 1: Configure Kanban view**
-
-Open: https://github.com/orgs/mlcommons/projects/57/views/1
-- Set layout to Board (should already be set)
-- Column field: Status
-- Group by: Priority (ShowStopper at top)
-- Filter: `status:Inbox,Triage,Ready,"In Progress","In Review"`
-
-- [ ] **Step 2: Configure Priority Table view**
-
-Open the Priority Table view
-- Sort by: Priority ascending (ShowStopper first)
-- Show columns: Title, Priority, Area, Status, Assignee, Target Release
-- Filter: exclude Done items
-
-- [ ] **Step 3: Configure By Assignee view**
-
-Open the By Assignee view
-- Group by: Assignee
-- Sort by: Priority ascending within each group
-- Show columns: Title, Priority, Area, Status
-
-- [ ] **Step 4: Configure Stale Issues view**
-
-Open the Stale Issues view
-- Sort by: Updated date ascending (oldest first)
-- Show columns: Title, Priority, Area, Status, Assignee, Updated
-- Filter: exclude Done, show only items not updated in 30+ days
diff --git a/docs/superpowers/specs/2026-04-07-project-management-design.md b/docs/superpowers/specs/2026-04-07-project-management-design.md
deleted file mode 100644
index 43e5d446..00000000
--- a/docs/superpowers/specs/2026-04-07-project-management-design.md
+++ /dev/null
@@ -1,605 +0,0 @@
-# Project Management Design: Labels, Board, Templates, and CONTRIBUTING.md
-
-**Date:** 2026-04-07
-**Author:** Zhihan Jiang (nvzhihanj)
-**Status:** Draft
-
-## Context
-
-The mlcommons/endpoints repository has 57 open issues with inconsistent labeling,
-no issue templates, a minimal CONTRIBUTING.md, and no active project board. The
-project has 3-4 core contributors (NVIDIA) and growing community participation
-(Intel, MLCommons, external). The goal is to establish project management
-infrastructure that serves the **broader MLCommons community** as the primary
-audience — making it easy for external contributors to self-serve, pick up issues,
-and understand the project roadmap.
-
-### Research Basis
-
-This design is informed by analysis of label taxonomies and project management
-practices from: Kubernetes, PyTorch, vLLM, Ray, SGLang, MLCommons/inference,
-and guidance from opensource.guide, GitHub Docs, CNCF, and Linux Foundation.
-
-### Phased Approach
-
-- **Phase 1 (now):** Labels, board, templates, CONTRIBUTING.md, issue migration
-- **Phase 2 (when issue volume > 100 or contributors > 10):** Size/effort labels,
-  stale bot automation, iteration/sprint fields, disable blank issues
-
----
-
-## 1. Label Taxonomy (~28 labels)
-
-### Design Principles
-
-- **Prefixed naming** (`type:`, `priority:`, `area:`, `status:`) for filterability
-  and visual grouping — inspired by Ray and PyTorch
-- **Coarse area labels** (7) grouping related modules — start coarse, split later
-- **Severity-gradient colors** for priority — hotter = more urgent
-- **Single color family** per label category for visual coherence
-
-### Type Labels
-
-| Label | Color | Description |
-|-------|-------|-------------|
-| `type: bug` | `#d73a4a` | Something isn't working |
-| `type: feature` | `#a2eeef` | New feature or capability |
-| `type: enhancement` | `#bfd4f2` | Improvement to existing functionality |
-| `type: performance` | `#3ddd26` | Performance regression or improvement |
-| `type: documentation` | `#0075ca` | Documentation only |
-| `type: question` | `#d876e3` | Usage question or clarification |
-| `type: RFC` | `#76fde7` | Request for comments / design proposal |
-| `type: chore` | `#ededed` | Maintenance, deps, CI, tooling |
-
-### Priority Labels
-
-| Label | Color | Description |
-|-------|-------|-------------|
-| `priority: ShowStopper` | `#000000` | Drop everything — critical blocker, all hands on deck |
-| `priority: P0` | `#b60205` | Critical — blocks release or users |
-| `priority: P1` | `#d93f0b` | High — must address this cycle |
-| `priority: P2` | `#fbca04` | Medium — address within quarter |
-| `priority: P3` | `#0e8a16` | Low — backlog, nice to have |
-
-### Area Labels
-
-| Label | Color | Description |
-|-------|-------|-------------|
-| `area: core-engine` | `#c5def5` | Load generator, scheduler, async utils |
-| `area: client` | `#c5def5` | Endpoint client, HTTP, transport, ZMQ |
-| `area: metrics` | `#c5def5` | Event recorder, metrics reporter, reporting |
-| `area: dataset` | `#c5def5` | Dataset manager, formats, predefined datasets |
-| `area: config-cli` | `#c5def5` | Config schema, CLI commands, YAML |
-| `area: evaluation` | `#c5def5` | Accuracy evaluation, scoring, extractors |
-| `area: adapters` | `#c5def5` | OpenAI, SGLang protocol adapters |
-
-### Status Labels
-
-| Label | Color | Description |
-|-------|-------|-------------|
-| `status: needs-triage` | `#e99695` | New issue, awaiting review |
-| `status: needs-info` | `#f9d0c4` | Awaiting more details from reporter |
-| `status: blocked` | `#b60205` | Blocked on external dependency or decision |
-
-### Community Labels (keep existing)
-
-| Label | Color | Description |
-|-------|-------|-------------|
-| `good first issue` | `#7057ff` | Good for newcomers |
-| `help wanted` | `#008672` | Extra attention needed |
-
-### Other (keep existing)
-
-| Label | Color | Description |
-|-------|-------|-------------|
-| `mlcommons` | `#e0703c` | MLCommons ruleset/submission integration |
-| `dependencies` | `#9083cd` | Dependency updates |
-| `security` | `#b60205` | Security vulnerability or hardening |
-| `duplicate` | `#cfd3d7` | Duplicate issue |
-| `invalid` | `#e4e669` | Not valid |
-| `wontfix` | `#ffffff` | Will not be worked on |
-
-### Labels to Remove
-
-These are replaced by the prefixed equivalents above:
-
-| Old Label | Replaced By |
-|-----------|-------------|
-| `bug` | `type: bug` |
-| `feature` | `type: feature` |
-| `enhancement` | `type: enhancement` |
-| `documentation` | `type: documentation` |
-| `performance` | `type: performance` |
-| `question` | `type: question` |
-| `P0` | `priority: P0` |
-| `P1` | `priority: P1` |
-| `P2` | `priority: P2` |
-| `ShowStopper` | `priority: ShowStopper` |
-| `testing` | `type: chore` (context-dependent) |
-| `accuracy` | `area: evaluation` |
-| `dataset` | `area: dataset` |
-| `Roadmap` | `type: RFC` |
-| `blocked` | `status: blocked` |
-| `rules` | `mlcommons` |
-| `MLCommons` | `mlcommons` (lowercase) |
-
----
-
-## 2. Project Board #57 Structure
-
-### Status Columns
-
-```
-Inbox → Triage → Ready → In Progress → In Review → Done
-```
-
-| Column | Purpose | Entry Criteria |
-|--------|---------|----------------|
-| **Inbox** | New issues land here automatically | Auto-added when issue opened |
-| **Triage** | Being evaluated for priority/area/assignee | Someone picked it up to review |
-| **Ready** | Triaged, prioritized, ready to work on | Has priority + area labels |
-| **In Progress** | Actively being worked on | Assigned, PR may be in flight |
-| **In Review** | PR submitted, awaiting review | Linked PR exists |
-| **Done** | Merged/resolved/closed | Auto-set when issue closed |
-
-### Custom Fields
-
-| Field | Type | Values |
-|-------|------|--------|
-| Priority | Single select | ShowStopper, P0, P1, P2, P3 |
-| Area | Single select | core-engine, client, metrics, dataset, config-cli, evaluation, adapters, mlcommons |
-| Target Release | Single select | v0.5.0, v1.0.0 (add as needed) |
-
-### Views (4)
-
-**1. Kanban (default)**
-- Layout: Board
-- Columns: Status field
-- Group by: Priority (ShowStopper at top → P3 at bottom)
-- Filter: status ≠ Done
-
-**2. Priority Table**
-- Layout: Table
-- Sort: Priority ascending (ShowStopper first), then updated date descending
-- Columns: Title, Priority, Area, Status, Assignee, Target Release
-- Filter: status ≠ Done
-
-**3. By Assignee**
-- Layout: Table
-- Group by: Assignee
-- Sort: Priority ascending within each group
-- Columns: Title, Priority, Area, Status
-- Filter: status ≠ Done
-
-**4. Stale Issues**
-- Layout: Table
-- Sort: Updated date ascending (oldest first)
-- Columns: Title, Priority, Area, Status, Assignee, Last Updated
-- Filter: status ≠ Done AND last updated more than 30 days ago
-
-### Automations
-
-| Trigger | Action |
-|---------|--------|
-| Issue added to project | Set status → Inbox |
-| Issue closed | Set status → Done |
-| PR merged closing issue | Set status → Done |
-| Item in Done 14+ days | Auto-archive |
-
----
-
-## 3. Issue Templates
-
-### Files
-
-- `.github/ISSUE_TEMPLATE/100-bug-report.yml` — Bug Report
-- `.github/ISSUE_TEMPLATE/200-feature-request.yml` — Feature Request
-- `.github/ISSUE_TEMPLATE/300-performance.yml` — Performance Issue
-- `.github/ISSUE_TEMPLATE/400-dataset-integration.yml` — Dataset Integration
-- `.github/ISSUE_TEMPLATE/config.yml` — Template chooser config
-
-### 100-bug-report.yml
-
-```yaml
-name: Bug Report
-description: Report a bug or unexpected behavior
-title: "[Bug]: "
-labels: ["type: bug", "status: needs-triage"]
-body:
-  - type: textarea
-    id: description
-    attributes:
-      label: Bug Description
-      description: What happened vs. what you expected
-      placeholder: "When I run X, I expected Y but got Z"
-    validations:
-      required: true
-  - type: textarea
-    id: reproduction
-    attributes:
-      label: Steps to Reproduce
-      value: |
-        1.
-        2.
-        3.
-    validations:
-      required: true
-  - type: textarea
-    id: environment
-    attributes:
-      label: Environment
-      description: OS, Python version, package version
-      placeholder: "OS: Ubuntu 22.04, Python 3.12, inference-endpoint v0.1.0"
-    validations:
-      required: true
-  - type: textarea
-    id: logs
-    attributes:
-      label: Relevant Logs
-      render: shell
-  - type: checkboxes
-    id: checklist
-    attributes:
-      label: Before submitting
-      options:
-        - label: I searched existing issues and found no duplicates
-          required: true
-```
-
-### 200-feature-request.yml
-
-```yaml
-name: Feature Request
-description: Suggest a new feature or enhancement
-title: "[Feature]: "
-labels: ["type: feature", "status: needs-triage"]
-body:
-  - type: textarea
-    id: motivation
-    attributes:
-      label: Motivation
-      description: What problem does this solve? Why do you need it?
-    validations:
-      required: true
-  - type: textarea
-    id: proposal
-    attributes:
-      label: Proposed Solution
-      description: How should this work? Include API sketches if relevant.
-    validations:
-      required: true
-  - type: textarea
-    id: alternatives
-    attributes:
-      label: Alternatives Considered
-  - type: textarea
-    id: context
-    attributes:
-      label: Additional Context
-```
-
-### 300-performance.yml
-
-```yaml
-name: Performance Issue
-description: Report a performance regression or improvement opportunity
-title: "[Perf]: "
-labels: ["type: performance", "status: needs-triage"]
-body:
-  - type: textarea
-    id: description
-    attributes:
-      label: Description
-      description: What performance issue did you observe?
-      placeholder: "QPS dropped from X to Y after upgrading to version Z"
-    validations:
-      required: true
-  - type: textarea
-    id: benchmark
-    attributes:
-      label: Benchmark Command
-      description: The exact command you ran
-      render: shell
-    validations:
-      required: true
-  - type: textarea
-    id: results
-    attributes:
-      label: Results
-      description: Expected vs actual numbers (QPS, latency, TTFT, TPOT, etc.)
-      placeholder: |
-        Expected: ~5000 QPS, p99 latency < 200ms
-        Actual: ~2000 QPS, p99 latency 800ms
-    validations:
-      required: true
-  - type: textarea
-    id: environment
-    attributes:
-      label: Environment
-      description: Hardware, OS, Python version, endpoint server details
-      placeholder: |
-        Hardware: 8x A100 80GB
-        OS: Ubuntu 22.04
-        Python: 3.12
-        Server: vLLM 0.6.0, Llama-3-70B
-        Workers: 4
-    validations:
-      required: true
-  - type: textarea
-    id: profiling
-    attributes:
-      label: Profiling Data (optional)
-      description: Any profiling output, flame graphs, or bottleneck analysis
-      render: shell
-  - type: checkboxes
-    id: checklist
-    attributes:
-      label: Before submitting
-      options:
-        - label: I searched existing issues and found no duplicates
-          required: true
-        - label: I ran with default settings before tuning
-          required: false
-```
-
-### 400-dataset-integration.yml
-
-```yaml
-name: Dataset Integration
-description: Request support for a new dataset or evaluation benchmark
-title: "[Dataset]: "
-labels: ["type: feature", "area: dataset", "status: needs-triage"]
-body:
-  - type: textarea
-    id: dataset
-    attributes:
-      label: Dataset Information
-      description: Name, URL, and brief description
-      placeholder: |
-        Name: MATH-500
-        URL: https://huggingface.co/datasets/...
-        Description: 500 competition math problems for testing reasoning
-    validations:
-      required: true
-  - type: dropdown
-    id: format
-    attributes:
-      label: Dataset Format
-      options:
-        - JSONL
-        - HuggingFace Dataset
-        - CSV
-        - JSON
-        - Parquet
-        - Other
-    validations:
-      required: true
-  - type: textarea
-    id: evaluation
-    attributes:
-      label: Evaluation Method
-      description: How should responses be scored?
-      placeholder: "Exact match after extracting boxed answer, or pass@1 for code"
-    validations:
-      required: true
-  - type: textarea
-    id: samples
-    attributes:
-      label: Scale
-      description: Number of samples, expected prompt/response lengths
-      placeholder: "500 samples, avg prompt ~200 tokens, avg response ~500 tokens"
-  - type: textarea
-    id: context
-    attributes:
-      label: Additional Context
-      description: Related benchmarks, papers, or prior art
-```
-
-### config.yml
-
-```yaml
-blank_issues_enabled: true
-contact_links:
-  - name: Questions & Discussion
-    url: https://github.com/mlcommons/endpoints/discussions
-    about: Ask questions and discuss ideas before filing an issue
-```
-
----
-
-## 4. CONTRIBUTING.md
-
-Replace the existing minimal CONTRIBUTING.md with an expanded version (~250 lines)
-covering:
-
-1. **Ways to Contribute** — links to all 4 issue templates, plus docs, PR reviews,
-   `good first issue` and `help wanted` labels
-2. **Development Setup** — prerequisites, fork/clone, venv, `pip install -e ".[dev,test]"`,
-   pre-commit install, local echo server testing
-3. **Code Style and Conventions** — ruff, mypy, line length 88, double quotes,
-   conventional commits, license headers, serialization conventions
-   (msgspec vs pydantic), performance-sensitive code guidelines
-4. **Testing** — pytest commands, markers (`unit`, `integration`, `slow`,
-   `performance`), `@pytest.mark.asyncio(mode="strict")`, >90% coverage target,
-   use real fixtures over mocks
-5. **Submitting Changes** — branch naming (`feat/`, `fix/`, `docs/`), PR template,
-   CI checks, review expectations (2-3 business days), review criteria
-6. **Issue Guidelines** — search first, use templates, issue lifecycle
-   (Inbox → Triage → Ready → In Progress → In Review → Done), priority levels table
-7. **MLCommons CLA** — existing CLA requirements preserved
-
----
-
-## 5. Issue Migration Plan
-
-### Duplicate Resolution
-
-Close duplicates with a comment explaining the closure and linking to the primary
-issue. Copy any unique context from the duplicate into a comment on the primary
-issue so no information is lost.
-
-| Close | Primary | Reason |
-|-------|---------|--------|
-| #205 "fully async benchmark" | #255 "Make Loadgen Async" | Same goal, #255 is cleaner |
-| #170 "warmup with random dataset" | #86 "Warmup runs" | Subset of #86 |
-| #226 "Initial multi-turn enabling" | #232 "multi-turn implementation" | Same feature |
-| #29 "submission checker for 6.0" | #79 "submission checker compat mode" | #29 is version-specific, superseded |
-| #207 "speedup tokenizer report" | #208 "optimize report generation" | #207 is a specific approach to #208 |
-| #83 "Q1 Roadmap" | #223 "Phase 2 Roadmap" | Superseded |
-
-**Evaluation:** #73 "random dataset support" — keep if random dataset has value
-beyond warmup use case; otherwise close as duplicate of #86.
-
-### Label Reassignment
-
-All 57 open issues are reassigned from old labels to the new prefixed taxonomy.
-Full mapping follows, organized by priority tier.
-
-#### ShowStopper
-
-| # | Title | Labels |
-|---|-------|--------|
-| 84 | Pareto clarification | `priority: ShowStopper`, `area: config-cli`, `mlcommons` |
-| 8 | Parity with MLPerf LoadGen | `priority: ShowStopper`, `type: performance`, `area: core-engine` |
-| 4 | Accuracy evaluation for LLMs | `priority: ShowStopper`, `type: feature`, `area: evaluation` |
-
-#### P0
-
-| # | Title | Labels |
-|---|-------|--------|
-| 86 | Warmup runs | `priority: P0`, `type: feature`, `area: core-engine` |
-| 232 | Multi-turn implementation | `priority: P0`, `type: feature`, `area: dataset` |
-| 183 | Pub/Sub event recorder | `priority: P0`, `type: feature`, `area: metrics` |
-| 138 | CI stress test upper bound | `priority: P0`, `type: chore`, `area: core-engine` |
-| 6 | Final report structure | `priority: P0`, `type: feature`, `area: metrics` |
-| 5 | Submission ruleset + config | `priority: P0`, `type: feature`, `area: config-cli`, `mlcommons` |
-
-#### P1
-
-| # | Title | Labels |
-|---|-------|--------|
-| 9 | Roofline analysis | `priority: P1`, `type: performance`, `area: core-engine` |
-| 255 | Make Loadgen Async | `priority: P1`, `type: feature`, `area: core-engine` |
-| 269 | Low concurrency timeouts | `priority: P1`, `type: bug`, `area: client` |
-| 237 | CLI fix --load-pattern + --target-qps | `priority: P1`, `type: bug`, `area: config-cli` |
-| 219 | target_qps hardcoded in Offline | `priority: P1`, `type: bug`, `area: config-cli` |
-| 221 | RuntimeSettings non-reproducible | `priority: P1`, `type: bug`, `area: config-cli` |
-| 202 | max_throughput connection timeouts | `priority: P1`, `type: bug`, `area: client` |
-| 199 | Perf discrepancy submission vs perf config | `priority: P1`, `type: bug`, `area: config-cli` |
-| 222 | KVStore/ServiceLauncher lack tests | `priority: P1`, `type: chore`, `area: core-engine` |
-| 220 | SGLang adapter tests skipped | `priority: P1`, `type: chore`, `area: adapters` |
-| 182 | Text vs token perf on TRTLLM | `priority: P1`, `type: performance`, `area: metrics` |
-| 177 | MATH500 dataset | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` |
-| 176 | MMLU/MMLU-Pro | `priority: P1`, `type: feature`, `area: evaluation`, `area: dataset` |
-| 113 | DeepSeek | `priority: P1`, `type: feature` |
-| 210 | Wan2.2-T2V support | `priority: P1`, `type: feature` |
-| 268 | Phase 2 model selection | `priority: P1`, `type: feature` |
-| 10 | System bottleneck tests | `priority: P1`, `type: performance`, `area: core-engine` |
-| 7 | Runtime visualization | `priority: P1`, `type: feature`, `area: metrics` |
-
-#### P2
-
-| # | Title | Labels |
-|---|-------|--------|
-| 254 | Handling failed requests | `priority: P2`, `type: feature`, `area: client` |
-| 217 | BURST and STEP load patterns | `priority: P2`, `type: feature`, `area: core-engine` |
-| 179 | Humanity's Last Exam | `priority: P2`, `type: feature`, `area: evaluation`, `area: dataset` |
-| 178 | Healthbench integration | `priority: P2`, `type: feature`, `area: evaluation`, `area: dataset` |
-| 173 | Investigate mlcr failures | `priority: P2`, `type: bug`, `mlcommons` |
-| 224 | Multiple perf configs | `priority: P2`, `type: feature`, `area: config-cli` |
-| 208 | Optimize report generation | `priority: P2`, `type: performance`, `area: metrics` |
-| 158 | SGLang adapter + OpenAI compat | `priority: P2`, `type: feature`, `area: adapters` |
-| 125 | Multi-concurrency scans | `priority: P2`, `type: feature`, `area: core-engine` |
-| 115 | Clarify default metric | `priority: P2`, `type: enhancement`, `area: config-cli` |
-| 79 | Submission checker compat mode | `priority: P2`, `type: feature`, `mlcommons` |
-| 73 | Random dataset support | `priority: P2`, `type: feature`, `area: dataset` |
-| 68 | Official model name mapping | `priority: P2`, `type: feature`, `area: config-cli`, `mlcommons` |
-| 58 | Config-template mapping | `priority: P2`, `type: feature`, `area: config-cli`, `mlcommons` |
-| 213 | PostGres dup element | `priority: P2`, `type: bug`, `mlcommons` |
-| 133 | llama.cpp incompatibility | `priority: P2`, `type: bug`, `area: client` |
-| 174 | Better error logging mlcr | `priority: P2`, `type: enhancement`, `mlcommons` |
-| 229 | Endpoints test environment | `priority: P2`, `type: chore` |
-| 228 | Endpoints Vision document | `priority: P2`, `type: documentation` |
-| 227 | DB and Object Store elements | `priority: P2`, `type: feature` |
-| 212 | UBI Storage layer | `priority: P2`, `type: feature` |
-
-#### P3
-
-| # | Title | Labels |
-|---|-------|--------|
-| 99 | Local mode errors | `priority: P3`, `type: bug`, `good first issue` |
-| 50 | LlaMa3-405b support | `priority: P3`, `type: feature` |
-| 204 | Documentation cleanup | `priority: P3`, `type: documentation` |
-| 190 | Skills, design docs, tooling | `priority: P3`, `type: chore` |
-| 181 | Sweep qwen scripts | `priority: P3`, `type: feature` |
-
-#### Other (no priority)
-
-| # | Title | Labels |
-|---|-------|--------|
-| 223 | Phase 2 Roadmap | `type: RFC` |
-| 267 | Bump transformers | `type: chore`, `dependencies`, `security` |
-
-### Q2 Board Population
-
-**Add to board #57 (~40 issues):** All ShowStopper, P0, P1, and P2 issues.
-Initial status: **Triage** (existing issues need priority confirmation from team).
-
-**Not on Q2 board (~5 issues):** P3 issues (#99, #50, #204, #190, #181) and
-dependabot (#267).
-
-### Milestones
-
-Create milestones as releases are planned:
-- `v0.5.0` — first milestone, assign issues as release scope is defined
-- `v1.0.0` — future
-
----
-
-## 6. Phase 2 (Future)
-
-Trigger when issue volume > 100 or contributors > 10:
-
-- Add `size: S`, `size: M`, `size: L`, `size: XL` effort labels
-- Disable blank issues in `config.yml`
-- Add stale bot (apply `status: stale` after 90 days, close after 30 more)
-- Add iteration/sprint fields to board if team adopts time-boxed cycles
-- Split coarse area labels if any accumulates > 20 issues
-
----
-
-## 7. Migration Procedure
-
-Order of operations for the migration:
-
-1. **Create new labels** — all `type:`, `priority:`, `area:`, `status:` labels
-2. **Relabel existing issues** — apply new labels per the mapping above
-3. **Remove old labels from issues** — strip legacy labels
-4. **Close duplicates** — comment with explanation + link to primary, copy unique
-   context to primary issue
-5. **Delete old labels** — remove legacy labels from the repository
-6. **Add issues to board #57** — all ShowStopper through P2
-7. **Set board status** — all migrated issues start in Triage
-8. **Configure board automations** — auto-add, auto-done, auto-archive
-9. **Create issue templates** — add all 4 YAML templates + config.yml
-10. **Update CONTRIBUTING.md** — replace with expanded version
-11. **Link open PRs to issues** — add "Relates to #N" comments where applicable
-12. **Commit and push** — templates + CONTRIBUTING.md in a single PR
-
-### Open PR → Issue Linkages
-
-| PR | Linked Issue | Relationship |
-|----|-------------|--------------|
-| #255 Make Loadgen Async | #255 (same) | PR is the issue |
-| #237 CLI fix --load-pattern + --target-qps | #237 (same) | PR is the issue |
-| #226 Initial multi-turn enabling | #232 multi-turn implementation | PR implements #232; #226 issue closed as dup |
-| #207 Speedup tokenizer report | #208 optimize report generation | PR implements #208; #207 issue closed as dup |
-| #205 Fully async benchmark | #255 Make Loadgen Async | Duplicate PR; #205 issue closed as dup |
-| #204 Documentation cleanup | #204 (same) | PR is the issue |
-| #190 Skills, design docs, tooling | #190 (same) | PR is the issue |
-| #181 Sweep qwen scripts | #181 (same) | PR is the issue |
-| #170 Warmup with random dataset | #86 Warmup runs | PR implements #86; #170 issue closed as dup |
-| #158 SGLang adapter + OpenAI compat | #158 (same) | PR is the issue |
-| #125 Multi-concurrency scans | #125 (same) | PR is the issue |
-| #79 Submission checker compat | #79 (same) + #29 (superseded) | PR is the issue |
-| #267 Bump transformers | #267 (dependabot) | PR is the issue |

From b1ab1c7ba1abeb976ea129480fdd09c96d4e409d Mon Sep 17 00:00:00 2001
From: Zhihan Jiang <zhihanj@nvidia.com>
Date: Wed, 8 Apr 2026 09:59:56 -0700
Subject: [PATCH 08/14] style: apply prettier formatting to README and
 CONTRIBUTING

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 CONTRIBUTING.md | 15 ++++++++-------
 README.md       | 32 ++++++++++++++++----------------
 2 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 8b264dcc..bd346de2 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -82,7 +82,7 @@ pre-commit run --all-files
 - **Quotes:** Double quotes
 - **License headers:** Required on all Python files (auto-added by pre-commit)
 - **Commit messages:** [Conventional commits](https://www.conventionalcommits.org/) — `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `perf:`
-- **Comments:** Only where the *why* isn't obvious from the code. No over-documenting.
+- **Comments:** Only where the _why_ isn't obvious from the code. No over-documenting.
 
 ### Serialization
 
@@ -185,13 +185,13 @@ and flow through: **Inbox → Triage → Ready → In Progress → In Review →
 
 ### Priority Levels
 
-| Priority | Meaning |
-|----------|---------|
+| Priority        | Meaning                            |
+| --------------- | ---------------------------------- |
 | **ShowStopper** | Drop everything — critical blocker |
-| **P0** | Blocks release or users |
-| **P1** | Must address this cycle |
-| **P2** | Address within quarter |
-| **P3** | Backlog, nice to have |
+| **P0**          | Blocks release or users            |
+| **P1**          | Must address this cycle            |
+| **P2**          | Address within quarter             |
+| **P3**          | Backlog, nice to have              |
 
 ## MLCommons CLA
 
@@ -200,6 +200,7 @@ All contributors must sign the
 A CLA bot will check your PR automatically.
 
 To sign up:
+
 1. Visit the [MLCommons Subscription form](https://mlcommons.org/membership/membership-overview/)
 2. Submit your GitHub username
 3. The CLA bot will verify on your next PR
diff --git a/README.md b/README.md
index 2a1a178f..b81cf8ba 100644
--- a/README.md
+++ b/README.md
@@ -60,13 +60,13 @@ Dataset Manager ──> Load Generator ──> Endpoint Client ──> External
                     Metrics Collector (EventRecorder + MetricsReporter)
 ```
 
-| Component | Purpose |
-|-----------|---------|
-| **Load Generator** | Central orchestrator: `BenchmarkSession` owns lifecycle, `Scheduler` controls timing |
-| **Endpoint Client** | Multi-process HTTP workers communicating via ZMQ IPC |
-| **Dataset Manager** | Loads JSONL, HuggingFace, CSV, JSON, Parquet datasets |
-| **Metrics** | SQLite-backed event recording, aggregation (QPS, latency, TTFT, TPOT) |
-| **Config** | Pydantic-based YAML schema, CLI auto-generated via cyclopts |
+| Component           | Purpose                                                                              |
+| ------------------- | ------------------------------------------------------------------------------------ |
+| **Load Generator**  | Central orchestrator: `BenchmarkSession` owns lifecycle, `Scheduler` controls timing |
+| **Endpoint Client** | Multi-process HTTP workers communicating via ZMQ IPC                                 |
+| **Dataset Manager** | Loads JSONL, HuggingFace, CSV, JSON, Parquet datasets                                |
+| **Metrics**         | SQLite-backed event recording, aggregation (QPS, latency, TTFT, TPOT)                |
+| **Config**          | Pydantic-based YAML schema, CLI auto-generated via cyclopts                          |
 
 ### Benchmark Modes
 
@@ -94,15 +94,15 @@ Run accuracy evaluation with Pass@1 scoring using pre-defined benchmarks:
 
 ## Documentation
 
-| Guide | Description |
-|-------|-------------|
-| [CLI Quick Reference](docs/CLI_QUICK_REFERENCE.md) | Command-line interface guide |
-| [CLI Design](docs/CLI_DESIGN.md) | CLI architecture and design decisions |
-| [Local Testing](docs/LOCAL_TESTING.md) | Test with the echo server |
-| [Client Performance Tuning](docs/CLIENT_PERFORMANCE_TUNING.md) | Endpoint client optimization |
-| [Performance Architecture](docs/PERF_ARCHITECTURE.md) | Performance architecture deep dive |
-| [Development Guide](docs/DEVELOPMENT.md) | Development setup and workflow |
-| [CONTRIBUTING.md](CONTRIBUTING.md) | How to contribute |
+| Guide                                                          | Description                           |
+| -------------------------------------------------------------- | ------------------------------------- |
+| [CLI Quick Reference](docs/CLI_QUICK_REFERENCE.md)             | Command-line interface guide          |
+| [CLI Design](docs/CLI_DESIGN.md)                               | CLI architecture and design decisions |
+| [Local Testing](docs/LOCAL_TESTING.md)                         | Test with the echo server             |
+| [Client Performance Tuning](docs/CLIENT_PERFORMANCE_TUNING.md) | Endpoint client optimization          |
+| [Performance Architecture](docs/PERF_ARCHITECTURE.md)          | Performance architecture deep dive    |
+| [Development Guide](docs/DEVELOPMENT.md)                       | Development setup and workflow        |
+| [CONTRIBUTING.md](CONTRIBUTING.md)                             | How to contribute                     |
 
 ## Contributing
 

From b5961aace99c747cb23b308fd0f57876a69ce771 Mon Sep 17 00:00:00 2001
From: Zhihan Jiang <zhihanj@nvidia.com>
Date: Wed, 8 Apr 2026 10:01:00 -0700
Subject: [PATCH 09/14] fix: remove invalid mode='strict' from
 @pytest.mark.asyncio examples
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Strict asyncio mode is configured globally in pyproject.toml via
asyncio_mode = "strict". The marker does not accept a mode argument —
passing it causes errors in recent pytest-asyncio versions.

Fixed in: CONTRIBUTING.md, AGENTS.md, docs/DEVELOPMENT.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 AGENTS.md           |   4 +-
 CONTRIBUTING.md     |   2 +-
 docs/DEVELOPMENT.md | 164 +++++++++++++++++++++-----------------------
 3 files changed, 82 insertions(+), 88 deletions(-)

diff --git a/AGENTS.md b/AGENTS.md
index 52a3dbb5..eb0349ca 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -240,7 +240,7 @@ See [Development Guide](docs/DEVELOPMENT.md) for full setup and workflow details
 @pytest.mark.run_explicitly # Only run when explicitly selected
 ```
 
-**Async tests**: Use `@pytest.mark.asyncio(mode="strict")` — the project uses strict asyncio mode.
+**Async tests**: Use `@pytest.mark.asyncio` — strict mode is configured globally in `pyproject.toml` (`asyncio_mode = "strict"`). Do NOT pass `mode="strict"` to the marker — it's not a valid argument.
 
 **Key fixtures** (defined in `tests/conftest.py`):
 
@@ -342,7 +342,7 @@ Known failure modes when AI tools generate code for this project. Reference thes
 
 - **Generating mock-heavy tests for integration scenarios**: This project has real echo/oracle server fixtures. AI tends to mock HTTP calls even when `mock_http_echo_server` or `mock_http_oracle_server` fixtures exist and should be used.
 - **Missing test markers**: Every test function needs `@pytest.mark.unit`, `@pytest.mark.integration`, or another marker. AI-generated tests almost always omit markers, which breaks CI filtering.
-- **Wrong asyncio mode**: Tests must use `@pytest.mark.asyncio(mode="strict")` — AI often writes bare `@pytest.mark.asyncio` or forgets it entirely, causing silent test skips or failures.
+- **Wrong asyncio marker**: Tests must use bare `@pytest.mark.asyncio` — strict mode is configured globally in `pyproject.toml`. Do NOT pass `mode="strict"` to the marker (it's not a valid argument and will cause errors). AI sometimes hallucinates this parameter.
 - **Fabricating fixture names**: AI may invent fixtures that don't exist in `conftest.py`. Always check that referenced fixtures actually exist before using them.
 
 ### Code Style & Repo Conventions
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index bd346de2..0cb0c164 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -127,7 +127,7 @@ Every test function **must** have a marker:
 
 ```python
 @pytest.mark.unit
-@pytest.mark.asyncio(mode="strict")  # for async tests — must use strict mode
+@pytest.mark.asyncio  # strict mode is configured globally in pyproject.toml
 async def test_something():
     ...
 ```
diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md
index af32da1d..4c95246f 100644
--- a/docs/DEVELOPMENT.md
+++ b/docs/DEVELOPMENT.md
@@ -2,7 +2,7 @@
 
 This guide provides everything you need to contribute to the MLPerf Inference Endpoint Benchmarking System.
 
-## Getting Started
+## 🚀 Getting Started
 
 ### Prerequisites
 
@@ -14,48 +14,40 @@ This guide provides everything you need to contribute to the MLPerf Inference En
 ### Development Environment Setup
 
 ```bash
-# 1. Fork https://github.com/mlcommons/endpoints on GitHub, then clone your fork
-git clone https://github.com/YOUR_USERNAME/endpoints.git
-cd endpoints
+# 1. Clone the repository
+git clone https://github.com/mlperf/inference-endpoint.git
+cd inference-endpoint
 
-# 2. Add the upstream repo as a remote
-git remote add upstream https://github.com/mlcommons/endpoints.git
-
-# 3. Create virtual environment (Python 3.12+ required)
+# 2. Create virtual environment (Python 3.12+ required)
 python3.12 -m venv venv
 source venv/bin/activate  # On Windows: venv\Scripts\activate
 
-# 4. Install development dependencies
+# 3. Install development dependencies
 pip install -e ".[dev,test]"
 
-# 5. Install pre-commit hooks
+# 4. Install pre-commit hooks
 pre-commit install
 
-# 6. Verify installation
+# 5. Verify installation
 inference-endpoint --version
 pytest --version
 ```
 
-## Project Structure
+## 🏗️ Project Structure
 
 ```
-endpoints/
+inference-endpoint/
 ├── src/inference_endpoint/     # Main package source
-│   ├── main.py                 # Entry point and CLI app
-│   ├── exceptions.py           # Project-wide exception types
-│   ├── async_utils/            # Event loop, ZMQ transport, pub/sub
+│   ├── cli.py                  # Command-line interface
 │   ├── commands/               # CLI command implementations
 │   ├── config/                 # Configuration and schema management
 │   ├── core/                   # Core types and orchestration
 │   ├── dataset_manager/        # Dataset handling and loading
 │   ├── endpoint_client/        # HTTP/ZMQ endpoint communication
-│   ├── evaluation/             # Accuracy evaluation and scoring
 │   ├── load_generator/         # Load generation and scheduling
 │   ├── metrics/                # Performance measurement and reporting
 │   ├── openai/                 # OpenAI API compatibility
-│   ├── plugins/                # Plugin system
 │   ├── profiling/              # Performance profiling tools
-│   ├── sglang/                 # SGLang API adapter
 │   ├── testing/                # Test utilities (echo server, etc.)
 │   └── utils/                  # Common utilities
 ├── tests/                      # Test suite
@@ -68,7 +60,7 @@ endpoints/
 └── scripts/                    # Utility scripts
 ```
 
-## Testing
+## 🧪 Testing
 
 ### Running Tests
 
@@ -111,36 +103,24 @@ import pytest
 from inference_endpoint.core.types import Query
 
 class TestQuery:
-    @pytest.mark.unit
     def test_query_creation(self):
         """Test creating a basic query."""
-        query = Query(data={"prompt": "Test", "model": "test-model"})
-        assert query.data["prompt"] == "Test"
-        assert query.data["model"] == "test-model"
+        query = Query(prompt="Test", model="test-model")
+        assert query.prompt == "Test"
+        assert query.model == "test-model"
 
-    @pytest.mark.unit
-    @pytest.mark.asyncio(mode="strict")
+    @pytest.mark.asyncio
     async def test_async_operation(self):
         """Test async operations."""
         # Your async test here
         pass
 ```
 
-## Code Quality
+## 📝 Code Quality
 
 ### Pre-commit Hooks
 
-The project uses pre-commit hooks to ensure code quality.
-
-Hooks that run automatically on commit:
-
-- trailing-whitespace, end-of-file-fixer, check-yaml, check-merge-conflict, debug-statements
-- `ruff` (lint + autofix) and `ruff-format`
-- `mypy` type checking
-- `prettier` for YAML/JSON/Markdown
-- License header enforcement (Apache 2.0 SPDX header required on all Python files, added by `scripts/add_license_header.py`)
-
-**Always run `pre-commit run --all-files` before committing.**
+The project uses pre-commit hooks to ensure code quality:
 
 ```bash
 # Install hooks (done during setup)
@@ -151,12 +131,13 @@ pre-commit run
 
 # Run all hooks on all files
 pre-commit run --all-files
+
+# Skip hooks (use sparingly)
+git commit --no-verify
 ```
 
 ### Code Formatting
 
-Configuration: `ruff` (line-length 88, target Python 3.12), `ruff-format` (double quotes, space indent).
-
 ```bash
 # Format code with ruff
 ruff format src/ tests/
@@ -178,17 +159,12 @@ mypy src/
 pre-commit run --all-files
 ```
 
-## Development Workflow
+## 🔧 Development Workflow
 
 ### 1. Feature Development
 
 ```bash
-# Sync your fork with upstream before starting
-git fetch upstream
-git checkout main
-git merge upstream/main
-
-# Create a feature branch on your fork
+# Create feature branch
 git checkout -b feature/your-feature-name
 
 # Make changes and test
@@ -199,7 +175,7 @@ pre-commit run --all-files
 git add .
 git commit -m "feat: add your feature description"
 
-# Push to your fork and open a PR against mlcommons/endpoints
+# Push and create PR
 git push origin feature/your-feature-name
 ```
 
@@ -221,15 +197,42 @@ When developing a new component:
 - **Performance Tests**: Ensure no performance regressions
 - **Documentation**: Update docs for new features
 
-## Documentation
+## 📚 Documentation
 
 ### Writing Documentation
 
-- **Code Comments**: Add comments only where the _why_ is not obvious from the code; avoid restating what the code does
+- **Code Comments**: Use docstrings for all public APIs
 - **README Updates**: Update README.md for user-facing changes
+- **API Documentation**: Document new interfaces and changes
 - **Examples**: Provide usage examples for new features
 
-## Performance Considerations
+### Documentation Standards
+
+```python
+def process_query(query: Query) -> QueryResult:
+    """
+    Process a query and return the result.
+
+    Args:
+        query: The query to process
+
+    Returns:
+        QueryResult containing the processed response
+
+    Raises:
+        QueryError: If the query cannot be processed
+
+    Example:
+        >>> query = Query(prompt="Hello")
+        >>> result = process_query(query)
+        >>> print(result.content)
+        'Hello there!'
+    """
+    # Implementation here
+    pass
+```
+
+## 🚀 Performance Considerations
 
 ### Development Guidelines
 
@@ -251,7 +254,7 @@ pytest --benchmark-only
 pytest --benchmark-compare
 ```
 
-## Debugging
+## 🔍 Debugging
 
 ### Common Issues
 
@@ -273,22 +276,7 @@ pytest -s -v
 python -m pdb -m pytest test_file.py
 ```
 
-## YAML Config Templates
-
-Config templates in `src/inference_endpoint/config/templates/` are auto-generated from schema defaults. When you change `config/schema.py`, regenerate them:
-
-```bash
-python scripts/regenerate_templates.py
-```
-
-The pre-commit hook auto-regenerates templates when `schema.py`, `config.py`, or `regenerate_templates.py` change. CI validates templates are up to date via `--check` mode.
-
-Two variants are generated per mode (offline, online, concurrency):
-
-- `_template.yaml` — minimal: only required fields + placeholders
-- `_template_full.yaml` — all fields with schema defaults + inline `# options:` comments
-
-## Package Management
+## 📦 Package Management
 
 ### Adding Dependencies
 
@@ -303,7 +291,7 @@ Install after updating:
 pip install -e ".[dev,test]"
 ```
 
-## Troubleshooting
+## 🚨 Troubleshooting
 
 ### Common Problems
 
@@ -338,20 +326,17 @@ python -c "import sys; print(sys.path)"
 export PYTHONPATH="${PYTHONPATH}:$(pwd)/src"
 ```
 
-## Contributing Guidelines
+## 🤝 Contributing Guidelines
 
 ### Pull Request Process
 
-1. **Fork** `mlcommons/endpoints` on GitHub
-2. **Clone your fork** and add `upstream` as a remote (see [Development Environment Setup](#development-environment-setup))
-3. **Sync with upstream** (`git fetch upstream && git merge upstream/main`) before starting work
-4. **Create a feature branch** on your fork (`git checkout -b feature/your-feature-name`)
-5. **Make your changes** following the coding standards
-6. **Add tests** for new functionality
-7. **Update documentation** as needed
-8. **Run all checks** locally: `pytest` and `pre-commit run --all-files`
-9. **Push to your fork** and open a PR against `mlcommons/endpoints:main`
-10. **Address review comments** promptly
+1. **Fork the repository** and create a feature branch
+2. **Make your changes** following the coding standards
+3. **Add tests** for new functionality
+4. **Update documentation** as needed
+5. **Run all checks** locally before submitting
+6. **Create a PR** with clear description and tests
+7. **Address review comments** promptly
 
 ### Commit Message Format
 
@@ -366,8 +351,6 @@ docs(readme): update installation instructions
 test(loadgen): add performance benchmarks
 ```
 
-Allowed types: `feat`, `fix`, `docs`, `test`, `chore`, `refactor`, `perf`, `ci`.
-
 ### Code Review Checklist
 
 - [ ] Code follows style guidelines
@@ -377,9 +360,20 @@ Allowed types: `feat`, `fix`, `docs`, `test`, `chore`, `refactor`, `perf`, `ci`.
 - [ ] Security implications are reviewed
 - [ ] Error handling is appropriate
 
-## Getting Help
+## 📞 Getting Help
 
-- **Issues**: [GitHub Issues](https://github.com/mlcommons/endpoints/issues)
-- **Discussions**: [GitHub Discussions](https://github.com/mlcommons/endpoints/discussions)
+- **Issues**: [GitHub Issues](https://github.com/mlperf/inference-endpoint/issues)
+- **Discussions**: [GitHub Discussions](https://github.com/mlperf/inference-endpoint/discussions)
 - **Documentation**: Check this guide and project docs
 - **Team**: Reach out to the development team
+
+## 🎯 Next Steps
+
+1. **Set up your environment** using this guide
+2. **Explore the codebase** to understand the architecture
+3. **Pick a component** to work on from the project board
+4. **Start with tests** to understand the expected behavior
+5. **Implement incrementally** with regular testing
+6. **Ask questions** when you need help
+
+Happy coding! 🚀

From cc3af95b2fa9518830686f5e4ab45f713377008d Mon Sep 17 00:00:00 2001
From: Zhihan Jiang <zhihanj@nvidia.com>
Date: Wed, 8 Apr 2026 10:06:31 -0700
Subject: [PATCH 10/14] docs: remove CLA line from README Contributing section
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

CLA details are already in CONTRIBUTING.md — no need to duplicate in README.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 README.md | 2 --
 1 file changed, 2 deletions(-)

diff --git a/README.md b/README.md
index b81cf8ba..a14ed18b 100644
--- a/README.md
+++ b/README.md
@@ -115,8 +115,6 @@ We welcome contributions from the community. See [CONTRIBUTING.md](CONTRIBUTING.
 
 Issues are tracked on our [project board](https://github.com/orgs/mlcommons/projects/57). Look for [`good first issue`](https://github.com/mlcommons/endpoints/labels/good%20first%20issue) or [`help wanted`](https://github.com/mlcommons/endpoints/labels/help%20wanted) to get started.
 
-All contributors must sign the [MLCommons CLA](https://mlcommons.org/membership/membership-overview/).
-
 ## Acknowledgements
 
 This project draws inspiration from:

From 22c646ec70903f01b300b995334b99b3d4fb689b Mon Sep 17 00:00:00 2001
From: Zhihan Jiang <zhihanj@nvidia.com>
Date: Wed, 8 Apr 2026 10:15:43 -0700
Subject: [PATCH 11/14] docs: strengthen pre-commit requirement in AGENTS.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Make it explicit that pre-commit must run before every commit, no
exceptions. Hooks may modify files — stage changes and commit once.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 AGENTS.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/AGENTS.md b/AGENTS.md
index eb0349ca..6fec5395 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -21,7 +21,7 @@ pytest -m integration                         # Integration tests only
 pytest --cov=src --cov-report=html            # With coverage
 pytest -xvs tests/unit/path/to/test_file.py  # Single test file
 
-# Code quality (run before commits)
+# Code quality — MUST run before every commit, no exceptions
 pre-commit run --all-files
 
 # Local testing with echo server
@@ -215,7 +215,7 @@ All of these run automatically on commit:
 - License header enforcement
 - `regenerate-templates`: auto-regenerates YAML config templates from schema defaults when `schema.py`, `config.py`, or `regenerate_templates.py` change
 
-**Always run `pre-commit run --all-files` before committing.**
+**IMPORTANT: Always run `pre-commit run --all-files` before every commit.** Hooks may modify files (prettier, ruff-format, license headers). If files are modified, stage the changes and commit once. Never commit without running pre-commit first.
 
 See [Development Guide](docs/DEVELOPMENT.md) for full setup and workflow details.
 

From d75f3c60c1cac390b9e2001d9128570f5d8665a8 Mon Sep 17 00:00:00 2001
From: Zhihan Jiang <zhihanj@nvidia.com>
Date: Mon, 13 Apr 2026 15:34:46 -0700
Subject: [PATCH 12/14] fix: remove Discussions references (feature not
 enabled)

Remove Discussions link from issue template config.yml and
CONTRIBUTING.md since GitHub Discussions is not enabled on this repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 .github/ISSUE_TEMPLATE/config.yml | 4 ----
 CONTRIBUTING.md                   | 3 +--
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
index 4ac37a65..0086358d 100644
--- a/.github/ISSUE_TEMPLATE/config.yml
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -1,5 +1 @@
 blank_issues_enabled: true
-contact_links:
-  - name: Questions & Discussion
-    url: https://github.com/mlcommons/endpoints/discussions
-    about: Ask questions and discuss ideas before filing an issue
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 0cb0c164..db06a18c 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -210,5 +210,4 @@ during the PR process.
 
 ## Questions?
 
-Open a [Discussion](https://github.com/mlcommons/endpoints/discussions) or
-file an issue. We aim to respond within a few business days.
+File an [issue](https://github.com/mlcommons/endpoints/issues). We aim to respond within a few business days.

From 1975e9ae4b5f34cf402d2a85234343a112815895 Mon Sep 17 00:00:00 2001
From: Zhihan Jiang <zhihanj@nvidia.com>
Date: Mon, 13 Apr 2026 15:42:04 -0700
Subject: [PATCH 13/14] =?UTF-8?q?docs:=20overhaul=20DEVELOPMENT.md=20?=
 =?UTF-8?q?=E2=80=94=20fix=20stale=20URLs,=20add=20fork=20workflow,=20alig?=
 =?UTF-8?q?n=20with=20AGENTS.md?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Fix repo URL: mlperf/inference-endpoint → mlcommons/endpoints
- Add proper fork workflow (fork → clone → add upstream → branch → PR)
- Update project structure to match current codebase (add evaluation,
  sglang, plugins, async_utils; fix entry point main.py not cli.py)
- Remove emoji headers for consistency
- Fix test example: add required markers, correct asyncio usage
- Remove "skip hooks" advice (contradicts project policy)
- Remove verbose docstring example (contradicts minimal-comments policy)
- Remove Discussions references (feature not enabled)
- Add YAML config templates section
- Add performance considerations aligned with AGENTS.md
- Add key test fixtures section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docs/DEVELOPMENT.md | 378 ++++++++++++++------------------------------
 1 file changed, 122 insertions(+), 256 deletions(-)

diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md
index 4c95246f..e4e2d3de 100644
--- a/docs/DEVELOPMENT.md
+++ b/docs/DEVELOPMENT.md
@@ -1,282 +1,206 @@
 # Development Guide
 
-This guide provides everything you need to contribute to the MLPerf Inference Endpoint Benchmarking System.
+This guide covers the development setup and workflow for the MLPerf Inference Endpoint Benchmarking System. For contribution guidelines, see [CONTRIBUTING.md](../CONTRIBUTING.md).
 
-## 🚀 Getting Started
+## Getting Started
 
 ### Prerequisites
 
-- **Python**: 3.12+ (Python 3.12 is recommended for optimal performance)
+- **Python**: 3.12+ (3.12 recommended)
 - **Git**: Latest version
-- **Virtual Environment**: Python venv or conda
-- **IDE**: VS Code, PyCharm, or your preferred editor
+- **OS**: Linux or macOS (Windows is not supported)
 
 ### Development Environment Setup
 
 ```bash
-# 1. Clone the repository
-git clone https://github.com/mlperf/inference-endpoint.git
-cd inference-endpoint
+# 1. Fork https://github.com/mlcommons/endpoints on GitHub, then clone your fork
+git clone https://github.com/YOUR_USERNAME/endpoints.git
+cd endpoints
 
-# 2. Create virtual environment (Python 3.12+ required)
+# 2. Add the upstream repo as a remote
+git remote add upstream https://github.com/mlcommons/endpoints.git
+
+# 3. Create virtual environment (Python 3.12+ required)
 python3.12 -m venv venv
-source venv/bin/activate  # On Windows: venv\Scripts\activate
+source venv/bin/activate
 
-# 3. Install development dependencies
+# 4. Install development dependencies
 pip install -e ".[dev,test]"
 
-# 4. Install pre-commit hooks
+# 5. Install pre-commit hooks
 pre-commit install
 
-# 5. Verify installation
+# 6. Verify installation
 inference-endpoint --version
 pytest --version
 ```
 
-## 🏗️ Project Structure
+## Project Structure
 
 ```
-inference-endpoint/
+endpoints/
 ├── src/inference_endpoint/     # Main package source
-│   ├── cli.py                  # Command-line interface
+│   ├── main.py                 # Entry point and CLI app
+│   ├── exceptions.py           # Project-wide exception types
+│   ├── async_utils/            # Event loop, ZMQ transport, pub/sub
 │   ├── commands/               # CLI command implementations
 │   ├── config/                 # Configuration and schema management
 │   ├── core/                   # Core types and orchestration
 │   ├── dataset_manager/        # Dataset handling and loading
 │   ├── endpoint_client/        # HTTP/ZMQ endpoint communication
+│   ├── evaluation/             # Accuracy evaluation and scoring
 │   ├── load_generator/         # Load generation and scheduling
 │   ├── metrics/                # Performance measurement and reporting
 │   ├── openai/                 # OpenAI API compatibility
+│   ├── plugins/                # Plugin system
 │   ├── profiling/              # Performance profiling tools
+│   ├── sglang/                 # SGLang API adapter
 │   ├── testing/                # Test utilities (echo server, etc.)
 │   └── utils/                  # Common utilities
 ├── tests/                      # Test suite
 │   ├── unit/                   # Unit tests
 │   ├── integration/            # Integration tests
-│   ├── performance/            # Performance tests
-│   └── datasets/               # Test datasets
+│   ├── performance/            # Performance benchmarks
+│   └── datasets/               # Test data (dummy_1k.jsonl, squad_pruned/)
 ├── docs/                       # Documentation
 ├── examples/                   # Usage examples
 └── scripts/                    # Utility scripts
 ```
 
-## 🧪 Testing
+## Testing
 
 ### Running Tests
 
 ```bash
-# Run all tests
+# All tests (excludes slow/performance)
 pytest
 
-# Run with coverage
-pytest --cov=src --cov-report=html
-
-# Run specific test categories
-pytest -m unit          # Unit tests only
-pytest -m integration   # Integration tests only
-pytest -m performance   # Performance tests only (no timeout)
+# Unit tests only
+pytest -m unit
 
-# Run tests in parallel
-pytest -n auto
+# Integration tests
+pytest -m integration
 
-# Run tests with verbose output
-pytest -v
+# Single file with verbose output
+pytest -xvs tests/unit/path/to/test_file.py
 
-# Run specific test file
-pytest tests/unit/test_core_types.py
-
-# Run with output to file (recommended)
-pytest -v 2>&1 | tee test_results.log
+# With coverage
+pytest --cov=src --cov-report=html
 ```
 
-### Test Structure
+### Test Markers
 
-- **Unit Tests** (`tests/unit/`): Test individual components in isolation
-- **Integration Tests** (`tests/integration/`): Test component interactions with real servers
-- **Performance Tests** (`tests/performance/`): Test performance characteristics (marked with @pytest.mark.performance, no timeout)
-- **Test Datasets** (`tests/datasets/`): Sample datasets for testing (dummy_1k.jsonl, squad_pruned/)
-
-### Writing Tests
+Every test function **must** have a marker:
 
 ```python
 import pytest
-from inference_endpoint.core.types import Query
-
-class TestQuery:
-    def test_query_creation(self):
-        """Test creating a basic query."""
-        query = Query(prompt="Test", model="test-model")
-        assert query.prompt == "Test"
-        assert query.model == "test-model"
-
-    @pytest.mark.asyncio
-    async def test_async_operation(self):
-        """Test async operations."""
-        # Your async test here
-        pass
-```
 
-## 📝 Code Quality
+@pytest.mark.unit
+def test_something():
+    ...
 
-### Pre-commit Hooks
+@pytest.mark.unit
+@pytest.mark.asyncio  # strict mode is configured globally in pyproject.toml
+async def test_async_something():
+    ...
+```
 
-The project uses pre-commit hooks to ensure code quality:
+Available markers: `unit`, `integration`, `slow`, `performance`, `run_explicitly`
 
-```bash
-# Install hooks (done during setup)
-pre-commit install
+### Key Fixtures
 
-# Run all hooks on staged files
-pre-commit run
+Defined in `tests/conftest.py` — use these instead of mocking:
 
-# Run all hooks on all files
-pre-commit run --all-files
+- `mock_http_echo_server` — real HTTP echo server on dynamic port
+- `mock_http_oracle_server` — dataset-driven response server
+- `dummy_dataset` — in-memory test dataset
+- `events_db` — pre-populated SQLite events database
 
-# Skip hooks (use sparingly)
-git commit --no-verify
-```
+### Coverage
 
-### Code Formatting
+Target **>90% coverage** for all new code.
 
-```bash
-# Format code with ruff
-ruff format src/ tests/
+## Code Quality
 
-# Check formatting without changing files
-ruff format --check src/ tests/
-```
+### Pre-commit Hooks
 
-### Linting
+All of these run automatically on commit:
 
-```bash
-# Run ruff linter
-ruff check src/ tests/
+- trailing-whitespace, end-of-file-fixer, check-yaml, check-merge-conflict, debug-statements
+- `ruff` (lint + autofix) and `ruff-format`
+- `mypy` type checking
+- `prettier` for YAML/JSON/Markdown
+- License header enforcement
+- YAML template validation and regeneration
 
-# Run mypy for type checking
-mypy src/
+**IMPORTANT: Always run `pre-commit run --all-files` before every commit.** Hooks may modify files. If files are modified, stage the changes and commit once.
 
-# Run all quality checks
+```bash
+# Run all hooks
 pre-commit run --all-files
+
+# Install hooks (done during setup)
+pre-commit install
 ```
 
-## 🔧 Development Workflow
+### Code Style
+
+- **Formatter/Linter**: `ruff` (line-length 88, target Python 3.12)
+- **Type checking**: `mypy`
+- **Formatting**: `ruff-format` (double quotes, space indent)
+- **License headers**: Required on all Python files (auto-added by pre-commit)
+- **Commit messages**: [Conventional commits](https://www.conventionalcommits.org/) — `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `perf:`
+- **Comments**: Only where the _why_ isn't obvious from the code
 
-### 1. Feature Development
+## Development Workflow
+
+### Feature Development
 
 ```bash
-# Create feature branch
-git checkout -b feature/your-feature-name
+# Sync your fork with upstream before starting
+git fetch upstream
+git checkout main
+git merge upstream/main
+
+# Create a feature branch on your fork
+git checkout -b feat/your-feature-name
 
 # Make changes and test
 pytest
 pre-commit run --all-files
 
 # Commit changes
-git add .
+git add <specific files>
 git commit -m "feat: add your feature description"
 
-# Push and create PR
-git push origin feature/your-feature-name
+# Push to your fork and open a PR against mlcommons/endpoints
+git push origin feat/your-feature-name
 ```
 
-### 2. Component Development
-
-When developing a new component:
-
-1. **Create the component directory** in `src/inference_endpoint/`
-2. **Add `__init__.py`** with component description
-3. **Implement the component** following the established patterns
-4. **Add tests** in the corresponding `tests/unit/` directory
-5. **Update main package** `__init__.py` if needed
-6. **Add dependencies** to `pyproject.toml` under `[project.dependencies]` or `[project.optional-dependencies]`
+### Branch Naming
 
-### 3. Testing Strategy
-
-- **Unit Tests**: >90% coverage required
-- **Integration Tests**: Test component interactions
-- **Performance Tests**: Ensure no performance regressions
-- **Documentation**: Update docs for new features
-
-## 📚 Documentation
-
-### Writing Documentation
-
-- **Code Comments**: Use docstrings for all public APIs
-- **README Updates**: Update README.md for user-facing changes
-- **API Documentation**: Document new interfaces and changes
-- **Examples**: Provide usage examples for new features
-
-### Documentation Standards
-
-```python
-def process_query(query: Query) -> QueryResult:
-    """
-    Process a query and return the result.
-
-    Args:
-        query: The query to process
-
-    Returns:
-        QueryResult containing the processed response
-
-    Raises:
-        QueryError: If the query cannot be processed
-
-    Example:
-        >>> query = Query(prompt="Hello")
-        >>> result = process_query(query)
-        >>> print(result.content)
-        'Hello there!'
-    """
-    # Implementation here
-    pass
+```
+feat/short-description
+fix/short-description
+docs/short-description
 ```
 
-## 🚀 Performance Considerations
-
-### Development Guidelines
-
-- **Async First**: Use async/await for I/O operations
-- **Memory Efficiency**: Minimize object creation in hot paths
-- **Profiling**: Use pytest-benchmark for performance testing
-- **Monitoring**: Add performance metrics for critical operations
+## YAML Config Templates
 
-### Performance Testing
+Config templates in `src/inference_endpoint/config/templates/` are auto-generated from schema defaults. When you change `config/schema.py`, regenerate them:
 
 ```bash
-# Run performance tests
-pytest -m performance
-
-# Run benchmarks
-pytest --benchmark-only
-
-# Compare with previous runs
-pytest --benchmark-compare
+python scripts/regenerate_templates.py
 ```
 
-## 🔍 Debugging
+The pre-commit hook auto-regenerates templates when `schema.py`, `config.py`, or `regenerate_templates.py` change. CI validates templates are up to date via `--check` mode.
 
-### Common Issues
+Two variants are generated per mode (offline, online, concurrency):
 
-1. **Import Errors**: Ensure `src/` is in Python path
-2. **Test Failures**: Check test data and mock objects
-3. **Performance Issues**: Use profiling tools to identify bottlenecks
-4. **Async Issues**: Ensure proper event loop handling
+- `_template.yaml` — minimal: only required fields + placeholders
+- `_template_full.yaml` — all fields with schema defaults + inline `# options:` comments
 
-### Debug Tools
-
-```bash
-# Run with debug logging
-inference-endpoint --verbose
-
-# Run tests with debug output
-pytest -s -v
-
-# Use Python debugger
-python -m pdb -m pytest test_file.py
-```
-
-## 📦 Package Management
+## Package Management
 
 ### Adding Dependencies
 
@@ -285,95 +209,37 @@ Add dependencies to `pyproject.toml` (always pin to exact versions with `==`):
 - **Runtime dependencies**: `[project.dependencies]`
 - **Optional groups** (dev, test, etc.): `[project.optional-dependencies]`
 
-Install after updating:
+After adding a dependency, run `pip-audit` (included in `dev` extras) to verify it has no known vulnerabilities.
 
 ```bash
 pip install -e ".[dev,test]"
 ```
 
-## 🚨 Troubleshooting
-
-### Common Problems
-
-**Pre-commit hooks failing:**
-
-```bash
-# Update pre-commit
-pre-commit autoupdate
-
-# Skip hooks temporarily
-git commit --no-verify
-```
-
-**Tests failing:**
+## Performance Considerations
 
-```bash
-# Clear Python cache
-find . -type d -name "__pycache__" -delete
-find . -type f -name "*.pyc" -delete
+Code in `load_generator/`, `endpoint_client/worker.py`, and `async_utils/transport/` is latency-critical. In these paths:
 
-# Reinstall package
-pip install -e .
-```
+- No `match` statements — use dict dispatch
+- Use `dataclass(slots=True)` or `msgspec.Struct` for frequently instantiated classes
+- Minimize async suspends
+- Use `msgspec` over `json`/`pydantic` for serialization
+- The HTTP client uses custom `ConnectionPool` with `httptools` parser — not `aiohttp`/`requests`
 
-**Import errors:**
+## Debugging
 
 ```bash
-# Check Python path
-python -c "import sys; print(sys.path)"
-
-# Ensure src is in path
-export PYTHONPATH="${PYTHONPATH}:$(pwd)/src"
-```
-
-## 🤝 Contributing Guidelines
-
-### Pull Request Process
-
-1. **Fork the repository** and create a feature branch
-2. **Make your changes** following the coding standards
-3. **Add tests** for new functionality
-4. **Update documentation** as needed
-5. **Run all checks** locally before submitting
-6. **Create a PR** with clear description and tests
-7. **Address review comments** promptly
-
-### Commit Message Format
+# Run with verbose logging
+inference-endpoint -v benchmark offline ...
 
-Use conventional commit format:
+# Run tests with stdout visible
+pytest -xvs tests/unit/path/to/test.py
 
+# Use Python debugger
+python -m pdb -m pytest tests/unit/path/to/test.py
 ```
-type(scope): description
-
-feat(core): add query lifecycle management
-fix(api): resolve endpoint connection issue
-docs(readme): update installation instructions
-test(loadgen): add performance benchmarks
-```
-
-### Code Review Checklist
-
-- [ ] Code follows style guidelines
-- [ ] Tests pass and coverage is adequate
-- [ ] Documentation is updated
-- [ ] Performance impact is considered
-- [ ] Security implications are reviewed
-- [ ] Error handling is appropriate
-
-## 📞 Getting Help
-
-- **Issues**: [GitHub Issues](https://github.com/mlperf/inference-endpoint/issues)
-- **Discussions**: [GitHub Discussions](https://github.com/mlperf/inference-endpoint/discussions)
-- **Documentation**: Check this guide and project docs
-- **Team**: Reach out to the development team
-
-## 🎯 Next Steps
 
-1. **Set up your environment** using this guide
-2. **Explore the codebase** to understand the architecture
-3. **Pick a component** to work on from the project board
-4. **Start with tests** to understand the expected behavior
-5. **Implement incrementally** with regular testing
-6. **Ask questions** when you need help
+## Getting Help
 
-Happy coding! 🚀
+- **Issues**: [GitHub Issues](https://github.com/mlcommons/endpoints/issues)
+- **Project Board**: [Q2 Board](https://github.com/orgs/mlcommons/projects/57)
+- **Documentation**: See [docs/](.) directory for guides

From 149cce0096cf2b151686fc4ab28cffce83f8bd88 Mon Sep 17 00:00:00 2001
From: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Date: Mon, 13 Apr 2026 18:01:41 -0500
Subject: [PATCH 14/14] Update dependencies

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
---
 pyproject.toml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/pyproject.toml b/pyproject.toml
index 19fa129d..67dfc865 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -47,7 +47,7 @@ dependencies = [
     "transformers==5.4.0",
     "numpy==2.4.4",
     "datasets==4.8.4",
-    "Pillow==12.1.1",
+    "Pillow==12.2.0",
     "sentencepiece==0.2.1",
     "protobuf==7.34.1",
     "openai_harmony==0.0.8",
@@ -82,7 +82,7 @@ test = [
     # Includes optional dependencies for full test coverage
     "inference-endpoint[sql]",
     # Testing framework
-    "pytest==9.0.2",
+    "pytest==9.0.3",
     "pytest-asyncio==1.3.0",
     "pytest-cov==7.1.0",
     "pytest-benchmark==5.2.3",