diff --git a/.cursor/rules/endpoint-rules.mdc b/.cursor/rules/endpoint-rules.mdc
deleted file mode 100644
index aff2d460..00000000
--- a/.cursor/rules/endpoint-rules.mdc
+++ /dev/null
@@ -1,118 +0,0 @@
----
-description:
-globs:
-alwaysApply: true
----
-# Cursor Rules for Python Project Development
-
-## Core Development Principles
-
-### 1. Planning-First Development
-- **Strict Separation**: Implementation MUST NOT begin until planning for the current step is complete
-- All architectural decisions, component interfaces, and implementation approaches must be documented before coding
-- Each development cycle follows: Plan ? Review Plan ? Implement ? Update Documentation
-
-### 2. Testing Requirements
-- **Mandatory Unit Tests**: Every new component that requires testing MUST have corresponding unit tests
-- **Pre-commit Validation**: All unit tests and pre-commit checks MUST pass before pushing to main repository
-- **No Exceptions**: Failed tests or checks block all commits until resolved
-
-### 3. Scratchpad Documentation System
-All planning and tracking must be maintained in `.cursor_artifacts/` directory.
-
-#### Required Files:
-- `.cursor_artifacts/hierarchy.md` - Project folder structure, module organization, and architectural overview
-- `.cursor_artifacts/progress.md` - Current status, completed tasks, next steps, and milestone tracking
-- `.cursor_artifacts/learning.md` - Technical insights, lessons learned, design decisions, and gotchas
-- `.cursor_artifacts/design.md` - System design, component interfaces, data models, and API specifications
-- `.cursor_artifacts/testing-strategy.md` - Test plans, coverage requirements, and testing approaches
-- `.cursor_artifacts/deployment.md` - Deployment procedures, environment configs, and release notes
-- `.cursor_artifacts/refactoring-log.md` - Planned and completed refactoring activities with justifications, keep empty if there's no major refactoring
-
-#### File Management:
-- **Size Limit**: Each scratchpad file MUST NOT exceed 1000 lines
-- **Regular Maintenance**: Split large files into focused sub-documents when approaching limit
-- **Consistent Updates**: Update relevant scratchpad files after each implementation phase
-
-### 4. Commit and Review Standards
-- **Post-Implementation Updates**: Always update `.cursor_artifacts/` scratchpad files after each implementation
-- **Small, Focused Changes**: Keep commits and reviews reasonably sized for effective review
-- **Clear Commit Messages**: Use conventional commit format with clear descriptions
-- **Documentation Sync**: Ensure documentation reflects current implementation state
-
-### 5. Python Best Practices
-- Follow PEP 8 style guidelines and modern Python idioms
-- Use type hints for all function signatures and complex variables
-- Implement proper error handling with specific exception types
-- Apply SOLID principles and clean code practices
-- Use dataclasses, context managers, and pathlib where appropriate
-- Follow async/await patterns for asynchronous code
-- Implement proper logging instead of print statements
-
-### 6. Change Control and Approval
-#### Automatic Approval (Small Changes):
-- Bug fixes within existing functionality
-- Adding unit tests
-- Documentation updates
-- Minor refactoring within single functions/methods
-- Code formatting and style improvements
-
-#### User Approval Required (Significant Changes):
-- **Major Refactoring**: Restructuring classes, modules, or architectural changes
-- **API Changes**: Modifying public interfaces or breaking changes
-- **Large Deletions**: Removing significant portions of existing code, documentation, or scratchpad content
-- **New Dependencies**: Adding external libraries or changing build requirements
-- **Database Schema Changes**: Migrations or structural data changes
-
-#### Approval Process:
-1. Document proposed changes in appropriate `.cursor_artifacts/` file
-2. Clearly outline impact, benefits, and risks
-3. Request explicit user approval before implementation
-4. Provide rollback plan for significant changes
-
-### 7. Comprehensive Testing Strategy
-- **Test Coverage**: Aim for >90% code coverage for business logic
-- **Test Types**: Unit tests, integration tests, and end-to-end tests as appropriate
-- **Edge Cases**: Test boundary conditions, error scenarios, and edge cases
-- **Test Documentation**: Clear test descriptions explaining what is being tested and why
-- **Mock Strategy**: Use appropriate mocking for external dependencies
-- **Performance Tests**: Include performance benchmarks for critical paths
-- **Test Data**: Use factories or fixtures for consistent test data setup
-
-### 8. Additional Development Standards
-
-#### Code Quality:
-- Use static analysis tools (pylint, mypy, black, isort)
-- Implement pre-commit hooks for automated quality checks
-- Regular code reviews focusing on maintainability and performance
-- Document complex algorithms and business logic
-
-#### Version Control:
-- Use feature branches for all development work
-- Squash commits when merging to maintain clean history
-- Tag releases with semantic versioning
-- Maintain changelog with user-facing changes
-
-#### Security and Performance:
-- Validate all user inputs and sanitize outputs
-- Use secure coding practices (no hardcoded secrets, proper authentication)
-- Profile performance-critical code sections
-- Monitor and log security-relevant events
-
-#### Dependencies and Environment:
-- Pin dependency versions in requirements files
-- Use virtual environments for all development work
-- Document environment setup and deployment procedures
-- Regular dependency updates with testing
-
-## Enforcement
-These rules are mandatory for all development work. Violations should be caught in pre-commit hooks, code review, or CI/CD pipeline. Any rule exceptions require explicit documentation and user approval.
-
-## Other user-defined rules
-- Always double-check the validity of the output, never hallucinate and lie about things that you don't know about.
-- Avoid refactoring the whole projects, and always ask for permission before doing a major refactor.
-- Look for clues and never be lazy about validating the facts.
-- Be diligent in checking if a component has already been implemented and can be reused. Avoid re-implementing wheels for parts that have already been built in the project. Double think if the reused components fit in the logic or not. If necessary, always use a single source of truth in the code repo (e.g. VERSION) instead of randomly hardcoding it everywhere in the code
-- If the logic is incomplete in the code, add comment about it. Don't just assume the user will dig and find it out.
-- Follow the best practice of whatever language you are writing in. For example in Python, don't put a lazy import unless carefully thought about.
-- When running pytest, make sure you pipe the output either to commandline or some file, so you don't need to run it repetitively to grep a failed test.
diff --git a/.cursor/rules/msgspec-patterns.mdc b/.cursor/rules/msgspec-patterns.mdc
deleted file mode 100644
index fa637ea9..00000000
--- a/.cursor/rules/msgspec-patterns.mdc
+++ /dev/null
@@ -1,534 +0,0 @@
----
-description: python performance critical code ; python msgspec usage guide
-alwaysApply: false
----
-## 2. Use Structs for Structured Data
-
-**Rule:** Always prefer `msgspec.Struct` over `dict`, `dataclasses`, or `attrs` for structured data with a known schema.
-
-**Why:** Structs are 5-60x faster for common operations and are optimized for encoding/decoding.
-
-```python
-# BAD: Using dict or dataclass
-from dataclasses import dataclass
-
-@dataclass
-class UserBad:
-    name:  str
-    email: str
-    age: int
-
-# GOOD: Using msgspec. Struct
-import msgspec
-
-class User(msgspec. Struct):
-    name: str
-    email: str
-    age: int
-
-# Usage
-user = User(name="alice", email="alice@example.com", age=30)
-data = msgspec.json.encode(user)
-decoded = msgspec.json.decode(data, type=User)
-```
-
----
-
-## 3. Omit Default Values
-
-**Rule:** Set `omit_defaults=True` on Struct definitions when default values are known on both encoding and decoding ends.
-
-**Why:** Reduces encoded message size and improves both encoding and decoding performance.
-
-```python
-# BAD:  Encoding all fields including defaults
-class ConfigBad(msgspec.Struct):
-    host: str = "localhost"
-    port: int = 8080
-    debug: bool = False
-    timeout: int = 30
-
-# GOOD:  Omit default values
-class Config(msgspec. Struct, omit_defaults=True):
-    host: str = "localhost"
-    port: int = 8080
-    debug: bool = False
-    timeout: int = 30
-
-# Only non-default values are encoded
-config = Config(host="production.example.com")
-data = msgspec.json.encode(config)
-# Result: b'{"host":"production.example.com"}' instead of full object
-```
-
----
-
-## 4. Avoid Decoding Unused Fields
-
-**Rule:** Define smaller "view" Struct types that only contain the fields you actually need.
-
-**Why:** msgspec skips decoding fields not defined in your Struct, reducing allocations and CPU time.
-
-```python
-# BAD:  Decoding entire large object when you only need a few fields
-class FullTweet(msgspec. Struct):
-    id: int
-    id_str: str
-    full_text: str
-    user: dict
-    entities: dict
-    extended_entities: dict
-    retweet_count:  int
-    favorite_count: int
-    # ...  many more fields
-
-# GOOD: Define minimal structs for your use case
-class User(msgspec. Struct):
-    name: str
-
-class TweetView(msgspec.Struct):
-    user: User
-    full_text: str
-    favorite_count: int
-
-# Only these 3 fields are decoded, rest is skipped
-tweet = msgspec.json.decode(large_json_response, type=TweetView)
-print(tweet.user. name)  # Access only what you need
-```
-
----
-
-## 5. Use encode_into for Buffer Reuse
-
-**Rule:** Compare and try-use `Encoder.encode_into()` with a pre-allocated `bytearray` in hot loops instead of `encode()`.
-
-**Why:** Avoids allocating a new `bytes` object for each encode operation.
-
-```python
-# BAD:  New bytes object allocated for each message
-def send_messages_bad(socket, msgs):
-    encoder = msgspec.msgpack.Encoder()
-    for msg in msgs:
-        data = encoder.encode(msg)  # New bytes object each time
-        socket. sendall(data)
-
-# POSSIBLY-GOOD ALWAYS MEASURE:  Reuse a buffer
-def send_messages_good(socket, msgs):
-    encoder = msgspec.msgpack.Encoder()
-    buffer = bytearray(1024)  # Pre-allocate once
-
-    for msg in msgs:
-        n = encoder.encode_into(msg, buffer)  # Reuse buffer
-        socket.sendall(memoryview(buffer)[:n])  # Send only encoded bytes
-```
-
----
-
-## 6. Line-Delimited JSON (NDJSON)
-
-**Rule:** Compare and try use `encode_into()` with `buffer.extend()` for line-delimited JSON to avoid copies.
-
-**Why:** Avoids unnecessary copying when appending newlines to JSON messages.
-
-```python
-# BAD: Unnecessary copy with string concatenation
-def write_ndjson_bad(file, messages):
-    for msg in messages:
-        json_msg = msgspec. json.encode(msg)
-        full_payload = json_msg + b'\n'  # Creates a copy
-        file. write(full_payload)
-
-# POSSIBLY-GOOD ALWAYS MEASURE: Zero-copy with encode_into
-def write_ndjson_good(file, messages):
-    encoder = msgspec.json.Encoder()
-    buffer = bytearray(64)  # Pre-allocate with reasonable size
-
-    for msg in messages:
-        n = encoder.encode_into(msg, buffer)
-        file.write(memoryview(buffer)[:n])  # Write only encoded bytes
-        file.write(b"\n")
-```
-
----
-
-## 7. Length-Prefix Framing
-
-**Rule:** Use `encode_into()` with an offset for length-prefix framing.
-
-**Why:** Efficiently prepends message length without extra copies.
-
-```python
-import msgspec
-
-def send_length_prefixed(socket, msg):
-    encoder = msgspec.msgpack.Encoder()
-    buffer = bytearray(64)
-
-    # Encode into buffer, leaving 4 bytes at front for length prefix
-    n = encoder.encode_into(msg, buffer, 4)
-
-    # Write message length as 4-byte big-endian integer at the start
-    buffer[:4] = n.to_bytes(4, "big")
-
-    socket.sendall(memoryview(buffer)[:4 + n])
-
-async def prefixed_send(stream, buffer:  bytes) -> None:
-    """Write a length-prefixed buffer to an async stream"""
-    prefix = len(buffer).to_bytes(4, "big")
-    stream.write(prefix)
-    stream.write(buffer)
-    await stream.drain()
-
-async def prefixed_recv(stream) -> bytes:
-    """Read a length-prefixed buffer from an async stream"""
-    prefix = await stream.readexactly(4)
-    n = int.from_bytes(prefix, "big")
-    return await stream.readexactly(n)
-```
-
----
-
-## 8. Use MessagePack Instead of JSON
-
-**Rule:** Consider using `msgspec.msgpack` instead of `msgspec.json` for internal APIs.
-
-**Why:** MessagePack is a more compact binary format and can be more performant than JSON.
-
-```python
-import msgspec
-
-class Event(msgspec. Struct):
-    type: str
-    data: dict
-    timestamp: float
-
-# Use MessagePack for internal service communication
-encoder = msgspec.msgpack.Encoder()
-decoder = msgspec.msgpack. Decoder(Event)
-
-event = Event(type="user_login", data={"user_id": 123}, timestamp=1703424000.0)
-packed = encoder.encode(event)  # More compact than JSON
-decoded = decoder.decode(packed)
-```
-
----
-
-## 9. Use gc=False for Long-Lived Objects
-
-**Rule:** Set `gc=False` on Struct types that will never participate in reference cycles and are long-lived.
-
-**Why:** Reduces garbage collector overhead and pause times by up to 75x.
-
-### What is gc=False?
-
-The `gc=False` option tells Python's garbage collector to never track instances of that Struct type.
-By default, Python's cyclic garbage collector tracks objects that could potentially participate in reference cycles.
-When you set `gc=False`, you're telling msgspec:  "I guarantee these objects will never be part of a reference cycle, so don't bother tracking them."
-
-### Performance Impact
-
-Key takeaways:
-- `gc=False` reduces GC pause time by 75x compared to standard classes
-- `gc=False` saves 16 bytes per instance (no GC header needed)
-- Regular msgspec structs are already 6x faster for GC than standard classes
-
-### When to Use gc=False
-
-Use `gc=False` when:
-- You're allocating a large number of Struct objects at once (e.g., decoding a large JSON response with thousands of items)
-- You have long-lived Struct objects in memory (e.g., a large cache of data objects)
-- Your Struct only contains scalar/primitive values (ints, floats, strings, bools, bytes)
-- You are 100% certain the Struct will NEVER participate in a reference cycle
-
-DO NOT use `gc=False` when:
-- Your Struct contains references to itself or other Structs (potential cycles)
-- Your Struct is part of a parent-child relationship where parent references child and child references parent
-- You're unsure whether cycles could occur
-
-ALWAYS MEASURE performance impact.
-
-### Decision Tree:  Should I Use gc=False?
-
-```
-Should I use gc=False?
-|
-+-- Does your Struct only contain scalar types (int, float, str, bool, bytes)?
-|   +-- YES --> SAFE to use gc=False
-|
-+-- Does your Struct contain lists/dicts but YOU control what goes in them?
-|   +-- Will you EVER put the struct itself (or a parent) into those containers?
-|       +-- NO --> Probably safe, but test carefully
-|       +-- YES/MAYBE --> Do NOT use gc=False
-|
-+-- Does your Struct have a reference to another Struct of the same type?
-|   +-- YES --> Do NOT use gc=False (e.g., tree nodes, linked lists)
-|
-+-- Is your Struct part of a parent-child bidirectional relationship?
-|   +-- YES --> Do NOT use gc=False
-|
-+-- When in doubt --> Do NOT use gc=False
-```
-
-### Examples
-
-```python
-# SAFE: Simple data objects with only scalar values
-class Point(msgspec. Struct, gc=False):
-    x: float
-    y: float
-    z: float
-
-class LogEntry(msgspec. Struct, gc=False):
-    timestamp: float
-    level: str
-    message: str
-    source: str
-
-class CacheEntry(msgspec.Struct, gc=False):
-    key: str
-    value: str
-    ttl: int
-    created_at: float
-
-# SAFE:  Structs containing only tuples of scalars
-class Package(msgspec. Struct, gc=False):
-    name: str
-    version:  str
-    depends: tuple[str, ...]  # immutable tuple of strings
-    size: int
-
-# UNSAFE: Self-referential structures - DO NOT use gc=False
-class TreeNode(msgspec. Struct):  # NO gc=False here!
-    value: int
-    children: list["TreeNode"]
-    parent: "TreeNode | None" = None
-```
-
-### Real-World Example:  Decoding Large JSON
-
-```python
-import msgspec
-from typing import Union
-
-# When decoding large JSON files (like package repositories),
-# gc=False significantly improves performance
-class Package(msgspec. Struct, gc=False):
-    build:  str
-    build_number: int
-    depends: tuple[str, ...]  # Use tuple, not list - immutable
-    md5: str
-    name: str
-    sha256: str
-    subdir: str
-    version: str
-    license: str = ""
-    noarch: Union[str, bool, None] = None
-    size: int = 0
-    timestamp: int = 0
-
-class RepoData(msgspec. Struct, gc=False):
-    repodata_version: int
-    info: dict
-    packages: dict[str, Package]
-    removed:  tuple[str, ...]  # Use tuple, not list
-
-# Create a typed decoder for maximum performance
-decoder = msgspec.json.Decoder(RepoData)
-
-def load_repo_data(path: str) -> RepoData:
-    with open(path, "rb") as f:
-        return decoder.decode(f.read())
-```
-
-## 10. Use array_like=True for Maximum Performance
-
-**Rule:** Set `array_like=True` when both ends know the field schema and you need maximum performance.
-
-**Why:** Encodes structs as arrays instead of objects, removing field names from the message.
-
-```python
-# Standard encoding includes field names
-class PointStandard(msgspec. Struct):
-    x: float
-    y: float
-    z: float
-
-# Encodes as:  b'{"x": 1.0,"y":2.0,"z":3.0}'
-
-# Array-like encoding removes field names
-class Point(msgspec. Struct, array_like=True):
-    x: float
-    y: float
-    z: float
-
-point = Point(1.0, 2.0, 3.0)
-data = msgspec.json.encode(point)
-# Result: b'[1.0,2.0,3.0]' - smaller and faster
-
-decoded = msgspec.json.decode(data, type=Point)
-# Works correctly: Point(x=1.0, y=2.0, z=3.0)
-```
-
----
-
-## 11. Tagged Unions for Polymorphic Types
-
-**Rule:** Use `tag=True` on Struct types when handling multiple message types in a single union.
-
-**Why:** Enables efficient discrimination between types during decoding.
-
-```python
-import msgspec
-
-# Define request types with tagging
-class GetRequest(msgspec. Struct, tag=True):
-    key: str
-
-class PutRequest(msgspec.Struct, tag=True):
-    key: str
-    value: str
-
-class DeleteRequest(msgspec.Struct, tag=True):
-    key: str
-
-class ListRequest(msgspec.Struct, tag=True):
-    prefix: str = ""
-
-# Union type for all requests
-Request = GetRequest | PutRequest | DeleteRequest | ListRequest
-
-# Single decoder handles all types
-decoder = msgspec.msgpack.Decoder(Request)
-
-# Decoding automatically determines the correct type
-data = msgspec.msgpack.encode(PutRequest(key="foo", value="bar"))
-request = decoder.decode(data)
-
-match request:
-    case GetRequest(key):
-        print(f"Get:  {key}")
-    case PutRequest(key, value):
-        print(f"Put: {key}={value}")
-    case DeleteRequest(key):
-        print(f"Delete: {key}")
-    case ListRequest(prefix):
-        print(f"List: {prefix}")
-```
-
----
-
-## 12. Use Struct Configuration Options
-
-**Rule:** Combine Struct options for cleaner, more robust code.
-
-```python
-import msgspec
-
-class Base(
-    msgspec. Struct,
-    omit_defaults=True,          # Don't encode default values
-    forbid_unknown_fields=True,  # Error on unknown fields (good for config files)
-    rename="kebab",              # Use kebab-case in JSON (my_field -> my-field)
-):
-    """Base class with common configuration."""
-    pass
-
-class ServerConfig(Base):
-    host: str = "localhost"
-    port: int = 8080
-    max_connections: int = 100
-    enable_ssl: bool = False
-
-# Decodes kebab-case JSON:  {"host": "prod", "max-connections": 500}
-config = msgspec.json.decode(
-    b'{"host":"prod","max-connections":  500}',
-    type=ServerConfig
-)
-# Result: ServerConfig(host='prod', port=8080, max_connections=500, enable_ssl=False)
-```
-
----
-
-## 13. TOML Configuration Files
-
-**Rule:** Use msgspec for parsing pyproject.toml and other TOML config files with validation.
-
-```python
-import msgspec
-from typing import Any
-
-class BuildSystem(msgspec. Struct, omit_defaults=True, rename="kebab"):
-    requires:  list[str] = []
-    build_backend: str | None = None
-
-class Project(msgspec. Struct, omit_defaults=True, rename="kebab"):
-    name: str | None = None
-    version: str | None = None
-    description: str | None = None
-    requires_python: str | None = None
-    dependencies: list[str] = []
-
-class PyProject(msgspec. Struct, omit_defaults=True, rename="kebab"):
-    build_system: BuildSystem | None = None
-    project:  Project | None = None
-    tool: dict[str, dict[str, Any]] = {}
-
-def load_pyproject(path: str) -> PyProject:
-    with open(path, "rb") as f:
-        return msgspec.toml.decode(f.read(), type=PyProject)
-```
-
-## Common Patterns
-
-### API Response Handler
-
-```python
-import msgspec
-from typing import TypeVar, Generic
-
-T = TypeVar('T')
-
-class APIResponse(msgspec. Struct, Generic[T], omit_defaults=True):
-    data: T | None = None
-    error: str | None = None
-    status: int = 200
-
-class User(msgspec. Struct):
-    id: int
-    name: str
-    email: str
-
-# Create typed decoder for specific response type
-user_response_decoder = msgspec. json.Decoder(APIResponse[User])
-
-def parse_user_response(raw:  bytes) -> APIResponse[User]:
-    return user_response_decoder.decode(raw)
-```
-
-## Struct Configuration Options Summary
-
-| Option | Description | Default |
-|--------|-------------|---------|
-| `omit_defaults` | Omit fields with default values when encoding | `False` |
-| `forbid_unknown_fields` | Error on unknown fields when decoding | `False` |
-| `frozen` | Make instances immutable and hashable | `False` |
-| `order` | Generate ordering methods (`__lt__`, etc.) | `False` |
-| `eq` | Generate equality methods | `True` |
-| `kw_only` | Make all fields keyword-only | `False` |
-| `tag` | Enable tagged union support | `None` |
-| `tag_field` | Field name for the tag | `"type"` |
-| `rename` | Rename fields for encoding/decoding | `None` |
-| `array_like` | Encode/decode as arrays instead of objects | `False` |
-| `gc` | Enable garbage collector tracking | `True` |
-| `weakref` | Enable weak reference support | `False` |
-| `dict` | Add `__dict__` attribute | `False` |
-| `cache_hash` | Cache the hash value | `False` |
-
----
-
-## References
-
-- Official Documentation: https://jcristharif.com/msgspec/
-- Performance Tips: https://jcristharif.com/msgspec/perf-tips.html
-- Structs Documentation: https://jcristharif.com/msgspec/structs.html
-- GC Configuration: https://jcristharif.com/msgspec/structs.html#struct-gc
diff --git a/.cursor/rules/python-antipatterns.mdc b/.cursor/rules/python-antipatterns.mdc
deleted file mode 100644
index ece51ff2..00000000
--- a/.cursor/rules/python-antipatterns.mdc
+++ /dev/null
@@ -1,658 +0,0 @@
----
-globs: **/*.py
-alwaysApply: false
----
-
-Try avoid these performance antipatterns in python code you write:
-
-***
-
-### 1. **Match statements (sequence)**
-- **Slow**
-```python
-def sequence_match_logical():
-    seq = ["🐸", "🐛", "🦋", "🪲"]
-    frogs = 0
-    for _ in range(100_000):
-        if isinstance(seq, Sequence) and len(seq) > 0 and seq[0] == "🐸":
-            frogs += 1
-```
-- **Fast**
-```python
-def sequence_match_statement():
-    seq = ["🐸", "🐛", "🦋", "🪲"]
-    frogs = 0
-    for _ in range(100_000):
-        match seq:
-            case ["🐸", *_]: frogs += 1
-```
-
-***
-
-### 2. **Match statements (literal)**
-- **Slow**
-```python
-def literal_match_logical():
-    seq = ["🐊", "🐛", "🐈", "🦋", "🪲", "🐳"]
-    butterflies, caterpillars, beetles = 0, 0, 0
-    for _ in range(100_000):
-        for x in seq:
-            if x == "🦋":
-                butterflies += 1
-            elif x == "🐛":
-                caterpillars += 1
-            elif x == "🪲":
-                beetles += 1
-```
-- **Fast**
-```python
-def literal_match_statement():
-    seq = ["🐊", "🐛", "🐈", "🦋", "🪲", "🐳"]
-    butterflies, caterpillars, beetles = 0, 0, 0
-    for _ in range(100_000):
-        for x in seq:
-            match x:
-                case "🦋": butterflies += 1
-                case "🐛": caterpillars += 1
-                case "🪲": beetles += 1
-```
-
-***
-
-### 3. **Match statements (mapping)**
-- **Slow**
-```python
-def mapping_match_logical():
-    boats = [
-        {"🐓": 1}, {"🦊": 1, "🌽": 1},
-        {"🐓": 1, "🌽": 1}, {"🐓": 1, "🦊": 1},
-    ]
-    problems = valid_boats = 0
-    for _ in range(100_000):
-        for boat in boats:
-            if isinstance(boat, Mapping):
-                if "🐓" in boat and "🌽" in boat:
-                    problems += 1
-                elif "🐓" in boat and "🦊" in boat:
-                    problems += 1
-                else:
-                    valid_boats += 1
-```
-- **Fast**
-```python
-def mapping_match_statement():
-    boats = [
-        {"🐓": 1}, {"🦊": 1, "🌽": 1},
-        {"🐓": 1, "🌽": 1}, {"🐓": 1, "🦊": 1},
-    ]
-    problems = valid_boats = 0
-    for _ in range(100_000):
-        for boat in boats:
-            match boat:
-                case {"🐓": _, "🌽": _}: problems += 1
-                case {"🐓": _, "🦊": _}: problems += 1
-                case _: valid_boats += 1
-```
-
-***
-
-### 4. **Match statements (classes)**
-- **Slow**
-```python
-def bench_class_matching_logical():
-    drivers = [
-        Driver(name="Max Verstappen", team="Red Bull"),
-        Driver(name="Sergio Perez", team="Red Bull"),
-        Driver(name="Charles Leclerc", team="Ferrari"),
-        Driver(name="Lewis Hamilton", team="Mercedes"),
-    ]
-    for _ in range(100_000):
-        for driver in drivers:
-            if not isinstance(driver, Driver):
-                desc = "Invalid request"
-            elif driver.name == "Max Verstappen":
-                desc = "Max Verstappen, the current world #1"
-            elif driver.team == "Ferrari":
-                desc = f"{driver.name}, a Ferrari driver!! 🐎"
-            else:
-                desc = f"{driver.name}, a {driver.team} driver."
-```
-- **Fast**
-```python
-def bench_class_matching_statement():
-    drivers = [
-        Driver(name="Max Verstappen", team="Red Bull"),
-        Driver(name="Sergio Perez", team="Red Bull"),
-        Driver(name="Charles Leclerc", team="Ferrari"),
-        Driver(name="Lewis Hamilton", team="Mercedes"),
-    ]
-    for _ in range(100_000):
-        for driver in drivers:
-            match driver:
-                case Driver(name="Max Verstappen"): desc = "Max Verstappen, the current world #1"
-                case Driver(name=name, team="Ferrari"): desc = f"{name}, a Ferrari driver!! 🐎"
-                case Driver(name=name, team=team): desc = f"{name}, a {team} driver."
-                case _: desc = "Invalid request"
-```
-
-***
-
-### 5. **Inline globals in loop**
-- **Slow**
-```python
-def global_constant_in_loop():
-    total = MY_GLOBAL_CONSTANT_A
-    for i in range(10_000):
-        total += i * MY_GLOBAL_CONSTANT_C
-```
-- **Fast**
-```python
-def local_constant_in_loop():
-    total = 3.14
-    for i in range(10_000):
-        total += i * 1234
-```
-
-***
-
-### 6. **GC with higher threshold**
-- **Slow**
-```python
-def load_with_gc():
-    t1, t2, t3 = gc.get_threshold()
-    gc.set_threshold(1000, 20, 20)
-    for _ in range(100_000):
-        _cyclic_references()
-    gc.set_threshold(t1, t2, t3)
-```
-- **Fast**
-```python
-def load_gc_at_end():
-    t1, t2, t3 = gc.get_threshold()
-    gc.set_threshold(10, 10, 10)
-    for _ in range(100_000):
-        _cyclic_references()
-    gc.set_threshold(t1, t2, t3)
-```
-
-***
-
-### 7. **Importing specific name instead of namespace**
-- **Slow**
-```python
-def dotted_import():
-    for _ in range(100_000):
-        return os.path.exists('/')
-```
-- **Fast**
-```python
-def direct_import():
-    for _ in range(100_000):
-        return exists('/')
-```
-
-***
-
-### 8. **Refactoring Try..except outside a loop**
-- **Slow**
-```python
-def try_in_loop():
-    items = {'a': 1}
-    for _ in range(100_000):
-        try:
-            _ = items['a']
-        except Exception:
-            pass
-```
-- **Fast**
-```python
-def try_outside_loop():
-    items = {'a': 1}
-    try:
-        for _ in range(100_000):
-            _ = items['a']
-    except Exception:
-        pass
-```
-
-***
-
-### 9. **Class instead of dataclass**
-- **Slow**
-```python
-def attributes_in_class():
-    class Pet:
-        legs: int
-        noise: str
-        def __init__(self, legs, noise): self.legs = legs; self.noise = noise
-        def __repr__(self): return ""
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-- **Fast**
-```python
-def attributes_in_dataclass():
-    @dataclass
-    class Pet:
-        legs: int
-        noise: str
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-
-***
-
-### 10. **Namedtuple instead of dataclass**
-- **Slow**
-```python
-def attributes_in_namedtuple():
-    Pet = namedtuple("Pet", "legs noise")
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-- **Fast**
-```python
-def attributes_in_dataclass():
-    @dataclass
-    class Pet:
-        legs: int
-        noise: str
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-
-***
-
-### 11. **class instead of namedtuple**
-- **Slow**
-```python
-def attributes_in_class():
-    class Pet:
-        legs: int
-        noise: str
-        def __init__(self, legs, noise): self.legs = legs; self.noise = noise
-        def __repr__(self): return ""
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-- **Fast**
-```python
-def attributes_in_namedtuple():
-    Pet = namedtuple("Pet", "legs noise")
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-
-***
-
-### 12. **namedtuple class instead of namedtuple**
-- **Slow**
-```python
-def attributes_in_namedtuple_type():
-    class Pet(typing.NamedTuple):
-        legs: int
-        noise: str
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-- **Fast**
-```python
-def attributes_in_namedtuple():
-    Pet = namedtuple("Pet", "legs noise")
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-
-***
-
-### 13. **dict instead of class**
-- **Slow**
-```python
-def attributes_in_dict():
-    for _ in range(100_000):
-        dog = {"legs": 4, "noise": "woof"}
-        str(dog)
-```
-- **Fast**
-```python
-def attributes_in_class():
-    class Pet:
-        legs: int
-        noise: str
-        def __init__(self, legs, noise): self.legs = legs; self.noise = noise
-        def __repr__(self): return ""
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-
-***
-
-### 14. **class with slots**
-- **Slow**
-```python
-def attributes_in_class():
-    class Pet:
-        legs: int
-        noise: str
-        def __init__(self, legs, noise): self.legs = legs; self.noise = noise
-        def __repr__(self): return ""
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-- **Fast**
-```python
-def attributes_in_class_with_slots():
-    class Pet:
-        legs: int
-        noise: str
-        __slots__ = 'legs', 'noise'
-        def __init__(self, legs, noise): self.legs = legs; self.noise = noise
-        def __repr__(self): return ""
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-
-***
-
-### 15. **dataclass with slots**
-- **Slow**
-```python
-def attributes_in_dataclass():
-    @dataclass
-    class Pet:
-        legs: int
-        noise: str
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-- **Fast**
-```python
-def attributes_in_dataclass_with_slots():
-    @dataclass(slots=True)
-    class Pet:
-        legs: int
-        noise: str
-    for _ in range(100_000):
-        dog = Pet(4, "woof")
-        str(dog)
-```
-
-***
-
-### 16. **Using a list comprehension to filter another list**
-- **Slow**
-```python
-def filter_list_as_loop():
-    result = []
-    inputs = range(100_000)
-    for i in inputs:
-        if i % 2:
-            result.append(i)
-```
-- **Fast**
-```python
-def filter_list_as_comprehension():
-    inputs = range(100_000)
-    result = [i for i in inputs if i % 2]
-```
-
-***
-
-### 17. **Join list comprehension instead of generator expression**
-- **Slow**
-```python
-def join_list_comprehension():
-    words = ['data', 'type', 'is', 'so', 'long', 'now']
-    for x in range(100_000):
-        ''.join([ele.title() for ele in words])
-```
-- **Fast**
-```python
-def join_generator_expression():
-    words = ['data', 'type', 'is', 'so', 'long', 'now']
-    for x in range(100_000):
-        ''.join(ele.title() for ele in words)
-```
-
-***
-
-### 18. **Using fullmatch instead of anchors**
-- **Slow**
-```python
-def regex_with_anchors():
-    SNAKE_CASE_RE = re.compile(r'^([a-z]+\d*_[a-z\d_]*|_+[a-z\d]+[a-z\d_]*)$')
-    tests = ['data_type', 'data_type_', '_dataType', 'dataType', 'data type']
-    for x in range(100_000):
-        for test_str in tests:
-            SNAKE_CASE_RE.match(test_str)
-```
-- **Fast**
-```python
-def regex_with_fullmatch():
-    SNAKE_CASE_RE = re.compile(r'([a-z]+\d*_[a-z\d_]*|_+[a-z\d]+[a-z\d_]*)')
-    tests = ['data_type', 'data_type_', '_dataType', 'dataType', 'data type']
-    for x in range(100_000):
-        for test_str in tests:
-            SNAKE_CASE_RE.fullmatch(test_str)
-```
-
-***
-
-### 19. **Using a-zA-Z instead of IGNORECASE**
-- **Slow**
-```python
-def regex_with_capitalrange():
-    SNAKE_CASE_RE = re.compile(r'([a-zA-Z]+\d*_[a-zA-Z\d_]*|_+[a-zA-Z\d]+[a-zA-Z\d_]*)')
-    tests = ['data_type', 'data_type_URL', '_DataType', 'DataTypeURL', 'Data Type URL']
-    for x in range(100_000):
-        for test_str in tests:
-            SNAKE_CASE_RE.fullmatch(test_str)
-```
-- **Fast**
-```python
-def regex_with_ignorecase():
-    SNAKE_CASE_RE = re.compile(r'([a-z]+\d*_[a-z\d_]*|_+[a-z\d]+[a-z\d_]*)', re.IGNORECASE)
-    tests = ['data_type', 'data_type_URL', '_DataType', 'DataTypeURL', 'Data Type URL']
-    for x in range(100_000):
-        for test_str in tests:
-            SNAKE_CASE_RE.fullmatch(test_str)
-```
-
-***
-
-### 20. **Kwargs for known keyword args**
-- **Slow**
-```python
-def keyword_call():
-    func_with_kwargs(a=1, b=2, c=3)
-```
-- **Fast**
-```python
-def positional_call():
-    func_with_named_args(a=1, b=2, c=3)
-```
-
-***
-
-### 21. **Tiny Functions**
-- **Slow**
-```python
-def use_tiny_func():
-    x = 1
-    for n in range(100_000):
-        add(x, n)
-        add(n, x)
-```
-- **Fast**
-```python
-def inline_tiny_func():
-    x = 1
-    for n in range(100_000):
-        x + n
-        n + x
-```
-
-***
-
-### 22. **Slicing with memoryview instead of bytes**
-- **Slow**
-```python
-def bytes_slice():
-    word = b'A' * 1000
-    for i in range(1000):
-        n = word[0:i]
-```
-- **Fast**
-```python
-def memoryview_slice():
-    word = memoryview(b'A' * 1000)
-    for i in range(1000):
-        n = word[0:i]
-```
-
-***
-
-### 23. **Loop invariant Code Motion**
-- **Slow**
-```python
-def before():
-    x = (1, 2, 3, 4)
-    i = 6
-    for j in range(100_000):
-        len(x) * i + j
-```
-- **Fast**
-```python
-def after():
-    x = (1, 2, 3, 4)
-    i = 6
-    x_i = len(x) * i
-    for j in range(100_000):
-        x_i + j
-```
-
-***
-
-### 24. **Copy slice to Local**
-- **Slow**
-```python
-def slice_as_local():
-    x = list(range(100_000))
-    y = list(range(100_000))
-    for n in range(100_000):
-        x[n] + y[n]
-        x[n] + y[n]
-        x[n] + y[n]
-        x[n] + y[n]
-        x[n] + y[n]
-```
-- **Fast**
-```python
-def slice_copy_to_fast():
-    x = list(range(100_000))
-    y = list(range(100_000))
-    for n in range(100_000):
-        i = x[n]
-        j = y[n]
-        i + j
-        i + j
-        i + j
-        i + j
-        i + j
-```
-
-***
-
-### 25. **Copy name to Local**
-- **Slow**
-```python
-def as_local():
-    for _ in range(100_000):
-        x + y
-        x + y
-        x + y
-        x + y
-        x + y
-```
-- **Fast**
-```python
-def copy_name_to_fast():
-    i = x
-    j = y
-    for _ in range(100_000):
-        i + j
-        i + j
-        i + j
-        i + j
-        i + j
-```
-
-***
-
-### 26. **Copy dict item to Local**
-- **Slow**
-```python
-def dont_copy_dict_key_to_fast():
-    for _ in range(100_000):
-        d["x"] + d["y"]
-        d["x"] + d["y"]
-        d["x"] + d["y"]
-        d["x"] + d["y"]
-        d["x"] + d["y"]
-```
-- **Fast**
-```python
-def copy_dict_key_to_fast():
-    i = d["x"]
-    j = d["y"]
-    for _ in range(100_000):
-        i + j
-        i + j
-        i + j
-        i + j
-        i + j
-```
-
-***
-
-### 27. **Copy class attr to Local**
-- **Slow**
-```python
-def dont_copy_attr_to_fast():
-    for _ in range(100_000):
-        foo.x + foo.y
-        foo.x + foo.y
-        foo.x + foo.y
-        foo.x + foo.y
-        foo.x + foo.y
-```
-- **Fast**
-```python
-def copy_attr_to_fast():
-    i = foo.x
-    j = foo.y
-    for _ in range(100_000):
-        i + j
-        i + j
-        i + j
-        i + j
-        i + j
-```
-
-***
-
-These minimal code snippets **accurately reflect the benchmark order and results in your environment, showing both slow (anti-pattern) and fast (optimized) variants for each case.**
-
-[1](https://github.com/tonybaloney/anti-patterns/blob/master/README.md)
diff --git a/.github/ISSUE_TEMPLATE/100-bug-report.yml b/.github/ISSUE_TEMPLATE/100-bug-report.yml
new file mode 100644
index 00000000..4cf5b586
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/100-bug-report.yml
@@ -0,0 +1,43 @@
+name: Bug Report
+description: Report a bug or unexpected behavior
+title: "[Bug]: "
+labels: ["type: bug", "status: needs-triage"]
+body:
+  - type: textarea
+    id: description
+    attributes:
+      label: Bug Description
+      description: What happened vs. what you expected
+      placeholder: "When I run X, I expected Y but got Z"
+    validations:
+      required: true
+  - type: textarea
+    id: reproduction
+    attributes:
+      label: Steps to Reproduce
+      value: |
+        1.
+        2.
+        3.
+    validations:
+      required: true
+  - type: textarea
+    id: environment
+    attributes:
+      label: Environment
+      description: OS, Python version, package version
+      placeholder: "OS: Ubuntu 22.04, Python 3.12, inference-endpoint v0.1.0"
+    validations:
+      required: true
+  - type: textarea
+    id: logs
+    attributes:
+      label: Relevant Logs
+      render: shell
+  - type: checkboxes
+    id: checklist
+    attributes:
+      label: Before submitting
+      options:
+        - label: I searched existing issues and found no duplicates
+          required: true
diff --git a/.github/ISSUE_TEMPLATE/200-feature-request.yml b/.github/ISSUE_TEMPLATE/200-feature-request.yml
new file mode 100644
index 00000000..3aa7de25
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/200-feature-request.yml
@@ -0,0 +1,27 @@
+name: Feature Request
+description: Suggest a new feature or enhancement
+title: "[Feature]: "
+labels: ["type: feature", "status: needs-triage"]
+body:
+  - type: textarea
+    id: motivation
+    attributes:
+      label: Motivation
+      description: What problem does this solve? Why do you need it?
+    validations:
+      required: true
+  - type: textarea
+    id: proposal
+    attributes:
+      label: Proposed Solution
+      description: How should this work? Include API sketches if relevant.
+    validations:
+      required: true
+  - type: textarea
+    id: alternatives
+    attributes:
+      label: Alternatives Considered
+  - type: textarea
+    id: context
+    attributes:
+      label: Additional Context
diff --git a/.github/ISSUE_TEMPLATE/300-performance.yml b/.github/ISSUE_TEMPLATE/300-performance.yml
new file mode 100644
index 00000000..d2aa9007
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/300-performance.yml
@@ -0,0 +1,59 @@
+name: Performance Issue
+description: Report a performance regression or improvement opportunity
+title: "[Perf]: "
+labels: ["type: performance", "status: needs-triage"]
+body:
+  - type: textarea
+    id: description
+    attributes:
+      label: Description
+      description: What performance issue did you observe?
+      placeholder: "QPS dropped from X to Y after upgrading to version Z"
+    validations:
+      required: true
+  - type: textarea
+    id: benchmark
+    attributes:
+      label: Benchmark Command
+      description: The exact command you ran
+      render: shell
+    validations:
+      required: true
+  - type: textarea
+    id: results
+    attributes:
+      label: Results
+      description: Expected vs actual numbers (QPS, latency, TTFT, TPOT, etc.)
+      placeholder: |
+        Expected: ~5000 QPS, p99 latency < 200ms
+        Actual: ~2000 QPS, p99 latency 800ms
+    validations:
+      required: true
+  - type: textarea
+    id: environment
+    attributes:
+      label: Environment
+      description: Hardware, OS, Python version, endpoint server details
+      placeholder: |
+        Hardware: 8x A100 80GB
+        OS: Ubuntu 22.04
+        Python: 3.12
+        Server: vLLM 0.6.0, Llama-3-70B
+        Workers: 4
+    validations:
+      required: true
+  - type: textarea
+    id: profiling
+    attributes:
+      label: Profiling Data (optional)
+      description: Any profiling output, flame graphs, or bottleneck analysis
+      render: shell
+  - type: checkboxes
+    id: checklist
+    attributes:
+      label: Before submitting
+      options:
+        - label: I searched existing issues and found no duplicates
+          required: true
+        - label: I ran with default settings before tuning
+          required: false
diff --git a/.github/ISSUE_TEMPLATE/400-dataset-integration.yml b/.github/ISSUE_TEMPLATE/400-dataset-integration.yml
new file mode 100644
index 00000000..67c6673f
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/400-dataset-integration.yml
@@ -0,0 +1,48 @@
+name: Dataset Integration
+description: Request support for a new dataset or evaluation benchmark
+title: "[Dataset]: "
+labels: ["type: feature", "area: dataset", "status: needs-triage"]
+body:
+  - type: textarea
+    id: dataset
+    attributes:
+      label: Dataset Information
+      description: Name, URL, and brief description
+      placeholder: |
+        Name: MATH-500
+        URL: https://huggingface.co/datasets/...
+        Description: 500 competition math problems for testing reasoning
+    validations:
+      required: true
+  - type: dropdown
+    id: format
+    attributes:
+      label: Dataset Format
+      options:
+        - JSONL
+        - HuggingFace Dataset
+        - CSV
+        - JSON
+        - Parquet
+        - Other
+    validations:
+      required: true
+  - type: textarea
+    id: evaluation
+    attributes:
+      label: Evaluation Method
+      description: How should responses be scored?
+      placeholder: "Exact match after extracting boxed answer, or pass@1 for code"
+    validations:
+      required: true
+  - type: textarea
+    id: samples
+    attributes:
+      label: Scale
+      description: Number of samples, expected prompt/response lengths
+      placeholder: "500 samples, avg prompt ~200 tokens, avg response ~500 tokens"
+  - type: textarea
+    id: context
+    attributes:
+      label: Additional Context
+      description: Related benchmarks, papers, or prior art
diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
new file mode 100644
index 00000000..0086358d
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1 @@
+blank_issues_enabled: true
diff --git a/.github/workflows/sync-labels-to-board.yml b/.github/workflows/sync-labels-to-board.yml
new file mode 100644
index 00000000..8a3eaf83
--- /dev/null
+++ b/.github/workflows/sync-labels-to-board.yml
@@ -0,0 +1,150 @@
+name: Sync Labels to Project Board
+
+on:
+  issues:
+    types: [labeled, unlabeled]
+
+env:
+  PROJECT_ID: "PVT_kwDOBAnwDc4BTQvY"
+  # These IDs are populated from the board's GraphQL field configuration.
+  # To find them: query the board fields via GraphQL and extract option IDs.
+  PRIORITY_FIELD_ID: "PVTSSF_lADOBAnwDc4BTQvYzhBKk68"
+  AREA_FIELD_ID: "PVTSSF_lADOBAnwDc4BTQvYzhBKk7A"
+
+jobs:
+  sync-labels:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Sync priority and area labels to board fields
+        uses: actions/github-script@v7
+        with:
+          script: |
+            const issue = context.payload.issue;
+            const labels = issue.labels.map(l => l.name);
+
+            // --- Field and option ID mappings ---
+            // Priority field
+            const PRIORITY_FIELD_ID = process.env.PRIORITY_FIELD_ID;
+            const PRIORITY_MAP = {
+              'priority: ShowStopper': process.env.SHOWSTOPPER_OPTION_ID,
+              'priority: P0': process.env.P0_OPTION_ID,
+              'priority: P1': process.env.P1_OPTION_ID,
+              'priority: P2': process.env.P2_OPTION_ID,
+              'priority: P3': process.env.P3_OPTION_ID,
+            };
+
+            // Area field
+            const AREA_FIELD_ID = process.env.AREA_FIELD_ID;
+            const AREA_MAP = {
+              'area: core-engine': process.env.CORE_ENGINE_OPTION_ID,
+              'area: client': process.env.CLIENT_OPTION_ID,
+              'area: metrics': process.env.METRICS_OPTION_ID,
+              'area: dataset': process.env.DATASET_OPTION_ID,
+              'area: config-cli': process.env.CONFIG_CLI_OPTION_ID,
+              'area: evaluation': process.env.EVALUATION_OPTION_ID,
+              'area: adapters': process.env.ADAPTERS_OPTION_ID,
+              'area: mlcommons': process.env.MLCOMMONS_OPTION_ID,
+            };
+
+            const PROJECT_ID = process.env.PROJECT_ID;
+
+            // Find the board item for this issue
+            const findItemQuery = `
+              query($projectId: ID!, $cursor: String) {
+                node(id: $projectId) {
+                  ... on ProjectV2 {
+                    items(first: 100, after: $cursor) {
+                      nodes {
+                        id
+                        content {
+                          ... on Issue { number }
+                        }
+                      }
+                      pageInfo { hasNextPage endCursor }
+                    }
+                  }
+                }
+              }
+            `;
+
+            let itemId = null;
+            let cursor = null;
+            while (!itemId) {
+              const result = await github.graphql(findItemQuery, {
+                projectId: PROJECT_ID,
+                cursor: cursor,
+              });
+              const items = result.node.items;
+              const match = items.nodes.find(
+                n => n.content && n.content.number === issue.number
+              );
+              if (match) {
+                itemId = match.id;
+                break;
+              }
+              if (!items.pageInfo.hasNextPage) break;
+              cursor = items.pageInfo.endCursor;
+            }
+
+            if (!itemId) {
+              core.info(`Issue #${issue.number} not found on board, skipping.`);
+              return;
+            }
+
+            // Helper to update a single-select field
+            async function setField(fieldId, optionId) {
+              if (!optionId) {
+                // Clear the field
+                await github.graphql(`
+                  mutation($projectId: ID!, $itemId: ID!, $fieldId: ID!) {
+                    clearProjectV2ItemFieldValue(input: {
+                      projectId: $projectId, itemId: $itemId, fieldId: $fieldId
+                    }) { projectV2Item { id } }
+                  }
+                `, { projectId: PROJECT_ID, itemId, fieldId });
+              } else {
+                await github.graphql(`
+                  mutation($projectId: ID!, $itemId: ID!, $fieldId: ID!, $optionId: String!) {
+                    updateProjectV2ItemFieldValue(input: {
+                      projectId: $projectId, itemId: $itemId, fieldId: $fieldId,
+                      value: { singleSelectOptionId: $optionId }
+                    }) { projectV2Item { id } }
+                  }
+                `, { projectId: PROJECT_ID, itemId, fieldId, optionId });
+              }
+            }
+
+            // Sync priority: find the highest-priority label on the issue
+            const priorityOrder = [
+              'priority: ShowStopper',
+              'priority: P0',
+              'priority: P1',
+              'priority: P2',
+              'priority: P3',
+            ];
+            const activePriority = priorityOrder.find(p => labels.includes(p));
+            const priorityOptionId = activePriority ? PRIORITY_MAP[activePriority] : null;
+            await setField(PRIORITY_FIELD_ID, priorityOptionId);
+            core.info(`Priority set to: ${activePriority || '(cleared)'}`);
+
+            // Sync area: use the first area label found
+            const activeArea = labels.find(l => l.startsWith('area: '));
+            const areaOptionId = activeArea ? AREA_MAP[activeArea] : null;
+            await setField(AREA_FIELD_ID, areaOptionId);
+            core.info(`Area set to: ${activeArea || '(cleared)'}`);
+        env:
+          PRIORITY_FIELD_ID: ${{ env.PRIORITY_FIELD_ID }}
+          AREA_FIELD_ID: ${{ env.AREA_FIELD_ID }}
+          SHOWSTOPPER_OPTION_ID: "26ab336c"
+          P0_OPTION_ID: "d3612dd9"
+          P1_OPTION_ID: "7ff45c96"
+          P2_OPTION_ID: "e41b2ee9"
+          P3_OPTION_ID: "d4d24170"
+          CORE_ENGINE_OPTION_ID: "db5c9511"
+          CLIENT_OPTION_ID: "ffeff676"
+          METRICS_OPTION_ID: "04637e5a"
+          DATASET_OPTION_ID: "b493fd0d"
+          CONFIG_CLI_OPTION_ID: "ae1f5588"
+          EVALUATION_OPTION_ID: "96e592b6"
+          ADAPTERS_OPTION_ID: "6c615274"
+          MLCOMMONS_OPTION_ID: "d5eff045"
diff --git a/.gitignore b/.gitignore
index 8dc22a68..6681801b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -189,10 +189,7 @@ outputs/
 # Example vLLM virtualenv
 examples/03_BenchmarkComparison/vllm_venv/
 
-# Agent artifacts (local development only)
+# AI tool artifacts (local development only)
 .cursor_artifacts/
-.claude/agent-memory/
-
-# User-specific local rules (local Docker dev); do not commit
-.cursor/rules/local-docker-dev.mdc
-CLAUDE.local.md
+.cursor/
+docs/superpowers/
diff --git a/AGENTS.md b/AGENTS.md
index 52a3dbb5..6fec5395 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -21,7 +21,7 @@ pytest -m integration                         # Integration tests only
 pytest --cov=src --cov-report=html            # With coverage
 pytest -xvs tests/unit/path/to/test_file.py  # Single test file
 
-# Code quality (run before commits)
+# Code quality — MUST run before every commit, no exceptions
 pre-commit run --all-files
 
 # Local testing with echo server
@@ -215,7 +215,7 @@ All of these run automatically on commit:
 - License header enforcement
 - `regenerate-templates`: auto-regenerates YAML config templates from schema defaults when `schema.py`, `config.py`, or `regenerate_templates.py` change
 
-**Always run `pre-commit run --all-files` before committing.**
+**IMPORTANT: Always run `pre-commit run --all-files` before every commit.** Hooks may modify files (prettier, ruff-format, license headers). If files are modified, stage the changes and commit once. Never commit without running pre-commit first.
 
 See [Development Guide](docs/DEVELOPMENT.md) for full setup and workflow details.
 
@@ -240,7 +240,7 @@ See [Development Guide](docs/DEVELOPMENT.md) for full setup and workflow details
 @pytest.mark.run_explicitly # Only run when explicitly selected
 ```
 
-**Async tests**: Use `@pytest.mark.asyncio(mode="strict")` — the project uses strict asyncio mode.
+**Async tests**: Use `@pytest.mark.asyncio` — strict mode is configured globally in `pyproject.toml` (`asyncio_mode = "strict"`). Do NOT pass `mode="strict"` to the marker — it's not a valid argument.
 
 **Key fixtures** (defined in `tests/conftest.py`):
 
@@ -342,7 +342,7 @@ Known failure modes when AI tools generate code for this project. Reference thes
 
 - **Generating mock-heavy tests for integration scenarios**: This project has real echo/oracle server fixtures. AI tends to mock HTTP calls even when `mock_http_echo_server` or `mock_http_oracle_server` fixtures exist and should be used.
 - **Missing test markers**: Every test function needs `@pytest.mark.unit`, `@pytest.mark.integration`, or another marker. AI-generated tests almost always omit markers, which breaks CI filtering.
-- **Wrong asyncio mode**: Tests must use `@pytest.mark.asyncio(mode="strict")` — AI often writes bare `@pytest.mark.asyncio` or forgets it entirely, causing silent test skips or failures.
+- **Wrong asyncio marker**: Tests must use bare `@pytest.mark.asyncio` — strict mode is configured globally in `pyproject.toml`. Do NOT pass `mode="strict"` to the marker (it's not a valid argument and will cause errors). AI sometimes hallucinates this parameter.
 - **Fabricating fixture names**: AI may invent fixtures that don't exist in `conftest.py`. Always check that referenced fixtures actually exist before using them.
 
 ### Code Style & Repo Conventions
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 8de1bbe9..db06a18c 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,11 +1,213 @@
-## Contributing
+# Contributing to MLPerf Inference Endpoints
 
-The best way to contribute to the MLCommons is to get involved with one of our many project communities. You can find more information about getting involved with MLCommons [here](https://mlcommons.org/community/).
+Welcome! We're glad you're interested in contributing. This project is part of
+[MLCommons](https://mlcommons.org/) and aims to build a high-performance
+benchmarking tool for LLM inference endpoints targeting 50k+ QPS.
 
-Generally we encourage people to become MLCommons members if they wish to contribute to MLCommons projects, but outside pull requests are very welcome too.
+## Table of Contents
 
-Regardless of whether you are a member, your organization (or you as an individual contributor) needs to sign the MLCommons Contributor License Agreement (CLA). Please submit your GitHub username to the [MLCommons Subscription form](https://mlcommons.org/community/subscribe/) to start that process.
+- [Ways to Contribute](#ways-to-contribute)
+- [Development Setup](#development-setup)
+- [Code Style and Conventions](#code-style-and-conventions)
+- [Testing](#testing)
+- [Submitting Changes](#submitting-changes)
+- [Issue Guidelines](#issue-guidelines)
+- [MLCommons CLA](#mlcommons-cla)
 
-MLCommons project work is tracked with issue trackers and pull requests. Modify the project in your own fork and issue a pull request once you want other developers to take a look at what you have done and discuss the proposed changes. Ensure that cla-bot and other checks pass for your pull requests.
+## Ways to Contribute
 
-For project-specific development standards (code style, test requirements, pre-commit hooks, commit format), see the [Development Guide](docs/DEVELOPMENT.md).
+- **Report bugs** — use the [Bug Report](https://github.com/mlcommons/endpoints/issues/new?template=100-bug-report.yml) template
+- **Request features** — use the [Feature Request](https://github.com/mlcommons/endpoints/issues/new?template=200-feature-request.yml) template
+- **Report performance issues** — use the [Performance Issue](https://github.com/mlcommons/endpoints/issues/new?template=300-performance.yml) template
+- **Request dataset support** — use the [Dataset Integration](https://github.com/mlcommons/endpoints/issues/new?template=400-dataset-integration.yml) template
+- **Improve documentation** — fix typos, clarify guides, add examples
+- **Pick up an issue** — look for [`good first issue`](https://github.com/mlcommons/endpoints/labels/good%20first%20issue) or [`help wanted`](https://github.com/mlcommons/endpoints/labels/help%20wanted)
+- **Review PRs** — thoughtful reviews are as valuable as code
+
+## Development Setup
+
+### Prerequisites
+
+- Python 3.12+ (3.12 recommended)
+- Git
+- A Unix-like OS (Linux or macOS)
+
+### Getting Started
+
+```bash
+# Fork and clone
+git clone https://github.com/<your-username>/endpoints.git
+cd endpoints
+
+# Create virtual environment
+python3.12 -m venv venv
+source venv/bin/activate
+
+# Install with dev and test extras
+pip install -e ".[dev,test]"
+
+# Install pre-commit hooks
+pre-commit install
+
+# Verify your setup
+pytest -m unit -x --timeout=60
+```
+
+### Local Testing with Echo Server
+
+```bash
+# Start a local echo server
+python -m inference_endpoint.testing.echo_server --port 8765
+
+# Run a quick probe
+inference-endpoint probe --endpoints http://localhost:8765 --model test-model
+```
+
+## Code Style and Conventions
+
+### Formatting and Linting
+
+We use [ruff](https://docs.astral.sh/ruff/) for formatting and linting, and
+[mypy](https://mypy-lang.org/) for type checking. Pre-commit hooks enforce
+these automatically.
+
+```bash
+# Run all checks manually
+pre-commit run --all-files
+```
+
+### Key Conventions
+
+- **Line length:** 88 characters
+- **Quotes:** Double quotes
+- **License headers:** Required on all Python files (auto-added by pre-commit)
+- **Commit messages:** [Conventional commits](https://www.conventionalcommits.org/) — `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `perf:`
+- **Comments:** Only where the _why_ isn't obvious from the code. No over-documenting.
+
+### Serialization
+
+- **Hot-path data** (Query, QueryResult, StreamChunk): `msgspec.Struct` — encode/decode with `msgspec.json`, not stdlib json
+- **Configuration**: `pydantic.BaseModel` for validation
+- **Do not** use `dataclass` where neighboring types use `msgspec`
+
+### Performance-Sensitive Code
+
+Code in `load_generator/`, `endpoint_client/worker.py`, and `async_utils/transport/`
+is latency-critical. In these paths:
+
+- No `match` statements — use dict dispatch
+- Minimize async suspends
+- No pydantic validation or excessive logging
+- Use `msgspec` over `json`/`pydantic` for serialization
+
+## Testing
+
+### Running Tests
+
+```bash
+# All tests (excludes slow/performance)
+pytest
+
+# Unit tests only
+pytest -m unit
+
+# Integration tests
+pytest -m integration
+
+# Single file
+pytest -xvs tests/unit/path/to/test_file.py
+
+# With coverage
+pytest --cov=src --cov-report=html
+```
+
+### Test Markers
+
+Every test function **must** have a marker:
+
+```python
+@pytest.mark.unit
+@pytest.mark.asyncio  # strict mode is configured globally in pyproject.toml
+async def test_something():
+    ...
+```
+
+Available markers: `unit`, `integration`, `slow`, `performance`, `run_explicitly`
+
+### Coverage
+
+Target **>90% coverage** for all new code. Use existing fixtures from
+`tests/conftest.py` (e.g., `mock_http_echo_server`, `mock_http_oracle_server`,
+`dummy_dataset`) rather than mocking.
+
+## Submitting Changes
+
+### Branch Naming
+
+```
+feat/short-description
+fix/short-description
+docs/short-description
+```
+
+### Pull Request Process
+
+1. **Create a focused PR** — one logical change per PR
+2. **Fill out the PR template** — describe what, why, and how to test
+3. **Ensure CI passes** — `pre-commit run --all-files` and `pytest -m unit` locally before pushing
+4. **Link related issues** — use `Closes #123` or `Relates to #123`
+5. **Expect review within 2-3 business days** — reviewers are auto-assigned based on changed files
+
+### What We Look For in Reviews
+
+- Does it follow existing patterns in the codebase?
+- Are tests included and meaningful (not mock-heavy)?
+- Is it focused — no unrelated refactoring or over-engineering?
+- Does it avoid adding unnecessary dependencies?
+
+### After Review
+
+- Address feedback with new commits (don't force-push during review)
+- Once approved, a maintainer will merge
+
+## Issue Guidelines
+
+### Before Filing
+
+1. Search [existing issues](https://github.com/mlcommons/endpoints/issues) for duplicates
+2. Use the appropriate issue template
+3. Provide enough detail to reproduce or understand the request
+
+### Issue Lifecycle
+
+New issues are auto-added to our [project board](https://github.com/orgs/mlcommons/projects/57)
+and flow through: **Inbox → Triage → Ready → In Progress → In Review → Done**
+
+### Priority Levels
+
+| Priority        | Meaning                            |
+| --------------- | ---------------------------------- |
+| **ShowStopper** | Drop everything — critical blocker |
+| **P0**          | Blocks release or users            |
+| **P1**          | Must address this cycle            |
+| **P2**          | Address within quarter             |
+| **P3**          | Backlog, nice to have              |
+
+## MLCommons CLA
+
+All contributors must sign the
+[MLCommons Contributor License Agreement](https://mlcommons.org/membership/membership-overview/).
+A CLA bot will check your PR automatically.
+
+To sign up:
+
+1. Visit the [MLCommons Subscription form](https://mlcommons.org/membership/membership-overview/)
+2. Submit your GitHub username
+3. The CLA bot will verify on your next PR
+
+Pull requests from non-members are welcome — you'll be prompted to sign the CLA
+during the PR process.
+
+## Questions?
+
+File an [issue](https://github.com/mlcommons/endpoints/issues). We aim to respond within a few business days.
diff --git a/README.md b/README.md
index 9af4eb85..a14ed18b 100644
--- a/README.md
+++ b/README.md
@@ -1,209 +1,129 @@
-# MLPerf® Inference Endpoint Benchmarking System
+# MLPerf Inference Endpoint Benchmarking System
 
-A high-performance benchmarking tool for LLM endpoints.
+[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
+[![Python 3.12+](https://img.shields.io/badge/python-3.12%2B-blue.svg)](https://www.python.org/downloads/)
+[![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen.svg)](https://pre-commit.com/)
 
-## Quick Start
+A high-performance benchmarking tool for LLM inference endpoints, targeting 50k+ QPS. Part of [MLCommons](https://mlcommons.org/).
 
-### Installation
+## Quick Start
 
-**Requirements**: Python 3.12+ (Python 3.12 is recommended for optimal performance. GIL-less mode in higher Python versions is not yet supported.)
+**Requirements:** Python 3.12+ (3.12 recommended)
 
 ```bash
-# Clone the repository
-# Note: This repo will be migrated to https://github.com/mlcommons/endpoints
 git clone https://github.com/mlcommons/endpoints.git
 cd endpoints
-
-# Create virtual environment
-python3.12 -m venv venv
-source venv/bin/activate
-
-# As a user
+python3.12 -m venv venv && source venv/bin/activate
 pip install .
-
-# As a developer (with development and test extras)
-pip install -e ".[dev,test]"
-pre-commit install
 ```
 
-### Basic Usage
-
 ```bash
-# Show help
-inference-endpoint --help
-
-# Show system information
-inference-endpoint -v info
-
 # Test endpoint connectivity
 inference-endpoint probe \
   --endpoints http://your-endpoint:8000 \
   --model Qwen/Qwen3-8B
 
-# Run offline benchmark (max throughput - uses all dataset samples)
+# Run offline benchmark (max throughput)
 inference-endpoint benchmark offline \
   --endpoints http://your-endpoint:8000 \
   --model Qwen/Qwen3-8B \
   --dataset tests/datasets/dummy_1k.jsonl
 
-# Run online benchmark (sustained QPS - requires --target-qps, --load-pattern)
+# Run online benchmark (sustained QPS)
 inference-endpoint benchmark online \
   --endpoints http://your-endpoint:8000 \
   --model Qwen/Qwen3-8B \
   --dataset tests/datasets/dummy_1k.jsonl \
   --load-pattern poisson \
   --target-qps 100
-
-# With explicit sample count
-inference-endpoint benchmark offline \
-  --endpoints http://your-endpoint:8000 \
-  --model Qwen/Qwen3-8B \
-  --dataset tests/datasets/dummy_1k.jsonl \
-  --num-samples 5000
 ```
 
-### Running Locally
+### Local Testing
 
 ```bash
-# Start local echo server
-python3 -m inference_endpoint.testing.echo_server --port 8765 &
-
-# Test with dummy dataset (included in repo)
+# Start local echo server and run a benchmark against it
+python -m inference_endpoint.testing.echo_server --port 8765 &
 inference-endpoint benchmark offline \
   --endpoints http://localhost:8765 \
-  --model Qwen/Qwen3-8B \
+  --model test-model \
   --dataset tests/datasets/dummy_1k.jsonl
-
-# Stop echo server
 pkill -f echo_server
 ```
 
-See [Local Testing Guide](docs/LOCAL_TESTING.md) for detailed instructions.
-
-### Running Tests and Examples
-
-```bash
-# Install test dependencies
-pip install ".[test]"
-
-# Run tests (excluding performance and explicit-run tests)
-pytest -m "not performance and not run_explicitly"
-
-# Run examples: follow instructions in examples/*/README.md
-```
+See [Local Testing Guide](docs/LOCAL_TESTING.md) for more details.
 
-## 📚 Documentation
-
-- [AGENTS.md](AGENTS.md) - Architecture, conventions, and AI agent guidelines
-- [CLI Quick Reference](docs/CLI_QUICK_REFERENCE.md) - Command-line interface guide
-- [Local Testing Guide](docs/LOCAL_TESTING.md) - Test with echo server
-- [Development Guide](docs/DEVELOPMENT.md) - How to contribute and develop
-- [Performance Architecture](docs/PERF_ARCHITECTURE.md) - Hot-path design and tuning
-- [Performance Tuning](docs/CLIENT_PERFORMANCE_TUNING.md) - CPU affinity and client tuning
-- [GitHub Setup Guide](docs/GITHUB_SETUP.md) - GitHub authentication and setup
-
-### Component Design Specs
-
-Each top-level component under `src/inference_endpoint/` has a corresponding spec:
-
-| Component         | Spec                                                             |
-| ----------------- | ---------------------------------------------------------------- |
-| Core types        | [docs/core/DESIGN.md](docs/core/DESIGN.md)                       |
-| Load generator    | [docs/load_generator/DESIGN.md](docs/load_generator/DESIGN.md)   |
-| Endpoint client   | [docs/endpoint_client/DESIGN.md](docs/endpoint_client/DESIGN.md) |
-| Metrics           | [docs/metrics/DESIGN.md](docs/metrics/DESIGN.md)                 |
-| Config            | [docs/config/DESIGN.md](docs/config/DESIGN.md)                   |
-| Async utils       | [docs/async_utils/DESIGN.md](docs/async_utils/DESIGN.md)         |
-| Dataset manager   | [docs/dataset_manager/DESIGN.md](docs/dataset_manager/DESIGN.md) |
-| Commands (CLI)    | [docs/commands/DESIGN.md](docs/commands/DESIGN.md)               |
-| OpenAI adapter    | [docs/openai/DESIGN.md](docs/openai/DESIGN.md)                   |
-| SGLang adapter    | [docs/sglang/DESIGN.md](docs/sglang/DESIGN.md)                   |
-| Evaluation        | [docs/evaluation/DESIGN.md](docs/evaluation/DESIGN.md)           |
-| Testing utilities | [docs/testing/DESIGN.md](docs/testing/DESIGN.md)                 |
-| Profiling         | [docs/profiling/DESIGN.md](docs/profiling/DESIGN.md)             |
-| Plugins           | [docs/plugins/DESIGN.md](docs/plugins/DESIGN.md)                 |
-| Utils             | [docs/utils/DESIGN.md](docs/utils/DESIGN.md)                     |
-
-## 🎯 Architecture
-
-The system follows a modular, event-driven architecture:
+## Architecture
 
 ```
-Dataset Manager ──► Load Generator ──► Endpoint Client ──► External Endpoint
-                          │
-                    Metrics Collector
-                 (event logging + reporting)
+Dataset Manager ──> Load Generator ──> Endpoint Client ──> External Endpoint
+                         |
+                    Metrics Collector (EventRecorder + MetricsReporter)
 ```
 
-- **Dataset Manager**: Loads benchmark datasets and applies transform pipelines
-- **Load Generator**: Central orchestrator — controls timing (scheduler), issues queries, and emits sample events
-- **Endpoint Client**: Multi-process HTTP worker pool communicating over ZMQ IPC
-- **Metrics Collector**: Receives sample events from Load Generator; writes to SQLite (EventRecorder), aggregates after the run (MetricsReporter)
+| Component           | Purpose                                                                              |
+| ------------------- | ------------------------------------------------------------------------------------ |
+| **Load Generator**  | Central orchestrator: `BenchmarkSession` owns lifecycle, `Scheduler` controls timing |
+| **Endpoint Client** | Multi-process HTTP workers communicating via ZMQ IPC                                 |
+| **Dataset Manager** | Loads JSONL, HuggingFace, CSV, JSON, Parquet datasets                                |
+| **Metrics**         | SQLite-backed event recording, aggregation (QPS, latency, TTFT, TPOT)                |
+| **Config**          | Pydantic-based YAML schema, CLI auto-generated via cyclopts                          |
 
-## Accuracy Evaluation
-
-You can run accuracy evaluation with Pass@1 scoring by specifying accuracy datasets in the benchmark
-configuration. Currently, Inference Endpoints provides the following pre-defined accuracy benchmarks:
-
-- GPQA (default: GPQA Diamond)
-- AIME (default: AIME 2025)
-- LiveCodeBench (default: lite, release_v6)
-
-However, LiveCodeBench will not work out-of-the-box and requires some additional setup. See the
-[LiveCodeBench](src/inference_endpoint/evaluation/livecodebench/README.md) documentation for
-details and explanations.
+### Benchmark Modes
 
-## 🚧 Pending Features
+- **Offline** (`max_throughput`): Burst all queries at once for peak throughput measurement
+- **Online** (`poisson`): Fixed QPS with Poisson arrival distribution for latency profiling
+- **Concurrency**: Fixed concurrent request count
 
-The following features are planned for future releases:
+### Performance Design
 
-- [ ] **Submission Ruleset Integration** - Full MLPerf submission workflow support
-- [ ] **Documentation Generation and Hosting** - Sphinx-based API documentation with GitHub Pages
+The hot path is optimized for minimal overhead:
 
-## 🤝 Contributing
+- Multi-process workers with ZMQ IPC (not threads)
+- `uvloop` + `eager_task_factory` for async performance
+- `msgspec` for zero-copy serialization on the data path
+- Custom HTTP connection pooling with `httptools` parser
+- CPU affinity support for performance tuning
 
-We welcome contributions! Please see our [Development Guide](docs/DEVELOPMENT.md) for details on:
-
-- Setting up your development environment
-- Code style and quality standards
-- Testing requirements
-- Pull request process
-
-## 🙏 Acknowledgements
+## Accuracy Evaluation
 
-This project draws inspiration from and learns from the following excellent projects:
+Run accuracy evaluation with Pass@1 scoring using pre-defined benchmarks:
 
-- [MLCommons Inference](https://github.com/mlcommons/inference) - MLPerf Inference benchmark suite
-- [AIPerf](https://github.com/ai-dynamo/aiperf) - AI model performance profiling framework
-- [SGLang GenAI-Bench](https://github.com/sgl-project/genai-bench) - Token-level performance evaluation tool
-- [vLLM Benchmarks](https://github.com/vllm-project/vllm/tree/main/benchmarks) - Performance benchmarking tools for vLLM
-- [InferenceMAX](https://github.com/InferenceMAX/InferenceMAX) - LLM inference optimization toolkit
+- **GPQA** (default: GPQA Diamond)
+- **AIME** (default: AIME 2025)
+- **LiveCodeBench** (default: lite, release_v6) — requires [additional setup](src/inference_endpoint/dataset_manager/predefined/livecodebench/README.md)
 
-We are grateful to these communities for their contributions to LLM benchmarking and performance analysis.
+## Documentation
 
-## 📄 License
+| Guide                                                          | Description                           |
+| -------------------------------------------------------------- | ------------------------------------- |
+| [CLI Quick Reference](docs/CLI_QUICK_REFERENCE.md)             | Command-line interface guide          |
+| [CLI Design](docs/CLI_DESIGN.md)                               | CLI architecture and design decisions |
+| [Local Testing](docs/LOCAL_TESTING.md)                         | Test with the echo server             |
+| [Client Performance Tuning](docs/CLIENT_PERFORMANCE_TUNING.md) | Endpoint client optimization          |
+| [Performance Architecture](docs/PERF_ARCHITECTURE.md)          | Performance architecture deep dive    |
+| [Development Guide](docs/DEVELOPMENT.md)                       | Development setup and workflow        |
+| [CONTRIBUTING.md](CONTRIBUTING.md)                             | How to contribute                     |
 
-This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE.md) file for
-details.
+## Contributing
 
-## 🔗 Links
+We welcome contributions from the community. See [CONTRIBUTING.md](CONTRIBUTING.md) for:
 
-- [MLCommons](https://mlcommons.org/) - Machine Learning Performance Standards
-- [Project Repository](https://github.com/mlcommons/endpoints)
-- [MLPerf Inference](https://mlcommons.org/benchmarks/inference/)
+- Development setup and prerequisites
+- Code style (ruff, mypy, conventional commits)
+- Testing requirements (>90% coverage, pytest markers)
+- Pull request process and review expectations
 
-## 👥 Contributors
+Issues are tracked on our [project board](https://github.com/orgs/mlcommons/projects/57). Look for [`good first issue`](https://github.com/mlcommons/endpoints/labels/good%20first%20issue) or [`help wanted`](https://github.com/mlcommons/endpoints/labels/help%20wanted) to get started.
 
-Credits to core contributors of the project:
+## Acknowledgements
 
-- MLCommons Committee
-- NVIDIA: Zhihan Jiang, Rashid Kaleem, Viraat Chandra, Alice Cheng
-- ...
+This project draws inspiration from:
 
-See [ATTRIBUTION](ATTRIBUTION) for detailed attribution information.
+- [MLCommons Inference](https://github.com/mlcommons/inference) — MLPerf Inference benchmark suite
+- [AIPerf](https://github.com/ai-dynamo/aiperf) — AI model performance profiling
+- [SGLang GenAI-Bench](https://github.com/sgl-project/genai-bench) — Token-level performance evaluation
+- [vLLM Benchmarks](https://github.com/vllm-project/vllm/tree/main/benchmarks) — Performance benchmarking for vLLM
 
-## 📞 Support
+## License
 
-- **Issues**: [GitHub Issues](https://github.com/mlcommons/endpoints/issues)
-- **Discussions**: [GitHub Discussions](https://github.com/mlcommons/endpoints/discussions)
-- **Documentation**: See [docs/](docs/) directory for guides
+Apache License 2.0 — see [LICENSE](LICENSE) for details.
diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md
index af32da1d..e4e2d3de 100644
--- a/docs/DEVELOPMENT.md
+++ b/docs/DEVELOPMENT.md
@@ -1,15 +1,14 @@
 # Development Guide
 
-This guide provides everything you need to contribute to the MLPerf Inference Endpoint Benchmarking System.
+This guide covers the development setup and workflow for the MLPerf Inference Endpoint Benchmarking System. For contribution guidelines, see [CONTRIBUTING.md](../CONTRIBUTING.md).
 
 ## Getting Started
 
 ### Prerequisites
 
-- **Python**: 3.12+ (Python 3.12 is recommended for optimal performance)
+- **Python**: 3.12+ (3.12 recommended)
 - **Git**: Latest version
-- **Virtual Environment**: Python venv or conda
-- **IDE**: VS Code, PyCharm, or your preferred editor
+- **OS**: Linux or macOS (Windows is not supported)
 
 ### Development Environment Setup
 
@@ -23,7 +22,7 @@ git remote add upstream https://github.com/mlcommons/endpoints.git
 
 # 3. Create virtual environment (Python 3.12+ required)
 python3.12 -m venv venv
-source venv/bin/activate  # On Windows: venv\Scripts\activate
+source venv/bin/activate
 
 # 4. Install development dependencies
 pip install -e ".[dev,test]"
@@ -61,8 +60,8 @@ endpoints/
 ├── tests/                      # Test suite
 │   ├── unit/                   # Unit tests
 │   ├── integration/            # Integration tests
-│   ├── performance/            # Performance tests
-│   └── datasets/               # Test datasets
+│   ├── performance/            # Performance benchmarks
+│   └── datasets/               # Test data (dummy_1k.jsonl, squad_pruned/)
 ├── docs/                       # Documentation
 ├── examples/                   # Usage examples
 └── scripts/                    # Utility scripts
@@ -73,114 +72,89 @@ endpoints/
 ### Running Tests
 
 ```bash
-# Run all tests
+# All tests (excludes slow/performance)
 pytest
 
-# Run with coverage
+# Unit tests only
+pytest -m unit
+
+# Integration tests
+pytest -m integration
+
+# Single file with verbose output
+pytest -xvs tests/unit/path/to/test_file.py
+
+# With coverage
 pytest --cov=src --cov-report=html
+```
 
-# Run specific test categories
-pytest -m unit          # Unit tests only
-pytest -m integration   # Integration tests only
-pytest -m performance   # Performance tests only (no timeout)
+### Test Markers
 
-# Run tests in parallel
-pytest -n auto
+Every test function **must** have a marker:
 
-# Run tests with verbose output
-pytest -v
+```python
+import pytest
 
-# Run specific test file
-pytest tests/unit/test_core_types.py
+@pytest.mark.unit
+def test_something():
+    ...
 
-# Run with output to file (recommended)
-pytest -v 2>&1 | tee test_results.log
+@pytest.mark.unit
+@pytest.mark.asyncio  # strict mode is configured globally in pyproject.toml
+async def test_async_something():
+    ...
 ```
 
-### Test Structure
+Available markers: `unit`, `integration`, `slow`, `performance`, `run_explicitly`
 
-- **Unit Tests** (`tests/unit/`): Test individual components in isolation
-- **Integration Tests** (`tests/integration/`): Test component interactions with real servers
-- **Performance Tests** (`tests/performance/`): Test performance characteristics (marked with @pytest.mark.performance, no timeout)
-- **Test Datasets** (`tests/datasets/`): Sample datasets for testing (dummy_1k.jsonl, squad_pruned/)
+### Key Fixtures
 
-### Writing Tests
+Defined in `tests/conftest.py` — use these instead of mocking:
 
-```python
-import pytest
-from inference_endpoint.core.types import Query
-
-class TestQuery:
-    @pytest.mark.unit
-    def test_query_creation(self):
-        """Test creating a basic query."""
-        query = Query(data={"prompt": "Test", "model": "test-model"})
-        assert query.data["prompt"] == "Test"
-        assert query.data["model"] == "test-model"
-
-    @pytest.mark.unit
-    @pytest.mark.asyncio(mode="strict")
-    async def test_async_operation(self):
-        """Test async operations."""
-        # Your async test here
-        pass
-```
+- `mock_http_echo_server` — real HTTP echo server on dynamic port
+- `mock_http_oracle_server` — dataset-driven response server
+- `dummy_dataset` — in-memory test dataset
+- `events_db` — pre-populated SQLite events database
+
+### Coverage
+
+Target **>90% coverage** for all new code.
 
 ## Code Quality
 
 ### Pre-commit Hooks
 
-The project uses pre-commit hooks to ensure code quality.
-
-Hooks that run automatically on commit:
+All of these run automatically on commit:
 
 - trailing-whitespace, end-of-file-fixer, check-yaml, check-merge-conflict, debug-statements
 - `ruff` (lint + autofix) and `ruff-format`
 - `mypy` type checking
 - `prettier` for YAML/JSON/Markdown
-- License header enforcement (Apache 2.0 SPDX header required on all Python files, added by `scripts/add_license_header.py`)
+- License header enforcement
+- YAML template validation and regeneration
 
-**Always run `pre-commit run --all-files` before committing.**
+**IMPORTANT: Always run `pre-commit run --all-files` before every commit.** Hooks may modify files. If files are modified, stage the changes and commit once.
 
 ```bash
-# Install hooks (done during setup)
-pre-commit install
-
-# Run all hooks on staged files
-pre-commit run
-
-# Run all hooks on all files
+# Run all hooks
 pre-commit run --all-files
-```
-
-### Code Formatting
-
-Configuration: `ruff` (line-length 88, target Python 3.12), `ruff-format` (double quotes, space indent).
 
-```bash
-# Format code with ruff
-ruff format src/ tests/
-
-# Check formatting without changing files
-ruff format --check src/ tests/
+# Install hooks (done during setup)
+pre-commit install
 ```
 
-### Linting
-
-```bash
-# Run ruff linter
-ruff check src/ tests/
+### Code Style
 
-# Run mypy for type checking
-mypy src/
-
-# Run all quality checks
-pre-commit run --all-files
-```
+- **Formatter/Linter**: `ruff` (line-length 88, target Python 3.12)
+- **Type checking**: `mypy`
+- **Formatting**: `ruff-format` (double quotes, space indent)
+- **License headers**: Required on all Python files (auto-added by pre-commit)
+- **Commit messages**: [Conventional commits](https://www.conventionalcommits.org/) — `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `perf:`
+- **Comments**: Only where the _why_ isn't obvious from the code
 
 ## Development Workflow
 
-### 1. Feature Development
+### Feature Development
 
 ```bash
 # Sync your fork with upstream before starting
@@ -189,88 +163,26 @@ git checkout main
 git merge upstream/main
 
 # Create a feature branch on your fork
-git checkout -b feature/your-feature-name
+git checkout -b feat/your-feature-name
 
 # Make changes and test
 pytest
 pre-commit run --all-files
 
 # Commit changes
-git add .
+git add <specific files>
 git commit -m "feat: add your feature description"
 
 # Push to your fork and open a PR against mlcommons/endpoints
-git push origin feature/your-feature-name
+git push origin feat/your-feature-name
 ```
 
-### 2. Component Development
-
-When developing a new component:
-
-1. **Create the component directory** in `src/inference_endpoint/`
-2. **Add `__init__.py`** with component description
-3. **Implement the component** following the established patterns
-4. **Add tests** in the corresponding `tests/unit/` directory
-5. **Update main package** `__init__.py` if needed
-6. **Add dependencies** to `pyproject.toml` under `[project.dependencies]` or `[project.optional-dependencies]`
-
-### 3. Testing Strategy
-
-- **Unit Tests**: >90% coverage required
-- **Integration Tests**: Test component interactions
-- **Performance Tests**: Ensure no performance regressions
-- **Documentation**: Update docs for new features
-
-## Documentation
-
-### Writing Documentation
-
-- **Code Comments**: Add comments only where the _why_ is not obvious from the code; avoid restating what the code does
-- **README Updates**: Update README.md for user-facing changes
-- **Examples**: Provide usage examples for new features
-
-## Performance Considerations
-
-### Development Guidelines
+### Branch Naming
 
-- **Async First**: Use async/await for I/O operations
-- **Memory Efficiency**: Minimize object creation in hot paths
-- **Profiling**: Use pytest-benchmark for performance testing
-- **Monitoring**: Add performance metrics for critical operations
-
-### Performance Testing
-
-```bash
-# Run performance tests
-pytest -m performance
-
-# Run benchmarks
-pytest --benchmark-only
-
-# Compare with previous runs
-pytest --benchmark-compare
 ```
-
-## Debugging
-
-### Common Issues
-
-1. **Import Errors**: Ensure `src/` is in Python path
-2. **Test Failures**: Check test data and mock objects
-3. **Performance Issues**: Use profiling tools to identify bottlenecks
-4. **Async Issues**: Ensure proper event loop handling
-
-### Debug Tools
-
-```bash
-# Run with debug logging
-inference-endpoint --verbose
-
-# Run tests with debug output
-pytest -s -v
-
-# Use Python debugger
-python -m pdb -m pytest test_file.py
+feat/short-description
+fix/short-description
+docs/short-description
 ```
 
 ## YAML Config Templates
@@ -297,89 +209,37 @@ Add dependencies to `pyproject.toml` (always pin to exact versions with `==`):
 - **Runtime dependencies**: `[project.dependencies]`
 - **Optional groups** (dev, test, etc.): `[project.optional-dependencies]`
 
-Install after updating:
+After adding a dependency, run `pip-audit` (included in `dev` extras) to verify it has no known vulnerabilities.
 
 ```bash
 pip install -e ".[dev,test]"
 ```
 
-## Troubleshooting
-
-### Common Problems
-
-**Pre-commit hooks failing:**
-
-```bash
-# Update pre-commit
-pre-commit autoupdate
-
-# Skip hooks temporarily
-git commit --no-verify
-```
-
-**Tests failing:**
+## Performance Considerations
 
-```bash
-# Clear Python cache
-find . -type d -name "__pycache__" -delete
-find . -type f -name "*.pyc" -delete
+Code in `load_generator/`, `endpoint_client/worker.py`, and `async_utils/transport/` is latency-critical. In these paths:
 
-# Reinstall package
-pip install -e .
-```
+- No `match` statements — use dict dispatch
+- Use `dataclass(slots=True)` or `msgspec.Struct` for frequently instantiated classes
+- Minimize async suspends
+- Use `msgspec` over `json`/`pydantic` for serialization
+- The HTTP client uses custom `ConnectionPool` with `httptools` parser — not `aiohttp`/`requests`
 
-**Import errors:**
+## Debugging
 
 ```bash
-# Check Python path
-python -c "import sys; print(sys.path)"
-
-# Ensure src is in path
-export PYTHONPATH="${PYTHONPATH}:$(pwd)/src"
-```
-
-## Contributing Guidelines
+# Run with verbose logging
+inference-endpoint -v benchmark offline ...
 
-### Pull Request Process
+# Run tests with stdout visible
+pytest -xvs tests/unit/path/to/test.py
 
-1. **Fork** `mlcommons/endpoints` on GitHub
-2. **Clone your fork** and add `upstream` as a remote (see [Development Environment Setup](#development-environment-setup))
-3. **Sync with upstream** (`git fetch upstream && git merge upstream/main`) before starting work
-4. **Create a feature branch** on your fork (`git checkout -b feature/your-feature-name`)
-5. **Make your changes** following the coding standards
-6. **Add tests** for new functionality
-7. **Update documentation** as needed
-8. **Run all checks** locally: `pytest` and `pre-commit run --all-files`
-9. **Push to your fork** and open a PR against `mlcommons/endpoints:main`
-10. **Address review comments** promptly
-
-### Commit Message Format
-
-Use conventional commit format:
-
-```
-type(scope): description
-
-feat(core): add query lifecycle management
-fix(api): resolve endpoint connection issue
-docs(readme): update installation instructions
-test(loadgen): add performance benchmarks
+# Use Python debugger
+python -m pdb -m pytest tests/unit/path/to/test.py
 ```
 
-Allowed types: `feat`, `fix`, `docs`, `test`, `chore`, `refactor`, `perf`, `ci`.
-
-### Code Review Checklist
-
-- [ ] Code follows style guidelines
-- [ ] Tests pass and coverage is adequate
-- [ ] Documentation is updated
-- [ ] Performance impact is considered
-- [ ] Security implications are reviewed
-- [ ] Error handling is appropriate
-
 ## Getting Help
 
 - **Issues**: [GitHub Issues](https://github.com/mlcommons/endpoints/issues)
-- **Discussions**: [GitHub Discussions](https://github.com/mlcommons/endpoints/discussions)
-- **Documentation**: Check this guide and project docs
-- **Team**: Reach out to the development team
+- **Project Board**: [Q2 Board](https://github.com/orgs/mlcommons/projects/57)
+- **Documentation**: See [docs/](.) directory for guides
diff --git a/pyproject.toml b/pyproject.toml
index 19fa129d..67dfc865 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -47,7 +47,7 @@ dependencies = [
     "transformers==5.4.0",
     "numpy==2.4.4",
     "datasets==4.8.4",
-    "Pillow==12.1.1",
+    "Pillow==12.2.0",
     "sentencepiece==0.2.1",
     "protobuf==7.34.1",
     "openai_harmony==0.0.8",
@@ -82,7 +82,7 @@ test = [
     # Includes optional dependencies for full test coverage
     "inference-endpoint[sql]",
     # Testing framework
-    "pytest==9.0.2",
+    "pytest==9.0.3",
     "pytest-asyncio==1.3.0",
     "pytest-cov==7.1.0",
     "pytest-benchmark==5.2.3",