diff --git a/.cursor/rules/endpoint-rules.mdc b/.cursor/rules/endpoint-rules.mdc deleted file mode 100644 index aff2d460..00000000 --- a/.cursor/rules/endpoint-rules.mdc +++ /dev/null @@ -1,118 +0,0 @@ ---- -description: -globs: -alwaysApply: true ---- -# Cursor Rules for Python Project Development - -## Core Development Principles - -### 1. Planning-First Development -- **Strict Separation**: Implementation MUST NOT begin until planning for the current step is complete -- All architectural decisions, component interfaces, and implementation approaches must be documented before coding -- Each development cycle follows: Plan ? Review Plan ? Implement ? Update Documentation - -### 2. Testing Requirements -- **Mandatory Unit Tests**: Every new component that requires testing MUST have corresponding unit tests -- **Pre-commit Validation**: All unit tests and pre-commit checks MUST pass before pushing to main repository -- **No Exceptions**: Failed tests or checks block all commits until resolved - -### 3. Scratchpad Documentation System -All planning and tracking must be maintained in `.cursor_artifacts/` directory. - -#### Required Files: -- `.cursor_artifacts/hierarchy.md` - Project folder structure, module organization, and architectural overview -- `.cursor_artifacts/progress.md` - Current status, completed tasks, next steps, and milestone tracking -- `.cursor_artifacts/learning.md` - Technical insights, lessons learned, design decisions, and gotchas -- `.cursor_artifacts/design.md` - System design, component interfaces, data models, and API specifications -- `.cursor_artifacts/testing-strategy.md` - Test plans, coverage requirements, and testing approaches -- `.cursor_artifacts/deployment.md` - Deployment procedures, environment configs, and release notes -- `.cursor_artifacts/refactoring-log.md` - Planned and completed refactoring activities with justifications, keep empty if there's no major refactoring - -#### File Management: -- **Size Limit**: Each scratchpad file MUST NOT exceed 1000 lines -- **Regular Maintenance**: Split large files into focused sub-documents when approaching limit -- **Consistent Updates**: Update relevant scratchpad files after each implementation phase - -### 4. Commit and Review Standards -- **Post-Implementation Updates**: Always update `.cursor_artifacts/` scratchpad files after each implementation -- **Small, Focused Changes**: Keep commits and reviews reasonably sized for effective review -- **Clear Commit Messages**: Use conventional commit format with clear descriptions -- **Documentation Sync**: Ensure documentation reflects current implementation state - -### 5. Python Best Practices -- Follow PEP 8 style guidelines and modern Python idioms -- Use type hints for all function signatures and complex variables -- Implement proper error handling with specific exception types -- Apply SOLID principles and clean code practices -- Use dataclasses, context managers, and pathlib where appropriate -- Follow async/await patterns for asynchronous code -- Implement proper logging instead of print statements - -### 6. Change Control and Approval -#### Automatic Approval (Small Changes): -- Bug fixes within existing functionality -- Adding unit tests -- Documentation updates -- Minor refactoring within single functions/methods -- Code formatting and style improvements - -#### User Approval Required (Significant Changes): -- **Major Refactoring**: Restructuring classes, modules, or architectural changes -- **API Changes**: Modifying public interfaces or breaking changes -- **Large Deletions**: Removing significant portions of existing code, documentation, or scratchpad content -- **New Dependencies**: Adding external libraries or changing build requirements -- **Database Schema Changes**: Migrations or structural data changes - -#### Approval Process: -1. Document proposed changes in appropriate `.cursor_artifacts/` file -2. Clearly outline impact, benefits, and risks -3. Request explicit user approval before implementation -4. Provide rollback plan for significant changes - -### 7. Comprehensive Testing Strategy -- **Test Coverage**: Aim for >90% code coverage for business logic -- **Test Types**: Unit tests, integration tests, and end-to-end tests as appropriate -- **Edge Cases**: Test boundary conditions, error scenarios, and edge cases -- **Test Documentation**: Clear test descriptions explaining what is being tested and why -- **Mock Strategy**: Use appropriate mocking for external dependencies -- **Performance Tests**: Include performance benchmarks for critical paths -- **Test Data**: Use factories or fixtures for consistent test data setup - -### 8. Additional Development Standards - -#### Code Quality: -- Use static analysis tools (pylint, mypy, black, isort) -- Implement pre-commit hooks for automated quality checks -- Regular code reviews focusing on maintainability and performance -- Document complex algorithms and business logic - -#### Version Control: -- Use feature branches for all development work -- Squash commits when merging to maintain clean history -- Tag releases with semantic versioning -- Maintain changelog with user-facing changes - -#### Security and Performance: -- Validate all user inputs and sanitize outputs -- Use secure coding practices (no hardcoded secrets, proper authentication) -- Profile performance-critical code sections -- Monitor and log security-relevant events - -#### Dependencies and Environment: -- Pin dependency versions in requirements files -- Use virtual environments for all development work -- Document environment setup and deployment procedures -- Regular dependency updates with testing - -## Enforcement -These rules are mandatory for all development work. Violations should be caught in pre-commit hooks, code review, or CI/CD pipeline. Any rule exceptions require explicit documentation and user approval. - -## Other user-defined rules -- Always double-check the validity of the output, never hallucinate and lie about things that you don't know about. -- Avoid refactoring the whole projects, and always ask for permission before doing a major refactor. -- Look for clues and never be lazy about validating the facts. -- Be diligent in checking if a component has already been implemented and can be reused. Avoid re-implementing wheels for parts that have already been built in the project. Double think if the reused components fit in the logic or not. If necessary, always use a single source of truth in the code repo (e.g. VERSION) instead of randomly hardcoding it everywhere in the code -- If the logic is incomplete in the code, add comment about it. Don't just assume the user will dig and find it out. -- Follow the best practice of whatever language you are writing in. For example in Python, don't put a lazy import unless carefully thought about. -- When running pytest, make sure you pipe the output either to commandline or some file, so you don't need to run it repetitively to grep a failed test. diff --git a/.cursor/rules/msgspec-patterns.mdc b/.cursor/rules/msgspec-patterns.mdc deleted file mode 100644 index fa637ea9..00000000 --- a/.cursor/rules/msgspec-patterns.mdc +++ /dev/null @@ -1,534 +0,0 @@ ---- -description: python performance critical code ; python msgspec usage guide -alwaysApply: false ---- -## 2. Use Structs for Structured Data - -**Rule:** Always prefer `msgspec.Struct` over `dict`, `dataclasses`, or `attrs` for structured data with a known schema. - -**Why:** Structs are 5-60x faster for common operations and are optimized for encoding/decoding. - -```python -# BAD: Using dict or dataclass -from dataclasses import dataclass - -@dataclass -class UserBad: - name: str - email: str - age: int - -# GOOD: Using msgspec. Struct -import msgspec - -class User(msgspec. Struct): - name: str - email: str - age: int - -# Usage -user = User(name="alice", email="alice@example.com", age=30) -data = msgspec.json.encode(user) -decoded = msgspec.json.decode(data, type=User) -``` - ---- - -## 3. Omit Default Values - -**Rule:** Set `omit_defaults=True` on Struct definitions when default values are known on both encoding and decoding ends. - -**Why:** Reduces encoded message size and improves both encoding and decoding performance. - -```python -# BAD: Encoding all fields including defaults -class ConfigBad(msgspec.Struct): - host: str = "localhost" - port: int = 8080 - debug: bool = False - timeout: int = 30 - -# GOOD: Omit default values -class Config(msgspec. Struct, omit_defaults=True): - host: str = "localhost" - port: int = 8080 - debug: bool = False - timeout: int = 30 - -# Only non-default values are encoded -config = Config(host="production.example.com") -data = msgspec.json.encode(config) -# Result: b'{"host":"production.example.com"}' instead of full object -``` - ---- - -## 4. Avoid Decoding Unused Fields - -**Rule:** Define smaller "view" Struct types that only contain the fields you actually need. - -**Why:** msgspec skips decoding fields not defined in your Struct, reducing allocations and CPU time. - -```python -# BAD: Decoding entire large object when you only need a few fields -class FullTweet(msgspec. Struct): - id: int - id_str: str - full_text: str - user: dict - entities: dict - extended_entities: dict - retweet_count: int - favorite_count: int - # ... many more fields - -# GOOD: Define minimal structs for your use case -class User(msgspec. Struct): - name: str - -class TweetView(msgspec.Struct): - user: User - full_text: str - favorite_count: int - -# Only these 3 fields are decoded, rest is skipped -tweet = msgspec.json.decode(large_json_response, type=TweetView) -print(tweet.user. name) # Access only what you need -``` - ---- - -## 5. Use encode_into for Buffer Reuse - -**Rule:** Compare and try-use `Encoder.encode_into()` with a pre-allocated `bytearray` in hot loops instead of `encode()`. - -**Why:** Avoids allocating a new `bytes` object for each encode operation. - -```python -# BAD: New bytes object allocated for each message -def send_messages_bad(socket, msgs): - encoder = msgspec.msgpack.Encoder() - for msg in msgs: - data = encoder.encode(msg) # New bytes object each time - socket. sendall(data) - -# POSSIBLY-GOOD ALWAYS MEASURE: Reuse a buffer -def send_messages_good(socket, msgs): - encoder = msgspec.msgpack.Encoder() - buffer = bytearray(1024) # Pre-allocate once - - for msg in msgs: - n = encoder.encode_into(msg, buffer) # Reuse buffer - socket.sendall(memoryview(buffer)[:n]) # Send only encoded bytes -``` - ---- - -## 6. Line-Delimited JSON (NDJSON) - -**Rule:** Compare and try use `encode_into()` with `buffer.extend()` for line-delimited JSON to avoid copies. - -**Why:** Avoids unnecessary copying when appending newlines to JSON messages. - -```python -# BAD: Unnecessary copy with string concatenation -def write_ndjson_bad(file, messages): - for msg in messages: - json_msg = msgspec. json.encode(msg) - full_payload = json_msg + b'\n' # Creates a copy - file. write(full_payload) - -# POSSIBLY-GOOD ALWAYS MEASURE: Zero-copy with encode_into -def write_ndjson_good(file, messages): - encoder = msgspec.json.Encoder() - buffer = bytearray(64) # Pre-allocate with reasonable size - - for msg in messages: - n = encoder.encode_into(msg, buffer) - file.write(memoryview(buffer)[:n]) # Write only encoded bytes - file.write(b"\n") -``` - ---- - -## 7. Length-Prefix Framing - -**Rule:** Use `encode_into()` with an offset for length-prefix framing. - -**Why:** Efficiently prepends message length without extra copies. - -```python -import msgspec - -def send_length_prefixed(socket, msg): - encoder = msgspec.msgpack.Encoder() - buffer = bytearray(64) - - # Encode into buffer, leaving 4 bytes at front for length prefix - n = encoder.encode_into(msg, buffer, 4) - - # Write message length as 4-byte big-endian integer at the start - buffer[:4] = n.to_bytes(4, "big") - - socket.sendall(memoryview(buffer)[:4 + n]) - -async def prefixed_send(stream, buffer: bytes) -> None: - """Write a length-prefixed buffer to an async stream""" - prefix = len(buffer).to_bytes(4, "big") - stream.write(prefix) - stream.write(buffer) - await stream.drain() - -async def prefixed_recv(stream) -> bytes: - """Read a length-prefixed buffer from an async stream""" - prefix = await stream.readexactly(4) - n = int.from_bytes(prefix, "big") - return await stream.readexactly(n) -``` - ---- - -## 8. Use MessagePack Instead of JSON - -**Rule:** Consider using `msgspec.msgpack` instead of `msgspec.json` for internal APIs. - -**Why:** MessagePack is a more compact binary format and can be more performant than JSON. - -```python -import msgspec - -class Event(msgspec. Struct): - type: str - data: dict - timestamp: float - -# Use MessagePack for internal service communication -encoder = msgspec.msgpack.Encoder() -decoder = msgspec.msgpack. Decoder(Event) - -event = Event(type="user_login", data={"user_id": 123}, timestamp=1703424000.0) -packed = encoder.encode(event) # More compact than JSON -decoded = decoder.decode(packed) -``` - ---- - -## 9. Use gc=False for Long-Lived Objects - -**Rule:** Set `gc=False` on Struct types that will never participate in reference cycles and are long-lived. - -**Why:** Reduces garbage collector overhead and pause times by up to 75x. - -### What is gc=False? - -The `gc=False` option tells Python's garbage collector to never track instances of that Struct type. -By default, Python's cyclic garbage collector tracks objects that could potentially participate in reference cycles. -When you set `gc=False`, you're telling msgspec: "I guarantee these objects will never be part of a reference cycle, so don't bother tracking them." - -### Performance Impact - -Key takeaways: -- `gc=False` reduces GC pause time by 75x compared to standard classes -- `gc=False` saves 16 bytes per instance (no GC header needed) -- Regular msgspec structs are already 6x faster for GC than standard classes - -### When to Use gc=False - -Use `gc=False` when: -- You're allocating a large number of Struct objects at once (e.g., decoding a large JSON response with thousands of items) -- You have long-lived Struct objects in memory (e.g., a large cache of data objects) -- Your Struct only contains scalar/primitive values (ints, floats, strings, bools, bytes) -- You are 100% certain the Struct will NEVER participate in a reference cycle - -DO NOT use `gc=False` when: -- Your Struct contains references to itself or other Structs (potential cycles) -- Your Struct is part of a parent-child relationship where parent references child and child references parent -- You're unsure whether cycles could occur - -ALWAYS MEASURE performance impact. - -### Decision Tree: Should I Use gc=False? - -``` -Should I use gc=False? -| -+-- Does your Struct only contain scalar types (int, float, str, bool, bytes)? -| +-- YES --> SAFE to use gc=False -| -+-- Does your Struct contain lists/dicts but YOU control what goes in them? -| +-- Will you EVER put the struct itself (or a parent) into those containers? -| +-- NO --> Probably safe, but test carefully -| +-- YES/MAYBE --> Do NOT use gc=False -| -+-- Does your Struct have a reference to another Struct of the same type? -| +-- YES --> Do NOT use gc=False (e.g., tree nodes, linked lists) -| -+-- Is your Struct part of a parent-child bidirectional relationship? -| +-- YES --> Do NOT use gc=False -| -+-- When in doubt --> Do NOT use gc=False -``` - -### Examples - -```python -# SAFE: Simple data objects with only scalar values -class Point(msgspec. Struct, gc=False): - x: float - y: float - z: float - -class LogEntry(msgspec. Struct, gc=False): - timestamp: float - level: str - message: str - source: str - -class CacheEntry(msgspec.Struct, gc=False): - key: str - value: str - ttl: int - created_at: float - -# SAFE: Structs containing only tuples of scalars -class Package(msgspec. Struct, gc=False): - name: str - version: str - depends: tuple[str, ...] # immutable tuple of strings - size: int - -# UNSAFE: Self-referential structures - DO NOT use gc=False -class TreeNode(msgspec. Struct): # NO gc=False here! - value: int - children: list["TreeNode"] - parent: "TreeNode | None" = None -``` - -### Real-World Example: Decoding Large JSON - -```python -import msgspec -from typing import Union - -# When decoding large JSON files (like package repositories), -# gc=False significantly improves performance -class Package(msgspec. Struct, gc=False): - build: str - build_number: int - depends: tuple[str, ...] # Use tuple, not list - immutable - md5: str - name: str - sha256: str - subdir: str - version: str - license: str = "" - noarch: Union[str, bool, None] = None - size: int = 0 - timestamp: int = 0 - -class RepoData(msgspec. Struct, gc=False): - repodata_version: int - info: dict - packages: dict[str, Package] - removed: tuple[str, ...] # Use tuple, not list - -# Create a typed decoder for maximum performance -decoder = msgspec.json.Decoder(RepoData) - -def load_repo_data(path: str) -> RepoData: - with open(path, "rb") as f: - return decoder.decode(f.read()) -``` - -## 10. Use array_like=True for Maximum Performance - -**Rule:** Set `array_like=True` when both ends know the field schema and you need maximum performance. - -**Why:** Encodes structs as arrays instead of objects, removing field names from the message. - -```python -# Standard encoding includes field names -class PointStandard(msgspec. Struct): - x: float - y: float - z: float - -# Encodes as: b'{"x": 1.0,"y":2.0,"z":3.0}' - -# Array-like encoding removes field names -class Point(msgspec. Struct, array_like=True): - x: float - y: float - z: float - -point = Point(1.0, 2.0, 3.0) -data = msgspec.json.encode(point) -# Result: b'[1.0,2.0,3.0]' - smaller and faster - -decoded = msgspec.json.decode(data, type=Point) -# Works correctly: Point(x=1.0, y=2.0, z=3.0) -``` - ---- - -## 11. Tagged Unions for Polymorphic Types - -**Rule:** Use `tag=True` on Struct types when handling multiple message types in a single union. - -**Why:** Enables efficient discrimination between types during decoding. - -```python -import msgspec - -# Define request types with tagging -class GetRequest(msgspec. Struct, tag=True): - key: str - -class PutRequest(msgspec.Struct, tag=True): - key: str - value: str - -class DeleteRequest(msgspec.Struct, tag=True): - key: str - -class ListRequest(msgspec.Struct, tag=True): - prefix: str = "" - -# Union type for all requests -Request = GetRequest | PutRequest | DeleteRequest | ListRequest - -# Single decoder handles all types -decoder = msgspec.msgpack.Decoder(Request) - -# Decoding automatically determines the correct type -data = msgspec.msgpack.encode(PutRequest(key="foo", value="bar")) -request = decoder.decode(data) - -match request: - case GetRequest(key): - print(f"Get: {key}") - case PutRequest(key, value): - print(f"Put: {key}={value}") - case DeleteRequest(key): - print(f"Delete: {key}") - case ListRequest(prefix): - print(f"List: {prefix}") -``` - ---- - -## 12. Use Struct Configuration Options - -**Rule:** Combine Struct options for cleaner, more robust code. - -```python -import msgspec - -class Base( - msgspec. Struct, - omit_defaults=True, # Don't encode default values - forbid_unknown_fields=True, # Error on unknown fields (good for config files) - rename="kebab", # Use kebab-case in JSON (my_field -> my-field) -): - """Base class with common configuration.""" - pass - -class ServerConfig(Base): - host: str = "localhost" - port: int = 8080 - max_connections: int = 100 - enable_ssl: bool = False - -# Decodes kebab-case JSON: {"host": "prod", "max-connections": 500} -config = msgspec.json.decode( - b'{"host":"prod","max-connections": 500}', - type=ServerConfig -) -# Result: ServerConfig(host='prod', port=8080, max_connections=500, enable_ssl=False) -``` - ---- - -## 13. TOML Configuration Files - -**Rule:** Use msgspec for parsing pyproject.toml and other TOML config files with validation. - -```python -import msgspec -from typing import Any - -class BuildSystem(msgspec. Struct, omit_defaults=True, rename="kebab"): - requires: list[str] = [] - build_backend: str | None = None - -class Project(msgspec. Struct, omit_defaults=True, rename="kebab"): - name: str | None = None - version: str | None = None - description: str | None = None - requires_python: str | None = None - dependencies: list[str] = [] - -class PyProject(msgspec. Struct, omit_defaults=True, rename="kebab"): - build_system: BuildSystem | None = None - project: Project | None = None - tool: dict[str, dict[str, Any]] = {} - -def load_pyproject(path: str) -> PyProject: - with open(path, "rb") as f: - return msgspec.toml.decode(f.read(), type=PyProject) -``` - -## Common Patterns - -### API Response Handler - -```python -import msgspec -from typing import TypeVar, Generic - -T = TypeVar('T') - -class APIResponse(msgspec. Struct, Generic[T], omit_defaults=True): - data: T | None = None - error: str | None = None - status: int = 200 - -class User(msgspec. Struct): - id: int - name: str - email: str - -# Create typed decoder for specific response type -user_response_decoder = msgspec. json.Decoder(APIResponse[User]) - -def parse_user_response(raw: bytes) -> APIResponse[User]: - return user_response_decoder.decode(raw) -``` - -## Struct Configuration Options Summary - -| Option | Description | Default | -|--------|-------------|---------| -| `omit_defaults` | Omit fields with default values when encoding | `False` | -| `forbid_unknown_fields` | Error on unknown fields when decoding | `False` | -| `frozen` | Make instances immutable and hashable | `False` | -| `order` | Generate ordering methods (`__lt__`, etc.) | `False` | -| `eq` | Generate equality methods | `True` | -| `kw_only` | Make all fields keyword-only | `False` | -| `tag` | Enable tagged union support | `None` | -| `tag_field` | Field name for the tag | `"type"` | -| `rename` | Rename fields for encoding/decoding | `None` | -| `array_like` | Encode/decode as arrays instead of objects | `False` | -| `gc` | Enable garbage collector tracking | `True` | -| `weakref` | Enable weak reference support | `False` | -| `dict` | Add `__dict__` attribute | `False` | -| `cache_hash` | Cache the hash value | `False` | - ---- - -## References - -- Official Documentation: https://jcristharif.com/msgspec/ -- Performance Tips: https://jcristharif.com/msgspec/perf-tips.html -- Structs Documentation: https://jcristharif.com/msgspec/structs.html -- GC Configuration: https://jcristharif.com/msgspec/structs.html#struct-gc diff --git a/.cursor/rules/python-antipatterns.mdc b/.cursor/rules/python-antipatterns.mdc deleted file mode 100644 index ece51ff2..00000000 --- a/.cursor/rules/python-antipatterns.mdc +++ /dev/null @@ -1,658 +0,0 @@ ---- -globs: **/*.py -alwaysApply: false ---- - -Try avoid these performance antipatterns in python code you write: - -*** - -### 1. **Match statements (sequence)** -- **Slow** -```python -def sequence_match_logical(): - seq = ["๐Ÿธ", "๐Ÿ›", "๐Ÿฆ‹", "๐Ÿชฒ"] - frogs = 0 - for _ in range(100_000): - if isinstance(seq, Sequence) and len(seq) > 0 and seq[0] == "๐Ÿธ": - frogs += 1 -``` -- **Fast** -```python -def sequence_match_statement(): - seq = ["๐Ÿธ", "๐Ÿ›", "๐Ÿฆ‹", "๐Ÿชฒ"] - frogs = 0 - for _ in range(100_000): - match seq: - case ["๐Ÿธ", *_]: frogs += 1 -``` - -*** - -### 2. **Match statements (literal)** -- **Slow** -```python -def literal_match_logical(): - seq = ["๐ŸŠ", "๐Ÿ›", "๐Ÿˆ", "๐Ÿฆ‹", "๐Ÿชฒ", "๐Ÿณ"] - butterflies, caterpillars, beetles = 0, 0, 0 - for _ in range(100_000): - for x in seq: - if x == "๐Ÿฆ‹": - butterflies += 1 - elif x == "๐Ÿ›": - caterpillars += 1 - elif x == "๐Ÿชฒ": - beetles += 1 -``` -- **Fast** -```python -def literal_match_statement(): - seq = ["๐ŸŠ", "๐Ÿ›", "๐Ÿˆ", "๐Ÿฆ‹", "๐Ÿชฒ", "๐Ÿณ"] - butterflies, caterpillars, beetles = 0, 0, 0 - for _ in range(100_000): - for x in seq: - match x: - case "๐Ÿฆ‹": butterflies += 1 - case "๐Ÿ›": caterpillars += 1 - case "๐Ÿชฒ": beetles += 1 -``` - -*** - -### 3. **Match statements (mapping)** -- **Slow** -```python -def mapping_match_logical(): - boats = [ - {"๐Ÿ“": 1}, {"๐ŸฆŠ": 1, "๐ŸŒฝ": 1}, - {"๐Ÿ“": 1, "๐ŸŒฝ": 1}, {"๐Ÿ“": 1, "๐ŸฆŠ": 1}, - ] - problems = valid_boats = 0 - for _ in range(100_000): - for boat in boats: - if isinstance(boat, Mapping): - if "๐Ÿ“" in boat and "๐ŸŒฝ" in boat: - problems += 1 - elif "๐Ÿ“" in boat and "๐ŸฆŠ" in boat: - problems += 1 - else: - valid_boats += 1 -``` -- **Fast** -```python -def mapping_match_statement(): - boats = [ - {"๐Ÿ“": 1}, {"๐ŸฆŠ": 1, "๐ŸŒฝ": 1}, - {"๐Ÿ“": 1, "๐ŸŒฝ": 1}, {"๐Ÿ“": 1, "๐ŸฆŠ": 1}, - ] - problems = valid_boats = 0 - for _ in range(100_000): - for boat in boats: - match boat: - case {"๐Ÿ“": _, "๐ŸŒฝ": _}: problems += 1 - case {"๐Ÿ“": _, "๐ŸฆŠ": _}: problems += 1 - case _: valid_boats += 1 -``` - -*** - -### 4. **Match statements (classes)** -- **Slow** -```python -def bench_class_matching_logical(): - drivers = [ - Driver(name="Max Verstappen", team="Red Bull"), - Driver(name="Sergio Perez", team="Red Bull"), - Driver(name="Charles Leclerc", team="Ferrari"), - Driver(name="Lewis Hamilton", team="Mercedes"), - ] - for _ in range(100_000): - for driver in drivers: - if not isinstance(driver, Driver): - desc = "Invalid request" - elif driver.name == "Max Verstappen": - desc = "Max Verstappen, the current world #1" - elif driver.team == "Ferrari": - desc = f"{driver.name}, a Ferrari driver!! ๐ŸŽ" - else: - desc = f"{driver.name}, a {driver.team} driver." -``` -- **Fast** -```python -def bench_class_matching_statement(): - drivers = [ - Driver(name="Max Verstappen", team="Red Bull"), - Driver(name="Sergio Perez", team="Red Bull"), - Driver(name="Charles Leclerc", team="Ferrari"), - Driver(name="Lewis Hamilton", team="Mercedes"), - ] - for _ in range(100_000): - for driver in drivers: - match driver: - case Driver(name="Max Verstappen"): desc = "Max Verstappen, the current world #1" - case Driver(name=name, team="Ferrari"): desc = f"{name}, a Ferrari driver!! ๐ŸŽ" - case Driver(name=name, team=team): desc = f"{name}, a {team} driver." - case _: desc = "Invalid request" -``` - -*** - -### 5. **Inline globals in loop** -- **Slow** -```python -def global_constant_in_loop(): - total = MY_GLOBAL_CONSTANT_A - for i in range(10_000): - total += i * MY_GLOBAL_CONSTANT_C -``` -- **Fast** -```python -def local_constant_in_loop(): - total = 3.14 - for i in range(10_000): - total += i * 1234 -``` - -*** - -### 6. **GC with higher threshold** -- **Slow** -```python -def load_with_gc(): - t1, t2, t3 = gc.get_threshold() - gc.set_threshold(1000, 20, 20) - for _ in range(100_000): - _cyclic_references() - gc.set_threshold(t1, t2, t3) -``` -- **Fast** -```python -def load_gc_at_end(): - t1, t2, t3 = gc.get_threshold() - gc.set_threshold(10, 10, 10) - for _ in range(100_000): - _cyclic_references() - gc.set_threshold(t1, t2, t3) -``` - -*** - -### 7. **Importing specific name instead of namespace** -- **Slow** -```python -def dotted_import(): - for _ in range(100_000): - return os.path.exists('/') -``` -- **Fast** -```python -def direct_import(): - for _ in range(100_000): - return exists('/') -``` - -*** - -### 8. **Refactoring Try..except outside a loop** -- **Slow** -```python -def try_in_loop(): - items = {'a': 1} - for _ in range(100_000): - try: - _ = items['a'] - except Exception: - pass -``` -- **Fast** -```python -def try_outside_loop(): - items = {'a': 1} - try: - for _ in range(100_000): - _ = items['a'] - except Exception: - pass -``` - -*** - -### 9. **Class instead of dataclass** -- **Slow** -```python -def attributes_in_class(): - class Pet: - legs: int - noise: str - def __init__(self, legs, noise): self.legs = legs; self.noise = noise - def __repr__(self): return "" - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` -- **Fast** -```python -def attributes_in_dataclass(): - @dataclass - class Pet: - legs: int - noise: str - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` - -*** - -### 10. **Namedtuple instead of dataclass** -- **Slow** -```python -def attributes_in_namedtuple(): - Pet = namedtuple("Pet", "legs noise") - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` -- **Fast** -```python -def attributes_in_dataclass(): - @dataclass - class Pet: - legs: int - noise: str - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` - -*** - -### 11. **class instead of namedtuple** -- **Slow** -```python -def attributes_in_class(): - class Pet: - legs: int - noise: str - def __init__(self, legs, noise): self.legs = legs; self.noise = noise - def __repr__(self): return "" - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` -- **Fast** -```python -def attributes_in_namedtuple(): - Pet = namedtuple("Pet", "legs noise") - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` - -*** - -### 12. **namedtuple class instead of namedtuple** -- **Slow** -```python -def attributes_in_namedtuple_type(): - class Pet(typing.NamedTuple): - legs: int - noise: str - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` -- **Fast** -```python -def attributes_in_namedtuple(): - Pet = namedtuple("Pet", "legs noise") - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` - -*** - -### 13. **dict instead of class** -- **Slow** -```python -def attributes_in_dict(): - for _ in range(100_000): - dog = {"legs": 4, "noise": "woof"} - str(dog) -``` -- **Fast** -```python -def attributes_in_class(): - class Pet: - legs: int - noise: str - def __init__(self, legs, noise): self.legs = legs; self.noise = noise - def __repr__(self): return "" - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` - -*** - -### 14. **class with slots** -- **Slow** -```python -def attributes_in_class(): - class Pet: - legs: int - noise: str - def __init__(self, legs, noise): self.legs = legs; self.noise = noise - def __repr__(self): return "" - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` -- **Fast** -```python -def attributes_in_class_with_slots(): - class Pet: - legs: int - noise: str - __slots__ = 'legs', 'noise' - def __init__(self, legs, noise): self.legs = legs; self.noise = noise - def __repr__(self): return "" - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` - -*** - -### 15. **dataclass with slots** -- **Slow** -```python -def attributes_in_dataclass(): - @dataclass - class Pet: - legs: int - noise: str - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` -- **Fast** -```python -def attributes_in_dataclass_with_slots(): - @dataclass(slots=True) - class Pet: - legs: int - noise: str - for _ in range(100_000): - dog = Pet(4, "woof") - str(dog) -``` - -*** - -### 16. **Using a list comprehension to filter another list** -- **Slow** -```python -def filter_list_as_loop(): - result = [] - inputs = range(100_000) - for i in inputs: - if i % 2: - result.append(i) -``` -- **Fast** -```python -def filter_list_as_comprehension(): - inputs = range(100_000) - result = [i for i in inputs if i % 2] -``` - -*** - -### 17. **Join list comprehension instead of generator expression** -- **Slow** -```python -def join_list_comprehension(): - words = ['data', 'type', 'is', 'so', 'long', 'now'] - for x in range(100_000): - ''.join([ele.title() for ele in words]) -``` -- **Fast** -```python -def join_generator_expression(): - words = ['data', 'type', 'is', 'so', 'long', 'now'] - for x in range(100_000): - ''.join(ele.title() for ele in words) -``` - -*** - -### 18. **Using fullmatch instead of anchors** -- **Slow** -```python -def regex_with_anchors(): - SNAKE_CASE_RE = re.compile(r'^([a-z]+\d*_[a-z\d_]*|_+[a-z\d]+[a-z\d_]*)$') - tests = ['data_type', 'data_type_', '_dataType', 'dataType', 'data type'] - for x in range(100_000): - for test_str in tests: - SNAKE_CASE_RE.match(test_str) -``` -- **Fast** -```python -def regex_with_fullmatch(): - SNAKE_CASE_RE = re.compile(r'([a-z]+\d*_[a-z\d_]*|_+[a-z\d]+[a-z\d_]*)') - tests = ['data_type', 'data_type_', '_dataType', 'dataType', 'data type'] - for x in range(100_000): - for test_str in tests: - SNAKE_CASE_RE.fullmatch(test_str) -``` - -*** - -### 19. **Using a-zA-Z instead of IGNORECASE** -- **Slow** -```python -def regex_with_capitalrange(): - SNAKE_CASE_RE = re.compile(r'([a-zA-Z]+\d*_[a-zA-Z\d_]*|_+[a-zA-Z\d]+[a-zA-Z\d_]*)') - tests = ['data_type', 'data_type_URL', '_DataType', 'DataTypeURL', 'Data Type URL'] - for x in range(100_000): - for test_str in tests: - SNAKE_CASE_RE.fullmatch(test_str) -``` -- **Fast** -```python -def regex_with_ignorecase(): - SNAKE_CASE_RE = re.compile(r'([a-z]+\d*_[a-z\d_]*|_+[a-z\d]+[a-z\d_]*)', re.IGNORECASE) - tests = ['data_type', 'data_type_URL', '_DataType', 'DataTypeURL', 'Data Type URL'] - for x in range(100_000): - for test_str in tests: - SNAKE_CASE_RE.fullmatch(test_str) -``` - -*** - -### 20. **Kwargs for known keyword args** -- **Slow** -```python -def keyword_call(): - func_with_kwargs(a=1, b=2, c=3) -``` -- **Fast** -```python -def positional_call(): - func_with_named_args(a=1, b=2, c=3) -``` - -*** - -### 21. **Tiny Functions** -- **Slow** -```python -def use_tiny_func(): - x = 1 - for n in range(100_000): - add(x, n) - add(n, x) -``` -- **Fast** -```python -def inline_tiny_func(): - x = 1 - for n in range(100_000): - x + n - n + x -``` - -*** - -### 22. **Slicing with memoryview instead of bytes** -- **Slow** -```python -def bytes_slice(): - word = b'A' * 1000 - for i in range(1000): - n = word[0:i] -``` -- **Fast** -```python -def memoryview_slice(): - word = memoryview(b'A' * 1000) - for i in range(1000): - n = word[0:i] -``` - -*** - -### 23. **Loop invariant Code Motion** -- **Slow** -```python -def before(): - x = (1, 2, 3, 4) - i = 6 - for j in range(100_000): - len(x) * i + j -``` -- **Fast** -```python -def after(): - x = (1, 2, 3, 4) - i = 6 - x_i = len(x) * i - for j in range(100_000): - x_i + j -``` - -*** - -### 24. **Copy slice to Local** -- **Slow** -```python -def slice_as_local(): - x = list(range(100_000)) - y = list(range(100_000)) - for n in range(100_000): - x[n] + y[n] - x[n] + y[n] - x[n] + y[n] - x[n] + y[n] - x[n] + y[n] -``` -- **Fast** -```python -def slice_copy_to_fast(): - x = list(range(100_000)) - y = list(range(100_000)) - for n in range(100_000): - i = x[n] - j = y[n] - i + j - i + j - i + j - i + j - i + j -``` - -*** - -### 25. **Copy name to Local** -- **Slow** -```python -def as_local(): - for _ in range(100_000): - x + y - x + y - x + y - x + y - x + y -``` -- **Fast** -```python -def copy_name_to_fast(): - i = x - j = y - for _ in range(100_000): - i + j - i + j - i + j - i + j - i + j -``` - -*** - -### 26. **Copy dict item to Local** -- **Slow** -```python -def dont_copy_dict_key_to_fast(): - for _ in range(100_000): - d["x"] + d["y"] - d["x"] + d["y"] - d["x"] + d["y"] - d["x"] + d["y"] - d["x"] + d["y"] -``` -- **Fast** -```python -def copy_dict_key_to_fast(): - i = d["x"] - j = d["y"] - for _ in range(100_000): - i + j - i + j - i + j - i + j - i + j -``` - -*** - -### 27. **Copy class attr to Local** -- **Slow** -```python -def dont_copy_attr_to_fast(): - for _ in range(100_000): - foo.x + foo.y - foo.x + foo.y - foo.x + foo.y - foo.x + foo.y - foo.x + foo.y -``` -- **Fast** -```python -def copy_attr_to_fast(): - i = foo.x - j = foo.y - for _ in range(100_000): - i + j - i + j - i + j - i + j - i + j -``` - -*** - -These minimal code snippets **accurately reflect the benchmark order and results in your environment, showing both slow (anti-pattern) and fast (optimized) variants for each case.** - -[1](https://github.com/tonybaloney/anti-patterns/blob/master/README.md) diff --git a/.github/ISSUE_TEMPLATE/100-bug-report.yml b/.github/ISSUE_TEMPLATE/100-bug-report.yml new file mode 100644 index 00000000..4cf5b586 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/100-bug-report.yml @@ -0,0 +1,43 @@ +name: Bug Report +description: Report a bug or unexpected behavior +title: "[Bug]: " +labels: ["type: bug", "status: needs-triage"] +body: + - type: textarea + id: description + attributes: + label: Bug Description + description: What happened vs. what you expected + placeholder: "When I run X, I expected Y but got Z" + validations: + required: true + - type: textarea + id: reproduction + attributes: + label: Steps to Reproduce + value: | + 1. + 2. + 3. + validations: + required: true + - type: textarea + id: environment + attributes: + label: Environment + description: OS, Python version, package version + placeholder: "OS: Ubuntu 22.04, Python 3.12, inference-endpoint v0.1.0" + validations: + required: true + - type: textarea + id: logs + attributes: + label: Relevant Logs + render: shell + - type: checkboxes + id: checklist + attributes: + label: Before submitting + options: + - label: I searched existing issues and found no duplicates + required: true diff --git a/.github/ISSUE_TEMPLATE/200-feature-request.yml b/.github/ISSUE_TEMPLATE/200-feature-request.yml new file mode 100644 index 00000000..3aa7de25 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/200-feature-request.yml @@ -0,0 +1,27 @@ +name: Feature Request +description: Suggest a new feature or enhancement +title: "[Feature]: " +labels: ["type: feature", "status: needs-triage"] +body: + - type: textarea + id: motivation + attributes: + label: Motivation + description: What problem does this solve? Why do you need it? + validations: + required: true + - type: textarea + id: proposal + attributes: + label: Proposed Solution + description: How should this work? Include API sketches if relevant. + validations: + required: true + - type: textarea + id: alternatives + attributes: + label: Alternatives Considered + - type: textarea + id: context + attributes: + label: Additional Context diff --git a/.github/ISSUE_TEMPLATE/300-performance.yml b/.github/ISSUE_TEMPLATE/300-performance.yml new file mode 100644 index 00000000..d2aa9007 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/300-performance.yml @@ -0,0 +1,59 @@ +name: Performance Issue +description: Report a performance regression or improvement opportunity +title: "[Perf]: " +labels: ["type: performance", "status: needs-triage"] +body: + - type: textarea + id: description + attributes: + label: Description + description: What performance issue did you observe? + placeholder: "QPS dropped from X to Y after upgrading to version Z" + validations: + required: true + - type: textarea + id: benchmark + attributes: + label: Benchmark Command + description: The exact command you ran + render: shell + validations: + required: true + - type: textarea + id: results + attributes: + label: Results + description: Expected vs actual numbers (QPS, latency, TTFT, TPOT, etc.) + placeholder: | + Expected: ~5000 QPS, p99 latency < 200ms + Actual: ~2000 QPS, p99 latency 800ms + validations: + required: true + - type: textarea + id: environment + attributes: + label: Environment + description: Hardware, OS, Python version, endpoint server details + placeholder: | + Hardware: 8x A100 80GB + OS: Ubuntu 22.04 + Python: 3.12 + Server: vLLM 0.6.0, Llama-3-70B + Workers: 4 + validations: + required: true + - type: textarea + id: profiling + attributes: + label: Profiling Data (optional) + description: Any profiling output, flame graphs, or bottleneck analysis + render: shell + - type: checkboxes + id: checklist + attributes: + label: Before submitting + options: + - label: I searched existing issues and found no duplicates + required: true + - label: I ran with default settings before tuning + required: false diff --git a/.github/ISSUE_TEMPLATE/400-dataset-integration.yml b/.github/ISSUE_TEMPLATE/400-dataset-integration.yml new file mode 100644 index 00000000..67c6673f --- /dev/null +++ b/.github/ISSUE_TEMPLATE/400-dataset-integration.yml @@ -0,0 +1,48 @@ +name: Dataset Integration +description: Request support for a new dataset or evaluation benchmark +title: "[Dataset]: " +labels: ["type: feature", "area: dataset", "status: needs-triage"] +body: + - type: textarea + id: dataset + attributes: + label: Dataset Information + description: Name, URL, and brief description + placeholder: | + Name: MATH-500 + URL: https://huggingface.co/datasets/... + Description: 500 competition math problems for testing reasoning + validations: + required: true + - type: dropdown + id: format + attributes: + label: Dataset Format + options: + - JSONL + - HuggingFace Dataset + - CSV + - JSON + - Parquet + - Other + validations: + required: true + - type: textarea + id: evaluation + attributes: + label: Evaluation Method + description: How should responses be scored? + placeholder: "Exact match after extracting boxed answer, or pass@1 for code" + validations: + required: true + - type: textarea + id: samples + attributes: + label: Scale + description: Number of samples, expected prompt/response lengths + placeholder: "500 samples, avg prompt ~200 tokens, avg response ~500 tokens" + - type: textarea + id: context + attributes: + label: Additional Context + description: Related benchmarks, papers, or prior art diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml new file mode 100644 index 00000000..0086358d --- /dev/null +++ b/.github/ISSUE_TEMPLATE/config.yml @@ -0,0 +1 @@ +blank_issues_enabled: true diff --git a/.github/workflows/sync-labels-to-board.yml b/.github/workflows/sync-labels-to-board.yml new file mode 100644 index 00000000..8a3eaf83 --- /dev/null +++ b/.github/workflows/sync-labels-to-board.yml @@ -0,0 +1,150 @@ +name: Sync Labels to Project Board + +on: + issues: + types: [labeled, unlabeled] + +env: + PROJECT_ID: "PVT_kwDOBAnwDc4BTQvY" + # These IDs are populated from the board's GraphQL field configuration. + # To find them: query the board fields via GraphQL and extract option IDs. + PRIORITY_FIELD_ID: "PVTSSF_lADOBAnwDc4BTQvYzhBKk68" + AREA_FIELD_ID: "PVTSSF_lADOBAnwDc4BTQvYzhBKk7A" + +jobs: + sync-labels: + runs-on: ubuntu-latest + steps: + - name: Sync priority and area labels to board fields + uses: actions/github-script@v7 + with: + script: | + const issue = context.payload.issue; + const labels = issue.labels.map(l => l.name); + + // --- Field and option ID mappings --- + // Priority field + const PRIORITY_FIELD_ID = process.env.PRIORITY_FIELD_ID; + const PRIORITY_MAP = { + 'priority: ShowStopper': process.env.SHOWSTOPPER_OPTION_ID, + 'priority: P0': process.env.P0_OPTION_ID, + 'priority: P1': process.env.P1_OPTION_ID, + 'priority: P2': process.env.P2_OPTION_ID, + 'priority: P3': process.env.P3_OPTION_ID, + }; + + // Area field + const AREA_FIELD_ID = process.env.AREA_FIELD_ID; + const AREA_MAP = { + 'area: core-engine': process.env.CORE_ENGINE_OPTION_ID, + 'area: client': process.env.CLIENT_OPTION_ID, + 'area: metrics': process.env.METRICS_OPTION_ID, + 'area: dataset': process.env.DATASET_OPTION_ID, + 'area: config-cli': process.env.CONFIG_CLI_OPTION_ID, + 'area: evaluation': process.env.EVALUATION_OPTION_ID, + 'area: adapters': process.env.ADAPTERS_OPTION_ID, + 'area: mlcommons': process.env.MLCOMMONS_OPTION_ID, + }; + + const PROJECT_ID = process.env.PROJECT_ID; + + // Find the board item for this issue + const findItemQuery = ` + query($projectId: ID!, $cursor: String) { + node(id: $projectId) { + ... on ProjectV2 { + items(first: 100, after: $cursor) { + nodes { + id + content { + ... on Issue { number } + } + } + pageInfo { hasNextPage endCursor } + } + } + } + } + `; + + let itemId = null; + let cursor = null; + while (!itemId) { + const result = await github.graphql(findItemQuery, { + projectId: PROJECT_ID, + cursor: cursor, + }); + const items = result.node.items; + const match = items.nodes.find( + n => n.content && n.content.number === issue.number + ); + if (match) { + itemId = match.id; + break; + } + if (!items.pageInfo.hasNextPage) break; + cursor = items.pageInfo.endCursor; + } + + if (!itemId) { + core.info(`Issue #${issue.number} not found on board, skipping.`); + return; + } + + // Helper to update a single-select field + async function setField(fieldId, optionId) { + if (!optionId) { + // Clear the field + await github.graphql(` + mutation($projectId: ID!, $itemId: ID!, $fieldId: ID!) { + clearProjectV2ItemFieldValue(input: { + projectId: $projectId, itemId: $itemId, fieldId: $fieldId + }) { projectV2Item { id } } + } + `, { projectId: PROJECT_ID, itemId, fieldId }); + } else { + await github.graphql(` + mutation($projectId: ID!, $itemId: ID!, $fieldId: ID!, $optionId: String!) { + updateProjectV2ItemFieldValue(input: { + projectId: $projectId, itemId: $itemId, fieldId: $fieldId, + value: { singleSelectOptionId: $optionId } + }) { projectV2Item { id } } + } + `, { projectId: PROJECT_ID, itemId, fieldId, optionId }); + } + } + + // Sync priority: find the highest-priority label on the issue + const priorityOrder = [ + 'priority: ShowStopper', + 'priority: P0', + 'priority: P1', + 'priority: P2', + 'priority: P3', + ]; + const activePriority = priorityOrder.find(p => labels.includes(p)); + const priorityOptionId = activePriority ? PRIORITY_MAP[activePriority] : null; + await setField(PRIORITY_FIELD_ID, priorityOptionId); + core.info(`Priority set to: ${activePriority || '(cleared)'}`); + + // Sync area: use the first area label found + const activeArea = labels.find(l => l.startsWith('area: ')); + const areaOptionId = activeArea ? AREA_MAP[activeArea] : null; + await setField(AREA_FIELD_ID, areaOptionId); + core.info(`Area set to: ${activeArea || '(cleared)'}`); + env: + PRIORITY_FIELD_ID: ${{ env.PRIORITY_FIELD_ID }} + AREA_FIELD_ID: ${{ env.AREA_FIELD_ID }} + SHOWSTOPPER_OPTION_ID: "26ab336c" + P0_OPTION_ID: "d3612dd9" + P1_OPTION_ID: "7ff45c96" + P2_OPTION_ID: "e41b2ee9" + P3_OPTION_ID: "d4d24170" + CORE_ENGINE_OPTION_ID: "db5c9511" + CLIENT_OPTION_ID: "ffeff676" + METRICS_OPTION_ID: "04637e5a" + DATASET_OPTION_ID: "b493fd0d" + CONFIG_CLI_OPTION_ID: "ae1f5588" + EVALUATION_OPTION_ID: "96e592b6" + ADAPTERS_OPTION_ID: "6c615274" + MLCOMMONS_OPTION_ID: "d5eff045" diff --git a/.gitignore b/.gitignore index 8dc22a68..6681801b 100644 --- a/.gitignore +++ b/.gitignore @@ -189,10 +189,7 @@ outputs/ # Example vLLM virtualenv examples/03_BenchmarkComparison/vllm_venv/ -# Agent artifacts (local development only) +# AI tool artifacts (local development only) .cursor_artifacts/ -.claude/agent-memory/ - -# User-specific local rules (local Docker dev); do not commit -.cursor/rules/local-docker-dev.mdc -CLAUDE.local.md +.cursor/ +docs/superpowers/ diff --git a/AGENTS.md b/AGENTS.md index 52a3dbb5..6fec5395 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -21,7 +21,7 @@ pytest -m integration # Integration tests only pytest --cov=src --cov-report=html # With coverage pytest -xvs tests/unit/path/to/test_file.py # Single test file -# Code quality (run before commits) +# Code quality โ€” MUST run before every commit, no exceptions pre-commit run --all-files # Local testing with echo server @@ -215,7 +215,7 @@ All of these run automatically on commit: - License header enforcement - `regenerate-templates`: auto-regenerates YAML config templates from schema defaults when `schema.py`, `config.py`, or `regenerate_templates.py` change -**Always run `pre-commit run --all-files` before committing.** +**IMPORTANT: Always run `pre-commit run --all-files` before every commit.** Hooks may modify files (prettier, ruff-format, license headers). If files are modified, stage the changes and commit once. Never commit without running pre-commit first. See [Development Guide](docs/DEVELOPMENT.md) for full setup and workflow details. @@ -240,7 +240,7 @@ See [Development Guide](docs/DEVELOPMENT.md) for full setup and workflow details @pytest.mark.run_explicitly # Only run when explicitly selected ``` -**Async tests**: Use `@pytest.mark.asyncio(mode="strict")` โ€” the project uses strict asyncio mode. +**Async tests**: Use `@pytest.mark.asyncio` โ€” strict mode is configured globally in `pyproject.toml` (`asyncio_mode = "strict"`). Do NOT pass `mode="strict"` to the marker โ€” it's not a valid argument. **Key fixtures** (defined in `tests/conftest.py`): @@ -342,7 +342,7 @@ Known failure modes when AI tools generate code for this project. Reference thes - **Generating mock-heavy tests for integration scenarios**: This project has real echo/oracle server fixtures. AI tends to mock HTTP calls even when `mock_http_echo_server` or `mock_http_oracle_server` fixtures exist and should be used. - **Missing test markers**: Every test function needs `@pytest.mark.unit`, `@pytest.mark.integration`, or another marker. AI-generated tests almost always omit markers, which breaks CI filtering. -- **Wrong asyncio mode**: Tests must use `@pytest.mark.asyncio(mode="strict")` โ€” AI often writes bare `@pytest.mark.asyncio` or forgets it entirely, causing silent test skips or failures. +- **Wrong asyncio marker**: Tests must use bare `@pytest.mark.asyncio` โ€” strict mode is configured globally in `pyproject.toml`. Do NOT pass `mode="strict"` to the marker (it's not a valid argument and will cause errors). AI sometimes hallucinates this parameter. - **Fabricating fixture names**: AI may invent fixtures that don't exist in `conftest.py`. Always check that referenced fixtures actually exist before using them. ### Code Style & Repo Conventions diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 8de1bbe9..db06a18c 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,11 +1,213 @@ -## Contributing +# Contributing to MLPerf Inference Endpoints -The best way to contribute to the MLCommons is to get involved with one of our many project communities. You can find more information about getting involved with MLCommons [here](https://mlcommons.org/community/). +Welcome! We're glad you're interested in contributing. This project is part of +[MLCommons](https://mlcommons.org/) and aims to build a high-performance +benchmarking tool for LLM inference endpoints targeting 50k+ QPS. -Generally we encourage people to become MLCommons members if they wish to contribute to MLCommons projects, but outside pull requests are very welcome too. +## Table of Contents -Regardless of whether you are a member, your organization (or you as an individual contributor) needs to sign the MLCommons Contributor License Agreement (CLA). Please submit your GitHub username to the [MLCommons Subscription form](https://mlcommons.org/community/subscribe/) to start that process. +- [Ways to Contribute](#ways-to-contribute) +- [Development Setup](#development-setup) +- [Code Style and Conventions](#code-style-and-conventions) +- [Testing](#testing) +- [Submitting Changes](#submitting-changes) +- [Issue Guidelines](#issue-guidelines) +- [MLCommons CLA](#mlcommons-cla) -MLCommons project work is tracked with issue trackers and pull requests. Modify the project in your own fork and issue a pull request once you want other developers to take a look at what you have done and discuss the proposed changes. Ensure that cla-bot and other checks pass for your pull requests. +## Ways to Contribute -For project-specific development standards (code style, test requirements, pre-commit hooks, commit format), see the [Development Guide](docs/DEVELOPMENT.md). +- **Report bugs** โ€” use the [Bug Report](https://github.com/mlcommons/endpoints/issues/new?template=100-bug-report.yml) template +- **Request features** โ€” use the [Feature Request](https://github.com/mlcommons/endpoints/issues/new?template=200-feature-request.yml) template +- **Report performance issues** โ€” use the [Performance Issue](https://github.com/mlcommons/endpoints/issues/new?template=300-performance.yml) template +- **Request dataset support** โ€” use the [Dataset Integration](https://github.com/mlcommons/endpoints/issues/new?template=400-dataset-integration.yml) template +- **Improve documentation** โ€” fix typos, clarify guides, add examples +- **Pick up an issue** โ€” look for [`good first issue`](https://github.com/mlcommons/endpoints/labels/good%20first%20issue) or [`help wanted`](https://github.com/mlcommons/endpoints/labels/help%20wanted) +- **Review PRs** โ€” thoughtful reviews are as valuable as code + +## Development Setup + +### Prerequisites + +- Python 3.12+ (3.12 recommended) +- Git +- A Unix-like OS (Linux or macOS) + +### Getting Started + +```bash +# Fork and clone +git clone https://github.com//endpoints.git +cd endpoints + +# Create virtual environment +python3.12 -m venv venv +source venv/bin/activate + +# Install with dev and test extras +pip install -e ".[dev,test]" + +# Install pre-commit hooks +pre-commit install + +# Verify your setup +pytest -m unit -x --timeout=60 +``` + +### Local Testing with Echo Server + +```bash +# Start a local echo server +python -m inference_endpoint.testing.echo_server --port 8765 + +# Run a quick probe +inference-endpoint probe --endpoints http://localhost:8765 --model test-model +``` + +## Code Style and Conventions + +### Formatting and Linting + +We use [ruff](https://docs.astral.sh/ruff/) for formatting and linting, and +[mypy](https://mypy-lang.org/) for type checking. Pre-commit hooks enforce +these automatically. + +```bash +# Run all checks manually +pre-commit run --all-files +``` + +### Key Conventions + +- **Line length:** 88 characters +- **Quotes:** Double quotes +- **License headers:** Required on all Python files (auto-added by pre-commit) +- **Commit messages:** [Conventional commits](https://www.conventionalcommits.org/) โ€” `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `perf:` +- **Comments:** Only where the _why_ isn't obvious from the code. No over-documenting. + +### Serialization + +- **Hot-path data** (Query, QueryResult, StreamChunk): `msgspec.Struct` โ€” encode/decode with `msgspec.json`, not stdlib json +- **Configuration**: `pydantic.BaseModel` for validation +- **Do not** use `dataclass` where neighboring types use `msgspec` + +### Performance-Sensitive Code + +Code in `load_generator/`, `endpoint_client/worker.py`, and `async_utils/transport/` +is latency-critical. In these paths: + +- No `match` statements โ€” use dict dispatch +- Minimize async suspends +- No pydantic validation or excessive logging +- Use `msgspec` over `json`/`pydantic` for serialization + +## Testing + +### Running Tests + +```bash +# All tests (excludes slow/performance) +pytest + +# Unit tests only +pytest -m unit + +# Integration tests +pytest -m integration + +# Single file +pytest -xvs tests/unit/path/to/test_file.py + +# With coverage +pytest --cov=src --cov-report=html +``` + +### Test Markers + +Every test function **must** have a marker: + +```python +@pytest.mark.unit +@pytest.mark.asyncio # strict mode is configured globally in pyproject.toml +async def test_something(): + ... +``` + +Available markers: `unit`, `integration`, `slow`, `performance`, `run_explicitly` + +### Coverage + +Target **>90% coverage** for all new code. Use existing fixtures from +`tests/conftest.py` (e.g., `mock_http_echo_server`, `mock_http_oracle_server`, +`dummy_dataset`) rather than mocking. + +## Submitting Changes + +### Branch Naming + +``` +feat/short-description +fix/short-description +docs/short-description +``` + +### Pull Request Process + +1. **Create a focused PR** โ€” one logical change per PR +2. **Fill out the PR template** โ€” describe what, why, and how to test +3. **Ensure CI passes** โ€” `pre-commit run --all-files` and `pytest -m unit` locally before pushing +4. **Link related issues** โ€” use `Closes #123` or `Relates to #123` +5. **Expect review within 2-3 business days** โ€” reviewers are auto-assigned based on changed files + +### What We Look For in Reviews + +- Does it follow existing patterns in the codebase? +- Are tests included and meaningful (not mock-heavy)? +- Is it focused โ€” no unrelated refactoring or over-engineering? +- Does it avoid adding unnecessary dependencies? + +### After Review + +- Address feedback with new commits (don't force-push during review) +- Once approved, a maintainer will merge + +## Issue Guidelines + +### Before Filing + +1. Search [existing issues](https://github.com/mlcommons/endpoints/issues) for duplicates +2. Use the appropriate issue template +3. Provide enough detail to reproduce or understand the request + +### Issue Lifecycle + +New issues are auto-added to our [project board](https://github.com/orgs/mlcommons/projects/57) +and flow through: **Inbox โ†’ Triage โ†’ Ready โ†’ In Progress โ†’ In Review โ†’ Done** + +### Priority Levels + +| Priority | Meaning | +| --------------- | ---------------------------------- | +| **ShowStopper** | Drop everything โ€” critical blocker | +| **P0** | Blocks release or users | +| **P1** | Must address this cycle | +| **P2** | Address within quarter | +| **P3** | Backlog, nice to have | + +## MLCommons CLA + +All contributors must sign the +[MLCommons Contributor License Agreement](https://mlcommons.org/membership/membership-overview/). +A CLA bot will check your PR automatically. + +To sign up: + +1. Visit the [MLCommons Subscription form](https://mlcommons.org/membership/membership-overview/) +2. Submit your GitHub username +3. The CLA bot will verify on your next PR + +Pull requests from non-members are welcome โ€” you'll be prompted to sign the CLA +during the PR process. + +## Questions? + +File an [issue](https://github.com/mlcommons/endpoints/issues). We aim to respond within a few business days. diff --git a/README.md b/README.md index 9af4eb85..a14ed18b 100644 --- a/README.md +++ b/README.md @@ -1,209 +1,129 @@ -# MLPerfยฎ Inference Endpoint Benchmarking System +# MLPerf Inference Endpoint Benchmarking System -A high-performance benchmarking tool for LLM endpoints. +[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) +[![Python 3.12+](https://img.shields.io/badge/python-3.12%2B-blue.svg)](https://www.python.org/downloads/) +[![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen.svg)](https://pre-commit.com/) -## Quick Start +A high-performance benchmarking tool for LLM inference endpoints, targeting 50k+ QPS. Part of [MLCommons](https://mlcommons.org/). -### Installation +## Quick Start -**Requirements**: Python 3.12+ (Python 3.12 is recommended for optimal performance. GIL-less mode in higher Python versions is not yet supported.) +**Requirements:** Python 3.12+ (3.12 recommended) ```bash -# Clone the repository -# Note: This repo will be migrated to https://github.com/mlcommons/endpoints git clone https://github.com/mlcommons/endpoints.git cd endpoints - -# Create virtual environment -python3.12 -m venv venv -source venv/bin/activate - -# As a user +python3.12 -m venv venv && source venv/bin/activate pip install . - -# As a developer (with development and test extras) -pip install -e ".[dev,test]" -pre-commit install ``` -### Basic Usage - ```bash -# Show help -inference-endpoint --help - -# Show system information -inference-endpoint -v info - # Test endpoint connectivity inference-endpoint probe \ --endpoints http://your-endpoint:8000 \ --model Qwen/Qwen3-8B -# Run offline benchmark (max throughput - uses all dataset samples) +# Run offline benchmark (max throughput) inference-endpoint benchmark offline \ --endpoints http://your-endpoint:8000 \ --model Qwen/Qwen3-8B \ --dataset tests/datasets/dummy_1k.jsonl -# Run online benchmark (sustained QPS - requires --target-qps, --load-pattern) +# Run online benchmark (sustained QPS) inference-endpoint benchmark online \ --endpoints http://your-endpoint:8000 \ --model Qwen/Qwen3-8B \ --dataset tests/datasets/dummy_1k.jsonl \ --load-pattern poisson \ --target-qps 100 - -# With explicit sample count -inference-endpoint benchmark offline \ - --endpoints http://your-endpoint:8000 \ - --model Qwen/Qwen3-8B \ - --dataset tests/datasets/dummy_1k.jsonl \ - --num-samples 5000 ``` -### Running Locally +### Local Testing ```bash -# Start local echo server -python3 -m inference_endpoint.testing.echo_server --port 8765 & - -# Test with dummy dataset (included in repo) +# Start local echo server and run a benchmark against it +python -m inference_endpoint.testing.echo_server --port 8765 & inference-endpoint benchmark offline \ --endpoints http://localhost:8765 \ - --model Qwen/Qwen3-8B \ + --model test-model \ --dataset tests/datasets/dummy_1k.jsonl - -# Stop echo server pkill -f echo_server ``` -See [Local Testing Guide](docs/LOCAL_TESTING.md) for detailed instructions. - -### Running Tests and Examples - -```bash -# Install test dependencies -pip install ".[test]" - -# Run tests (excluding performance and explicit-run tests) -pytest -m "not performance and not run_explicitly" - -# Run examples: follow instructions in examples/*/README.md -``` +See [Local Testing Guide](docs/LOCAL_TESTING.md) for more details. -## ๐Ÿ“š Documentation - -- [AGENTS.md](AGENTS.md) - Architecture, conventions, and AI agent guidelines -- [CLI Quick Reference](docs/CLI_QUICK_REFERENCE.md) - Command-line interface guide -- [Local Testing Guide](docs/LOCAL_TESTING.md) - Test with echo server -- [Development Guide](docs/DEVELOPMENT.md) - How to contribute and develop -- [Performance Architecture](docs/PERF_ARCHITECTURE.md) - Hot-path design and tuning -- [Performance Tuning](docs/CLIENT_PERFORMANCE_TUNING.md) - CPU affinity and client tuning -- [GitHub Setup Guide](docs/GITHUB_SETUP.md) - GitHub authentication and setup - -### Component Design Specs - -Each top-level component under `src/inference_endpoint/` has a corresponding spec: - -| Component | Spec | -| ----------------- | ---------------------------------------------------------------- | -| Core types | [docs/core/DESIGN.md](docs/core/DESIGN.md) | -| Load generator | [docs/load_generator/DESIGN.md](docs/load_generator/DESIGN.md) | -| Endpoint client | [docs/endpoint_client/DESIGN.md](docs/endpoint_client/DESIGN.md) | -| Metrics | [docs/metrics/DESIGN.md](docs/metrics/DESIGN.md) | -| Config | [docs/config/DESIGN.md](docs/config/DESIGN.md) | -| Async utils | [docs/async_utils/DESIGN.md](docs/async_utils/DESIGN.md) | -| Dataset manager | [docs/dataset_manager/DESIGN.md](docs/dataset_manager/DESIGN.md) | -| Commands (CLI) | [docs/commands/DESIGN.md](docs/commands/DESIGN.md) | -| OpenAI adapter | [docs/openai/DESIGN.md](docs/openai/DESIGN.md) | -| SGLang adapter | [docs/sglang/DESIGN.md](docs/sglang/DESIGN.md) | -| Evaluation | [docs/evaluation/DESIGN.md](docs/evaluation/DESIGN.md) | -| Testing utilities | [docs/testing/DESIGN.md](docs/testing/DESIGN.md) | -| Profiling | [docs/profiling/DESIGN.md](docs/profiling/DESIGN.md) | -| Plugins | [docs/plugins/DESIGN.md](docs/plugins/DESIGN.md) | -| Utils | [docs/utils/DESIGN.md](docs/utils/DESIGN.md) | - -## ๐ŸŽฏ Architecture - -The system follows a modular, event-driven architecture: +## Architecture ``` -Dataset Manager โ”€โ”€โ–บ Load Generator โ”€โ”€โ–บ Endpoint Client โ”€โ”€โ–บ External Endpoint - โ”‚ - Metrics Collector - (event logging + reporting) +Dataset Manager โ”€โ”€> Load Generator โ”€โ”€> Endpoint Client โ”€โ”€> External Endpoint + | + Metrics Collector (EventRecorder + MetricsReporter) ``` -- **Dataset Manager**: Loads benchmark datasets and applies transform pipelines -- **Load Generator**: Central orchestrator โ€” controls timing (scheduler), issues queries, and emits sample events -- **Endpoint Client**: Multi-process HTTP worker pool communicating over ZMQ IPC -- **Metrics Collector**: Receives sample events from Load Generator; writes to SQLite (EventRecorder), aggregates after the run (MetricsReporter) +| Component | Purpose | +| ------------------- | ------------------------------------------------------------------------------------ | +| **Load Generator** | Central orchestrator: `BenchmarkSession` owns lifecycle, `Scheduler` controls timing | +| **Endpoint Client** | Multi-process HTTP workers communicating via ZMQ IPC | +| **Dataset Manager** | Loads JSONL, HuggingFace, CSV, JSON, Parquet datasets | +| **Metrics** | SQLite-backed event recording, aggregation (QPS, latency, TTFT, TPOT) | +| **Config** | Pydantic-based YAML schema, CLI auto-generated via cyclopts | -## Accuracy Evaluation - -You can run accuracy evaluation with Pass@1 scoring by specifying accuracy datasets in the benchmark -configuration. Currently, Inference Endpoints provides the following pre-defined accuracy benchmarks: - -- GPQA (default: GPQA Diamond) -- AIME (default: AIME 2025) -- LiveCodeBench (default: lite, release_v6) - -However, LiveCodeBench will not work out-of-the-box and requires some additional setup. See the -[LiveCodeBench](src/inference_endpoint/evaluation/livecodebench/README.md) documentation for -details and explanations. +### Benchmark Modes -## ๐Ÿšง Pending Features +- **Offline** (`max_throughput`): Burst all queries at once for peak throughput measurement +- **Online** (`poisson`): Fixed QPS with Poisson arrival distribution for latency profiling +- **Concurrency**: Fixed concurrent request count -The following features are planned for future releases: +### Performance Design -- [ ] **Submission Ruleset Integration** - Full MLPerf submission workflow support -- [ ] **Documentation Generation and Hosting** - Sphinx-based API documentation with GitHub Pages +The hot path is optimized for minimal overhead: -## ๐Ÿค Contributing +- Multi-process workers with ZMQ IPC (not threads) +- `uvloop` + `eager_task_factory` for async performance +- `msgspec` for zero-copy serialization on the data path +- Custom HTTP connection pooling with `httptools` parser +- CPU affinity support for performance tuning -We welcome contributions! Please see our [Development Guide](docs/DEVELOPMENT.md) for details on: - -- Setting up your development environment -- Code style and quality standards -- Testing requirements -- Pull request process - -## ๐Ÿ™ Acknowledgements +## Accuracy Evaluation -This project draws inspiration from and learns from the following excellent projects: +Run accuracy evaluation with Pass@1 scoring using pre-defined benchmarks: -- [MLCommons Inference](https://github.com/mlcommons/inference) - MLPerf Inference benchmark suite -- [AIPerf](https://github.com/ai-dynamo/aiperf) - AI model performance profiling framework -- [SGLang GenAI-Bench](https://github.com/sgl-project/genai-bench) - Token-level performance evaluation tool -- [vLLM Benchmarks](https://github.com/vllm-project/vllm/tree/main/benchmarks) - Performance benchmarking tools for vLLM -- [InferenceMAX](https://github.com/InferenceMAX/InferenceMAX) - LLM inference optimization toolkit +- **GPQA** (default: GPQA Diamond) +- **AIME** (default: AIME 2025) +- **LiveCodeBench** (default: lite, release_v6) โ€” requires [additional setup](src/inference_endpoint/dataset_manager/predefined/livecodebench/README.md) -We are grateful to these communities for their contributions to LLM benchmarking and performance analysis. +## Documentation -## ๐Ÿ“„ License +| Guide | Description | +| -------------------------------------------------------------- | ------------------------------------- | +| [CLI Quick Reference](docs/CLI_QUICK_REFERENCE.md) | Command-line interface guide | +| [CLI Design](docs/CLI_DESIGN.md) | CLI architecture and design decisions | +| [Local Testing](docs/LOCAL_TESTING.md) | Test with the echo server | +| [Client Performance Tuning](docs/CLIENT_PERFORMANCE_TUNING.md) | Endpoint client optimization | +| [Performance Architecture](docs/PERF_ARCHITECTURE.md) | Performance architecture deep dive | +| [Development Guide](docs/DEVELOPMENT.md) | Development setup and workflow | +| [CONTRIBUTING.md](CONTRIBUTING.md) | How to contribute | -This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE.md) file for -details. +## Contributing -## ๐Ÿ”— Links +We welcome contributions from the community. See [CONTRIBUTING.md](CONTRIBUTING.md) for: -- [MLCommons](https://mlcommons.org/) - Machine Learning Performance Standards -- [Project Repository](https://github.com/mlcommons/endpoints) -- [MLPerf Inference](https://mlcommons.org/benchmarks/inference/) +- Development setup and prerequisites +- Code style (ruff, mypy, conventional commits) +- Testing requirements (>90% coverage, pytest markers) +- Pull request process and review expectations -## ๐Ÿ‘ฅ Contributors +Issues are tracked on our [project board](https://github.com/orgs/mlcommons/projects/57). Look for [`good first issue`](https://github.com/mlcommons/endpoints/labels/good%20first%20issue) or [`help wanted`](https://github.com/mlcommons/endpoints/labels/help%20wanted) to get started. -Credits to core contributors of the project: +## Acknowledgements -- MLCommons Committee -- NVIDIA: Zhihan Jiang, Rashid Kaleem, Viraat Chandra, Alice Cheng -- ... +This project draws inspiration from: -See [ATTRIBUTION](ATTRIBUTION) for detailed attribution information. +- [MLCommons Inference](https://github.com/mlcommons/inference) โ€” MLPerf Inference benchmark suite +- [AIPerf](https://github.com/ai-dynamo/aiperf) โ€” AI model performance profiling +- [SGLang GenAI-Bench](https://github.com/sgl-project/genai-bench) โ€” Token-level performance evaluation +- [vLLM Benchmarks](https://github.com/vllm-project/vllm/tree/main/benchmarks) โ€” Performance benchmarking for vLLM -## ๐Ÿ“ž Support +## License -- **Issues**: [GitHub Issues](https://github.com/mlcommons/endpoints/issues) -- **Discussions**: [GitHub Discussions](https://github.com/mlcommons/endpoints/discussions) -- **Documentation**: See [docs/](docs/) directory for guides +Apache License 2.0 โ€” see [LICENSE](LICENSE) for details. diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md index af32da1d..e4e2d3de 100644 --- a/docs/DEVELOPMENT.md +++ b/docs/DEVELOPMENT.md @@ -1,15 +1,14 @@ # Development Guide -This guide provides everything you need to contribute to the MLPerf Inference Endpoint Benchmarking System. +This guide covers the development setup and workflow for the MLPerf Inference Endpoint Benchmarking System. For contribution guidelines, see [CONTRIBUTING.md](../CONTRIBUTING.md). ## Getting Started ### Prerequisites -- **Python**: 3.12+ (Python 3.12 is recommended for optimal performance) +- **Python**: 3.12+ (3.12 recommended) - **Git**: Latest version -- **Virtual Environment**: Python venv or conda -- **IDE**: VS Code, PyCharm, or your preferred editor +- **OS**: Linux or macOS (Windows is not supported) ### Development Environment Setup @@ -23,7 +22,7 @@ git remote add upstream https://github.com/mlcommons/endpoints.git # 3. Create virtual environment (Python 3.12+ required) python3.12 -m venv venv -source venv/bin/activate # On Windows: venv\Scripts\activate +source venv/bin/activate # 4. Install development dependencies pip install -e ".[dev,test]" @@ -61,8 +60,8 @@ endpoints/ โ”œโ”€โ”€ tests/ # Test suite โ”‚ โ”œโ”€โ”€ unit/ # Unit tests โ”‚ โ”œโ”€โ”€ integration/ # Integration tests -โ”‚ โ”œโ”€โ”€ performance/ # Performance tests -โ”‚ โ””โ”€โ”€ datasets/ # Test datasets +โ”‚ โ”œโ”€โ”€ performance/ # Performance benchmarks +โ”‚ โ””โ”€โ”€ datasets/ # Test data (dummy_1k.jsonl, squad_pruned/) โ”œโ”€โ”€ docs/ # Documentation โ”œโ”€โ”€ examples/ # Usage examples โ””โ”€โ”€ scripts/ # Utility scripts @@ -73,114 +72,89 @@ endpoints/ ### Running Tests ```bash -# Run all tests +# All tests (excludes slow/performance) pytest -# Run with coverage +# Unit tests only +pytest -m unit + +# Integration tests +pytest -m integration + +# Single file with verbose output +pytest -xvs tests/unit/path/to/test_file.py + +# With coverage pytest --cov=src --cov-report=html +``` -# Run specific test categories -pytest -m unit # Unit tests only -pytest -m integration # Integration tests only -pytest -m performance # Performance tests only (no timeout) +### Test Markers -# Run tests in parallel -pytest -n auto +Every test function **must** have a marker: -# Run tests with verbose output -pytest -v +```python +import pytest -# Run specific test file -pytest tests/unit/test_core_types.py +@pytest.mark.unit +def test_something(): + ... -# Run with output to file (recommended) -pytest -v 2>&1 | tee test_results.log +@pytest.mark.unit +@pytest.mark.asyncio # strict mode is configured globally in pyproject.toml +async def test_async_something(): + ... ``` -### Test Structure +Available markers: `unit`, `integration`, `slow`, `performance`, `run_explicitly` -- **Unit Tests** (`tests/unit/`): Test individual components in isolation -- **Integration Tests** (`tests/integration/`): Test component interactions with real servers -- **Performance Tests** (`tests/performance/`): Test performance characteristics (marked with @pytest.mark.performance, no timeout) -- **Test Datasets** (`tests/datasets/`): Sample datasets for testing (dummy_1k.jsonl, squad_pruned/) +### Key Fixtures -### Writing Tests +Defined in `tests/conftest.py` โ€” use these instead of mocking: -```python -import pytest -from inference_endpoint.core.types import Query - -class TestQuery: - @pytest.mark.unit - def test_query_creation(self): - """Test creating a basic query.""" - query = Query(data={"prompt": "Test", "model": "test-model"}) - assert query.data["prompt"] == "Test" - assert query.data["model"] == "test-model" - - @pytest.mark.unit - @pytest.mark.asyncio(mode="strict") - async def test_async_operation(self): - """Test async operations.""" - # Your async test here - pass -``` +- `mock_http_echo_server` โ€” real HTTP echo server on dynamic port +- `mock_http_oracle_server` โ€” dataset-driven response server +- `dummy_dataset` โ€” in-memory test dataset +- `events_db` โ€” pre-populated SQLite events database + +### Coverage + +Target **>90% coverage** for all new code. ## Code Quality ### Pre-commit Hooks -The project uses pre-commit hooks to ensure code quality. - -Hooks that run automatically on commit: +All of these run automatically on commit: - trailing-whitespace, end-of-file-fixer, check-yaml, check-merge-conflict, debug-statements - `ruff` (lint + autofix) and `ruff-format` - `mypy` type checking - `prettier` for YAML/JSON/Markdown -- License header enforcement (Apache 2.0 SPDX header required on all Python files, added by `scripts/add_license_header.py`) +- License header enforcement +- YAML template validation and regeneration -**Always run `pre-commit run --all-files` before committing.** +**IMPORTANT: Always run `pre-commit run --all-files` before every commit.** Hooks may modify files. If files are modified, stage the changes and commit once. ```bash -# Install hooks (done during setup) -pre-commit install - -# Run all hooks on staged files -pre-commit run - -# Run all hooks on all files +# Run all hooks pre-commit run --all-files -``` - -### Code Formatting - -Configuration: `ruff` (line-length 88, target Python 3.12), `ruff-format` (double quotes, space indent). -```bash -# Format code with ruff -ruff format src/ tests/ - -# Check formatting without changing files -ruff format --check src/ tests/ +# Install hooks (done during setup) +pre-commit install ``` -### Linting - -```bash -# Run ruff linter -ruff check src/ tests/ +### Code Style -# Run mypy for type checking -mypy src/ - -# Run all quality checks -pre-commit run --all-files -``` +- **Formatter/Linter**: `ruff` (line-length 88, target Python 3.12) +- **Type checking**: `mypy` +- **Formatting**: `ruff-format` (double quotes, space indent) +- **License headers**: Required on all Python files (auto-added by pre-commit) +- **Commit messages**: [Conventional commits](https://www.conventionalcommits.org/) โ€” `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `perf:` +- **Comments**: Only where the _why_ isn't obvious from the code ## Development Workflow -### 1. Feature Development +### Feature Development ```bash # Sync your fork with upstream before starting @@ -189,88 +163,26 @@ git checkout main git merge upstream/main # Create a feature branch on your fork -git checkout -b feature/your-feature-name +git checkout -b feat/your-feature-name # Make changes and test pytest pre-commit run --all-files # Commit changes -git add . +git add git commit -m "feat: add your feature description" # Push to your fork and open a PR against mlcommons/endpoints -git push origin feature/your-feature-name +git push origin feat/your-feature-name ``` -### 2. Component Development - -When developing a new component: - -1. **Create the component directory** in `src/inference_endpoint/` -2. **Add `__init__.py`** with component description -3. **Implement the component** following the established patterns -4. **Add tests** in the corresponding `tests/unit/` directory -5. **Update main package** `__init__.py` if needed -6. **Add dependencies** to `pyproject.toml` under `[project.dependencies]` or `[project.optional-dependencies]` - -### 3. Testing Strategy - -- **Unit Tests**: >90% coverage required -- **Integration Tests**: Test component interactions -- **Performance Tests**: Ensure no performance regressions -- **Documentation**: Update docs for new features - -## Documentation - -### Writing Documentation - -- **Code Comments**: Add comments only where the _why_ is not obvious from the code; avoid restating what the code does -- **README Updates**: Update README.md for user-facing changes -- **Examples**: Provide usage examples for new features - -## Performance Considerations - -### Development Guidelines +### Branch Naming -- **Async First**: Use async/await for I/O operations -- **Memory Efficiency**: Minimize object creation in hot paths -- **Profiling**: Use pytest-benchmark for performance testing -- **Monitoring**: Add performance metrics for critical operations - -### Performance Testing - -```bash -# Run performance tests -pytest -m performance - -# Run benchmarks -pytest --benchmark-only - -# Compare with previous runs -pytest --benchmark-compare ``` - -## Debugging - -### Common Issues - -1. **Import Errors**: Ensure `src/` is in Python path -2. **Test Failures**: Check test data and mock objects -3. **Performance Issues**: Use profiling tools to identify bottlenecks -4. **Async Issues**: Ensure proper event loop handling - -### Debug Tools - -```bash -# Run with debug logging -inference-endpoint --verbose - -# Run tests with debug output -pytest -s -v - -# Use Python debugger -python -m pdb -m pytest test_file.py +feat/short-description +fix/short-description +docs/short-description ``` ## YAML Config Templates @@ -297,89 +209,37 @@ Add dependencies to `pyproject.toml` (always pin to exact versions with `==`): - **Runtime dependencies**: `[project.dependencies]` - **Optional groups** (dev, test, etc.): `[project.optional-dependencies]` -Install after updating: +After adding a dependency, run `pip-audit` (included in `dev` extras) to verify it has no known vulnerabilities. ```bash pip install -e ".[dev,test]" ``` -## Troubleshooting - -### Common Problems - -**Pre-commit hooks failing:** - -```bash -# Update pre-commit -pre-commit autoupdate - -# Skip hooks temporarily -git commit --no-verify -``` - -**Tests failing:** +## Performance Considerations -```bash -# Clear Python cache -find . -type d -name "__pycache__" -delete -find . -type f -name "*.pyc" -delete +Code in `load_generator/`, `endpoint_client/worker.py`, and `async_utils/transport/` is latency-critical. In these paths: -# Reinstall package -pip install -e . -``` +- No `match` statements โ€” use dict dispatch +- Use `dataclass(slots=True)` or `msgspec.Struct` for frequently instantiated classes +- Minimize async suspends +- Use `msgspec` over `json`/`pydantic` for serialization +- The HTTP client uses custom `ConnectionPool` with `httptools` parser โ€” not `aiohttp`/`requests` -**Import errors:** +## Debugging ```bash -# Check Python path -python -c "import sys; print(sys.path)" - -# Ensure src is in path -export PYTHONPATH="${PYTHONPATH}:$(pwd)/src" -``` - -## Contributing Guidelines +# Run with verbose logging +inference-endpoint -v benchmark offline ... -### Pull Request Process +# Run tests with stdout visible +pytest -xvs tests/unit/path/to/test.py -1. **Fork** `mlcommons/endpoints` on GitHub -2. **Clone your fork** and add `upstream` as a remote (see [Development Environment Setup](#development-environment-setup)) -3. **Sync with upstream** (`git fetch upstream && git merge upstream/main`) before starting work -4. **Create a feature branch** on your fork (`git checkout -b feature/your-feature-name`) -5. **Make your changes** following the coding standards -6. **Add tests** for new functionality -7. **Update documentation** as needed -8. **Run all checks** locally: `pytest` and `pre-commit run --all-files` -9. **Push to your fork** and open a PR against `mlcommons/endpoints:main` -10. **Address review comments** promptly - -### Commit Message Format - -Use conventional commit format: - -``` -type(scope): description - -feat(core): add query lifecycle management -fix(api): resolve endpoint connection issue -docs(readme): update installation instructions -test(loadgen): add performance benchmarks +# Use Python debugger +python -m pdb -m pytest tests/unit/path/to/test.py ``` -Allowed types: `feat`, `fix`, `docs`, `test`, `chore`, `refactor`, `perf`, `ci`. - -### Code Review Checklist - -- [ ] Code follows style guidelines -- [ ] Tests pass and coverage is adequate -- [ ] Documentation is updated -- [ ] Performance impact is considered -- [ ] Security implications are reviewed -- [ ] Error handling is appropriate - ## Getting Help - **Issues**: [GitHub Issues](https://github.com/mlcommons/endpoints/issues) -- **Discussions**: [GitHub Discussions](https://github.com/mlcommons/endpoints/discussions) -- **Documentation**: Check this guide and project docs -- **Team**: Reach out to the development team +- **Project Board**: [Q2 Board](https://github.com/orgs/mlcommons/projects/57) +- **Documentation**: See [docs/](.) directory for guides diff --git a/pyproject.toml b/pyproject.toml index 19fa129d..67dfc865 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -47,7 +47,7 @@ dependencies = [ "transformers==5.4.0", "numpy==2.4.4", "datasets==4.8.4", - "Pillow==12.1.1", + "Pillow==12.2.0", "sentencepiece==0.2.1", "protobuf==7.34.1", "openai_harmony==0.0.8", @@ -82,7 +82,7 @@ test = [ # Includes optional dependencies for full test coverage "inference-endpoint[sql]", # Testing framework - "pytest==9.0.2", + "pytest==9.0.3", "pytest-asyncio==1.3.0", "pytest-cov==7.1.0", "pytest-benchmark==5.2.3",