Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions .claude/skills/opsml-rust-python/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
name: opsml-rust-python
description: Repo-local OpsML skill for Rust core, Python bindings, PyO3, maturin, card/registry/server/client logic, Python API exports, generated stubs, and cross-language tests. Use when working in OpsML crates, `py-opsml`, PyO3-exposed types, Rust errors that cross Python, card or registry behavior, server/client contracts, or Python-visible SDK behavior. Do not use for Svelte UI work; use `opsml-ui` instead.
---

# OpsML Rust/Python

Use this skill as the source of truth for OpsML work where Rust core logic is exposed to Python through PyO3. OpsML is not a generic Rust/Python package: cards are the central abstraction, Rust owns the business logic, and Python is a thin ergonomic API over the Rust core.

Start by locating the layer you are changing:
- Rust core design, traits, ownership, performance, cloning, async, or crate-local API shape: read `references/rust-core.md`.
- Rust card, registry, storage, SQL, server, auth, events, or shared contracts: read `references/architecture.md`.
- PyO3 classes, `#[pymethods]`, GIL usage, nested `#[pyclass]` fields, or Python lifetimes: read `references/pyo3-boundaries.md`.
- Error types, `PyErr`, server envelopes, CLI errors, or Python exceptions: read `references/errors.md`.
- Python exports, `__all__`, generated stubs, maturin setup, or Python-visible SDK behavior: read `references/python-api-and-stubs.md`.
- Tests, linting, formatting, or command selection: read `references/testing-workflows.md`.
- Agent-readable APIs, structured errors, validation, lint sensors, or harness work: read `references/agent-harness.md`.

Follow these repo-specific rules:
- Keep core behavior in Rust. Python should expose a typed, ergonomic API and small helpers, not duplicate card, registry, storage, or validation logic.
- Design Rust APIs around domain-owned data, precise traits, and explicit ownership before thinking about the Python binding.
- Keep Python lifetimes out of Rust-only code. Introduce `Python<'py>`, `Bound<'py, PyAny>`, `Py<PyAny>`, and `PyErr` only where code crosses the Python boundary.
- Do not store `PyErr` in reusable Rust error enums. Convert Python errors into string-backed Rust variants, then convert Rust errors back into Python exceptions at the PyO3 boundary.
- For `#[pyclass]` fields whose type is also `#[pyclass]`, do not use `#[pyo3(get, set)]`. Implement manual `#[getter]` and `#[setter]` methods with `IntoPyObjectExt` and `extract`.
- Prefer zero-cost Rust abstractions: enums with delegated trait impls, static dispatch, concrete types, precise errors, iterators, references, ownership transfer, and `Arc` only where shared state is real.
- Treat speculative `Clone`, broad abstractions, unnecessary allocation, and Python-driven core design as design smells. Add them only for concrete call sites.
- Make new Rust core logic testable without Python whenever possible. Add Python tests for Python-visible workflows.
- Write errors and API contracts so humans and coding agents can debug them: stable names, clear fields, concise messages, and actionable hints.
- Use repository workflow tooling from `mise.toml`. Do not invent ad hoc commands when a `mise run ...` task exists.
- Inspect current dependency versions in `Cargo.toml`, `py-opsml/pyproject.toml`, and lockfiles before relying on version-specific behavior.

When making Rust/PyO3 changes, inspect in this order:
1. The Rust crate that owns the domain behavior.
2. Shared types in `opsml-types/src/contracts/` if the behavior crosses server/client/Python boundaries.
3. PyO3 module registration under `py-opsml/src/`.
4. Python package re-exports under `py-opsml/python/opsml/`.
5. Generated stubs under `py-opsml/python/opsml/_opsml.pyi` and `py-opsml/python/opsml/stubs/`.
6. Rust and Python tests that model the user journey.

Use these verification commands when relevant:
- Rust formatting: `mise run format`
- Rust linting: `mise run lints`
- Targeted Rust tests: `cargo test -p <crate> <test_name> -- --nocapture --test-threads=1`
- Full Rust aggregate when justified: `mise run test:unit`
- Rebuild Python bindings after PyO3-exposed Rust changes: `mise run py:setup`
- Python linting: `mise run py:lints`
- Python unit tests: `mise run py:test:unit`

Prefer narrow, local edits that match existing OpsML patterns. Broaden the architecture only when the current domain boundary is truly wrong for the user workflow.
55 changes: 55 additions & 0 deletions .claude/skills/opsml-rust-python/references/agent-harness.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Agent-Friendly OpsML Work

OpsML is being shaped for both human developers and coding agents. The key direction is harness engineering: give agents strong guides before they act and strong sensors after they act.

## Design For Agents And Humans

Code should be readable by humans first and stable enough for agents to modify safely:
- Keep names domain-specific and unambiguous.
- Keep functions small enough that invariants are visible.
- Prefer typed contracts over loose dictionaries or stringly-typed conventions.
- Surface failures with stable codes, fields, hints, and docs where the repo supports them.
- Keep side effects at clear boundaries.

## Same Envelope Principle

Harness work should converge on the same structured shape across layers:
- HTTP responses.
- PyO3 exceptions.
- CLI output.
- `card.validate()`.
- `opsml lint`.
- Integrity checks.
- Eval results.

Use fields such as:
- `code`
- `field`
- `hint` or `suggested_action`
- `doc_url`
- `retry`

Agents should not need to parse paragraphs to understand what field to fix.

## Validation And Sensors

When adding governance behavior, think in layers:
- Edit-time or local lint sensors.
- Rust-native validation on core types.
- Registry/server chokepoints.
- Post-hoc integrity checks.
- Behavior evals for prompts and agents.

The Rust core should own validations that define durable OpsML correctness. Python should expose them ergonomically and test them as user workflows.

## Documentation Near APIs

Public Rust and Python APIs should include useful docs when they define:
- User-visible behavior.
- Required invariants.
- Error conditions.
- Security constraints.
- Serialization formats.
- Cross-language boundary behavior.

Do not add noisy comments that restate simple code. Add concise comments when they preserve hard-won context, such as why a Python lifetime is intentionally kept at the boundary.
73 changes: 73 additions & 0 deletions .claude/skills/opsml-rust-python/references/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# OpsML Rust/Python Architecture

OpsML is an AI lifecycle platform organized around cards: versioned, encrypted, registry-tracked records for data, models, experiments, prompts, services, agents, and skills. Python users create and operate cards, but the durable behavior belongs in Rust.

## Core Rule

Rust is the source of truth for:
- Card structure and validation.
- Registry behavior.
- Server contracts and route behavior.
- Storage, SQL, encryption, auth, events, and versioning.
- Serialization formats that persist or cross process boundaries.

Python should provide:
- Ergonomic constructors and usage patterns.
- Thin re-exports from `_opsml`.
- Small Python-only helpers where Python libraries are the natural boundary.
- User-journey tests for public Python behavior.

Do not implement durable business logic twice in Rust and Python. If Python and Rust disagree, the design is already drifting.

## Important Crates

- `opsml-cards`: PyO3 card structs and card-specific behavior.
- `opsml-registry`: Python-facing `CardRegistry`; dispatches to local or server-backed operations.
- `opsml-types`: shared contract types, enums, and request/response shapes.
- `opsml-server`: Axum routes, middleware, API handlers, server errors.
- `opsml-client`: Rust HTTP client used by Python bindings in server mode.
- `opsml-sql`: database abstraction over SQLite, PostgreSQL, and MySQL.
- `opsml-storage`: storage abstraction over local and cloud backends.
- `opsml-crypt`: artifact encryption.
- `opsml-experiment`, `opsml-genai`, `opsml-service`: domain-specific card logic.
- `py-opsml`: Python package and PyO3 extension wiring.

Read `AGENTS.md` for the full crate map before changing a cross-cutting path.

## Enum-Based Backends

OpsML favors enum dispatch for core backends:
- `StorageClientEnum` delegates `StorageClient` methods to local/S3/GCS/Azure variants.
- `SqlClientEnum` delegates SQL/card logic to SQLite/PostgreSQL/MySQL variants.

When adding a backend or domain variant, follow this pattern before reaching for `Box<dyn Trait>`.

## Contracts And Routes

Shared request/response types belong in `opsml-types/src/contracts/`.

Server routes live under `/opsml/api` and follow the existing handler shape:
- `State<Arc<AppState>>` for dependencies.
- `Extension<UserPermissions>` for protected routes, even read-only routes.
- `Query(...)` or `Json(...)` for inputs.
- `Result<Json<Response>, (StatusCode, Json<OpsmlServerError>)>` or an established local equivalent.

Use `parse_qs_query::<T>(&uri)` for query strings containing `Vec<T>`.

## Registry Modes

`CardRegistry` supports:
- Local mode: direct filesystem/SQLite-backed registry operations.
- Server mode: HTTP proxy through `opsml-client`.

New behavior should preserve both modes unless the feature is explicitly server-only. A change that only works through the Python package but not through Rust registry/server paths is usually in the wrong layer.

## Artifact Encryption

Card artifacts are encrypted before storage. Do not bypass:
- `create_artifact_key()`
- `create_and_store_encrypted_file()`
- `download_artifact()` plus decryption
- `ArtifactKey` as the database source of truth

Security-sensitive changes need targeted tests around key lookup, upload/download paths, and error behavior.
84 changes: 84 additions & 0 deletions .claude/skills/opsml-rust-python/references/errors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Errors Across Rust And Python

OpsML errors should be clear enough for humans and structured enough for agents. They should name what failed, where possible include the affected field/resource, and preserve enough context to fix the issue without parsing vague prose.

## No Stored `PyErr`

Do not store `PyErr` inside reusable Rust error enums. `PyErr` can pull Python runtime/lifetime concerns into pure Rust code and cause C linker or GIL-related failures in Rust tests.

Use the canonical pattern in `crates/opsml_cards/src/skill/error.rs`:
- Rust error variants store Rust-owned data such as `String`.
- `From<PyErr> for SkillError` converts to a string-backed variant.
- `From<SkillError> for PyErr` exists only behind the Python feature and maps to a Python exception at the boundary.

Preferred shape:

```rust
#[derive(thiserror::Error, Debug)]
pub enum DomainError {
#[error("{0}")]
Error(String),

#[error(transparent)]
Io(#[from] std::io::Error),
}

#[cfg(feature = "python")]
impl From<PyErr> for DomainError {
fn from(err: PyErr) -> Self {
DomainError::Error(err.to_string())
}
}

#[cfg(feature = "python")]
impl From<DomainError> for PyErr {
fn from(err: DomainError) -> PyErr {
pyo3::exceptions::PyRuntimeError::new_err(err.to_string())
}
}
```

Avoid this in core errors:

```rust
#[error(transparent)]
Python(#[from] PyErr)
```

Only use direct Python error storage in code that is permanently Python-only and cannot be reached by Rust tests or Rust core logic. That should be rare in OpsML.

## Transitive Error Chains

The rule applies transitively. If `CardError` wraps `ModelInterfaceError`, and `ModelInterfaceError` stores `PyErr`, then `CardError` is contaminated too.

When adding `#[from]` variants, inspect wrapped errors for PyO3 types. Prefer converting upstream errors to string-backed variants at the boundary.

## Server Error Envelope

Server handlers should use the existing `OpsmlServerError` helpers and structured fields where available. For new agent-facing or validation work, prefer stable data:
- `code`
- `field`
- `suggested_action` or hint
- `doc_url`
- `retry`

Prefer one stable error shape across HTTP, PyO3, CLI, lint output, validation, and eval. A Python caller, CLI user, UI route, and coding agent should be able to recognize the same failure without parsing unrelated prose.

## Human And Agent Debuggability

Error messages should:
- Name the operation that failed.
- Include the resource identifier when safe, such as card UID, space/name/version, file path, or field.
- Avoid generic messages like "invalid input" when the failing field is known.
- Avoid logging secrets, tokens, encryption keys, or provider credentials.
- Keep wording concise and stable enough for tests and agents.

## Mapping To Python Exceptions

Choose Python exception types deliberately:
- Invalid user input: `PyValueError`.
- Missing key or field: `PyKeyError` or `PyValueError`, depending on existing local style.
- Filesystem/IO: `PyOSError`.
- Runtime integration failure: `PyRuntimeError`.

Follow nearby mappings in the same crate before introducing a new exception style.
89 changes: 89 additions & 0 deletions .claude/skills/opsml-rust-python/references/pyo3-boundaries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# PyO3 Boundary Rules

PyO3 code is a boundary layer. Keep Python lifetimes, `PyErr`, and object extraction at the edge so Rust core code remains testable with normal Rust tests.

## Boundary Types

Use these types only when the code is actually crossing into Python:
- `Python<'py>`
- `Bound<'py, PyAny>`
- `Py<PyAny>`
- `PyErr`
- `#[pyclass]`, `#[pymethods]`, `#[pyfunction]`

Pure Rust functions should generally accept and return Rust types, not Python-bound objects.

## Constructors

For Python constructors, keep extraction and conversion near `#[new]`, then call a Rust-native constructor such as `new_rs`, `from_config`, or a domain-specific builder.

Preferred shape:

```rust
#[cfg(feature = "python")]
#[pymethods]
impl SkillCard {
#[new]
pub fn new(skill: &Bound<'_, PyAny>, space: Option<&str>) -> Result<Self, SkillError> {
let skill = skill.extract::<AgentSkillStandard>()?;
Self::new_rs(skill, space, None, None, None, None, None, None)
}
}
```

Keep the Rust-native constructor usable without Python.

## Nested `#[pyclass]` Fields

If a `#[pyclass]` struct has a field whose type is itself a `#[pyclass]`, do not put `#[pyo3(get, set)]` on that field. PyO3-generated accessors can leak Python lifetimes into pure Rust call sites and tests.

Use the canonical pattern in `crates/opsml_cards/src/skill/card.rs`:
- `skill` field getter at `SkillCard::skill`.
- `set_skill` setter using `extract`.
- `dependencies` getter using `IntoPyObjectExt`.
- `set_dependencies` setter using `extract::<Vec<SkillDependency>>()`.

Preferred shape:

```rust
#[getter]
pub fn skill<'py>(&self, py: Python<'py>) -> Result<Bound<'py, PyAny>, SkillError> {
Ok(self.skill.clone().into_bound_py_any(py)?)
}

#[setter]
pub fn set_skill(&mut self, skill: &Bound<'_, PyAny>) -> Result<(), SkillError> {
self.skill = skill.extract::<AgentSkillStandard>()?;
Ok(())
}
```

## Python Objects In Cards

Some cards hold Python-owned objects, such as model or data interfaces. Keep GIL acquisition scoped tightly:
- Acquire the GIL only where calling Python methods or extracting Python objects.
- Convert Python-side data into Rust metadata before serialization.
- Do not attempt to serialize `Py<PyAny>` directly.
- Reconstruct Python-facing objects only at load/deserialization boundaries where the Python API needs them.

## Feature Gates

Respect existing `python` and `server` feature gates. If a Rust unit test fails with Python linking or libpython errors, inspect for leaked PyO3 types in core code or transitive error chains.

Common causes:
- `PyErr` stored in a reusable error enum.
- `#[pyo3(get, set)]` on nested `#[pyclass]` fields.
- Python-only imports not guarded with `#[cfg(feature = "python")]`.

## Module Registration

New Python-visible Rust functions/classes must be wired through `py-opsml/src/lib.rs` and the appropriate submodule registration function, such as:
- `card::add_card_module(m)?`
- `data::add_data_module(m)?`
- `model::add_model_module(m)?`
- `experiment::add_experiment_module(m)?`
- `agent::add_agent_module(m)?`
- `service::add_service_module(m)?`
- `types::add_types_module(m)?`

Do not stop after adding `#[pyclass]`; registration, Python exports, stubs, and tests are part of the public API surface.
Loading
Loading