From eba78c184ba23fda5030b6fd057a86faf381e546 Mon Sep 17 00:00:00 2001 From: Brian Krabach Date: Mon, 16 Mar 2026 11:55:57 -0700 Subject: [PATCH 1/6] fix(critical): add __dict__ support to RustCoordinator for dynamic attribute assignment MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PyO3 #[pyclass] types do not support arbitrary attribute assignment by default — they lack __dict__. The old Python ModuleCoordinator was a Python *subclass* of RustCoordinator, giving it __dict__ automatically. When we removed _rust_wrappers.py in v1.2.3 and aliased ModuleCoordinator directly to RustCoordinator, we broke any code that dynamically sets attributes on the coordinator at runtime (e.g. coordinator.session_state, coordinator._tool_dispatch_context, coordinator._arbitrary_attr, etc.). The session_state crash in v1.2.3 and the _tool_dispatch_context crash in v1.2.4 are both symptoms of this same root cause. Fix: add the 'dict' flag to #[pyclass], which enables Python __dict__ on RustCoordinator. Dynamic attribute assignment now works for all attributes — past, present, and future — without needing to enumerate them as explicit Rust fields. The explicit session_state field added in v1.2.4 is retained for backward compatibility; its getter/setter still works correctly alongside __dict__ support. Verified: cargo check -p amplifier-core-py ✓ cargo clippy -p amplifier-core-py ✓ maturin develop ✓ pytest tests/ (549 passed, 1 skipped) ✓ dynamic attr test: _tool_dispatch_context, session_state, _arbitrary_attr ✓ --- bindings/python/src/coordinator/mod.rs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bindings/python/src/coordinator/mod.rs b/bindings/python/src/coordinator/mod.rs index a690d96..e778431 100644 --- a/bindings/python/src/coordinator/mod.rs +++ b/bindings/python/src/coordinator/mod.rs @@ -31,7 +31,7 @@ mod mount_points; /// The `mount_points` dict is directly accessible and mutable from Python, /// matching `ModuleCoordinator.mount_points` behavior that the ecosystem /// (pytest_plugin, testing.py) depends on. -#[pyclass(name = "RustCoordinator", subclass)] +#[pyclass(name = "RustCoordinator", subclass, dict)] pub(crate) struct PyCoordinator { /// Rust kernel coordinator (for reset_turn, injection tracking, config). pub(crate) inner: Arc, From aaba2b108328d5c877f8efaa81a5c55e48ab7f8f Mon Sep 17 00:00:00 2001 From: Brian Krabach Date: Mon, 16 Mar 2026 11:56:59 -0700 Subject: [PATCH 2/6] feat: add E2E smoke test script for pre-release validation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Creates scripts/e2e-smoke-test.sh — a containerized end-to-end smoke test that validates a locally-built amplifier-core wheel in a clean Docker environment before any release is tagged. The test: 1. Builds a wheel from local source via maturin build --release 2. Creates a fresh python:3.12-slim Docker container (isolated, no host pollution) 3. Installs amplifier from GitHub (CLI + foundation from git, core from PyPI) 4. Overrides amplifier-core with the local wheel via uv pip --force-reinstall 5. Runs the real smoke prompt with a real LLM provider (Anthropic): 'Ask recipe author to run one of its example recipes' 6. Detects failures: Python exceptions, AttributeErrors, tool failures, timeouts 7. Reports PASS/FAIL with clear output 8. Cleans up the container automatically (trap EXIT) This test exercises the full integration path: CLI startup → session init → LLM call → delegate tool → sub-session spawn → recipe tool → bundle file resolution → recipe execution → result return All three recent incidents (session_state crash v1.2.3, _tool_dispatch_context crash v1.2.4, version mismatch PyPI publish) would have been caught by this test before tagging. Usage: ./scripts/e2e-smoke-test.sh # Build + test (~5 min) ./scripts/e2e-smoke-test.sh --skip-build # Reuse existing dist/ wheel Reads ANTHROPIC_API_KEY from environment or ~/.amplifier/keys.env. --- scripts/e2e-smoke-test.sh | 234 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 234 insertions(+) create mode 100755 scripts/e2e-smoke-test.sh diff --git a/scripts/e2e-smoke-test.sh b/scripts/e2e-smoke-test.sh new file mode 100755 index 0000000..d1dedf0 --- /dev/null +++ b/scripts/e2e-smoke-test.sh @@ -0,0 +1,234 @@ +#!/usr/bin/env bash +set -euo pipefail + +# E2E Smoke Test for amplifier-core +# Runs a real amplifier session in an isolated Docker container +# with the locally-built amplifier-core wheel, using real LLM providers. +# +# Prerequisites: +# - Docker installed and running +# - ANTHROPIC_API_KEY set in environment (or in ~/.amplifier/keys.env) +# - maturin installed (pip install maturin) +# +# Usage: +# ./scripts/e2e-smoke-test.sh # Build wheel + run test +# ./scripts/e2e-smoke-test.sh --skip-build # Use existing wheel in dist/ +# +# Environment variables: +# SMOKE_PROMPT Override the default test prompt +# SMOKE_TIMEOUT Override the timeout in seconds (default: 180) + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +REPO_DIR="$(cd "$SCRIPT_DIR/.." && pwd)" +WHEEL_DIR="$REPO_DIR/dist" +CONTAINER_NAME="amplifier-e2e-smoke-$$" +SKIP_BUILD=false +SMOKE_PROMPT="${SMOKE_PROMPT:-Ask recipe author to run one of its example recipes}" +TIMEOUT_SECONDS="${SMOKE_TIMEOUT:-180}" + +# Parse args +for arg in "$@"; do + case $arg in + --skip-build) SKIP_BUILD=true ;; + --help) + echo "Usage: $0 [--skip-build]" + echo "" + echo "Options:" + echo " --skip-build Use existing wheel in dist/ instead of rebuilding" + echo "" + echo "Environment variables:" + echo " ANTHROPIC_API_KEY Required (or set in ~/.amplifier/keys.env)" + echo " SMOKE_PROMPT Test prompt (default: 'Ask recipe author to run one of its example recipes')" + echo " SMOKE_TIMEOUT Timeout in seconds (default: 180)" + exit 0 + ;; + esac +done + +# Colors +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +CYAN='\033[0;36m' +NC='\033[0m' + +log() { echo -e "${YELLOW}[smoke-test]${NC} $*"; } +info() { echo -e "${CYAN}[smoke-test]${NC} $*"; } +pass() { echo -e "${GREEN}[PASS]${NC} $*"; } +fail() { echo -e "${RED}[FAIL]${NC} $*"; exit 1; } + +cleanup() { + log "Cleaning up container $CONTAINER_NAME..." + docker rm -f "$CONTAINER_NAME" 2>/dev/null || true +} +trap cleanup EXIT + +# --------------------------------------------------------------------------- +# Step 0: Resolve API keys +# --------------------------------------------------------------------------- + +# If ANTHROPIC_API_KEY is not set, try to load from ~/.amplifier/keys.env +if [[ -z "${ANTHROPIC_API_KEY:-}" ]]; then + KEYS_ENV="$HOME/.amplifier/keys.env" + if [[ -f "$KEYS_ENV" ]]; then + log "Loading API keys from $KEYS_ENV..." + # shellcheck disable=SC1090 + set -a + source "$KEYS_ENV" + set +a + fi +fi + +[[ -z "${ANTHROPIC_API_KEY:-}" ]] && fail "ANTHROPIC_API_KEY not set. Set it in your environment or in ~/.amplifier/keys.env" +command -v docker &>/dev/null || fail "Docker not installed or not in PATH" + +# --------------------------------------------------------------------------- +# Step 1: Build wheel +# --------------------------------------------------------------------------- + +if [[ "$SKIP_BUILD" == "false" ]]; then + log "Building wheel from local source (this takes ~2 minutes)..." + mkdir -p "$WHEEL_DIR" + rm -f "$WHEEL_DIR"/amplifier_core-*.whl + (cd "$REPO_DIR" && maturin build --release --out "$WHEEL_DIR") \ + || fail "Wheel build failed — check maturin output above" + log "Wheel build complete." +else + log "Skipping build (--skip-build). Using existing wheel in $WHEEL_DIR/" +fi + +WHEEL=$(ls "$WHEEL_DIR"/amplifier_core-*.whl 2>/dev/null | head -1) +[[ -z "$WHEEL" ]] && fail "No wheel found in $WHEEL_DIR/ — run without --skip-build first" +log "Using wheel: $(basename "$WHEEL")" + +# --------------------------------------------------------------------------- +# Step 2: Create container +# --------------------------------------------------------------------------- + +log "Creating isolated Docker container..." +docker run -d --name "$CONTAINER_NAME" \ + -e ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \ + -e OPENAI_API_KEY="${OPENAI_API_KEY:-}" \ + -e AZURE_OPENAI_API_KEY="${AZURE_OPENAI_API_KEY:-}" \ + python:3.12-slim \ + sleep 3600 \ + || fail "Container creation failed" + +info "Container: $CONTAINER_NAME" + +# --------------------------------------------------------------------------- +# Step 3: Bootstrap container (git + uv) +# --------------------------------------------------------------------------- + +log "Installing dependencies in container (git, uv)..." +docker exec "$CONTAINER_NAME" bash -c " + apt-get update -qq && apt-get install -y -qq git >/dev/null 2>&1 + pip install -q uv + echo 'Bootstrap OK' +" || fail "Dependency install failed" + +# --------------------------------------------------------------------------- +# Step 4: Install amplifier from git +# --------------------------------------------------------------------------- + +log "Installing amplifier from GitHub (amplifier-core from PyPI, CLI+foundation from git)..." +docker exec "$CONTAINER_NAME" bash -c " + export PATH=/root/.local/bin:\$PATH + uv tool install git+https://github.com/microsoft/amplifier@main 2>&1 | tail -5 + echo 'Install OK' +" || fail "Amplifier install failed" + +# --------------------------------------------------------------------------- +# Step 5: Override amplifier-core with local wheel +# --------------------------------------------------------------------------- + +log "Injecting local wheel into container..." +docker cp "$WHEEL" "$CONTAINER_NAME:/tmp/local-core.whl" \ + || fail "Wheel copy to container failed" + +log "Overriding amplifier-core with local wheel..." +docker exec "$CONTAINER_NAME" bash -c " + uv pip install \ + --python /root/.local/share/uv/tools/amplifier/bin/python3 \ + --force-reinstall --no-deps \ + /tmp/local-core.whl 2>&1 | tail -3 + echo 'Override OK' +" || fail "Wheel override failed" + +# --------------------------------------------------------------------------- +# Step 6: Verify installed version +# --------------------------------------------------------------------------- + +log "Verifying installed version..." +INSTALLED_VERSION=$(docker exec "$CONTAINER_NAME" bash -c " + export PATH=/root/.local/bin:\$PATH + amplifier --version 2>&1 +") +info "Installed: $INSTALLED_VERSION" + +# --------------------------------------------------------------------------- +# Step 7: Run the smoke test +# --------------------------------------------------------------------------- + +echo "" +log "============================================================" +log " SMOKE TEST START" +log " Prompt: '$SMOKE_PROMPT'" +log " Timeout: ${TIMEOUT_SECONDS}s" +log "============================================================" +echo "" + +# Run the smoke test; capture output even if timeout exits non-zero +SMOKE_EXIT_CODE=0 +SMOKE_OUTPUT=$(docker exec "$CONTAINER_NAME" bash -c " + export PATH=/root/.local/bin:\$PATH + timeout $TIMEOUT_SECONDS amplifier run '$SMOKE_PROMPT' 2>&1 +" 2>&1) || SMOKE_EXIT_CODE=$? + +# --------------------------------------------------------------------------- +# Step 8: Evaluate results +# --------------------------------------------------------------------------- + +echo "" +echo "============================================================" +echo " SMOKE TEST OUTPUT (last 40 lines):" +echo "============================================================" +echo "$SMOKE_OUTPUT" | tail -40 +echo "============================================================" +echo "" + +# Check for Python exceptions / attribute errors (hard failures) +ERROR_PATTERNS="Traceback|TypeError|AttributeError|no attribute|object has no attribute|ImportError|ModuleNotFoundError|RuntimeError|KeyError|ValueError" +if echo "$SMOKE_OUTPUT" | grep -qE "$ERROR_PATTERNS"; then + echo "============================================================" + echo " ERRORS DETECTED:" + echo "============================================================" + echo "$SMOKE_OUTPUT" | grep -E "$ERROR_PATTERNS" | head -20 + echo "============================================================" + fail "Smoke test FAILED — Python exceptions detected in output" +fi + +# Check for tool failures (the exact pattern that caught our bugs) +TOOL_FAILURE_COUNT=$(echo "$SMOKE_OUTPUT" | grep -cE "Tool .+ failed:" || true) +if [[ "$TOOL_FAILURE_COUNT" -gt 0 ]]; then + echo "============================================================" + echo " TOOL FAILURES DETECTED ($TOOL_FAILURE_COUNT):" + echo "============================================================" + echo "$SMOKE_OUTPUT" | grep -E "Tool .+ failed:" | head -10 + echo "============================================================" + fail "Smoke test FAILED — $TOOL_FAILURE_COUNT tool failure(s) detected" +fi + +# Check for timeout (exit code 124 from the 'timeout' command) +if [[ "$SMOKE_EXIT_CODE" -eq 124 ]]; then + fail "Smoke test TIMED OUT after ${TIMEOUT_SECONDS}s — increase SMOKE_TIMEOUT or investigate" +fi + +# If we got here: no exceptions, no tool failures, no timeout +echo "" +pass "========================================================" +pass " SMOKE TEST PASSED" +pass " $INSTALLED_VERSION" +pass " No crashes, no tool failures, no timeout" +pass "========================================================" +echo "" From 51bd4793149275ba9c70721c684cb0b395f363c4 Mon Sep 17 00:00:00 2001 From: Brian Krabach Date: Mon, 16 Mar 2026 12:05:49 -0700 Subject: [PATCH 3/6] fix: preserve wheel filename in E2E smoke test script MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The wheel was being copied into the container as 'local-core.whl', which is not a valid wheel filename (missing Python tags). uv pip install rejected it with: error: The wheel filename "local-core.whl" is invalid: Must have a Python tag The script continued because the error was swallowed in the bash block, so the override silently failed and the smoke test ran against the previously-installed version (v1.2.2 from PyPI) — masking bugs. Fixes: - Preserve the original wheel filename when copying into the container (e.g. amplifier_core-1.2.4-cp312-cp312-manylinux_2_34_aarch64.whl) - Capture uv pip install output and check it explicitly for success confirmation (grep for 'installed' or 'already satisfied') - After the override, parse the version from the wheel filename and verify amplifier --version reports the expected 'core X.Y.Z' - Fail loudly at each check so the smoke test never runs against the wrong version --- scripts/e2e-smoke-test.sh | 27 ++++++++++++++++++++++----- 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/scripts/e2e-smoke-test.sh b/scripts/e2e-smoke-test.sh index d1dedf0..a0ebe7e 100755 --- a/scripts/e2e-smoke-test.sh +++ b/scripts/e2e-smoke-test.sh @@ -143,17 +143,25 @@ docker exec "$CONTAINER_NAME" bash -c " # --------------------------------------------------------------------------- log "Injecting local wheel into container..." -docker cp "$WHEEL" "$CONTAINER_NAME:/tmp/local-core.whl" \ +WHEEL_BASENAME=$(basename "$WHEEL") +docker cp "$WHEEL" "$CONTAINER_NAME:/tmp/$WHEEL_BASENAME" \ || fail "Wheel copy to container failed" +log "Wheel copied as: $WHEEL_BASENAME" log "Overriding amplifier-core with local wheel..." -docker exec "$CONTAINER_NAME" bash -c " +OVERRIDE_OUTPUT=$(docker exec "$CONTAINER_NAME" bash -c " uv pip install \ --python /root/.local/share/uv/tools/amplifier/bin/python3 \ --force-reinstall --no-deps \ - /tmp/local-core.whl 2>&1 | tail -3 - echo 'Override OK' -" || fail "Wheel override failed" + '/tmp/$WHEEL_BASENAME' 2>&1 +") || fail "Wheel override failed — uv pip install returned non-zero" + +# Confirm uv actually installed the package (not silently skipped / errored) +if ! echo "$OVERRIDE_OUTPUT" | grep -qiE "installed|already satisfied"; then + echo "$OVERRIDE_OUTPUT" + fail "Wheel override failed — uv did not report a successful install. See output above." +fi +log "Override output: $(echo "$OVERRIDE_OUTPUT" | tail -3)" # --------------------------------------------------------------------------- # Step 6: Verify installed version @@ -166,6 +174,15 @@ INSTALLED_VERSION=$(docker exec "$CONTAINER_NAME" bash -c " ") info "Installed: $INSTALLED_VERSION" +# Confirm the core version actually changed to our local build +LOCAL_CORE_VER=$(echo "$WHEEL_BASENAME" | grep -oP '(?<=amplifier_core-)[^-]+' || true) +if [[ -n "$LOCAL_CORE_VER" ]]; then + if ! echo "$INSTALLED_VERSION" | grep -q "core $LOCAL_CORE_VER"; then + fail "Wheel override did not take effect — expected 'core $LOCAL_CORE_VER' but got: $INSTALLED_VERSION" + fi + log "Core version confirmed: $LOCAL_CORE_VER ✓" +fi + # --------------------------------------------------------------------------- # Step 7: Run the smoke test # --------------------------------------------------------------------------- From 2e9de36feb7459af06ec2e74a2e26a74ac4b14dd Mon Sep 17 00:00:00 2001 From: Brian Krabach Date: Mon, 16 Mar 2026 14:41:27 -0700 Subject: [PATCH 4/6] style: apply cargo fmt to fix CI --- Cargo.lock | 4 ++-- bindings/python/src/coordinator/mod.rs | 2 -- uv.lock | 2 +- 3 files changed, 3 insertions(+), 5 deletions(-) diff --git a/Cargo.lock b/Cargo.lock index 37617ea..8a4216b 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -40,7 +40,7 @@ checksum = "e9d4ee0d472d1cd2e28c97dfa124b3d8d992e10eb0a035f33f5d12e3a177ba3b" [[package]] name = "amplifier-core" -version = "1.2.3" +version = "1.2.4" dependencies = [ "chrono", "log", @@ -76,7 +76,7 @@ dependencies = [ [[package]] name = "amplifier-core-py" -version = "1.2.3" +version = "1.2.4" dependencies = [ "amplifier-core", "log", diff --git a/bindings/python/src/coordinator/mod.rs b/bindings/python/src/coordinator/mod.rs index e778431..1cf65ad 100644 --- a/bindings/python/src/coordinator/mod.rs +++ b/bindings/python/src/coordinator/mod.rs @@ -478,5 +478,3 @@ impl PyCoordinator { Ok(dict) } } - - diff --git a/uv.lock b/uv.lock index 3ff7e63..800d2d0 100644 --- a/uv.lock +++ b/uv.lock @@ -4,7 +4,7 @@ requires-python = ">=3.11" [[package]] name = "amplifier-core" -version = "1.2.3" +version = "1.2.4" source = { editable = "." } dependencies = [ { name = "click" }, From 4607534015e231a3e726d5057f8962b9a6692a5b Mon Sep 17 00:00:00 2001 From: Brian Krabach Date: Mon, 16 Mar 2026 14:54:10 -0700 Subject: [PATCH 5/6] test: fix version assertion to use semver check instead of hardcoded 1.0.0 --- bindings/python/tests/test_schema_sync.py | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/bindings/python/tests/test_schema_sync.py b/bindings/python/tests/test_schema_sync.py index 62ecf22..4763c49 100644 --- a/bindings/python/tests/test_schema_sync.py +++ b/bindings/python/tests/test_schema_sync.py @@ -11,7 +11,14 @@ def test_rust_engine_has_version(): """Verify _engine exposes __version__.""" from amplifier_core._engine import __version__ - assert __version__ == "1.0.0" + # Version comes from env!("CARGO_PKG_VERSION") — must match Cargo.toml + assert __version__ is not None + assert isinstance(__version__, str) + assert len(__version__) > 0 + # Verify it looks like a semver version (X.Y.Z) + parts = __version__.split(".") + assert len(parts) == 3, f"Expected semver X.Y.Z, got {__version__}" + assert all(p.isdigit() for p in parts), f"Expected numeric semver, got {__version__}" def test_rust_engine_has_types(): From fac87d20b6ef533ea0e7ed1e637e35f4fc8f731d Mon Sep 17 00:00:00 2001 From: Brian Krabach Date: Mon, 16 Mar 2026 15:47:30 -0700 Subject: [PATCH 6/6] docs: add E2E smoke test gate and incident playbook to release process --- agents/core-expert.md | 9 ++++- context/release-mandate.md | 58 ++++++++++++++++++++++++++++- docs/CORE_DEVELOPMENT_PRINCIPLES.md | 13 ++++++- 3 files changed, 75 insertions(+), 5 deletions(-) diff --git a/agents/core-expert.md b/agents/core-expert.md index 1ee0af4..77da8a1 100644 --- a/agents/core-expert.md +++ b/agents/core-expert.md @@ -56,8 +56,14 @@ Provide: - Note: this applies specifically to amplifier-core (PyPI package), not all ecosystem repos - The three files that must be bumped in sync: `pyproject.toml`, `crates/amplifier-core/Cargo.toml`, `bindings/python/Cargo.toml` - The atomic script: `python scripts/bump_version.py X.Y.Z` +- **The E2E smoke test**: `./scripts/e2e-smoke-test.sh` must pass before tagging. It validates + the built wheel in an isolated Docker container with a real LLM session. This gate was added + after three incidents in v1.2.3/v1.2.4 where the wheel was broken but all unit/integration + tests passed. - The tag push: `git tag vX.Y.Z && git push origin main --tags` - Why: `v*` tag triggers `rust-core-wheels.yml` → PyPI publish; this is the only path to production +- **Incident recovery**: If a broken version reaches PyPI, see the Incident Playbook in + `context/release-mandate.md` — yank on PyPI, fix forward, never reuse a version number. --- @@ -242,7 +248,8 @@ See @core:docs/contracts/[NAME]_CONTRACT.md for complete specification. - **Two-implementation rule** before promoting anything - **Backward compatibility** is sacred - **Reference contract docs** - don't copy their content -- **Release gate is mandatory** — every merge to amplifier-core main requires a version bump, `v*` tag, and push before the next PR starts. See CORE_DEVELOPMENT_PRINCIPLES.md §10. +- **Release gate is mandatory** — every merge to amplifier-core main requires a version bump, E2E smoke test, `v*` tag, and push before the next PR starts. See CORE_DEVELOPMENT_PRINCIPLES.md §10. +- **E2E before tagging** — `./scripts/e2e-smoke-test.sh` must pass before any `v*` tag. Unit tests are necessary but not sufficient; the v1.2.3/v1.2.4 incidents proved that 549 passing tests don't guarantee the wheel works. **Your Mantra**: "The center stays still so the edges can move fast. I help ensure the kernel remains tiny, stable, and boring." diff --git a/context/release-mandate.md b/context/release-mandate.md index c213ed2..edbbe73 100644 --- a/context/release-mandate.md +++ b/context/release-mandate.md @@ -40,12 +40,66 @@ This rule exists **specifically** because `amplifier-core` publishes to PyPI and - `pyproject.toml` (line 3) - `crates/amplifier-core/Cargo.toml` (line 3) - `bindings/python/Cargo.toml` (line 3) -3. Commit, tag, and push: +3. Run the E2E smoke test (mandatory since v1.2.5): + ```bash + ./scripts/e2e-smoke-test.sh + ``` + This builds a wheel from local source, installs it in an isolated Docker container alongside + the real `amplifier` CLI, and runs a real LLM-powered session exercising tool dispatch, + agent delegation, and recipe execution. It catches: + - Import/attribute errors in the Rust↔Python bridge + - Session startup crashes + - Tool dispatch failures + - Any Python exception during a real agent loop + + **Requirements:** Docker running, `ANTHROPIC_API_KEY` set (or in `~/.amplifier/keys.env`). + Takes ~5 minutes. **Do not tag until this passes.** + +4. Commit, tag, and push: ```bash git commit -am "chore: bump version to X.Y.Z" git tag vX.Y.Z git push origin main --tags ``` -4. The `v*` tag triggers `rust-core-wheels.yml` → builds wheels for all platforms → publishes to PyPI. +5. The `v*` tag triggers `rust-core-wheels.yml` → builds wheels for all platforms → publishes to PyPI. Full process details: `docs/CORE_DEVELOPMENT_PRINCIPLES.md` §10 — The Release Gate. + +--- + +## Incident Playbook: When a Broken Version Reaches PyPI + +*Added after the v1.2.3/v1.2.4 incidents (March 2026).* + +### The Problem + +Once a version is published to PyPI, `uv tool install amplifier` and `pip install amplifier-core` +serve it immediately. There is no "rollback" button. For `uv tool install` users specifically, +there is no fast local rollback — users must wait for a fix. + +### The Playbook + +1. **Yank the broken version on PyPI** (immediately, ~30 seconds): + - Go to https://pypi.org/manage/project/amplifier-core/release/X.Y.Z/ + - Click "Options" → "Yank release" + - Add reason: "Broken: [brief description]" + + Yanking tells pip/uv to skip this version for new installs. New `amplifier update` + invocations will resolve to the last non-yanked version. + +2. **Fix forward** — do NOT try to reuse the yanked version number: + - Fix the bug on `main` + - Run the E2E smoke test (`./scripts/e2e-smoke-test.sh`) + - Bump to the next PATCH version + - Tag + push as normal + +3. **Post-mortem**: Add the incident to the history below. + +### Incident History + +| Version | Date | Root Cause | Impact | Resolution | +|---------|------|-----------|--------|------------| +| v1.0.7→v1.0.8 | 2026-03-03 | RetryConfig break for provider-anthropic | Provider users broken | Emergency hotfix | +| v1.2.3 | 2026-03-16 | `session_state` crash — missing dict field on RustCoordinator | CLI startup crashed | Yanked | +| v1.2.4 | 2026-03-16 | `_tool_dispatch_context` crash — RustCoordinator lacked `__dict__` | All tool dispatch crashed | Yanked | +| v1.2.4 | 2026-03-16 | Version files not bumped before tagging | PyPI publish rejected (400) | Re-tagged | diff --git a/docs/CORE_DEVELOPMENT_PRINCIPLES.md b/docs/CORE_DEVELOPMENT_PRINCIPLES.md index dac2a85..6400404 100644 --- a/docs/CORE_DEVELOPMENT_PRINCIPLES.md +++ b/docs/CORE_DEVELOPMENT_PRINCIPLES.md @@ -120,6 +120,7 @@ Each test layer has a distinct purpose: | **Proto equivalence tests** | Proto expansion matches hand-written Rust types | `src/generated/equivalence_tests.rs` | | **Python tests** | Behavioral compatibility — the PyO3 bridge works correctly | `tests/`, `bindings/python/tests/` | | **Switchover tests** | Python API contract — same imports, same behavior after Rust migration | `bindings/python/tests/test_switchover_*.py` | +| **E2E smoke test** | Real-world validation — the built wheel works in an isolated install with real LLM calls | `scripts/e2e-smoke-test.sh` | **The compiler is the first test.** If it compiles and clippy is clean, the structural correctness bar is already met. Tests then verify behavior, not shape. @@ -192,14 +193,22 @@ This rule applies **specifically to amplifier-core** because of its PyPI distrib - `crates/amplifier-core/Cargo.toml` (line 3) - `bindings/python/Cargo.toml` (line 3) -3. **Commit, tag, and push:** +3. **Run the E2E smoke test** before tagging: + ```bash + ./scripts/e2e-smoke-test.sh + ``` + Validates the built wheel in an isolated Docker container with a real LLM session. + Requires Docker and `ANTHROPIC_API_KEY`. Do not tag until it passes. + See `context/release-mandate.md` for full details and incident playbook. + +4. **Commit, tag, and push:** ```bash git commit -am "chore: bump version to X.Y.Z" git tag vX.Y.Z git push origin main --tags ``` -4. **Verify CI triggers.** The `v*` tag triggers `rust-core-wheels.yml`, which builds wheels for all platforms (Linux x86/aarch64, macOS, Windows) and publishes to PyPI. The next PR does not start until PyPI publish is confirmed. +5. **Verify CI triggers.** The `v*` tag triggers `rust-core-wheels.yml`, which builds wheels for all platforms (Linux x86/aarch64, macOS, Windows) and publishes to PyPI. The next PR does not start until PyPI publish is confirmed. ### Why the Script Exists