Skip to content

ci: migrate 15 of 20 jobs to smithy self-hosted runners#98

Merged
avrabe merged 1 commit into
mainfrom
smithy-migration
May 10, 2026
Merged

ci: migrate 15 of 20 jobs to smithy self-hosted runners#98
avrabe merged 1 commit into
mainfrom
smithy-migration

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 3, 2026

Summary

Conservative migration of mcp's CI to the pulseengine self-hosted fleet,
mirroring spar (#201), rivet (#262), kiln (#247), and gale (#35). 15 of
20 jobs move to smithy rust-cpu / light classes; 5 stay on hosted
because they depend on docker buildx, the podman-docker shim, the macOS

  • Windows matrix, or the upstream cargo audit parser fix that smithy
    hasn't shipped yet. Each kept-hosted job has an in-place comment naming
    the reason.

Migrated -> smithy

Class Jobs
rust-cpu (12 G) coverage, multi-version-testing, validate-framework-fast, stdio-integration-tests, python-sdk-compatibility, external-validator-integration, benchmark-validation, validate-release, quick-validation, validation-specific-tests, compatibility-check, validate-external-servers
light (4 G) changes, pr-report, update-compatibility-matrix

stdio-integration-tests and python-sdk-compatibility need Node/Python
respectively; smithy ships Node LTS via nvm and Python 3.12, so both
work without extra setup.

Stays on ubuntu-latest (in-place reasons in this PR for context)

Job Why hosted
build-validation-image docker buildx + GHCR push; smithy's podman-docker shim is untested for this
validate-in-container pulls + runs the container image built above
validate-framework matrix spans macos-latest and windows-latest; smithy is Linux x86_64 only
security-validation runs cargo audit; smithy's pinned cargo-audit 0.21.x rejects CVSS 4.0 advisories (RUSTSEC-2026-0037). Move once smithy bumps to >=0.22.1
cross-platform-validation matrix spans macos-latest and windows-latest

Workarounds applied

  • quick-validation (pr-validation.yml) and coverage (code-coverage.yml)
    had a Free up disk space step running sudo rm -rf on hosted-only
    paths (/usr/share/dotnet, /usr/local/lib/android, /opt/ghc,
    /opt/hostedtoolcache/CodeQL). Smithy redirects TMPDIR to a 500 G
    volume, so the cleanup is unnecessary and the sudo call would fail.
    Gated each with if: runner.environment == 'github-hosted' so the
    step is preserved (and still runs) for any future fallback to hosted.

Expected win

Same as spar / rivet / kiln / gale: queue elimination on the
org-free-tier 20-concurrent cap dominates total wall time. mcp's PR
validation flows have meaningful compile + coverage work that fits
rust-cpu cleanly.

Test plan

  • All 15 migrated jobs land on smithy runners and finish green
  • All 5 hosted jobs remain unchanged
  • No EACCES events in smithy's journalctl -u smithy-trace-eacces.service
  • No "no space left on device" on coverage runs

Rollback

Revert this commit; all 15 jobs flip back to ubuntu-latest.

  rust-cpu     coverage, multi-version-testing, validate-framework-fast,
               stdio-integration-tests, python-sdk-compatibility,
               external-validator-integration, benchmark-validation,
               validate-release, quick-validation, validation-specific-tests,
               compatibility-check, validate-external-servers
  light        changes, pr-report, update-compatibility-matrix

Stays on ubuntu-latest:
  - build-validation-image       docker buildx + GHCR push, podman-docker shim untested
  - validate-in-container        pulls + runs container image
  - validate-framework           matrix spans macOS + Windows
  - security-validation          cargo audit; smithy's pinned cargo-audit (0.21.x) rejects CVSS 4.0
  - cross-platform-validation    matrix spans macOS + Windows

Workaround applied:
  Two jobs (quick-validation, coverage) had a "Free up disk space" step
  that runs `sudo rm -rf` on hosted-only paths (/usr/share/dotnet, etc.).
  Smithy redirects TMPDIR to a 500 G volume so the cleanup is unnecessary
  and would fail (no sudo). Gated with `if: runner.environment == 'github-hosted'`
  so the step still runs when CI falls back to hosted.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

PR Validation Results

Quick Validation: ❌

  • Format check
  • Clippy lints
  • Unit tests
  • Documentation

Summary: ❌ Some checks failed

@avrabe avrabe merged commit 3e294ed into main May 10, 2026
18 of 22 checks passed
@avrabe avrabe deleted the smithy-migration branch May 10, 2026 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant