Skip to content

ci: [TEST RUN — DO NOT REVIEW] All CI optimizations combined for profiling#7039

Open
huth-stacks wants to merge 10 commits intostacks-network:developfrom
huth-stacks:ci/all-optimizations
Open

ci: [TEST RUN — DO NOT REVIEW] All CI optimizations combined for profiling#7039
huth-stacks wants to merge 10 commits intostacks-network:developfrom
huth-stacks:ci/all-optimizations

Conversation

@huth-stacks
Copy link
Copy Markdown

⚠️ TEST RUN ONLY — DO NOT REVIEW OR MERGE ⚠️

This PR exists solely to measure the aggregate CI timing impact of all optimizations combined.
It requires upstream org runners (ubuntu-latest-m) which are not available on personal forks.
Will be closed after timing data is collected.


What's included (10 atomic changes)

# Change Lines Impact
P0 Fix unit test failure masking 3 Correctness
P1 Remove yml from paths-ignore 1 Correctness
P2 Start cache building immediately 4 deleted ~28s off critical path
P3 Larger runners (ubuntu-latest-m) 3 ~30-50% faster compilation
P4 Timing-based test partitioning +150 lines ~10-12m off slowest partition
P6 Constants check via artifact 15 4m10s → 9s
P7 Enable clippy caching 1 5m → 1m on warm cache
P8 CARGO_INCREMENTAL=0, debug=0 2 ~10% compile + smaller cache
P9 Remove coverage instrumentation 1 ~15% faster compile, smaller binaries
P10 Skip cargo-hack on PRs 1 deleted ~13m off every PR

Total: ~30 lines changed across 7 files.

Baseline (median of 3 recent upstream runs)

Metric Current
Total wall-clock ~2h30m
nextest-archive 15m33s
Slowest unit partition 24m1s
Constants check 4m10s
Cargo hack native 13m14s

Expected aggregate impact

Targeting sub-1-hour total pipeline time.

Security Checklist

  • No new permissions
  • No secrets exposure
  • No new third-party actions
  • All runner labels already in use by release workflow

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 25, 2026

CLA assistant check
All committers have signed the CLA.

@huth-stacks
Copy link
Copy Markdown
Author

⚠️ DO NOT REVIEW — CI PERFORMANCE TEST RUN ⚠️

This PR exists solely to measure aggregate CI timing impact. It requires ubuntu-latest-m runners which are only available on org accounts, not personal forks.

Will be closed after timing data is collected. Not a real PR.

See also: #7038 (P3 individual test — larger runners only)

The unit-tests job had continue-on-error: true and the check-tests
job did not depend on unit-tests, causing test failures to be
silently swallowed. Remove continue-on-error and add unit-tests
to the check-tests needs array.
The paths-ignore block excluded **.yml files, which meant pushes
to master/develop/next containing only workflow file changes would
silently skip CI. Remove this exclusion so workflow changes are
always validated.
The create-cache workflow waited for rustfmt, changelog-check, and
check-release before starting the 15-minute nextest archive build.
These gates are independent from compilation. Test workflows already
depend on both create-cache and the format checks independently,
so format failures still block test results.
Switch nextest-archive, cargo-hack native-targets, and constants-check
to ubuntu-latest-m (4 vCPU, 16GB RAM) for faster compilation. The
release workflow already uses these runners. Expected ~30-50% faster
compilation with negligible cost difference.
Add a Python script that reads JUnit XML timing data from previous
CI runs and uses greedy bin-packing to distribute tests into
time-balanced partitions. Falls back to hash-based partitioning
when no timing data is available (first run, or data expired).

New files:
  .github/scripts/split-tests-by-timing.py - bin-packing script
  .config/nextest.toml - enables JUnit XML output for CI profile

Each partition uploads its JUnit XML as an artifact (90-day retention).
On subsequent runs, all partitions download this timing data and the
script assigns tests to minimize the slowest partition's duration.

POC: inlines the nextest command to test this approach. Production
implementation should integrate with stacks-network/actions.
Instead of compiling stacks-inspect from scratch (4+ minutes), the
nextest-archive job now generates the constants JSON and uploads it
as an artifact. The constants-check job downloads and diffs it,
reducing the check from 4 minutes to ~10 seconds.

Also adds create-cache to constants-check's needs in ci.yml so the
artifact is available before the download step runs.
The clippy workflow explicitly disabled caching, causing full
recompilation on every PR. Enable the built-in cargo/target
caching from actions-rust-lang/setup-rust-toolchain.
Set CARGO_INCREMENTAL=0 and CARGO_PROFILE_DEV_DEBUG=0 for the test
cache build. Incremental compilation adds ~10% overhead on clean CI
builds and bloats the target directory. Debug info level 2 (default)
increases binary size and link time without benefit in CI where we
don't debug interactively.
Remove -Cinstrument-coverage from RUSTFLAGS in create-cache.yml.
This eliminates ~15% compilation overhead, ~56% binary size bloat,
and per-test .profraw I/O from every PR run. Coverage data is no
longer collected per-PR. The coverage report job will skip gracefully.

Standard practice for large Rust projects — coverage on merge, not
per-PR. Can be restored by reverting this one-line change.
Remove pull_request from cargo-hack-check's trigger condition.
Feature combination checks still run on merge queue, releases,
and manual dispatch — catching issues before they reach develop.
PRs skip this 13-minute job for faster feedback.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants