Skip to content

feat(ci): two-tier test split with service classification#115434

Draft
mchen-sentry wants to merge 8 commits into
masterfrom
mchen/ci-two-tier-split-v2
Draft

feat(ci): two-tier test split with service classification#115434
mchen-sentry wants to merge 8 commits into
masterfrom
mchen/ci-two-tier-split-v2

Conversation

@mchen-sentry
Copy link
Copy Markdown
Member

Tests that don't use Snuba still pay full Snuba startup cost (~50s/shard) today. This splits the suite into tier1 (postgres-only, 5 shards, no Snuba) and tier2 (full stack, 17 shards) — same 22 total shards, but tier1 shards finish in ~12m instead of ~18m.

The split is driven by a classify-services workflow (run once per branch, on-demand) that instruments each test's socket calls and fixture markers to map files to their service dependencies. The result is stored as an artifact and consumed by the split-tiers job on each backend run.

  • service_classifier.py: pytest plugin — monkey-patches socket.send/sendall during test runs to detect which services each test actually contacts, combined with static fixture marker inspection
  • classify-services.yml: runs classification across 22 shards and merges results
  • split-tests-by-tier.py: splits the merged JSON into tier1/tier2 file lists
  • backend-light job: 5 shards with only postgres + redis-cluster + kafka (no Snuba devservices), uses --dist=loadfile
  • backend-test drops to 17 shards when tiers are active, uses --dist=load for tier2 (snuba-heavy tests have higher per-test variance)
  • Falls back to normal 22-shard run when no classification artifact exists

- service_classifier.py: hybrid static + runtime classification plugin
  that maps each test to its service dependencies (Snuba, Kafka, etc.)
- classify-services.yml: workflow to generate classification across 22 shards
- split-tests-by-tier.py: splits classification into tier1 (postgres-only)
  and tier2 (full Snuba stack) test lists
- backend.yml: add split-tiers + backend-light jobs, wire backend-test
  to use tier2 list when classification is available
- Selective testing (PRs) and tiers (master) are mutually exclusive
Hybrid distribution mode based on experiment data: --dist=load cuts tier 2
shard-time variance by 54% (179s -> 82s spread) by load-balancing individual
tests across workers, but hurts tier 1 (where small fast tests benefit from
fixture reuse via loadfile). Apply load only when tiers are active.

Backend-test without tiers (selective PRs, master without classification)
keeps --dist=loadfile for backwards compatibility.
@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant