feat(ci): two-tier test split with service classification by mchen-sentry · Pull Request #115434 · getsentry/sentry

mchen-sentry · 2026-05-12T21:28:47Z

Tests that don't use Snuba still pay full Snuba startup cost (~50s/shard) today. This splits the suite into tier1 (postgres-only, 5 shards, no Snuba) and tier2 (full stack, 17 shards) — same 22 total shards, but tier1 shards finish in ~12m instead of ~18m.

The split is driven by a classify-services workflow (run once per branch, on-demand) that instruments each test's socket calls and fixture markers to map files to their service dependencies. The result is stored as an artifact and consumed by the split-tiers job on each backend run.

service_classifier.py: pytest plugin — monkey-patches socket.send/sendall during test runs to detect which services each test actually contacts, combined with static fixture marker inspection
classify-services.yml: runs classification across 22 shards and merges results
split-tests-by-tier.py: splits the merged JSON into tier1/tier2 file lists
backend-light job: 5 shards with only postgres + redis-cluster + kafka (no Snuba devservices), uses --dist=loadfile
backend-test drops to 17 shards when tiers are active, uses --dist=load for tier2 (snuba-heavy tests have higher per-test variance)
Falls back to normal 22-shard run when no classification artifact exists

- service_classifier.py: hybrid static + runtime classification plugin that maps each test to its service dependencies (Snuba, Kafka, etc.) - classify-services.yml: workflow to generate classification across 22 shards - split-tests-by-tier.py: splits classification into tier1 (postgres-only) and tier2 (full Snuba stack) test lists - backend.yml: add split-tiers + backend-light jobs, wire backend-test to use tier2 list when classification is available - Selective testing (PRs) and tiers (master) are mutually exclusive

Hybrid distribution mode based on experiment data: --dist=load cuts tier 2 shard-time variance by 54% (179s -> 82s spread) by load-balancing individual tests across workers, but hurts tier 1 (where small fast tests benefit from fixture reuse via loadfile). Apply load only when tiers are active. Backend-test without tiers (selective PRs, master without classification) keeps --dist=loadfile for backwards compatibility.

mchen-sentry added 8 commits May 12, 2026 14:27

fix(ci): correct mypy type ignore codes in service_classifier

f9df328

fix(ci): broaden mypy ignores for socket monkey-patching

4731854

fix(ci): add redis-cluster/kafka service containers to backend-light

86349c4

fix(ci): reduce backend-test to 17 shards when tiers active (5+17=22)

b862333

fix(ci): filter classify runs by conclusion via jq, not --status flag

f7c4dae

feat(ci): two-tier test split with service classification

a8916c4

github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ci): two-tier test split with service classification#115434

feat(ci): two-tier test split with service classification#115434
mchen-sentry wants to merge 8 commits into
masterfrom
mchen/ci-two-tier-split-v2

mchen-sentry commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mchen-sentry commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant