Summary
The current replay coverage is useful, but too much of the core smoke/behavior validation still depends on Apple system apps and simulator-specific surfaces. That makes the suite less representative of real mobile app development patterns and more sensitive to OS churn and host-environment flakiness.
Proposed Improvements
- Move most smoke and interaction coverage off system apps and onto a controlled fixture app with stable accessibility identifiers.
- Keep system-app coverage only where it validates platform-specific behavior:
- app launch / close
- deep links
- permissions / alerts
- home / app switcher
- screenshots / recordings
- Split replay coverage by intent:
- smoke: fast, minimal confidence checks
- behavior: selectors, fill, scroll, keyboard, lifecycle, alerts
- benchmark/soak: longer end-to-end journeys
- Add flows that better match real mobile app usage:
- cold start -> deep link -> navigation
- text entry + keyboard show/dismiss
- background -> foreground resume
- modal/sheet open/close
- permission prompt accept/dismiss/recovery
- interrupted flow after home/app switcher
- long-list rediscovery and scroll recovery
- Add one explicit host-focus regression for iOS simulator so we catch cases where Simulator steals macOS focus during automation.
- Prefer structural assertions over screenshot dependence in CI where possible.
Why This Matters
- System Settings / Simulator UI changes are outside our control and make otherwise-valid tests flaky.
- A fixture app gives us stable selectors and lets us model the patterns real mobile teams actually care about.
- The iOS simulator focus issue is a host-OS-visible regression and should have dedicated end-to-end coverage.
Suggested Starting Point
- Build a small fixture/demo app with stable ids and common mobile UI patterns.
- Migrate smoke and behavior replays from Settings/system-app journeys to that fixture.
- Keep benchmarks as longer, more realistic user journeys.
- Add a focused iOS simulator “does not steal host focus” regression test.
Summary
The current replay coverage is useful, but too much of the core smoke/behavior validation still depends on Apple system apps and simulator-specific surfaces. That makes the suite less representative of real mobile app development patterns and more sensitive to OS churn and host-environment flakiness.
Proposed Improvements
Why This Matters
Suggested Starting Point