Summary
packages/core/e2e/dev.test.ts > dev e2e > should rebuild on imported step dependency change is failing intermittently on Windows runners with Turbopack. The other E2E suites (Local Dev, Local Postgres, Local Prod, Vercel Prod) and the Linux unit suite all pass — only the Windows shard hits this.
Symptom
The test rewrites a workflow's imported step file, then polls the manifest for ~50s waiting for the new step name to appear. Every poll attempt triggers a workflow run; on the failing runs that trigger gets HTTP 500:
Failed to trigger workflow "importedStepOnlyWorkflow": 500
Error: Could not parse module '[project]/packages/core/dist/runtime/start.js', file not found
After the timeout vitest reports:
× should rebuild on imported step dependency change ~50s
FAIL packages/core/e2e/dev.test.ts > dev e2e > should rebuild on imported step dependency change
Error: Timed out after 50000ms waiting for manifest.json to include imported step hot-reload marker.
Last error: Failed to trigger workflow "importedStepOnlyWorkflow": 500
Root cause (already documented in the test)
The test source at packages/core/e2e/dev.test.ts:270-279 already calls this out:
Turbopack on Windows occasionally caches a stale resolver failure (e.g. Could not parse module '@workflow/core/dist/runtime/start.js') after an HMR cascade and returns 500 to every request until something invalidates its cache. Rewriting the api file is enough to force a fresh resolve on the next request, so we treat the 500 as transient and keep polling instead of bailing out.
That workaround was added in #1895 (2f52d14f3). It does help — most runs eventually recover — but on the failing runs the cache doesn't get invalidated within the 50s polling window even with the api-file rewrite.
Pre-existing, not caused by any specific PR
So this is blocking PR check status for any PR whose Windows shard happens to hit the flake — including main itself.
Suggestions
- Bump the cache-busting heuristic. The test currently rewrites the api file once per failed poll; could try also touching the workflows file, hitting
/api/_next/clear-cache if available, or restarting the dev server after N consecutive 500s.
- Increase the timeout window specifically on Windows runners (e.g.
if (process.platform === 'win32') timeoutMs = 120_000) — at the cost of slower CI on real failures.
- Mark the test as flaky on Windows with
test.fails / test.runIf(process.platform !== 'win32') so it stops blocking PR check status, and track a real fix separately.
- Investigate Turbopack's resolver-cache invalidation — the underlying issue is that Turbopack on Windows holds a stale "module not found" entry past the point where the file is actually present. This may have a fix upstream.
In the short term option 3 unblocks PRs; longer term option 4 is the right fix.
How to reproduce
# In a Windows runner / VM
cd packages/core
pnpm vitest run e2e/dev.test.ts -t "should rebuild on imported step dependency change"
Re-run the suite a few times; expect intermittent failures with the 500 / stale-resolver error.
Summary
packages/core/e2e/dev.test.ts > dev e2e > should rebuild on imported step dependency changeis failing intermittently on Windows runners with Turbopack. The other E2E suites (Local Dev, Local Postgres, Local Prod, Vercel Prod) and the Linux unit suite all pass — only the Windows shard hits this.Symptom
The test rewrites a workflow's imported step file, then polls the manifest for ~50s waiting for the new step name to appear. Every poll attempt triggers a workflow run; on the failing runs that trigger gets HTTP 500:
After the timeout vitest reports:
Root cause (already documented in the test)
The test source at
packages/core/e2e/dev.test.ts:270-279already calls this out:That workaround was added in #1895 (
2f52d14f3). It does help — most runs eventually recover — but on the failing runs the cache doesn't get invalidated within the 50s polling window even with the api-file rewrite.Pre-existing, not caused by any specific PR
25302018883main1203dae725300558041main059821cb25302467764pgp/serialize-abort-signal6ec2b957So this is blocking PR check status for any PR whose Windows shard happens to hit the flake — including
mainitself.Suggestions
/api/_next/clear-cacheif available, or restarting the dev server after N consecutive 500s.if (process.platform === 'win32') timeoutMs = 120_000) — at the cost of slower CI on real failures.test.fails/test.runIf(process.platform !== 'win32')so it stops blocking PR check status, and track a real fix separately.In the short term option 3 unblocks PRs; longer term option 4 is the right fix.
How to reproduce
Re-run the suite a few times; expect intermittent failures with the 500 / stale-resolver error.