Skip to content

Make resumeHook() resilient to transient hook_received event write failures#1834

Open
TooTallNate wants to merge 1 commit intomainfrom
resilient-resume-hook
Open

Make resumeHook() resilient to transient hook_received event write failures#1834
TooTallNate wants to merge 1 commit intomainfrom
resilient-resume-hook

Conversation

@TooTallNate
Copy link
Copy Markdown
Member

Summary

Brings resumeHook() to feature-parity with start()'s resilient-start behavior: when events.create('hook_received') fails with a transient 429/5xx but the queue dispatch succeeded, the workflow runtime materializes the missing hook_received event from a payload carried on the queue message.

  • Adds optional hookInput field on WorkflowInvokePayloadSchema (hookId + resumeId + payload) and optional resumeId on HookReceivedEventSchema.eventData. Both additive/optional — no server changes required (verified against workflow-server's hook_received write path).
  • resumeHook() returns ResumedHook = Hook & { resilientResume?: boolean }; the flag is set when the fallback path was taken.
  • Uses sequential events.create → queue (not parallel): hook_received events have no entity-level conflict guard, so a duplicate written before the direct write commits would double-deliver to the workflow. Tradeoff is slightly more latency on the happy path in exchange for correctness.
  • Extracts isRetryableEventError into a shared helper used by both start() and resumeHook().

Scope

Does not extend the pattern to other primitives (steps, waits, wakeUp(), cancelRun()) — only start()/resumeHook() meet all three conditions that justify this pattern: externally-initiated + paired-with-queue-dispatch + carries-data-the-runtime-cannot-reconstruct. Step/wait events either run inside durable queue handlers (retry handles them) or carry no payload (a lighter mechanism would suffice).

Test plan

  • Added packages/core/src/runtime/resume-hook.test.ts (9 unit tests): happy path, all failure combinations (retryable/non-retryable events error, queue error, both), sequential-ordering check, legacy-spec fail-fast.
  • Added e2e test resilient resume: hookWorkflow receives payload when hook_received returns 500 in packages/core/e2e/e2e.test.ts — stubs the world to make events.create('hook_received') throw 500 and verifies the workflow still receives the payload.
  • Verified against nextjs-turbopack workbench locally: all 13 hook-related e2e tests pass (including the new one and existing resilient-start).
  • Caught and fixed a dedup race via @workflow/world-testing's hooks test (initial parallel-dispatch approach was double-delivering payloads; switched to sequential).

…ilures

When events.create('hook_received') fails with a retryable error (429/5xx),
resumeHook() now dispatches the queue message with a `hookInput` payload
carrying the dehydrated hook payload. The workflow runtime materializes the
missing hook_received event from that payload on its next delivery, mirroring
the existing resilient-start behavior of start() / run_created / run_started.

Returned Hook carries a new `resilientResume: true` flag when the fallback
path was taken. Both write paths share a client-minted `resumeId` as an
idempotency key so the runtime can dedup if the direct write actually
committed but the client saw a transient error.

Uses a sequential write-then-queue flow (not parallel) to avoid a dedup race
on the happy path: hook_received events have no entity-level conflict guard
(unlike run_created), so a duplicate written before the direct write commits
would double-deliver the payload to the workflow.
Copilot AI review requested due to automatic review settings April 23, 2026 06:52
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
example-nextjs-workflow-turbopack Ready Ready Preview, Comment Apr 23, 2026 6:56am
example-nextjs-workflow-webpack Ready Ready Preview, Comment Apr 23, 2026 6:56am
example-workflow Ready Ready Preview, Comment Apr 23, 2026 6:56am
workbench-astro-workflow Ready Ready Preview, Comment Apr 23, 2026 6:56am
workbench-express-workflow Ready Ready Preview, Comment Apr 23, 2026 6:56am
workbench-fastify-workflow Ready Ready Preview, Comment Apr 23, 2026 6:56am
workbench-hono-workflow Ready Ready Preview, Comment Apr 23, 2026 6:56am
workbench-nitro-workflow Ready Ready Preview, Comment Apr 23, 2026 6:56am
workbench-nuxt-workflow Ready Ready Preview, Comment Apr 23, 2026 6:56am
workbench-sveltekit-workflow Ready Ready Preview, Comment Apr 23, 2026 6:56am
workbench-vite-workflow Ready Ready Preview, Comment Apr 23, 2026 6:56am
workflow-docs Ready Ready Preview, Comment, Open in v0 Apr 23, 2026 6:56am
workflow-swc-playground Ready Ready Preview, Comment Apr 23, 2026 6:56am
workflow-web Ready Ready Preview, Comment Apr 23, 2026 6:56am

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 23, 2026

🦋 Changeset detected

Latest commit: 63ceeb2

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 21 packages
Name Type
@workflow/core Minor
workflow Minor
@workflow/world Minor
@workflow/builders Patch
@workflow/cli Patch
@workflow/next Patch
@workflow/nitro Patch
@workflow/vitest Patch
@workflow/web-shared Patch
@workflow/web Patch
@workflow/world-testing Patch
@workflow/ai Major
@workflow/world-local Patch
@workflow/world-postgres Patch
@workflow/world-vercel Patch
@workflow/astro Patch
@workflow/nest Patch
@workflow/rollup Patch
@workflow/sveltekit Patch
@workflow/vite Patch
@workflow/nuxt Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

📊 Benchmark Results

📈 Comparing against baseline from main branch. Green 🟢 = faster, Red 🔺 = slower.

workflow with no steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Nitro 0.043s (~) 1.005s (~) 0.962s 10 1.00x
💻 Local Express 0.046s (+3.8%) 1.005s (~) 0.959s 10 1.07x
💻 Local Next.js (Turbopack) 0.048s 1.006s 0.958s 10 1.11x
🐘 Postgres Next.js (Turbopack) 0.054s 1.009s 0.956s 10 1.25x
🐘 Postgres Nitro 0.059s (-38.6% 🟢) 1.011s (-3.1%) 0.952s 10 1.36x
🐘 Postgres Express 0.064s (+10.5% 🔺) 1.010s (~) 0.946s 10 1.49x
workflow with 1 step

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Next.js (Turbopack) 1.113s 2.005s 0.892s 10 1.00x
💻 Local Nitro 1.126s (~) 2.007s (~) 0.881s 10 1.01x
🐘 Postgres Next.js (Turbopack) 1.128s 2.009s 0.880s 10 1.01x
💻 Local Express 1.133s (+0.7%) 2.006s (~) 0.873s 10 1.02x
🐘 Postgres Express 1.140s (-0.6%) 2.010s (~) 0.870s 10 1.02x
🐘 Postgres Nitro 1.155s (+1.3%) 2.010s (~) 0.856s 10 1.04x
workflow with 10 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Next.js (Turbopack) 10.827s 11.023s 0.196s 3 1.00x
🐘 Postgres Next.js (Turbopack) 10.834s 11.022s 0.188s 3 1.00x
🐘 Postgres Express 10.855s (-1.0%) 11.018s (~) 0.163s 3 1.00x
🐘 Postgres Nitro 10.924s (~) 11.024s (~) 0.100s 3 1.01x
💻 Local Nitro 10.925s (~) 11.022s (~) 0.097s 3 1.01x
💻 Local Express 11.009s (+0.8%) 11.694s (+6.1% 🔺) 0.684s 3 1.02x
workflow with 25 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 14.469s 15.025s 0.557s 4 1.00x
🐘 Postgres Nitro 14.550s (~) 15.024s (~) 0.475s 4 1.01x
💻 Local Next.js (Turbopack) 14.600s 15.027s 0.428s 4 1.01x
🐘 Postgres Express 14.625s (~) 15.023s (~) 0.398s 4 1.01x
💻 Local Nitro 14.988s (~) 15.028s (-6.3% 🟢) 0.040s 4 1.04x
💻 Local Express 15.078s (+0.7%) 16.033s (+6.7% 🔺) 0.956s 4 1.04x
workflow with 50 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 13.631s 14.020s 0.389s 7 1.00x
🐘 Postgres Nitro 13.966s (~) 14.451s (+1.0%) 0.485s 7 1.02x
🐘 Postgres Express 14.005s (~) 14.451s (-1.0%) 0.446s 7 1.03x
💻 Local Next.js (Turbopack) 16.037s 16.529s 0.493s 6 1.18x
💻 Local Express 16.718s (+0.7%) 17.034s (~) 0.316s 6 1.23x
💻 Local Nitro 16.758s (~) 17.032s (~) 0.274s 6 1.23x
Promise.all with 10 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 1.215s 2.010s 0.795s 15 1.00x
🐘 Postgres Nitro 1.273s (~) 2.011s (~) 0.737s 15 1.05x
🐘 Postgres Express 1.275s (+1.1%) 2.009s (~) 0.734s 15 1.05x
💻 Local Express 1.492s (~) 2.006s (~) 0.513s 15 1.23x
💻 Local Next.js (Turbopack) 1.515s 2.007s 0.491s 15 1.25x
💻 Local Nitro 1.529s (-6.3% 🟢) 2.006s (-3.3%) 0.477s 15 1.26x
Promise.all with 25 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Nitro 2.327s (-1.0%) 3.009s (~) 0.682s 10 1.00x
🐘 Postgres Express 2.356s (~) 3.009s (~) 0.653s 10 1.01x
🐘 Postgres Next.js (Turbopack) 2.404s 3.008s 0.604s 10 1.03x
💻 Local Next.js (Turbopack) 2.813s 3.008s 0.196s 10 1.21x
💻 Local Express 2.921s (-1.1%) 3.308s (-4.2%) 0.387s 10 1.26x
💻 Local Nitro 3.146s (~) 3.886s (~) 0.739s 8 1.35x
Promise.all with 50 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Nitro 3.476s (~) 4.011s (~) 0.535s 8 1.00x
🐘 Postgres Express 3.483s (~) 4.011s (~) 0.528s 8 1.00x
🐘 Postgres Next.js (Turbopack) 3.677s 4.011s 0.334s 8 1.06x
💻 Local Express 7.761s (-6.9% 🟢) 8.021s (-11.1% 🟢) 0.260s 4 2.23x
💻 Local Next.js (Turbopack) 8.167s 8.771s 0.604s 4 2.35x
💻 Local Nitro 8.398s (+0.6%) 9.022s (~) 0.625s 4 2.42x
Promise.race with 10 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 1.235s 2.008s 0.773s 15 1.00x
🐘 Postgres Express 1.258s (~) 2.009s (~) 0.751s 15 1.02x
🐘 Postgres Nitro 1.271s (+1.1%) 2.008s (~) 0.737s 15 1.03x
💻 Local Next.js (Turbopack) 1.517s 2.007s 0.489s 15 1.23x
💻 Local Nitro 1.527s (-18.1% 🟢) 2.006s (-14.3% 🟢) 0.479s 15 1.24x
💻 Local Express 1.555s (-17.9% 🟢) 2.006s (-15.2% 🟢) 0.450s 15 1.26x
Promise.race with 25 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Express 2.325s (-0.7%) 3.010s (~) 0.685s 10 1.00x
🐘 Postgres Nitro 2.347s (~) 3.009s (~) 0.662s 10 1.01x
🐘 Postgres Next.js (Turbopack) 2.365s 3.010s 0.645s 10 1.02x
💻 Local Next.js (Turbopack) 2.952s 3.676s 0.724s 9 1.27x
💻 Local Express 3.111s (-0.7%) 3.886s (+3.3%) 0.775s 8 1.34x
💻 Local Nitro 3.196s (+4.3%) 4.011s (+3.2%) 0.815s 8 1.37x
Promise.race with 50 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Nitro 3.469s (~) 4.011s (~) 0.542s 8 1.00x
🐘 Postgres Express 3.483s (~) 4.011s (~) 0.527s 8 1.00x
🐘 Postgres Next.js (Turbopack) 3.621s 4.011s 0.390s 8 1.04x
💻 Local Next.js (Turbopack) 8.024s 8.518s 0.494s 4 2.31x
💻 Local Nitro 8.991s (-1.7%) 9.774s (-2.5%) 0.783s 4 2.59x
💻 Local Express 9.273s (+5.4% 🔺) 10.026s (+8.1% 🔺) 0.753s 3 2.67x
workflow with 10 sequential data payload steps (10KB)

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 0.745s 1.006s 0.261s 60 1.00x
🐘 Postgres Nitro 0.809s (-1.4%) 1.006s (~) 0.197s 60 1.09x
🐘 Postgres Express 0.823s (-2.0%) 1.006s (-1.6%) 0.184s 60 1.10x
💻 Local Next.js (Turbopack) 0.846s 1.021s 0.175s 59 1.14x
💻 Local Nitro 0.989s (+0.9%) 1.250s (+14.3% 🔺) 0.260s 49 1.33x
💻 Local Express 1.033s (+5.0%) 1.881s (+74.8% 🔺) 0.848s 32 1.39x
workflow with 25 sequential data payload steps (10KB)

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 1.815s 2.007s 0.191s 45 1.00x
🐘 Postgres Express 1.964s (-0.6%) 2.257s (~) 0.293s 40 1.08x
🐘 Postgres Nitro 1.983s (+2.9%) 2.284s (+8.7% 🔺) 0.300s 40 1.09x
💻 Local Next.js (Turbopack) 2.728s 3.008s 0.280s 30 1.50x
💻 Local Nitro 3.062s (+0.9%) 3.884s (+3.3%) 0.823s 24 1.69x
💻 Local Express 3.133s (+3.9%) 3.885s (+8.3% 🔺) 0.752s 24 1.73x
workflow with 50 sequential data payload steps (10KB)

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 3.683s 4.010s 0.326s 30 1.00x
🐘 Postgres Nitro 3.898s (-5.0%) 4.148s (-9.9% 🟢) 0.249s 29 1.06x
🐘 Postgres Express 3.955s (-0.9%) 4.295s (-1.7%) 0.340s 28 1.07x
💻 Local Next.js (Turbopack) 8.741s 9.017s 0.277s 14 2.37x
💻 Local Express 9.026s (-2.0%) 9.556s (-4.6%) 0.531s 13 2.45x
💻 Local Nitro 9.240s (-0.6%) 10.018s (~) 0.778s 12 2.51x
workflow with 10 concurrent data payload steps (10KB)

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 0.243s 1.007s 0.764s 60 1.00x
🐘 Postgres Nitro 0.286s (+0.9%) 1.007s (~) 0.721s 60 1.18x
🐘 Postgres Express 0.287s (+1.7%) 1.007s (~) 0.720s 60 1.18x
💻 Local Next.js (Turbopack) 0.570s 1.004s 0.434s 60 2.35x
💻 Local Express 0.583s (+4.0%) 1.022s (+1.7%) 0.439s 59 2.40x
💻 Local Nitro 0.593s (-1.9%) 1.004s (-1.7%) 0.411s 60 2.44x
workflow with 25 concurrent data payload steps (10KB)

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 0.464s 1.006s 0.542s 90 1.00x
🐘 Postgres Nitro 0.495s (~) 1.006s (~) 0.512s 90 1.07x
🐘 Postgres Express 0.508s (~) 1.007s (~) 0.499s 90 1.09x
💻 Local Express 2.478s (-1.4%) 3.009s (~) 0.530s 30 5.34x
💻 Local Nitro 2.587s (+1.9%) 3.010s (~) 0.422s 30 5.58x
💻 Local Next.js (Turbopack) 2.588s 3.009s 0.421s 30 5.58x
workflow with 50 concurrent data payload steps (10KB)

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 0.728s 1.005s 0.277s 120 1.00x
🐘 Postgres Nitro 0.791s (~) 1.008s (~) 0.217s 120 1.09x
🐘 Postgres Express 0.794s (-3.1%) 1.008s (-0.9%) 0.215s 120 1.09x
💻 Local Next.js (Turbopack) 10.717s 11.299s 0.581s 11 14.72x
💻 Local Express 10.864s (-2.9%) 11.573s (-3.1%) 0.709s 11 14.92x
💻 Local Nitro 11.357s (+1.5%) 12.031s (+3.1%) 0.674s 10 15.59x
Stream Benchmarks (includes TTFB metrics)
workflow with stream

💻 Local Development

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Next.js (Turbopack) 0.183s 1.003s 0.012s 1.018s 0.835s 10 1.00x
🐘 Postgres Next.js (Turbopack) 0.186s 1.001s 0.002s 1.010s 0.824s 10 1.02x
🐘 Postgres Nitro 0.201s (-1.8%) 0.997s (~) 0.002s (+6.7% 🔺) 1.010s (~) 0.809s 10 1.10x
💻 Local Nitro 0.202s (-5.3% 🟢) 1.004s (~) 0.013s (~) 1.019s (~) 0.816s 10 1.11x
🐘 Postgres Express 0.205s (~) 1.000s (~) 0.001s (-12.5% 🟢) 1.010s (~) 0.805s 10 1.12x
💻 Local Express 0.223s (+12.1% 🔺) 1.004s (~) 0.011s (-12.4% 🟢) 1.017s (~) 0.794s 10 1.22x
stream pipeline with 5 transform steps (1MB)

💻 Local Development

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 0.601s 1.027s 0.004s 1.039s 0.439s 58 1.00x
🐘 Postgres Nitro 0.607s (-2.7%) 1.005s (~) 0.004s (~) 1.022s (~) 0.415s 59 1.01x
🐘 Postgres Express 0.620s (-1.6%) 1.006s (~) 0.004s (+11.5% 🔺) 1.022s (~) 0.402s 59 1.03x
💻 Local Next.js (Turbopack) 0.665s 1.012s 0.010s 1.024s 0.359s 59 1.11x
💻 Local Express 0.783s (+3.5%) 1.013s (-1.6%) 0.010s (+1.9%) 1.024s (-1.5%) 0.241s 59 1.30x
💻 Local Nitro 0.965s (+15.0% 🔺) 1.011s (~) 0.011s (+13.5% 🔺) 1.228s (+10.1% 🔺) 0.263s 49 1.61x
10 parallel streams (1MB each)

💻 Local Development

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 0.899s 1.072s 0.000s 1.082s 0.182s 56 1.00x
🐘 Postgres Express 0.964s (~) 1.220s (-4.5%) 0.000s (-6.1% 🟢) 1.232s (-5.7% 🟢) 0.269s 49 1.07x
🐘 Postgres Nitro 0.970s (~) 1.169s (-6.3% 🟢) 0.000s (-100.0% 🟢) 1.184s (-5.8% 🟢) 0.214s 51 1.08x
💻 Local Nitro 1.234s (+1.0%) 2.020s (~) 0.000s (+233.3% 🔺) 2.022s (~) 0.788s 30 1.37x
💻 Local Express 1.242s (+1.4%) 2.022s (~) 0.000s (-20.0% 🟢) 2.024s (~) 0.782s 30 1.38x
💻 Local Next.js (Turbopack) 1.269s 2.021s 0.000s 2.024s 0.755s 30 1.41x
fan-out fan-in 10 streams (1MB each)

💻 Local Development

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 1.750s 2.146s 0.000s 2.152s 0.402s 28 1.00x
🐘 Postgres Nitro 1.820s (+1.6%) 2.139s (~) 0.000s (+100.0% 🔺) 2.152s (-1.0%) 0.332s 28 1.04x
🐘 Postgres Express 1.823s (+2.9%) 2.179s (~) 0.000s (+Infinity% 🔺) 2.189s (~) 0.366s 28 1.04x
💻 Local Express 3.438s (-0.8%) 4.034s (~) 0.001s (+58.3% 🔺) 4.038s (~) 0.599s 15 1.96x
💻 Local Nitro 3.503s (+3.4%) 4.099s (+1.7%) 0.000s (-37.5% 🟢) 4.101s (+1.6%) 0.598s 15 2.00x
💻 Local Next.js (Turbopack) 3.693s 4.098s 0.001s 4.102s 0.409s 15 2.11x

Summary

Fastest Framework by World

Winner determined by most benchmark wins

World 🥇 Fastest Framework Wins
💻 Local Next.js (Turbopack) 15/21
🐘 Postgres Next.js (Turbopack) 17/21
Fastest World by Framework

Winner determined by most benchmark wins

Framework 🥇 Fastest World Wins
Express 🐘 Postgres 19/21
Next.js (Turbopack) 🐘 Postgres 17/21
Nitro 🐘 Postgres 19/21
Column Definitions
  • Workflow Time: Runtime reported by workflow (completedAt - createdAt) - primary metric
  • TTFB: Time to First Byte - time from workflow start until first stream byte received (stream benchmarks only)
  • Slurp: Time from first byte to complete stream consumption (stream benchmarks only)
  • Wall Time: Total testbench time (trigger workflow + poll for result)
  • Overhead: Testbench overhead (Wall Time - Workflow Time)
  • Samples: Number of benchmark iterations run
  • vs Fastest: How much slower compared to the fastest configuration for this benchmark

Worlds:

  • 💻 Local: In-memory filesystem world (local development)
  • 🐘 Postgres: PostgreSQL database world (local development)
  • ▲ Vercel: Vercel production/preview deployment
  • 🌐 Turso: Community world (local development)
  • 🌐 MongoDB: Community world (local development)
  • 🌐 Redis: Community world (local development)
  • 🌐 Jazz: Community world (local development)

📋 View full workflow run


Some benchmark jobs failed:

  • Local: success
  • Postgres: success
  • Vercel: failure

Check the workflow run for details.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

🧪 E2E Test Results

Some tests failed

Summary

Passed Failed Skipped Total
❌ 💻 Local Development 1064 2 86 1152
✅ 📦 Local Production 1066 0 86 1152
✅ 🐘 Local Postgres 1066 0 86 1152
✅ 📋 Other 270 0 18 288
Total 3466 2 276 3744

❌ Failed Tests

💻 Local Development (2 failed)

vite-stable (2 failed):

  • error handling error propagation step errors basic step error preserves message and stack trace
  • error handling error propagation step errors cross-file step error preserves message and function names in stack

Details by Category

❌ 💻 Local Development
App Passed Failed Skipped
✅ astro-stable 90 0 6
✅ express-stable 90 0 6
✅ fastify-stable 90 0 6
✅ hono-stable 90 0 6
✅ nextjs-turbopack-canary 77 0 19
✅ nextjs-turbopack-stable 96 0 0
✅ nextjs-webpack-canary 77 0 19
✅ nextjs-webpack-stable 96 0 0
✅ nitro-stable 90 0 6
✅ nuxt-stable 90 0 6
✅ sveltekit-stable 90 0 6
❌ vite-stable 88 2 6
✅ 📦 Local Production
App Passed Failed Skipped
✅ astro-stable 90 0 6
✅ express-stable 90 0 6
✅ fastify-stable 90 0 6
✅ hono-stable 90 0 6
✅ nextjs-turbopack-canary 77 0 19
✅ nextjs-turbopack-stable 96 0 0
✅ nextjs-webpack-canary 77 0 19
✅ nextjs-webpack-stable 96 0 0
✅ nitro-stable 90 0 6
✅ nuxt-stable 90 0 6
✅ sveltekit-stable 90 0 6
✅ vite-stable 90 0 6
✅ 🐘 Local Postgres
App Passed Failed Skipped
✅ astro-stable 90 0 6
✅ express-stable 90 0 6
✅ fastify-stable 90 0 6
✅ hono-stable 90 0 6
✅ nextjs-turbopack-canary 77 0 19
✅ nextjs-turbopack-stable 96 0 0
✅ nextjs-webpack-canary 77 0 19
✅ nextjs-webpack-stable 96 0 0
✅ nitro-stable 90 0 6
✅ nuxt-stable 90 0 6
✅ sveltekit-stable 90 0 6
✅ vite-stable 90 0 6
✅ 📋 Other
App Passed Failed Skipped
✅ e2e-local-dev-nest-stable 90 0 6
✅ e2e-local-postgres-nest-stable 90 0 6
✅ e2e-local-prod-nest-stable 90 0 6

📋 View full workflow run


Some E2E test jobs failed:

  • Vercel Prod: failure
  • Local Dev: failure
  • Local Prod: success
  • Local Postgres: success
  • Windows: cancelled

Check the workflow run for details.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds resilient behavior to resumeHook() to mirror start()’s resilient-start pattern: if hook_received event creation fails transiently (429/5xx) but queue dispatch succeeds, the runtime can reconstruct (“materialize”) the missing hook_received event from data carried on the queue message.

Changes:

  • Extend queue payload and hook event schemas to optionally carry hookInput (hookId, resumeId, payload) and resumeId for dedup/materialization.
  • Update resumeHook() to mint a resumeId, attempt a direct hook_received write first, and fall back to queue-carried hookInput only on retryable event-write failures.
  • Add runtime-side materialization logic plus unit/e2e tests, and extract isRetryableEventError for shared use.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
packages/world/src/queue.ts Adds HookInputSchema and optional hookInput on workflow invoke payloads.
packages/world/src/events.ts Adds optional resumeId to hook_received event data for dedup.
packages/core/src/telemetry/semantic-conventions.ts Adds span attributes for resilient resume and materialization.
packages/core/src/runtime/start.ts Switches to shared isRetryableEventError helper.
packages/core/src/runtime/resume-hook.ts Implements sequential write-then-queue behavior and resilient fallback signaling.
packages/core/src/runtime/resume-hook.test.ts Adds unit coverage for resilient resume behavior and ordering.
packages/core/src/runtime/helpers.ts Extracts shared isRetryableEventError.
packages/core/src/runtime.ts Materializes missing hook_received from hookInput during workflow execution.
packages/core/e2e/e2e.test.ts Adds e2e validating payload delivery when hook_received write fails with 500.
.changeset/resilient-resume-hook.md Declares minor bumps and documents the new resilient resume behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +98 to +106
* ## Resilient resume
*
* `resumeHook()` fires the `hook_received` event creation and the workflow
* queue dispatch in parallel. If the event creation fails with a retryable
* error (429/5xx) but the queue dispatch succeeds, the workflow runtime will
* materialize the missing `hook_received` event from the payload carried on
* the queue message — the returned hook has `resilientResume: true` to
* signal this fallback path was taken. This mirrors the resilient-start
* behavior of {@link start}.
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSDoc for the “Resilient resume” section says resumeHook() fires events.create and the queue dispatch in parallel, but the implementation below is explicitly sequential (events.create first, then queue) to avoid the dedup race. Update the doc comment to match the actual behavior so callers/operators aren’t misled about ordering/latency and failure modes.

Copilot uses AI. Check for mistakes.
Comment on lines +456 to +464
// When `resumeHook()` fires its hook_received event write and
// queue dispatch in parallel, the event write may fail with
// a transient 429/5xx while the queue dispatch succeeds.
// In that case `hookInput` is present on the queue payload,
// carrying the dehydrated payload + a client-minted
// idempotency key (`resumeId`). If no existing hook_received
// event already carries that `resumeId`, we materialize one
// here so the workflow replay sees the payload. Mirrors
// `start()`'s resilient path for run_created → run_started.
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment explains resilient resume in terms of resumeHook() writing the event and queueing “in parallel”, but resumeHook() is now sequential specifically to avoid the duplicate-materialization race. Please update this block to reflect the current ordering and the actual condition for hookInput being present (only when the direct write failed with a retryable error).

Suggested change
// When `resumeHook()` fires its hook_received event write and
// queue dispatch in parallel, the event write may fail with
// a transient 429/5xx while the queue dispatch succeeds.
// In that case `hookInput` is present on the queue payload,
// carrying the dehydrated payload + a client-minted
// idempotency key (`resumeId`). If no existing hook_received
// event already carries that `resumeId`, we materialize one
// here so the workflow replay sees the payload. Mirrors
// `start()`'s resilient path for run_created → run_started.
// `resumeHook()` now tries to write `hook_received` first and
// only enqueues a resume carrying `hookInput` if that direct
// write fails with a retryable error (for example, a transient
// 429/5xx). In that recovery path, `hookInput` contains the
// dehydrated payload plus the client-minted idempotency key
// (`resumeId`). If no existing `hook_received` event already
// carries that `resumeId`, we materialize one here so replay
// can see the payload while avoiding duplicate
// materialization.

Copilot uses AI. Check for mistakes.
Comment on lines +270 to +279

describe('isRetryableEventError', () => {
// Indirectly tested via resumeHook above. The helper is also unit-covered
// via start.test.ts's resilient start suite; no duplicate tests needed.
it('is exercised via resumeHook resilient resume tests', () => {
expect(SPEC_VERSION_CURRENT).toBeGreaterThanOrEqual(
SPEC_VERSION_SUPPORTS_CBOR_QUEUE_TRANSPORT
);
});
});
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The describe('isRetryableEventError') block doesn’t actually test isRetryableEventError—it only asserts a relationship between spec version constants. This is brittle (unrelated constant changes could fail the suite) and provides no coverage of the helper’s behavior. Either remove this block or replace it with direct unit tests for isRetryableEventError (e.g., in helpers.test.ts).

Suggested change
describe('isRetryableEventError', () => {
// Indirectly tested via resumeHook above. The helper is also unit-covered
// via start.test.ts's resilient start suite; no duplicate tests needed.
it('is exercised via resumeHook resilient resume tests', () => {
expect(SPEC_VERSION_CURRENT).toBeGreaterThanOrEqual(
SPEC_VERSION_SUPPORTS_CBOR_QUEUE_TRANSPORT
);
});
});

Copilot uses AI. Check for mistakes.
@pranaygp pranaygp requested a review from VaguelySerious April 23, 2026 16:21
Copy link
Copy Markdown
Member

@VaguelySerious VaguelySerious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, seems like a straight improvement.

I wish we had world-side constraints that allowed us to parallelize the calls like with start. Alternatively, we could completely drop the event creation and just do the queue, but that wouldn't be backwards compatible, so it'd be a bigger headache. So current PR state is fine.

Comment on lines +229 to +233
// First, attempt the direct hook_received event write. This is
// sequential (not parallel with queue dispatch) to avoid a race
// where the queue handler processes the message before the event
// write has committed, which would otherwise cause the runtime
// fallback to materialize a duplicate hook_received event.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it? We don't have this issue with lazy start, and that's what the idempotency key is for, right? Or I guess it would require World-side support and we're avoiding that to keep the scope small?

Comment on lines +296 to +297
'Hook event creation failed, but the workflow was re-triggered via the queue. ' +
'The hook_received event will be materialized by the runtime via the resilient resume path.',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit verbose

Suggested change
'Hook event creation failed, but the workflow was re-triggered via the queue. ' +
'The hook_received event will be materialized by the runtime via the resilient resume path.',
'hook_received event could not immediately be created, re-trying via queue.',

(e.eventData as { resumeId?: string } | undefined)
?.resumeId === hookInput.resumeId
);
if (!alreadyMaterialized) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd usually say this isn't safe (TOCTOU race): We should pass resumeId to the world and the world should enforce idempotency if possible. I know it's hard in this case, because the World might not be able to enforce uniqueness on resumeId during insert.

However, since events.create -> queue is in sequence, and we only send hookInput if the former fails, this seems like a really niche extra check that doesn't hurt, though I'd assume alreadyMaterialized to always be false (unless there's another race condition) given the above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants