Description
Using Workflow DevKit with Postgres World, I’m observing non-durable behavior for sleep() when the worker/server is down at the moment the sleep duration expires.
According to the docs:
- sleep: Suspends a workflow for a specified duration or until an end date without consuming any resources. Once the duration or end date passes, the workflow will resume execution.
- Postgres World is described as a long‑running, persistent world using PostgreSQL + pg-boss.
Based on this, I expect sleep() to be durable across process crashes, even if the worker is offline at the exact wake-up time.
Observed behavior:
- If the worker comes back before the sleep duration elapses, the workflow resumes as expected.
- If the worker comes back after the sleep duration has already passed, the workflow never resumes and the run remains stuck in a running state.
Environment
- World:
postgres-world
- Worker: running with environment variables per docs:
WORKFLOW_TARGET_WORLD=postgres-world
WORKFLOW_POSTGRES_URL=<same for app and worker>
WORKFLOW_POSTGRES_WORKER_CONCURRENCY set (default / reasonable value)
- The worker and app both point to the same PostgreSQL instance
Reproduction steps
Workflow Test Scenario
- Log:
"before sleep"
sleep(5000) (5 seconds)
- Log:
"after sleep"
Scenario A (works as expected)
[10:00] Start workflow; it calls sleep(5s)
[10:01] Crash/stop the server/worker
[10:04] Restart the server/worker (before 5s elapsed)
[10:05] Workflow resumes and logs "after sleep"
Scenario B (buggy behavior)
[10:00] Start workflow; it calls sleep(5s)
[10:01] Crash/stop the server/worker
[10:05+] Restart the server/worker (after 5s elapsed)
Result:
- Workflow does not resume
- Run remains stuck in
running state in the database
- No further steps execute
The only difference between scenarios A and B is whether the worker is alive at the wake-up time.
Minimal workflow:
import { sleep } from "workflow";
export async function handleUserSignup() {
"use workflow";
await logStarted();
await sleep(2000);
await log1();
await sleep(2000);
await log2();
return { status: "success" };
}
async function logStarted() {
"use step";
console.log("Started");
}
async function log1() {
"use step";
console.log("Log 1 - Stop here to test crash recovery");
}
async function log2() {
"use step";
console.log("Log 2 - Success!");
}
Description
Using Workflow DevKit with Postgres World, I’m observing non-durable behavior for
sleep()when the worker/server is down at the moment the sleep duration expires.According to the docs:
Based on this, I expect
sleep()to be durable across process crashes, even if the worker is offline at the exact wake-up time.Observed behavior:
Environment
postgres-worldWORKFLOW_TARGET_WORLD=postgres-worldWORKFLOW_POSTGRES_URL=<same for app and worker>WORKFLOW_POSTGRES_WORKER_CONCURRENCYset (default / reasonable value)Reproduction steps
Workflow Test Scenario
"before sleep"sleep(5000)(5 seconds)"after sleep"Scenario A (works as expected)
[10:00]Start workflow; it callssleep(5s)[10:01]Crash/stop the server/worker[10:04]Restart the server/worker (before 5s elapsed)[10:05]Workflow resumes and logs"after sleep"Scenario B (buggy behavior)
[10:00]Start workflow; it callssleep(5s)[10:01]Crash/stop the server/worker[10:05+]Restart the server/worker (after 5s elapsed)Result:
runningstate in the databaseMinimal workflow: