A native durable execution engine that allows developers to write normal code (using loops, conditionals, and standard concurrency) that becomes durable when wrapped in the engine. The engine ensures that workflows can be interrupted at any point and resume from the exact point of failure without re-executing previously completed steps.
-
Step Primitive: Any function with side effects must be wrapped in a
stepcall, which provides durability guarantees through checkpointing. -
Memoization: The engine checks a persistent SQLite database before executing a step. If the step has been executed before for that specific workflow instance, it returns the cached result.
-
Sequence Management: The engine tracks a logical sequence ID for each step execution, allowing it to handle loops and conditional logic correctly, even when steps share the same name.
-
Concurrency Support: Steps can run in parallel using Java's
CompletableFuture, with thread-safe database operations.
- ✅ Durability: Workflows survive crashes and resume from the last completed step
- ✅ Type Safety: Generic-based API supports any return type
- ✅ Concurrency: Parallel step execution with thread-safe database operations
- ✅ Zombie Step Protection: Handles crashes between execution and commit
- ✅ No DSL Required: Write workflows in standard, idiomatic Java code
- ✅ Automatic Sequence IDs: Internal atomic tracking supports loops without manual ID management
- SQLite WAL Mode: Enabled Write-Ahead Logging to allow concurrent read and write operations without blocking.
- Busy Timeout: Configured a
5000msbusy timeout to handle lock contention gracefully during high concurrency. - Minimal Locking: Database locks are only held during atomic state transitions (checking result, marking in-progress, or storing final result).
- SQL Injection Protection: All database interactions use
PreparedStatementwith parameterized queries to ensure data security. - Type Safety: Utilizes Java Generics and TypeTokens to ensure compile-time type checking and safe deserialization of complex objects.
- Atomic State Transitions: Uses a state machine (
pending->in_progress->completed) to ensure that partially executed steps are never mistakenly treated as finished.
The engine handles the case where a process crashes after a side effect has started but before its result is saved. By checkpointing the in_progress state, the engine detects interrupted steps upon restart and triggers a safe re-execution (Replay Pattern).
- Java 17 or higher
- Maven 3.6 or higher
mvn clean installmvn exec:java -Dexec.mainClass="Main" -Dexec.args="-workflow-id onboarding-001 -name 'Jane Smith' -email 'jane@example.com'"Or compile and run directly:
mvn package
java -cp target/durable-execution-engine-1.0.0-jar-with-dependencies.jar Main -workflow-id onboarding-001 -name "Jane Smith" -email "jane@example.com"To test durability, you can interrupt the workflow at any time using Ctrl+C. The workflow will stop, and when you run the same command again, it will resume from where it left off.
# Start workflow
java -cp target/durable-execution-engine-1.0.0-jar-with-dependencies.jar Main -workflow-id onboarding-001
# Press Ctrl+C to simulate crash
# Resume workflow (same command)
java -cp target/durable-execution-engine-1.0.0-jar-with-dependencies.jar Main -workflow-id onboarding-001To start fresh, use the -reset flag:
java -cp target/durable-execution-engine-1.0.0-jar-with-dependencies.jar Main -workflow-id onboarding-001 -resetThe example workflow demonstrates:
- Step 1: Create Record (Sequential)
- Step 2 & 3: Provision Laptop & Access (Parallel)
- Step 4: Send Welcome Email (Sequential)
// Step 1: Create Record
Employee employee = dc.step("create_record", () -> {
Employee emp = new Employee();
emp.setId("EMP-001");
emp.setName("John");
return emp;
}, Employee.class);
// Step 2 & 3: Parallel execution
Map<String, DurableContext.StepFunction<String>> parallelSteps = new HashMap<>();
parallelSteps.put("provision_laptop", () -> "LAPTOP-001");
parallelSteps.put("provision_access", () -> "ACCESS-001");
Map<String, String> results = Concurrency.runParallel(dc, parallelSteps, String.class);
// Step 4: Send Email
dc.step("send_welcome_email", () -> {
// Send email
return true;
}, Boolean.class);The engine uses an atomic sequence counter to track step execution order. Each call to step increments the sequence number, creating a unique step_key by combining the step ID with the sequence number (e.g., "create_record:1", "create_record:2").
This approach handles:
- Loops: Each iteration gets a unique sequence number
- Conditionals: Steps in different branches get different sequence numbers
- Parallel Steps: Each parallel step gets its own sequence number
for (int i = 0; i < 3; i++) {
final int index = i;
String result = dc.step("process_item", () -> {
// This will create step keys: "process_item:1", "process_item:2", "process_item:3"
return "item-" + index;
}, String.class);
}The engine ensures thread safety during parallel execution through multiple mechanisms:
- WAL Mode: SQLite is configured with Write-Ahead Logging (WAL) mode, which allows concurrent reads and writes
- Busy Timeout: A 5-second busy timeout is set to handle
SQLITE_BUSYerrors gracefully - ReentrantLock Protection: Critical sections (database writes) are protected with
ReentrantLock
- Check Phase: Thread-safe read to check if step exists (no lock needed for reads in WAL mode)
- Mark In Progress: Atomic insert with lock protection
- Execute: Function execution (no database access)
- Store Result: Atomic update with lock protection
The engine protects against "zombie steps" (crashes between execution and commit) by:
- Marking steps as "in_progress" before execution
- On resume, detecting steps stuck in "in_progress" state
- Automatically marking them as failed and re-executing
CREATE TABLE steps (
workflow_id TEXT NOT NULL,
step_key TEXT NOT NULL,
status TEXT NOT NULL,
result BLOB,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (workflow_id, step_key)
);workflow_id: Unique identifier for the workflow instancestep_key: Combination of step ID and sequence number (e.g.,"create_record:1")status: One of"in_progress","completed", or"failed"result: JSON-serialized result of the step execution
Executes a function with durability guarantees. Returns cached result if step was previously executed.
Parameters:
id: Step identifier (can be reused in loops)fn: Functional interface that returns typeTclazz: Class object for typeT(for JSON deserialization)
Returns:
- Result of type
Tif successful - Throws
Exceptionif execution fails
runParallel<T>(DurableContext dc, Map<String, StepFunction<T>> steps, Class<T> clazz) throws Exception
Executes multiple steps in parallel using Java's CompletableFuture.
Parameters:
dc: Durable context for the workflowsteps: Map of step IDs to functionsclazz: Class object for typeT
Returns:
- Map of step IDs to results
- Throws
Exceptionif any step fails
Creates a new durable execution context.
Parameters:
workflowID: Unique workflow identifierdb: SQLite database connection
-
Start a workflow:
java -cp target/durable-execution-engine-1.0.0-jar-with-dependencies.jar Main -workflow-id test-001
-
Interrupt it during execution (Ctrl+C)
-
Resume the same workflow:
java -cp target/durable-execution-engine-1.0.0-jar-with-dependencies.jar Main -workflow-id test-001
-
Observe that completed steps are skipped and the workflow resumes from the last incomplete step.
durable-execution-engine/
├── src/main/java/
│ ├── engine/
│ │ ├── DurableContext.java # Core Step primitive and DurableContext
│ │ ├── Database.java # Database initialization and schema
│ │ └── Concurrency.java # Parallel execution support
│ ├── examples/
│ │ └── onboarding/
│ │ ├── Employee.java # Employee data class
│ │ └── OnboardingWorkflow.java # Example employee onboarding workflow
│ └── Main.java # CLI tool
├── pom.xml # Maven project definition
└── README.md # This file
org.xerial:sqlite-jdbc:3.44.1.0: SQLite JDBC driver for Javacom.google.code.gson:gson:2.10.1: JSON serialization library
# Compile the project
mvn clean compile
# Package as JAR with dependencies
mvn clean package
# Run Unit Tests
mvn -Dtest=DurableEngineTest test
# Run Stress Test (100 concurrent workflows)
mvn -Dtest=StressTest test
# Run a Single Test Case
mvn -Dtest=DurableEngineTest#testDurableLoop test
This project is created for educational purposes as part of an assignment.